Trade Data Sources Map — Replicating ImportGenius

Trade Data Sources Map — Replicating ImportGenius

Canonical reference catalog of data sources that can replicate ImportGenius’s capabilities. Durable cross-project knowledge — applies to Frank’s Renso engagement, Renso Hub vision, and diligence work across companies.

See also [[renso-trade-intelligence-stack]] (where this gets applied), [[renso-engagement-frank-cto]] (Phase 3 deliverables this powers), [[Financial Data Sources]] (complementary macro/finance sources), [[Canadian Government Data Indexing]] (StatCan and Canadian API patterns).

(source: open-web research synthesis, 2026-05-14)


Section 1: The Core Insight — What ImportGenius Actually Is

ImportGenius is not a proprietary data collection. It is primarily:

  1. US CBP AMS Vessel Cargo Manifests — legally public per 19 CFR §103.31 — the spine of the product
  2. ~12 Latin American customs systems publishing full Bill of Lading (B/L) data at the shipment level
  3. Asian customs data sourced through customs broker networks (not government portals — these governments do not publish B/L data publicly)
  4. Entity resolution + HS classification + UI layered on top of raw manifest data

Replication is feasible because (1) and (2) are statutorily public. The hard part is not access — it is the entity resolution and consistent HS normalization that ImportGenius has invested in over years. The $0 coverage gap (Section 11) is real but bounded.

(source: open-web research synthesis, 2026-05-14)


Section 2: Primary Manifest Data — Pricing Tier Table

Commercial aggregators that sit on top of the primary manifest data:

SourceCoverageURLCost
ImportYetiUS AMS Jan 2015+importyeti.com (beta API: data.importyeti.com)Free OSINT tier / ~$10/mo
OEC Bulk BoLUS AMS Jan 2021–Feb 2026oec.world/en/resources/bulk-download/bill-of-ladingPremium (gated)
ImportInfoUS AMS Oct 2012+importinfo.com$59–$149/mo
ImportGeniusUS + 21 countriesimportgenius.com$229–$1,999+/mo
Descartes Datamyne230 markets daily BOLdatamyne.comEnterprise (contact)
Panjiva / PIERS (S&P Global)US + 30+ countries; 16M+ records 2023spglobal.com$500–$2000+/mo est.
Trademo2.3B+ shipmentstrademo.com$20K+ per purchase
Volza203 countriesvolza.com$3.50/record API
manifestDB65 countries (raw CSV)manifestdb.comContact
ExportGenius (separate from ImportGenius)India-focusedexportgenius.inCustom
Seair Exim80+ countries, India-strongseair.co.inSubscription
Eximpedia130+ countrieseximpedia.appUsage-based
TendataGlobal, APAC-strongtendata.comSubscription

Decision rule: For the $0 architecture (Section 10), ImportYeti is the primary US AMS source. Peru SUNAT is the primary Latin America mirror source — already implemented in Frank’s Product_Intelligence_Engine.py.

(source: open-web research synthesis, 2026-05-14)


Section 3: Foreign Customs Systems by Country (B/L Level)

Open with Full Shipment Records (the Gold Tier)

These countries publish individual Bill of Lading records publicly or with minimal registration friction.

Peru — SUNAT/Aduanet at aduanet.gob.pe/cl-ad-itconsultadwh/ieITS01Alias Full B/L: date, FOB value, weight, shipper, consignee, product description, HS code. Free. Already implemented in Frank’s Product_Intelligence_Engine.py. Moderate CAPTCHA barrier. Richest open Latin America source. Canonical mirror for Vietnam→Canada and China→Canada intelligence. (source: /Users/franknguyen/renso/Product_Intelligence_Engine.py, open-web research synthesis, 2026-05-14)

Colombia — DIAN/MUISCA at dian.gov.co Full B/L data. Free. CAPTCHA-gated. (source: open-web research synthesis, 2026-05-14)

Ecuador — SENAE/ECUAPASS portal Full B/L data. Free. Login + NIT registration may be required. (source: open-web research synthesis, 2026-05-14)

Bolivia — Aduana Nacional Confirmed by ImportGenius coverage. Limited English documentation. (source: open-web research synthesis, 2026-05-14)

Costa Rica — DGA Full B/L confirmed. (source: open-web research synthesis, 2026-05-14)

Panama — ANA Full B/L; Colon Free Zone is a high-volume port (~242K import shipments). (source: open-web research synthesis, 2026-05-14)

Venezuela — SENIAT Listed by ImportGenius; coverage unclear given country operational instability. (source: open-web research synthesis, 2026-05-14)

Uruguay — DNA Confirmed by aggregators; third-party access only. (source: open-web research synthesis, 2026-05-14)

Paraguay — Partial Third-party access only. (source: open-web research synthesis, 2026-05-14)

Russia — FCS at eng.customs.gov.ru Live B/L confirmed by ImportGenius (land/sea/air + prices). Sanctions complications make Western pipelines operationally risky. (source: open-web research synthesis, 2026-05-14)

Ukraine — Customs Service Land/sea/air + prices per ImportGenius. (source: open-web research synthesis, 2026-05-14)

Turkey — GTIP codes via aggregators Source mechanism unclear. (source: open-web research synthesis, 2026-05-14)


Aggregate Only / No Public B/L (Use Mirror Data Instead)

These countries do not publish shipment-level customs records. Intelligence on them must be obtained through the mirror data technique (Section 9) or commercial subscriptions.

CountryPortal / AuthorityStatus
VietnamGDVC — customs.gov.vnStatistics only
ChinaGACC — english.customs.gov.cnMonthly press briefings only
PakistanFBR/WeBOCBeing replaced; no public B/L
BangladeshNBRAggregates only
Sri LankaCustoms”Trade info not released to third parties” — official statement
PhilippinesBureau of CustomsNo public B/L
IndonesiaDJBCThird-party claims; sources opaque
ThailandCustoms DepartmentAggregates only
IndiaDGCIS FTDDPAggregates only; row-level via broker networks
MexicoANAM/SATPedimento-level not public; only aggregate statistics; third parties source from customs brokers
ChileSNAGovernment gives aggregates; B/L via commercial only
ArgentinaARCA (formerly AFIP/DGA)Partial; regime in flux — Decree 953/2024 replaced AFIP with ARCA
BrazilComex Stat — comexstat.mdic.gov.brFREE API but aggregate HS-8 monthly only (not individual B/L); R package comexr; monthly since 1997; CIF/freight included

Note on Brazil: Comex Stat is still valuable as a gap-fill for Brazil aggregate flows even though it lacks B/L-level data. Use as sanity-check alongside individual country B/L sources.

(source: open-web research synthesis, 2026-05-14)


Section 4: Vessel & Container Tracking

Free Tier

SourceNotesURL
aisstream.ioFree WebSocket; ~200km coastal; ~300 msg/sec worldwideaisstream.io
AISHubFree with AIS feed contribution (reciprocal); coastalaishub.net
Maersk DeveloperFree; Track and Trace Plus REST endpointdeveloper.maersk.com
Hapag-Lloyd APIFree with developer registrationapi-portal.hlag.com
CMA CGM APIFree with developer registrationapi-portal.cma-cgm.com
Track-Trace.comFree basic; 50+ carrier lookupstrack-trace.com
EQUASISFree with registration; Port State Control records (not live tracking)equasis.org

Key limit: aisstream.io is coastal only (~200km). Real-time satellite AIS (Spire, Kpler) is enterprise tier.

SourceNotes
MarineTraffic / KplerEnterprise; removed credit system Jan 2025
VesselFinder€330 / 10K credits; archive since 2009
Spire Maritime / Pole Star / exactEarthEnterprise, satellite-class — global coverage
FleetMonSubscription; acquired March 2023
ShipsgoCredit-based; 160+ carriers; 1 credit per upload then unlimited subsequent queries

(source: open-web research synthesis, 2026-05-14)


Section 5: Customs Aggregates (Free APIs)

These are country/HS-level aggregate flows — not shipment B/L records. Use for volume baseline and sanity-checking micro sources. See also [[renso-trade-intelligence-stack]] Layer 1 (Macro).

SourceWhatAccessRate Limit
UN Comtrade PlusHS-6 bilateral, 200+ countriescomtradedeveloper.un.org + Python lib comtradeapicall500 calls/day, 100K records/call
US Census International TradeUS HS-level monthlycensus.gov/data/developersNo key needed as of March 2025
Eurostat ComextEU bilateral HS-8ec.europa.eu/eurostat — SDMX 2.1+3.0None stated; CSV bulk back to 1988
Statistics Canada WDSCanada bilateral monthlystatcan.gc.ca — Table 12-10-0011-01 (monthly), 12-10-0143 (BEC5)None
Brazil Comex StatBrazil bilateral HS-8 monthly since 1997comexstat.mdic.gov.br; R package comexrNone stated
USITC DataWebUS trade + HTS classificationdataweb.usitc.gov (account required)None stated
World Bank WITS / TRAINSTrade + tariffswits.worldbank.org/API/V1; Python: world_trade_dataNo full-DB queries
WTO StatisticsPolicy + tariffsstats.wto.orgYes
IMF DOTSDirection of Tradeimf.org/en/DataYes
OECD Trade StatsOECD countriesstats.oecd.org — SDMXYes
Vietnam GSOVietnam aggregatesgso.gov.vnNo formal API
China GACCMonthly press briefingsenglish.customs.gov.cnNo API
India DGCIS FTDDPAnnual statistical publicationsftddp.dgciskol.gov.inNo API

(source: open-web research synthesis, 2026-05-14; source: /Users/franknguyen/renso/Master_Trade_Data_Directory.md)


Section 6: Entity Resolution & Enrichment

Entity resolution is the hard part of replicating ImportGenius — normalizing scraped shipper/consignee strings to canonical legal entities.

SourceCoverageCost
GLEIF Golden Copy4M+ LEIs in 200+ jurisdictions; daily deltaFree bulk download
OpenCorporates200M companies, 140 jurisdictions; OpenRefine reconciliation APIFree non-commercial; £2,250–£12,000+/yr commercial
MRAS / Canadian Business RegistriesAll provinces, beneficial ownership since Jan 2024Free
OrgBook BCBC verifiable credentialsFree API (already used in Frank’s stack)
Companies House UKFull REST API; no key requiredFree
OFAC SDN + EU + UN sanctionsBulk XML/CSV; fuzzy search UIFree
USPTO PatentsViewPatent assignments + inventorsFree API at patentsview.org/api
FCC ID databaseElectronics certificationsFree at fccid.io / fcc.report (already in PIE.py)
Wikidata SPARQLAliases, identifiersFree at query.wikidata.org
D&B Direct / LSEG World-CheckEnterprise tierPaid
Crunchbase / DealroomStartup/VC-backed entitiesFree limited / paid

Recommended resolution pipeline: GLEIF Golden Copy → RapidFuzz or Elasticsearch local index → OpenCorporates reconciliation API → MRAS/OrgBook for Canadian entities → OFAC/EU/UN sanctions screening on every entity resolved.

(source: open-web research synthesis, 2026-05-14; source: /Users/franknguyen/renso/Global_Trade_Aggregator_V1_Staff_Engineered.md)


Section 7: Classification (Free-Text → HS Code)

Matching scraped product descriptions to HS codes is the classification layer. The hierarchy is HS-2 (chapter) → HS-4 (heading) → HS-6 (subheading, internationally standard) → HS-8/10 (national tariff lines).

SourceNotesURL
USITC HTSFull text + CSV/JSON/Excelhts.usitc.gov; data.gov dataset
WCO Trade ToolsHS 2022 legal text free; explanatory notes subscriptionwcotradetools.org
GitHub: datasets/harmonized-systemCommunity datapackagegithub.com/datasets/harmonized-system
GitHub: warrantgroup/WCO-HS-CodesRaw CSVgithub.com/warrantgroup/WCO-HS-Codes
WTO HS TrackerTrack HS code changes across versionshstracker.wto.org
CBSA D-series memorandaCanadian classification guidancecbsa-asfc.gc.ca
LLM batch classifierClaude Sonnet $3/MTok input; 85–92% accuracy at HS-4 with good prompting

Practical note: For the $0 tier, the USITC HTS CSV as a lookup table plus a Claude Sonnet batch classifier for free-text descriptions is sufficient for HS-4 accuracy. HS-6 precision requires the WCO legal text.

(source: open-web research synthesis, 2026-05-14)


Section 8: Adjacent Intelligence

Sources adjacent to manifest/customs data that enrich trade intelligence:

SourceWhatURL
FDA facility registrationSearch by country/product/firmaccessdata.fda.gov
USDA APHIS/FSISForeign establishment listsfsis.usda.gov
CFIA licensed establishmentsCanadian food inspectioninspection.canada.ca
China GACC Decree 248 / CIFERChina-approved foreign food exporters; USDA FAS publishes updatesfas.usda.gov
EU CBAMCarbon Border Adjustment declarations (live since Oct 2023)ec.europa.eu/taxation_customs/cbam
gCaptain / MarineLink / Maritime ExecutiveFree trade pressgcaptain.com / marinelink.com
EventRegistry200 articles/day freeeventregistry.org
Google News RSSSearch-based news feednews.google.com/rss/search?q=

(source: open-web research synthesis, 2026-05-14)


Section 9: The Mirror Data Technique (Frank’s Core Use Case)

When both origin AND destination countries have closed customs systems, query a third country whose customs is open and which appears on either end of the trade relationship.

The canonical problem: Vietnam → Canada. Canada suppresses vessel manifest data. Vietnam is aggregate-only. Both ends are closed.

For Vietnam (Origin-Closed)

Vietnam → US (via US AMS / ImportYeti) — PRIMARY MIRROR. US is Vietnam’s #1–2 export destination. Virtually any Vietnamese factory shipping to Canada also ships to the US. HIGH signal quality. This is the first query in any Vietnam factory fingerprinting workflow.

Vietnam → Peru (Peru SUNAT — Frank’s PIE.py) — SECONDARY MIRROR. Smaller trade lane; useful for food/consumer goods. Medium signal. Already implemented.

Vietnam → Colombia / Ecuador / Bolivia — TERTIARY MIRRORS. SUNAT-family open systems. Medium signal. Useful for cross-validation.

For China (Origin-Closed)

China → US (US AMS) — PRIMARY MIRROR. Highest-volume lane in AMS database. HIGH signal quality.

China → Peru / Colombia — SECONDARY MIRRORS. Good for food/consumer goods cross-validation. Medium signal.

Recipe: Fingerprinting a Vietnamese Factory for Its Canadian Customer Base

  1. Query ImportYeti (US AMS) for factory name OR HS code + origin=Vietnam → get US consignees
  2. Cross-reference Peru SUNAT for the same shipper name → get Peruvian consignees
  3. Union the consignee set → factory’s global customer base (US + Latin America proxy)
  4. Resolve all names via GLEIF + OpenCorporates + OrgBook → canonical legal entities
  5. Canadian customers typically cluster near US customers → cross-reference MRAS / Canadian business registries / trade press to identify Canadian counterparts

(source: open-web research synthesis, 2026-05-14; source: /Users/franknguyen/renso/Product_Intelligence_Engine.py)


Section 10: The $0/Month Replication Architecture

Target: ~80% ImportGenius coverage on Canada–Vietnam, Canada–China, and Latin America trade lanes.

Ingestion Layer:
  - ImportYeti scraper (US AMS since 2015; free OSINT tier)
  - Peru SUNAT scraper (already in PIE.py — aduanet.gob.pe)
  - Colombia DIAN scraper (CAPTCHA bypass needed)
  - Brazil Comex Stat API (aggregate gap-fill — comexstat.mdic.gov.br)
  - UN Comtrade, US Census, StatCan, Eurostat APIs (macro sanity-check)
  - aisstream.io WebSocket (AIS vessel positions — coastal)
  - Maersk / Hapag-Lloyd / CMA CGM developer APIs (container milestones — free)
  - FCC ID scraper (fccid.io — already in PIE.py)

Resolution Layer:
  - GLEIF Golden Copy bulk → indexed in RapidFuzz or Elasticsearch
  - OpenCorporates API (non-commercial tier)
  - OrgBook BC + MRAS for Canadian entities
  - OFAC + EU + UN sanctions screening on every resolved entity

Classification Layer:
  - USITC HTS CSV lookup table (hts.usitc.gov)
  - Claude Sonnet batch classifier for free-text → HS-6 → HS-10

Storage Layer:
  - Postgres `shipments` canonical table
  - Postgres `entities` LEI-keyed table
  - S3/R2 for raw scrape archives

Query Layer:
  - SQL + Metabase/Superset dashboards
  - Optional: Typesense or Meilisearch for full-text search

(source: open-web research synthesis, 2026-05-14)


Section 11: Coverage Gaps and Limits (Honest)

The $0 architecture MISSES:

  • Vietnam, China, Pakistan, Bangladesh, Indonesia, Thailand, Philippines — no shipment-level data without commercial subscriptions. These markets are sourced through customs broker networks, not government portals. Only Panjiva, Volza, Trademo, Datamyne, Tendata reach there.
  • Real-time satellite AIS — aisstream.io is coastal only (~200km from shore). Spire Maritime / Kpler / exactEarth is enterprise tier. Vessels in open ocean are invisible.
  • Historical data before 2015 — ImportYeti US AMS starts Jan 2015. OEC BoL starts Jan 2021. Earlier US manifest history requires ImportInfo ($59+/mo, back to Oct 2012) or PIERS.
  • AI-powered entity resolution at scale — ImportGenius has invested years in their entity graph. The GLEIF + OpenCorporates + RapidFuzz approach works but requires operational tuning and will have higher false-positive rates on ambiguous company names.

(source: open-web research synthesis, 2026-05-14)


Section 12: Open-Source Prior Art

RepositoryWhat
frnsys/tradePython AMS manifest analysis
marcdacosta/ambient-shippingAIS + BoL correlation utilities

Both are useful starting points for the ingestion layer. Check for upstream changes before building custom scrapers.

(source: open-web research synthesis, 2026-05-14)


  • [[renso-trade-intelligence-stack]] — where this catalog gets applied: Frank’s Product Intelligence Engine, triangulation architecture, Peru SUNAT implementation, Meta Ad Library tooling
  • [[renso-engagement-frank-cto]] — Phase 3 AI/data intelligence deliverables that this powers
  • [[renso-strategic-vision]] — Renso Hub data-subscription product that these pipelines would feed
  • [[Financial Data Sources]] — macro financial data sources (FRED, EDGAR, Bloomberg) in complementary context
  • [[Canadian Government Data Indexing]] — StatCan API patterns and Canadian government data access