Bulk Matching External Records
Bulk match incoming or external records against existing records or clusters in a mastering project.
Record and Cluster Matching
Incoming records are matched against existing records or clusters in a mastering project.
Records: Tamr compares each incoming record against existing records in the unified dataset to determine any matching records. For each record that finds a match, the response includes the matching record id pair and the confidence score.
Clusters: Tamr compares each incoming record against existing clusters to find any matching clusters. For each record that finds a match, the response includes the matching record id and cluster id pair, and the average confidence score.
Checklist before proceeding
- At least one mastering project exists (Creating a Project ).
- At least one dataset is added to the project and is schema mapped to the project's unified dataset (Adding a Dataset ).
- Generate Record Pairs and the Update Results jobs have both been executed.
Bulk Match and Published Clusters
If clusters have been published, bulk match will be performed against records and clusters from the most recent cluster publication. If clusters have never been published, bulk match will be performed against the current records and clusters.
Synchronous Operation

Synchronous operation to bulk match external or incoming records to a mastering project.
- Bulk Match: POST /dedup/match/records/{name}
Submit records in a stream for matching against existing records, where{name}
is the name of the unified dataset of a Mastering project.
OR
- Bulk Match: POST /dedup/match/clusters/{name}
Submit records in a stream for matching against existing clusters, where{name}
is the name of the unified dataset of a Mastering project.
Asynchronous Operation

Asynchronous operation to bulk match external or incoming records to a mastering project.
- Bulk Match: POST /projects/{project}:bulkMatch
Submit records in a single batch for matching, whereproject
is the name of Mastering project, andtype
is either of the keywordsrecords
orclusters
. Additionally, capture the id of the submitted job from the response. - Wait For Job: GET /operations/{operationId}
Poll the status of the job submitted in Step 1 using the captured{id}
until statusSUCCEEDED
received. - Export Results: GET /projects/{project}/results/{operationId}
Export the materialized bulk record match results dataset using the captured{id}
from Step 1.
Updated over 5 years ago