Bulk Matching External Records
Temporarily store a large number of incoming or external records (from tens of thousands to millions) to a mastering project until the match job is complete, and identify matches between the incoming records and existing records or records and clusters.
Matching Records and Clusters
Incoming records are matched against existing records or clusters in a mastering project.
- Records: Tamr Core compares each incoming record against existing records in the unified dataset to determine any matching records. For each record that finds a match, the response includes the matching record id pair and the confidence score.
- Clusters: Tamr Core compares each incoming record against existing clusters to find any matching clusters. For each record that finds a match, the response includes the matching record id and cluster id pair, and the average confidence score.
Before You Begin
Verify the following before completing the procedures in this topic:
- At least one mastering project exists (Creating a Project).
- The project includes at one dataset, and you have performed schema mapping on the project’s unified dataset (Adding a Dataset).
- You have run the Generate Record Pairs and the Update Results jobs at least once.
Typically, after you make changes that affect clusters, you publish the clusters and then use a matching service. If there no changes ( that is, clusters have not been published since the last time you used a matching service) -1 is returned.
Asynchronous - Bulk Matching External Records
To run asynchronous bulk matching:
- Bulk Match: POST /projects/{project}:bulkMatch
Submit records in a single batch for matching, whereproject
is the name of Mastering project, andtype
is either of the keywordsrecords
orclusters
. Additionally, capture the id of the submitted job from the response. - Wait For Job: GET /v1/operations/{operationId}
Poll the status of the job submitted in Step 1 using the captured{id}
until statusSUCCEEDED
received. - Export Results: GET /projects/{project}/results/{operationId}
Export the updated bulk record match results dataset using the captured{id}
from Step 1.
Synchronous Operation - Bulk Matching External Records
To run synchronous bulk matching, either
- Bulk Match Records: POST /dedup/match/records/{name}
Submit records in a stream for matching against existing records, where{name}
is the name of the unified dataset of a Mastering project. - Bulk Match Clusters: POST /dedup/match/clusters/{name}
Submit records in a stream for matching against existing clusters, where{name}
is the name of the unified dataset of a Mastering project.
For more information on these endpoints, see the Swagger API documentation installed with your Tamr Core instance.
Updated about 2 years ago