Tamr Documentation

Bulk Matching External Records

Temporarily store a large number of incoming or external records (from tens of thousands to millions) to a mastering project until the match job is complete, and identify matches between the incoming records and existing records or records and clusters.

📘

Record and Cluster Matching

Incoming records are matched against existing records or clusters in a mastering project.

Records: Tamr compares each incoming record against existing records in the unified dataset to determine any matching records. For each record that finds a match, the response includes the matching record id pair and the confidence score.

Clusters: Tamr compares each incoming record against existing clusters to find any matching clusters. For each record that finds a match, the response includes the matching record id and cluster id pair, and the average confidence score.

Checklist before proceeding

Typically, after you make changes that affect clusters, you publish the clusters and then use a matching service. If there no changes ( that is, clusters have not been published since the last time you used a matching service) -1 is returned.

Asynchronous Operation

Asynchronous operation to bulk match external or incoming records to a mastering project.Asynchronous operation to bulk match external or incoming records to a mastering project.

Asynchronous operation to bulk match external or incoming records to a mastering project.

  1. Bulk Match: POST /projects/{project}:bulkMatch
    Submit records in a single batch for matching, where project is the name of Mastering project, and type is either of the keywords records or clusters. Additionally, capture the id of the submitted job from the response.
  2. Wait For Job: GET /operations/{operationId}
    Poll the status of the job submitted in Step 1 using the captured {id} until status SUCCEEDED received.
  3. Export Results: GET /projects/{project}/results/{operationId}
    Export the materialized bulk record match results dataset using the captured {id} from Step 1.

Synchronous Operation

Synchronous operation to bulk match external or incoming records to a mastering project.Synchronous operation to bulk match external or incoming records to a mastering project.

Synchronous operation to bulk match external or incoming records to a mastering project.

  1. Bulk Match: POST /dedup/match/records/{name}
    Submit records in a stream for matching against existing records, where {name} is the name of the unified dataset of a Mastering project.

OR

  1. Bulk Match: POST /dedup/match/clusters/{name}
    Submit records in a stream for matching against existing clusters, where {name} is the name of the unified dataset of a Mastering project.

Updated about a month ago



Bulk Matching External Records


Temporarily store a large number of incoming or external records (from tens of thousands to millions) to a mastering project until the match job is complete, and identify matches between the incoming records and existing records or records and clusters.

Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.