Working with Record Pairs
Understand record matching, and curating and reviewing high-impact record pairs in a mastering project.

The iterative curation of high-impact record pairs allows Tamr to accurately classify all record pairs as a match or non-match.
A mastering project allows you to find similar records by generating record pairs from all datasets mapped to the unified schema. A record pair is defined as two records that form a potential match.
Generating Record Pairs
Blocking Model
Once the unified dataset is created, the next step is to create a blocking model to generate record pairs that are a potential match. See Managing Record Pairs.
The goal of the blocking model is to help you efficiently identify matching record pairs. A blocking model is composed of one-or-more blocking terms . They define matching conditions for unified attribute values. See Adding a Blocking Term.
Estimating Pair Counts
In creating a blocking model it helps to estimate the number of record pairs that Tamr will generate. For example, in a real-world dataset of 1M customer account records, Tamr can typically find 50M potentially matching record pairs.
Estimating record pair counts allows you to iterate quickly when discovering blocking terms and see the effect of adjusting thresholds, tokenizers and similarity functions
See Estimating Pair Counts.
Curating and Reviewing Record Pairs
Initially, a handful of arbitrary record pairs are selected by the Admin and/or Curator and classified as Match or No Match. These initial match and/or no-match record pairs provide Tamr with the first feedback required to begin learning and allow the Curator to initialize the entity resolution model (Updating Mastering Results).
Once Tamr runs the model, it identifies high impact record pairs for review. You can assign high-impact record pairs to Reviewers to be classified as Match or No Match. Curators can then verify these assignments.
The iterative curation of high-impact record pairs allows Tamr to accurately classify all record pairs as a match or non-match.
Updated over 4 years ago