A mastering project allows you to find similar records by generating record pairs from all datasets mapped to the unified schema. A record pair is defined as two records that form a potential match.
The goal of the blocking model is to help you efficiently identify matching record pairs. A blocking model is composed of one-or-more blocking terms. They define matching conditions for unified attribute values. See Adding a Blocking Clause.
In creating a blocking model it helps to estimate the number of record pairs that Tamr Core will generate. For example, in a real-world dataset of 1M customer account records, the model can typically find 50M potentially matching record pairs.
Estimating record pair counts allows you to iterate quickly when discovering blocking terms and see the effect of adjusting thresholds, tokenizers and similarity functions.
Initially, a handful of arbitrary record pairs are selected by an admin or curator and classified as Match or No Match. These initial matching and non-matching record pairs provide the model with the first feedback required to begin learning and allow the curator to initialize the entity resolution model. See Training Initial Pairs and Reviewing Record Pairs.
Once you run the model, it identifies high-impact record pairs for review. You can assign high-impact record pairs to reviewers. Curators and verifiers can then verify these assignments.
The iterative curation of high-impact record pairs allows Tamr Core to accurately classify all record pairs as matching or non-matching pairs.
Both curators or verifiers can assign and verify pairs. See the following:
Updated about 2 months ago