Training Initial Pairs
This topic is meant to guide users through the initial training stage in mastering projects, taking place before a curator first updates results.
Before you can begin initial training on the mastering model, a curator creates the unified dataset, optionally sets up record grouping, and configures the blocking model. After a curator applies feedback and updates results at least once, you can begin training the model.
To do this, on the Pairs page, start by labeling an initial set of pairs as a Match or No match . These initial responses provide the first feedback required to begin machine learning. Typically, one team member reviews these pairs and identifies matching and non-matching pairs, as well as noting pairs that are difficult to assess.
In order for Tamr Core to learn best, plan to use a variety of data attributes that represent the different ways in which records or groups may or may not match. Use careful approaches to ensure Tamr Core has an accurate set of training data.
Remember, you can select Compare details to open a side-by-side view to compare records or groups. See Viewing Pairs Side-By-Side.
For information on reviewing and labeling pairs, see Reviewing Pairs and Labeling Pairs.
Remember: Do not bulk label. In this stage of initial training, Tamr Core values quality over quantity. See Working with Tamr Core Machine Learning Models.
When you have a solid set of matching and non-matching pairs identified, have a curator Apply feedback and update results to update Tamr Core. See Curating Project Jobs and Viewing Metrics.
Updated over 1 year ago