Tamr can accelerate the schema mapping process by learning which input attributes are likely candidates to be mapped to unified attributes.
Essentially, the Tamr model treats schema mapping as a categorization task: if the unified attributes are the categories, which is the best fit for each input attribute? The model tokenizes all of the values for an attribute and uses the MinHash sampling technique to evaluate overlap. The model assigns a score to indicate relative similarity and then makes mapping suggestions for you to review. When you accept or reject suggestions, you provide more information to the model.
This process is iterative and involves steps that you complete or initiate, and others that are computed by the Tamr model.
- Create unified attributes and map input attributes to a few of them, either manually or by bootstrapping (done by you).
- Update the unified dataset by running the job (initiated by you, completed by Tamr).
- Learn from mappings by running the job (initiated by you, completed by Tamr).
- Generate mapping suggestions (initiated by you, completed by Tamr).
- Review and accept or reject Tamr's mapping suggestions (done by you).
To control the number and quality of the suggestions that you review, you can specify how similar you want the mapping suggestions to be.
- Repeat if necessary (done by you).
Steps 1 and 5 involve expert input and feedback.
Steps 2-4 represent jobs that Tamr runs to learn and generate suggested mappings for attributes in the unified dataset.
When you add more datasets to the project you repeat this process by running steps 2-4 again, and reviewing, accepting, or rejecting Tamr suggestions.
Updated 5 months ago
If you are running a mastering or categorization project, you can continue optimizing the schema for the unified dataset as needed. If, on the other hand, you are satisfied with the set of unified attributes, you can proceed to preview the records in the unified dataset.
|Previewing the Unified Dataset|
|Generating Attribute Recommendations|