Tamr Core can accelerate the schema mapping process by learning which input attributes are likely to be mapped to unified attributes.
Essentially, the model treats schema mapping as a categorization task: if the unified attributes are the categories, which is the best fit for each input attribute? The model tokenizes all of the values for an attribute and uses the MinHash sampling technique to evaluate overlap. The model assigns a score to indicate relative similarity and then makes mapping suggestions for you to review. When you accept or reject suggestions, you provide more information to the model.
This process is iterative and involves steps that you complete or initiate, and others that are computed by the model.
- Create unified attributes and map input attributes to a few of them, either manually or by bootstrapping attributes.
- Update the unified dataset by running the job (initiated by you, completed by Tamr Core).
- Learn from mappings by running the job (initiated by you, completed by Tamr Core).
- Generate mapping suggestions (initiated by you, completed by Tamr Core).
- Review and accept or reject mapping suggestions.
To control the number and quality of the suggestions that you review, you can specify how similar you want the mapping suggestions to be.
- Repeat if necessary.
Steps 1 and 5 involve expert input and feedback.
Steps 2-4 represent jobs that Tamr Core runs to learn and generate suggested mappings for attributes in the unified dataset.
When you add more datasets to the project you repeat this process by running steps 2-4 again, and reviewing, accepting, or rejecting suggestions.
If you are running a mastering or categorization project, you can continue optimizing the schema for the unified dataset as needed. If, on the other hand, you are satisfied with the set of unified attributes, you can proceed to previewing the records in the unified dataset.
Updated over 1 year ago