As a curator, you can manage schema mapping, mastering, golden records, and categorization projects. You provide the initial data expertise to validate and train Tamr models for these projects, as well as complete any of the tasks that a reviewer or verifier can do.
In the schema mapping workflow, you design the unified schema by mapping the attributes, or columns, in input datasets to unified attributes, which trains the Tamr model for recommending subsequent mappings.
In mastering projects, you are responsible for:
- Defining the blocking model to optimize the number of pairwise comparisons Tamr makes to identify matching and non-matching records.
- Labelling an initial set of record pairs as match or no-match to train the Tamr model.
- Running jobs to train the Tamr models with the feedback that team members provide on record pairs and clusters.
You can also complete all of the tasks that a reviewer or a verifier can perform in a mastering project.
In a golden records project, you are responsible for configuring the rules that consolidate a cluster of records from a mastering project into a single golden record with data values that best represent the single entity represented by that cluster.
In a categorization project, you are responsible for:
- Uploading and, if necessary, editing the taxonomy used to categorize records.
- Categorizing a subset of records to train the Tamr model.
- Running jobs to train the Tamr model with the feedback that team members provide on record categorizations.
You can also complete all of the tasks that a reviewer or a verifier can perform in a categorization project.
You can also define transformations in schema mapping, mastering, and categorization projects to modify the unified dataset. For example, you can concatenate first and last name fields, remove whitespace, or apply consistent date formatting.
Updated a day ago