A Tamr schema mapping project allows you to build a common view of an entity (for example, a person or an organization) across your data sources.
In this type of project, you "map" attributes from multiple input datasets to a consistent set of attributes in a unified schema. This allows you to harmonize multiple data sources that contain the same type of content, but identify values with different attribute names or store them in different formats.
The unified schema contains all of the attributes needed to answer questions downstream. It can be helpful to think of the attributes in the unified dataset as the set of column headers in the table into which Tamr consolidates data.
You can then apply transformations to the unified dataset to clean, reformat, or otherwise change the unified dataset without affecting the source values from your input datasets.
For example, a pharmaceutical company can use a schema mapping project to converge records from thousands of clinical trials into a single standard CDISC SDTM version. The standardized data that results allows the company to comply with FDA standards. It also makes it easier to implement other initiatives, such as building integrated, curated data hubs. Data hubs with clean data enable scientific insights across many clinical trials.
The schema mapping workflow consists of the following stages:
- Create the project and upload the input datasets, or use the API to add an input dataset to a project. Team members with the admin role complete this stage.
- Profile datasets to compute metrics for the dataset and its attributes and create a sample of the records for display. An admin typically profiles input dataset on upload, and can re-run profiling at any point in the workflow.
- Optional. Tag datasets. An admin can add metadata about each input dataset by adding tags. Tags allow you to organize and filter datasets, which can be useful in later stages of data mastering.
- Design the unified schema and create its unified attributes. Curators complete this stage.
- Begin mapping input attributes to unified attributes. Curators complete this stage.
- Use Tamr to generate attribute mapping recommendations, and accept or reject those suggestions. Curators complete this stage.
- Optional. Set up transformations for the data in the unified dataset. Curators complete this stage.
Tip: Team members with the reviewer role can view progress in a schema mapping project. However, these projects do not provide an interface for comments or feedback from reviewers.
For more information, see User Roles and the Tamr Documentation.
Updated 2 months ago
|Solving Data Quality Challenges with Tamr|
|Golden Records Projects|