User GuidesAPI ReferenceRelease Notes
Doc HomeHelp CenterLog In

Schema Mapping Projects

In a schema mapping project, you map attributes from input datasets to new attributes in a single, unified schema to create the resulting unified dataset.

Purpose and Overview

A Tamr Core schema mapping project allows you to build a common view of an entity (for example, a person or an organization) across your data sources.

You do this by "mapping" attributes from multiple input datasets to a consistent set of attributes in a unified schema. This allows you to harmonize multiple data sources that contain the same type of content, but identify values with different attribute names or store them in different formats.

The unified schema contains all of the attributes needed to answer questions downstream. It can be helpful to think of the attributes in the unified dataset as the set of column headers in the table into which the project consolidates data.

You can then apply transformations to the unified dataset to clean, reformat, or otherwise change the unified dataset without affecting the source values from your input datasets.

For example, a pharmaceutical company can use a schema mapping project to converge records from thousands of clinical trials into a single standard
CDISC SDTM version. The standardized data that results allows the company to comply with FDA standards. It also makes it easier to implement other initiatives, such as building integrated, curated data hubs. Data hubs with clean data enable scientific insights across many clinical trials.

Schema Mapping Workflow

The schema mapping workflow consists of the following stages:

  1. Create the project and upload the input datasets, or use the API to
    add an input dataset to a project. Team members with the admin or author role complete this stage.
  2. Profile datasets to compute metrics for the dataset and its attributes and create a sample of the records for display. The admin or author typically profiles input datasets on upload, and can re-run profiling at any point in the workflow.
  3. Design the unified attributes for the unified schema. Stakeholders throughout your organization can contribute to this stage.
  4. Create the unified dataset and begin mapping input attributes to unified attributes. Curators complete this stage.
  5. Use Tamr Core's machine learning model to generate attribute mapping recommendations, and accept or reject these suggestions. Curators complete this stage.
  6. Optional. Set up transformations for the data in the unified dataset. Curators complete this stage.

Tip: Team members with the reviewer or verifier role can view progress in a schema mapping project. However, these projects do not provide an interface for comments or feedback from these team members.

For more information, see User Roles and Tamr Core Documentation.