Schema Mapping Workflow
To harmonize multiple data sources, curators align attributes from input data sources with the attributes in a unified schema.
After an admin or author creates the project, you can upload one or more datasets. You then can access and review all of the attributes from all input datasets on the Schema Mapping page of the project. On this page, you create a unified schema from one or more tabular datasets.
You define the unified attributes in the target dataset, and how input attributes map to these unified attributes.
When you manually map some of the input attributes to unified attributes, you provide information to the machine learning model so that it can recommend additional mappings to help automate the process.
For example, your input datasets have attributes for givenName
, First_Name
, and Name
. Using your knowledge of both the input datasets and the downstream needs of data consumers, you decide that a unified attribute of firstName_original
should store all first name values, and you map the givenName
input attribute to that unified attribute. This initial mapping trains the model, which can then suggest additional mappings, potentially including First_Name
and Name
to firstName_original
. As you iteratively accept suggestions, the suggestions that the model makes become increasingly helpful.
You can then add data transformations to attributes in the unified dataset, specified input datasets, or both.
Updated almost 2 years ago