User GuidesAPI ReferenceRelease Notes
Doc HomeHelp CenterLog In

Profiling a Dataset

Profiling creates a sample of the dataset for preview, computes metrics, and saves metadata for use during schema mapping.

Profiling an input dataset does the following:

  • Creates a sample of the dataset to preview on the Schema Mapping and Unified Dataset pages.
  • Computes the dataset record count and metrics for each attribute in the dataset including percentage null, percentage distinct, and most frequent values (distinct value histogram).
  • Produces metadata that the machine learning model can use when learning how to map attributes from input datasets to the unified dataset.

You can profile datasets when you upload or while you work in a project.

To profile an input dataset:

  1. Open a schema mapping, mastering, or categorization project and select Datasets.
  2. Select a dataset by using the checkbox that appears on the left end of the row and then choose Profile.
  3. Select Profile n Dataset to confirm.

To update profiling during schema mapping:

  1. Open the Schema Mapping page of a schema mapping, mastering, or categorization project. The input attributes appear on the left.
  2. Expand the information for an input attribute (on the left) or unified attribute (on the right) by selecting the expand arrow to the left of the attribute name. If profiling is needed, one of the following options appears.
  • A Profile option appears if the attribute's dataset has never been profiled.

Expanding details for an attribute that has not been profiled.

  • A Profiling out of date option appears if work in the project requires profiling to be refreshed.

Expanding details for an attribute with an out-of-date profile.

Viewing Attribute Metrics

When you expand an attribute's details on the Schema Mapping page, the following information displays:

  • Record count.
  • Most frequent values with percentages.
  • A bar chart representing the percentages of records with different types of data values vs. those that are null, empty, or blank. Move your cursor over the colored sections of the bar to see the calculated percentages.

Records for this city attribute are 76% text values, 12% lists with multiple values (with a maximum of 6 and a mean of 1.04) and 12% empty (null, blank string, or empty list).

You can also access metrics on the Unified Dataset page. Move your cursor over the bar chart under an attribute name to see its metrics.


Viewing profiled data on the Unified Dataset page.