Tamr Documentation

Using the Dataset Catalog

The dataset catalog provides access to all input and unified datasets and to the internal datasets generated by Tamr jobs.

Curators with sufficient permission can work with datasets in a project. Admins can use the dataset catalog to manage datasets, including datasets generated by Tamr without locating and opening a project.

Viewing and Filtering Datasets

To view datasets and optionally filter by type:

  1. At the top right of your browser, select Dataset Catalog. By default, all datasets appear.
  2. From the Filter dropdown, select an option to show only one type of dataset:
  • Source: Input datasets. See Uploading a Dataset into Tamr.
  • Unified Datasets: The result of mapping the attributes in source dataset(s) to attributes in a unified dataset. See Schema Mapping Workflow.
  • Results and Internals: Tamr generates datasets to store computation results and internal object properties. See Datasets Generated by Tamr.
  • Sample, Group-by and API-derived: Tamr generates sample datasets as smaller versions of existing datasets to support faster previewing. Users create group-by and API-derived datasets as transformed versions of existing datasets.
  • System: Tamr generates datasets with metadata that supports internal processes.

Managing Project Membership

You can add datasets to projects and remove datasets from projects.

To update project membership:

  1. Navigate to the Dataset Catalog page.
  2. Move your cursor along the left edge of the table of datasets to show selection checkboxes.
  3. Select one or more checkboxes to select datasets and then choose Input to projects. The Manage project membership for n datasets dialog box opens.
  4. To add all selected datasets to a project, select the project checkbox.
    To remove all selected datasets from a project, clear the project checkbox.
    Tip: A horizontal line in a project checkbox indicates that a subset of the selected datasets have been added to that project.
  5. Select Update.

Profiling Datasets

Profiling creates a sample of the dataset for preview, computes metrics, and saves metadata for use during schema mapping. For examples, see Viewing Attribute Metrics.

To profile one or more datasets:

  1. Navigate to the Dataset Catalog page and move your cursor along the left edge of the table of datasets to show selection checkboxes.
  2. Select one or more checkboxes to select datasets and then choose Profile.
  3. Select Profile n Datasets to confirm.

Previewing a Dataset

Preview shows a sample of the records in an input dataset in tabular format. Preview is available after a dataset is profiled.

To preview a dataset:

  1. Navigate to the Dataset Catalog page.
  2. Locate the dataset and select Preview. You might need to scroll horizontally to see this option.
    A dialog box with a sample of the records in the dataset opens.

Tagging Datasets

Tags are system-wide metadata values that can help you and your team organize and locate input datasets. For more information about the options available for working with tags, see Managing Dataset Tags.

To tag one or more datasets:

  1. Navigate to the Dataset Catalog page and move your cursor along the left edge of the table of datasets to show selection checkboxes.
  2. Select one or more checkboxes to select datasets and then choose Tag. The Tags dialog box opens.
  3. Select the checkboxes of the tags you want to add to the datasets, or clear the checkboxes to remove the tags.
    Tip: A horizontal line in a tag checkbox indicates that a subset of the selected datasets have that tag.

You can also create a tag by typing a unique name into the search box, or select manage to edit tag names or delete tags.

Uploading Datasets

Uploading a dataset in the dataset catalog is similar to uploading a dataset into a project. For more information about preparing files for upload, see Preparing a Dataset.

To upload a dataset:

  1. Navigate to the Dataset Catalog page.
  2. Select Add new dataset. The Add Dataset dialog box opens.
  3. To upload a local file, select Choose file. See Upload a Local File for details.
    To upload from another source configured for your installation, see Upload from a Connected External Source or Upload with the DMS for details.
  4. After upload, you can adjust the permissions in a policy to give team members access to the datasets. See Managing User Accounts and Access.

Exporting a Dataset

Exporting a dataset from the dataset catalog is similar to exporting a dataset from a project.

To export a dataset:

  1. Navigate to the Dataset Catalog page.
  2. Locate the dataset and select Export. You might need to scroll horizontally to see this option.
  3. Select Generate Export to start a dataset export job.
  4. When the dataset export job finishes, choose Export available. A menu of options appears.
  5. For local download: select Download export. The CSV file downloads to your local file system.
    If the DMS is configured for your installation, you can also export to cloud storage.

Updated 2 months ago


Using the Dataset Catalog


The dataset catalog provides access to all input and unified datasets and to the internal datasets generated by Tamr jobs.

Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.