Using the Dataset Catalog
The dataset catalog provides access to all input and unified datasets and to the internal datasets generated by system jobs.
Admins, as well as authors and curators with sufficient permission, can work with datasets in a project. Admins can also use the dataset catalog to manage datasets, including datasets generated by Tamr Core without locating and opening a project.
Viewing and Filtering Datasets
To view datasets and optionally filter by type:
- At the top right of your browser, select Dataset Catalog. By default, all datasets appear.
- Optional. To show only one type of dataset, from the Filter dropdown select one of these options:
- Source: Input datasets. See Uploading a Dataset into Tamr Core.
- Unified Datasets: The result of mapping the attributes in source dataset(s) to attributes in a unified dataset. See Schema Mapping Workflow.
- Results and Internals: The system generates datasets to store computation results and internal object properties. See Datasets Generated by Tamr Core.
- Sample, Group-by and API-derived: The system generates sample datasets as smaller versions of existing datasets to support faster previewing. Users create group-by and API-derived datasets as transformed versions of existing datasets.
- System: The system generates datasets with metadata that supports internal processes.
Managing Project Membership
You can add datasets to projects and remove datasets from projects.
To update project membership:
- Open the Dataset Catalog page.
- Move your cursor along the left edge of the table of datasets to show selection checkboxes.
- Select one or more checkboxes to choose datasets and then select Input to projects. The Manage project membership for n datasets dialog opens.
- To add all selected datasets to a project, select that project's checkbox.
To remove all selected datasets from a project, clear the project checkbox.
Tip: A horizontal line in a project checkbox indicates that a subset of the selected datasets are already members of that project. - Select Update.
Profiling Datasets
Profiling creates a sample of the dataset for preview, computes metrics, and saves metadata for use during schema mapping. For examples, see Viewing Attribute Metrics.
To profile one or more datasets:
- Open the Dataset Catalog page and move your cursor along the left edge of the table of datasets to show selection checkboxes.
- Select one or more checkboxes to choose datasets and then select Profile.
- To confirm, select Profile n Datasets.
Previewing a Dataset
Preview shows a sample of the records in an input dataset in tabular format. Preview is available after you profile a dataset.
To preview a dataset:
- Open the Dataset Catalog page.
- Locate the dataset and select Preview. You might need to scroll horizontally to see this option.
A dialog opens to show a sample of the records in the dataset.
Tagging Datasets
Tags are system-wide metadata values that can help you and your team organize and locate input datasets. For more information about the options available for working with tags, see Managing Dataset Tags.
To tag one or more datasets:
- Open the Dataset Catalog page and move your cursor along the left edge of the table of datasets to show selection checkboxes.
- Select one or more checkboxes to choose datasets and then select Tag. The Tags dialog opens.
- Select the checkboxes for the tags you want to add to the datasets, or clear the checkboxes to remove the tags.
Tip: A horizontal line in a tag checkbox indicates that a subset of the selected datasets have that tag.
You can also create a tag by typing a unique name into the search box, or select manage to edit tag names or delete tags.
Uploading Datasets
Uploading a dataset using the dataset catalog is similar to uploading a dataset into a project. For more information about preparing files for upload, see Preparing a Dataset.
To upload a dataset:
- Open the Dataset Catalog page.
- Select Add new dataset. The Add Dataset dialog opens.
- To upload a local file, select Choose file. See Uploading a Local File for details.
To upload from cloud storage (if configured for your installation) or another external source, see Uploading from Cloud Storage or Uploading with a JDBC Driver for details. - After upload, you can adjust the permissions in a policy to give team members access to the datasets. See Managing User Accounts and Access.
Exporting a Dataset
Exporting a dataset from the dataset catalog is similar to exporting a dataset from a project.
To export a dataset from the Dataset Catalog page, you locate the dataset and choose an Export option. You might need to scroll horizontally to see Export.
You can export files:
- In CSV format to local storage. See Exporting a Dataset to a Local File System.
- In CSV or Avro format to cloud storage. See Exporting to Cloud Storage.
- In Parquet format to cloud storage, and in other formats to other external data stores. See Exporting with a JDBC Driver.
Updated about 2 years ago