User GuidesAPI ReferenceRelease Notes
Doc HomeHelp CenterLog In

Exporting a Dataset

Export a dataset from a project to save locally.

Admins and Curators can export input datasets from the Datasets page in a project.

  • If the Data Movement Service (DMS) is enabled and configured, you can export a dataset to a previously-configured cloud storage location, in either CSV or Parquet format.

    important Important: If DMS is enabled for your instance, you cannot download datasets to a local file system via the UI.

  • If DMS is not enabled, you can download a dataset in CSV format to your local file system.

Note: Admins can also export any dataset file from the Dataset Catalog, including input datasets, unified datasets, and internal datasets. See Exporting a Dataset from the Dataset Catalog.

Exporting to Cloud Storage with DMS

You use DMS to export data in CSV or Parquet format to cloud storage.

important Important: Users who need access to data files exported to cloud storage must be given access to the appropriate cloud storage locations.

Note: For information about how to use APIs to export datasets, see Using the DMS API.

Export File Format for Cloud Storage Destinations

Files exported to cloud storage destinations have the following characteristics:

  • Format: Parquet or Comma separated values (.csv). The delimiter, quote and escape characters are ,, " and " respectively. Spaces in attribute names are handled differently in CSV and Parquet files:
  • For exports in CSV format, if an attribute name in the Tamr dataset includes a space, the exported column name includes the space.
  • For exports in Parquet format, Tamr automatically replaces spaces with underscores. For example, the “Cluster Name” attribute is exported as “Cluster_Name”.
  • Encoding: UTF-8.
  • Header: File contains a header row.
  • Multivalues: Multivalues are delimited by the character |.

Export a Dataset to a Cloud Storage Destination

To export a dataset to a cloud storage destination:

  1. Open a schema mapping, mastering, or categorization project and select the Datasets page.
  2. Locate the dataset and choose Export.
  3. Select Confirm to start a dataset export job.
  4. When the dataset export job finishes, choose Export available.
  5. Select Export to (provider name). The Export Dataset dialog opens.
  6. Select the file type for your export: CSV or Parquet.
  7. Specify a new or existing destination path for the file.
  • ADLS2: Account Name, Container, Path.
  • AWS S3: Region, Bucket, Path.
  • GCS: Project, Bucket, Path.
    Tip: To search for an existing path, you can supply values for the first two fields and then select Apply. To reduce the time a search takes, provide as much of the path as possible.If you change the file type or the destination values, Apply to refresh the file finder.
  1. Select Export Dataset to export the dataset into file(s) with the dataset name in the specified folder.

Exporting a Dataset to a Local File System

If DMS is not enabled, you can use the user interface to export datasets in CSV format from a project and download to a local file system.

Note: For information about how to use APIs to export datasets, customers can consult the Tamr Help Center

Export File Format for Local Downloads

Files downloaded to local file system have the following characteristics:

  • Format: Comma separated values (.csv). The delimiter, quote and escape characters are ,, " and " respectively.
  • Encoding: UTF-8.
  • Header: File contains a header row.
  • Multivalues: Multivalues are delimited by the character |.

Export a Dataset Locally

To export a dataset locally:

  1. Open a schema mapping, mastering, or categorization project and select the Datasets page.
  2. Locate the dataset and choose Export.
  3. Select Confirm to start a dataset export job.
  4. When the dataset export job finishes, choose Export available.
  5. Select Download export. The CSV file downloads to your local file system.