Tamr Documentation

Exporting a Dataset from the Dataset Catalog

Admins can export a dataset from the Tamr Dataset Catalog.

As an admin, you can export datasets from the Dataset Catalog.

  • If the Tamr Data Movement Service  (DMS) is enabled and configured, you can export a dataset from Tamr to a previously-configured cloud storage location via the user interface, in either CSV or Parquet format.
  • If DMS is not enabled, you can download a datatset in CSV format to your local file system.

Exporting to Cloud Storage with DMS

You use the Tamr DMS to export datasets in CSV or Parquet format to cloud storage.

Important: Tamr users who need access to data files exported to cloud storage must be given access to the appropriate cloud storage locations.

Note: For information about how to use APIs to export datasets, see Using the DMS API.

Export File Format for Cloud Storage Destination

Files exported to cloud storage destinations have the following characteristics:

  • Format: Parquet or Comma separated values (.csv). The delimiter, quote and escape characters are ,, " and " respectively. Spaces in attribute names are handled differently in CSV and Parquet files:
    • For exports in CSV format, if an attribute name in the Tamr dataset includes a space, the exported column name includes the space.
    • For exports in Parquet format, Tamr automatically replaces spaces with underscores. For example, the “Cluster Name” attribute is exported as “Cluster_Name”.
  • Encoding: UTF-8.
  • Header: File contains a header row.
  • Multivalues: Multivalues are delimited by the character |.

Exporting a Dataset to a Cloud Storage Destination

To export to a cloud storage destination:

  1. On the Dataset Catalog page, locate the dataset and choose Export.
  2. Select Confirm to start the dataset export job.
  3. When the dataset export job finishes, select Export available.
  4. Select Export to (provider name). The Export Dataset dialog opens.
  5. Select the file type for your export: CSV or Parquet.
  6. Specify a new or existing destination path for the file.
  • ADLS2: Account Name, Container, Path
  • AWS S3: Region, Bucket, Path
  • GCS: Project, Bucket, Path
    To search for an existing path, you can supply values for the first two fields and then select Apply.
    Tip: To reduce the time a search takes, provide as much of the path as possible.
    If you change the file type or the destination values, Apply to refresh the file finder.
  1. Select Export Dataset. Tamr exports the dataset into file(s) with the dataset name in the specified folder.

Exporting a Dataset to a Local File System

If DMS is not enabled, you can download a dataset to a local file system from the Dataset Catalog.

Note: For information about how to use APIs to export datasets, Tamr customers can consult the Tamr Help Center knowledge base.

Export File Format for Local Download

Tamr downloads files with the following characteristics:

  • Format: Comma separated values (.csv). The delimiter, quote and escape characters are ,, " and " respectively.
  • Encoding: UTF-8.
  • Header: File contains a header row.
  • Multivalues: Multivalues are delimited by the character |.

Export a Dataset to a Local File System

To export a datatset to a local file system:

  1. On the Dataset Catalog page, locate the dataset and choose Export.
  2. Select Confirm to start the dataset export job.
  3. When the dataset export job finishes, select Export available.
  4. Select Download Export. The file downloads to your computer.

Updated about a month ago


Exporting a Dataset from the Dataset Catalog


Admins can export a dataset from the Tamr Dataset Catalog.

Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.