Tamr Documentation

Exporting a Dataset

Export a dataset from a Tamr project to save locally.

You can export datasets from the Datasets page in a project.

  • If the Tamr Data Movement Service  (DMS) is enabled and configured, you can export a dataset from Tamr to a previously-configured cloud storage location, in either CSV or Parquet format.
    Important: If DMS is enabled for your instance, you cannot download datasets to a local file system via the UI.
  • If DMS is not enabled, you can download a dataset in CSV format to your local file system.

Note: Admins can also export dataset files from the Dataset Catalog. See Exporting a Dataset from the Dataset Catalog.

Exporting to Cloud Storage with DMS

You use the Tamr DMS to export data in CSV or Parquet format to cloud storage.

Important: Tamr users who need access to data files exported to cloud storage must be given access to the appropriate cloud storage locations.

Note: For information about how to use APIs to export datasets, see Using the DMS API.

Export File Format for Cloud Storage Destinations

Files exported to cloud storage destinations have the following characteristics:

  • Format: Parquet or Comma separated values (.csv). The delimiter, quote and escape characters are ,, " and " respectively. Spaces in attribute names are handled differently in CSV and Parquet files:
    • For exports in CSV format, if an attribute name in the Tamr dataset includes a space, the exported column name includes the space.
    • For exports in Parquet format, Tamr automatically replaces spaces with underscores. For example, the “Cluster Name” attribute is exported as “Cluster_Name”.
  • Encoding: UTF-8.
  • Header: File contains a header row.
  • Multivalues: Multivalues are delimited by the character |.

Export a Dataset to a Cloud Storage Destination

To export a dataset to a cloud storage destination:

  1. Open a schema mapping, mastering, or categorization project and select the Datasets page.
  2. Locate the dataset and choose Export.
  3. Select Confirm to start a dataset export job.
  4. When the dataset export job finishes, choose Export available.
  5. Select Export to (provider name). The Export Dataset dialog opens.
  6. Select the file type for your export: CSV or Parquet.
  7. Specify a new or existing destination path for the file.
  • ADLS2: Account Name, Container, Path
  • AWS S3: Region, Bucket, Path
  • GCS: Project, Bucket, Path
    To search for an existing path, you can supply values for the first two fields and then select Apply.
    Tip: To reduce the time a search takes, provide as much of the path as possible.
    If you change the file type or the destination values, Apply to refresh the file finder.
  1. Select Export Dataset. Tamr exports the dataset into file(s) with the dataset name in the specified folder.

Exporting a Dataset to a Local File System

If DMS is not enabled, you can use the Tamr user interface to export datasets in CSV format from a project and download to a local file system.

Note: For information about how to use APIs to export datasets, Tamr customers can consult the Tamr Help Center knowledge base.

Export File Format for Local Downloads

Files downloaded to local file system have the following characteristics:

  • Format: Comma separated values (.csv). The delimiter, quote and escape characters are ,, " and " respectively.
  • Encoding: UTF-8.
  • Header: File contains a header row.
  • Multivalues: Multivalues are delimited by the character |.

Export a Dataset Locally

To export a dataset locally:

  1. Open a schema mapping, mastering, or categorization project and select the Datasets page.
  2. Locate the dataset and choose Export.
  3. Select Confirm to start a dataset export job.
  4. When the dataset export job finishes, choose Export available.
  5. Select Download export. The CSV file downloads to your local file system.

Updated about a month ago


Exporting a Dataset


Export a dataset from a Tamr project to save locally.

Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.