Exporting a Dataset from the Dataset Catalog
Admins can use the dataset catalog to prepare a dataset for export and export it.
As an admin, you can export datasets from the dataset catalog.
- If the Data Movement Service (DMS) is not enabled, you can download a dataset in CSV format to your local file system.
- If DMS is enabled and configured, you can export a dataset to a previously-configured cloud storage location via the user interface, in either CSV or Parquet format.
Exporting a Dataset to a Local File System
If DMS is not enabled, you can download a dataset to a local file system from the dataset catalog.
Note: For information about how to use APIs to export datasets, consult the Tamr Help Center.
Export File Format for Local Download
Tamr Core downloads files with the following characteristics:
- Format: Comma separated values (.csv). The delimiter, quote, and escape characters are
,
,"
, and"
respectively. - Encoding: UTF-8.
- Header: File contains a header row.
- Multivalues: Multi-value arrays are delimited by the character
|
.
Export a Dataset to a Local File System
To export a dataset to a local file system:
- At the top right of your browser, select Dataset Catalog. A list of all datasets appears.
- Locate the dataset and select Export. You might need to scroll horizontally to see this option.
- Select Confirm to start the dataset export job. This process can take several minutes.
- When the dataset export job finishes, select Export available.
- Select Download Export. The file downloads to your computer.
Exporting to Cloud Storage with DMS
You use the DMS to export datasets in CSV or Parquet format to cloud storage.
Important: Team members who need access to the data files you export to cloud storage must be given access to the appropriate cloud storage locations.
Note: For information about how to use APIs to export datasets, see Using the DMS API.
Export File Format for Cloud Storage Destination
Files exported to cloud storage destinations have the following characteristics:
- Format: Parquet or Comma separated values (.csv). The delimiter, quote and escape characters are
,
,"
and"
respectively. Spaces in attribute names are handled differently in CSV and Parquet files:- For exports in CSV format, if an attribute name in the dataset includes a space, the exported column name includes the space.
- For exports in Parquet format, Tamr Core automatically replaces spaces with underscores. For example, the “Cluster Name” attribute is exported as “Cluster_Name”.
- Encoding: UTF-8.
- Header: File contains a header row.
- Multivalues: Multi-value arrays are delimited by the character
|
.
Exporting a Dataset to a Cloud Storage Destination
You supply the following provider-specific values to identify the cloud storage destination for your export:
- ADLS2: Account Name, Container, Path
- AWS S3: Region, Bucket, Path
- GCS: Project, Bucket, Path
To export to a cloud storage destination:
- On the Dataset Catalog page, locate the dataset and choose Export.
- Select Confirm to start the dataset export job.
- When the dataset export job finishes, select Export available.
- Select Export to (provider name). The Export Dataset dialog opens.
- Select the file type for your export: CSV or Parquet. If you change the file type, select Apply to refresh the file finder.
- Specify a new or existing destination path for the file.
To search for an existing path, you can supply values for the first two fields and then select Apply. However, to reduce the time a search can take, provide as much of the path as possible.
If you change the destination values, select Apply to refresh the file finder. - Select Export Dataset. Tamr Core exports the dataset into file(s) with the dataset name in the specified folder.
Updated over 2 years ago