A Primer on Exporting Datasets
You can either export and download a dataset using the Tamr UI or API.
From the UI
Please see the Tamr documentation on Exporting a Dataset.
Datasets are exported in the CSV format by default. However, datasets that contain complex data types cannot be exported as CSV and this will be shown as Export unavailable in the UI. These datasets can be configured to be exported in the JSON format using the Tamr API.
Using the API
To access the Tamr API, see Using the APIs.
Before you can export a dataset using the API, you will need to first determine its id, latest revision number, and column header.
- Navigate to the Tamr swagger docs interface at <hostname>:<port>/docs.
- Click on the dataset service.
- Click to expand the GET /datasets/named/{name} API endpoint as shown in the screenshot below.
- Enter the name of the dataset you would like to export under the name parameter.
- Click Try it out!
The API call, if successful, will return with a JSON containing the following fields:
- [“documentId”][“id”] -> dataset_id
- [“lastModified”][“version”] -> latest_revision_number
- [“data”][“fields”] -> data_columns
Note these down as you will need them to generate an export for the dataset. To generate an export:
- Click on the unify service in the swagger docs.
- Click to expand the POST /export API endpoint.
- Fill the body with the following:
{
"datasetId": dataset_id,
"revision": latest_revision_number
"columns": data_columns,
"formatConfiguration": format_config,
}
The text in bold should be replaced with the values obtained from the previous API call. You may also choose to export only a subset of the columns in data_columns. To configure the export to be in a JSON format, use {"@class":"com.tamr.procurify.models.export.JsonFormat$Configuration"} as the format_config. For configuring a CSV export instead, use:
{
"@class": "com.tamr.procurify.models.export.CsvFormat$Configuration",
"writeHeader": true,
"delimiterCharacter": ",",
"quoteCharacter": "\"",
"nullValue": ""
}
- Click Try it out! and you should see an export job being kicked off in the UI.
- Once the job is finished, you can download the export either from the UI using the Download export link or by navigating to :/api/export/read/dataset/dataset_id in your browser.
You may also consider setting user roles and policies to specify dataset access. See Tamr User Policies for more information.
Updated about 2 years ago