Data Movement Service
The data movement service allows you to ingest and export data files between Tamr Core and your cloud storage.
Tamr recommends using Core Connect to import and export large data files between Tamr Core and your cloud storage provider.
The data movement service (DMS) is designed to facilitate large data movement jobs between your cloud storage solution and your Tamr instance. After DMS is configured, you can use this service either through the Tamr user interface (UI) or through the DMS API.
DMS supports CSV and Parquet formats for Tamr dataset ingest and export. Tamr supports ingesting and exporting datasets from cloud storage within your cloud provider.
If DMS is enabled for your instance, users cannot download datasets to their local file system via the UI. This allows organizations to ensure all teams follow the appropriate data access policies, which are managed via their cloud storage accounts.
Important: Tamr Core users who need access to data files exported to cloud storage must be given access to the appropriate cloud storage locations.
Before You Use DMS
- The current version of DMS supports API interaction through command-line utilities, including cURL, only.
- DMS does not support Parquet files that include arrays with
nulls
. - For DMS jobs, the job ID is a GUID created by DMS and uses a different format than the numeric job
IDs created by Tamr. - For successfully completed DMS jobs, the status is
completed
, instead ofsucceeded
which is reported for other jobs. See Managing Jobs.
Parquet File Support
Tamr is able to ingest all Parquet files, including complex Parquet files with lists, maps, and structs.
Single level lists will appear as they are defined, with null values appearing as primitive nulls (as opposed to string nulls). When exporting Parquet files, nulls are excluded completely as defined in the Parquet specification.
Maps, structs, and lists nested deeper than two levels are partially supported; the column type is string, and the struct is converted to a string.
Note: If using the DMS API, set the inheritSchema
option to false
to convert all primitive types to string.
Below is an example of a complex struct converted to a string.
Configuring and Using DMS
To configure and use DMS, see:
Updated over 2 years ago