The Data Movement Service (DMS) is designed to facilitate large data movement jobs between your cloud storage solution and your Tamr instance. After DMS is configured, you can use this service either through the Tamr user interface (UI) or through the DMS API.
The DMS supports CSV and Parquet formats for Tamr dataset ingest and export. Tamr supports ingesting and exporting datasets from cloud storage within your cloud provider.
If DMS is enabled for your instance, users cannot download datasets to their local file system via the UI. This allows organizations to ensure all teams follow the appropriate data access policies, which are managed via their cloud storage accounts.
Important: Tamr users who need access to data files exported to cloud storage must be given access to the appropriate cloud storage locations.
- The current version of the data movement service (DMS) supports API interaction through command-line utilities, including cURL, only.
- DMS does not support Parquet files that include arrays with
- For DMS jobs, the job ID is a GUID created by DMS and uses a different format than the numeric job
IDs created by Tamr.
- For successfully completed DMS jobs, the status is
completed, instead of
succeededwhich is reported for other Tamr jobs.
Tamr is able to ingest all parquet files, including complex parquet files with lists, maps, and structs.
Single level lists will appear as they are defined, with null values appearing as primitive nulls (as opposed to string nulls). When exporting parquet files, nulls are excluded completely as defined in the parquet specification.
Maps, structs, and lists nested deeper than two levels are partially supported; the column type is string, and the struct is converted to a string.
Note: If using the DMS API, set the
inheritSchema option to
false to convert all primitive types to string.
Below is an example of a complex struct converted to a string.
To configure and use DMS, see:
Updated 3 months ago