Import and export data files between Tamr Core and a cloud storage provider with the Core Connect API service.
Important: Core Connect is in limited release. If you have questions about using the Core Connect limited release APIs, contact Tamr Support.
Before you can use the Core Connect API, your system administrator must configure Core Connect. See Configuring Core Connect.
You can import and export data files between Tamr Core and a cloud storage provider by issuing POST requests to the RESTful web API for the Core Connect service. These POST requests start a job to transfer the files, and return a response with a job ID. You issue GET requests to obtain information about the jobs initiated by POST requests, including whether they are complete.
Core Connect supports the import and export of the following file types:
- Avro, delimited, and PArquet files for S3, ADLSGen2, HDFS, GCS, and the server local file system
- Newline-delimited JSON files for server local file system (export only) and S3
Supported JDBC drivers: See the Core Connect overview for the current list of supported JDBC drivers.
Using the Core Connect API Swagger Documentation
Interactive Swagger API documentation is installed with each Tamr Core instance. For an introduction to using Swagger API documentation, see Using the Tamr Core API.
The Core Connect API Body Key Reference provides JSON key:value definitions, and Core Connect API Example Requests provides sample calls. For the full list of Connect API endpoints, commands, and keys, refer to the Connect API Swagger documentation, available at http://<tamr_ip>:9100/docs
.
Key Features of the Core Connect API
Authentication
See Configuring Core Connect for authentication instructions for GCS, S3, and ADLS2.
Jinja templating is part of the API request processing and can be used to specify environment variable names. This allows you to avoid transmission of sensitive information, such as credentials. See Using a Jinja Template and Core Connect API Example Requests for an example.
For Hadoop File System (HDFS) and Hive, Core Connect supports Kerberos authentication. See Configuring Core Connect.
Data Import
The Core Connect API supports the following key features for data import:
- Single query and batch queries.
- Avro primitive types and array types are handled automatically. Avro complex types can be ignored.
- Core Connect adds a column,
TAMRSEQ
, to the imported dataset which is populated with the row number. If your data does not include a primary key, you can specify the TAMRSEQ column as the primary key by leaving the primary_key list blank ('[]') during import. - Option to include data additively or destructively using the
truncateTamrDataset
body key. When set tofalse
(default), records from the imported file are added to the target dataset. When set totrue
, all records are deleted (truncated) from the target dataset before the file is imported.
Data Export
The Core Connect API supports the following key features for data export:
- Export either a full or delta dataset. See Options for Exporting below.
- Export a dataset into a JDBC-compliant table; Core Connect can automatically create a table if it does not exist, adjusting the data type to be the database native database types for Oracle, Postgres, SQL Server, Redshift, and H2.
- Execute Hive create table scripts via 'execute' API.
Options for Exporting
You can choose to export either the full Tamr Core dataset or only the new and modified records between two available versions. The API endpoints for export to either file systems or a datastore are set to export full datasets by default. To export only new and modified records, in exportDeltaConfig
you set exportDelta
to true
.
- By default, Core Connect exports the set of new or modified records between the two latest versions of the specified dataset only.
- To specify two dataset versions, in
deltaConfig
you can specifyfromVersion
and/ortoVersion
.
Note: If the changes between the two latest, or specified, versions of the requested dataset are not incremental (that is, the data has been truncated and reloaded), the API call with the exportDelta flag on returns an error.
Profiling
The Core Connect API supports the following key features for profiling:
- Profiles JDBC data/stream for up to 100,000 records by default. This number is configurable with the TAMR_CONNECT_PROFILING_SAMPLE_SIZE variable. See Configuring Core Connect.
- Profiling results for each column are saved as records in a Tamr Core dataset.
See Core Connect API Example Requests for an example.