User GuidesAPI ReferenceRelease NotesEnrichment APIs
Doc HomeSupportLog In

Using the Core Connect API

Import and export data files between Tamr Core and a cloud storage provider with the Core Connect API service.

importantimportant Important: Core Connect is in limited release. If you have questions about using the Core Connect limited release APIs, contact Tamr Support.

Before you can use the Core Connect API, your system administrator must configure Core Connect. See Configuring Core Connect.

You can import and export data files between Tamr Core and a cloud storage provider by issuing POST requests to the RESTful web API for the Core Connect service. These POST requests start a job to transfer the files, and return a response with a job ID. You issue GET requests to obtain information about the jobs initiated by POST requests, including whether they are complete.

Core Connect supports the import and export of the following file types:

  • Avro and delimited files for S3, ADLSGen2, HDFS, GCS, and the server local file system
  • Newline-delimited JSON files for server local file system (export only) and S3

Core Connect does not support import and export of Parquet files. If you currently use the Data Movement Service (DMS) to import and export Parquet files between Tamr and cloud storage locations, continue to use DMS through the Tamr UI or the Data Movement Service (DMS) API for this effort.

Supported JDBC drivers: See the Core Connect overview for the current list of supported JDBC drivers.

Using the Core Connect API Swagger Documentation

Interactive Swagger API documentation is installed with each Tamr Core instance. For an introduction to using Swagger API documentation, see Using the Tamr Core API.

The Core Connect API Body Key Reference provides JSON key:value definitions, and Core Connect API Example Requests provides sample calls. For the full list of Connect API endpoints, commands, and keys, refer to the Connect API Swagger documentation, available at http://<tamr_ip>:9100/docs.

Key Features of the Core Connect API

Authentication

See Configuring Core Connect for authentication instructions for GCS, S3, and ADLS2.

Jinja templating is part of the API request processing and can be used to specify environment variable names. This allows you to avoid transmission of sensitive information, such as credentials. See Using a Jinja Template and Core Connect API Example Requests for an example.

For Hadoop File System (HDFS) and Hive, Core Connect supports Kerberos authentication. See Configuring Core Connect.

Data Import

The Core Connect API supports the following key features for data import:

  • Single query and batch queries.
  • Avro primitive types and array types are handled automatically. Avro complex types can be ignored.
  • Core Connect adds a column, TAMRSEQ, to the imported dataset which is populated with the row number. If your data does not include a primary key, you can specify the TAMRSEQ column as the primary key by leaving the primary_key list blank ('[]') during import.
  • Option to include data additively or destructively using the truncateTamrDataset body key. When set to false (default), records from the imported file are added to the target dataset. When set to true, all records are deleted (truncated) from the target dataset before the file is imported.

Data Export

The Core Connect API supports the following key features for data export:

  • Export either a full or delta dataset.
  • Export a dataset into a JDBC-compliant table; Core Connect can automatically create a table if it does not exist, adjusting the data type to be the database native database types for Oracle, Postgres, SQL Server, Redshift, and H2.
  • Execute Hive create table scripts via 'execute' API.

Profiling

The Core Connect API supports the following key features for profiling:

  • Profiles JDBC data/stream for up to 100,000 records by default. This number is configurable with the TAMR_CONNECT_PROFILING_SAMPLE_SIZE variable. See Configuring Core Connect.
  • Profiling results for each column are saved as records in a Tamr Core dataset.

See Core Connect API Example Requests for an example.