You can import and export data files between Tamr Core and a cloud storage provider by issuing POST requests to the RESTful web API for the Core Connect service. These POST requests start a job to transfer the files, and return a response with a job ID. You issue GET requests to obtain information about the jobs initiated by POST requests, including whether they are complete.

Before you can use the Core Connect API to connect to a cloud storage provider, your system administrator must configure Core Connect. See Configuring Core Connect.

You can also use Core Connect to access external datastores with JDBC drivers.

Core Connect supports the import and export of the following file types:

Avro, delimited, and Parquet files for S3, ADLSGen2, HDFS, GCS, and the server local file system
Newline-delimited JSON files for server local file system (export only) and S3

Supported JDBC drivers: See the Core Connect overview for the current list of supported JDBC drivers.

Using the Core Connect API Swagger Documentation

Interactive Swagger API documentation is installed with each Tamr Core instance. For an introduction to using Swagger API documentation, see Using the Tamr Core API.

The Core Connect API Body Key Reference provides JSON key:value definitions, and Core Connect API Example Requests provides sample calls. For the full list of Connect API endpoints, commands, and keys, refer to the Connect API Swagger documentation, available at http://<tamr_ip>:9100/docs.

Key Features of the Core Connect API

Authentication

See Configuring Core Connect for authentication instructions for GCS, S3, and ADLS2.

Jinja templating is part of the API request processing and can be used to specify environment variable names. This allows you to avoid transmission of sensitive information, such as credentials. See Using a Jinja Template and Core Connect API Example Requests for an example.

For Hadoop File System (HDFS) and Hive, Core Connect supports Kerberos authentication. See Configuring Core Connect.

Data Import

The Core Connect API supports the following key features for data import:

Single query and batch queries.
Avro primitive types and array types are handled automatically. Avro complex types can be ignored.
Core Connect adds a column, TAMRSEQ, to the imported dataset which is populated with the row number. If your data does not include a primary key, you can specify the TAMRSEQ column as the primary key by leaving the primaryKey array blank ('[]') during import.
Option to include data additively or destructively using the truncateTamrDataset body key. When set to false (default), records from the imported file are added to the target dataset. When set to true, all records are deleted (truncated) from the target dataset before the file is imported.

Data Export

The Core Connect API supports the following key features for data export:

Export either a full or delta dataset. See Options for Exporting below.
Export a dataset into a JDBC-compliant table; Core Connect can automatically create a table if it does not exist, adjusting the data type to be the database native database types for Oracle, Postgres, SQL Server, Redshift, and H2.
Execute Hive create table scripts via 'execute' API.

Options for Exporting

You can choose to export either the full Tamr Core dataset or only the new and modified records between two available versions. The API endpoints for export to either file systems or a datastore are set to export full datasets by default. To export only new and modified records, in exportDeltaConfig you set exportDelta to true.

By default, Core Connect exports the set of new or modified records between the two latest versions of the specified dataset only.
To specify two dataset versions, in deltaConfig you can specify fromVersion and/or toVersion.

See Delta Export Examples.

Note: If the changes between the two latest, or specified, versions of the requested dataset are not incremental (that is, the data has been truncated and reloaded), the API call with the exportDelta flag on returns an error.

Profiling

The Core Connect API supports the following key features for profiling:

Profiles JDBC data/stream for up to 100,000 records by default. This number is configurable with the TAMR_CONNECT_PROFILING_SAMPLE_SIZE variable. See Configuring Core Connect.
Profiling results for each column are saved as records in a Tamr Core dataset.

See Core Connect API Example Requests for an example.

Managing Jobs

The Core Connect API allows you to monitor and modify jobs. You can:

Get a specific job by id: GET/jobs/{id}
Cancel a job by id: POST/jobs/{id}/cancel
Get a list of jobs initiated between two dates: GET/jobs

For the full list of Connect API endpoints, commands, and keys, refer to the Connect API Swagger documentation, available at http://<tamr_ip>:9100/docs.

You can also view Core Connect job statuses in the Core Connect UI. To view the status of a job's execution, navigate to the Jobs tab. See Monitoring Job Status for more on job statuses.