User GuidesAPI ReferenceRelease Notes
Doc HomeHelp CenterLog In

Core Connect API Body Key Reference

Descriptions of the JSON keys that you can set in the body of Connect POST calls.

This reference is intended as a supplement to the Connect API Swagger documentation, which is available at http://<tamr_ip>:9100/docs.

To reduce repetition, keys that are common across different file types, cloud storage providers, and either export or import are described only once. This reference repeats descriptions only when a difference exists based on the requirements of a specific type, provider, or target.

As a result, you may need to refer to several sections to find information about all of the keys that you can define in the body of a given POST call.

JDBC Keys

You include these JSON key:value pairs in the body of the following JDBC-specific requests:

  • POST /jdbcIngest/batch
  • POST jdbcIngest/execute
  • POST /jdbcIngest/ingest
  • POST /jdbcIngest/preview
  • POST /jdbcIngest/profile
  • POST /urlExport/jdbc

queryConfig Object

You include the queryConfig object in all JDBC calls.

Key Description Data Type
dbPassword The username to use when authenticating to the data source string
dbUsername The password to use when authenticating to the data source string
fetchSize The number of records for the JDBC driver to retrieve at a time. Can be adjusted to improve read performance at the expense of memory consumption. int
jdbcUrl The location and the type of the data source using MySQL connection URL syntax. See Connection URL Syntax in the Oracle documentation.


Note: The ending slash is optional for all JDBC parquet exports across all cloud providers.

string

Keys for JDBC Import Only

The following keys apply to calls from one or more of the jdbcIngest endpoints only.

Key Description Data Type
queryTargetList Applies to jdbcIngest/batch only.

Describes the target Tamr datasets and their primary keys in order to ingest multiple query results at a time. Useful in cases where you have hundreds of tables to ingest, for example, research assays.

Descriptions of this object's keys follow.

Array
datasetName Applies to /jdbcIngest/ingest, /jdbcIngest/preview, and jdbcIngest/batch (in the queryTargetList object) only.

Name of the dataset in Tamr Core.

Array
policyIds Optional. The authorization policies that will include the new Tamr Core dataset as a resource.

Note: If present, this list does not update the policyIds of a dataset that already exists and is being updated.

string
primaryKey Applies to /jdbcIngest/ingest, /jdbcIngest/preview, and jdbcIngest/batch (in the queryTargetList object) only.

The individual field that uniquely identifies each record in the source. Optional.

If left blank, Tamr Core uses the `TAMRSEQ` column as the primary key. See Data Import in Using the Core Connect API.

Array
profile If set to true, Tamr Core adds a job to profile the new dataset after ingest to the queue. Set to false to profile the dataset at another time. Defaults to false. Boolean
query Applies to /jdbcIngest/ingest, /jdbcIngest/batch, and /jdbcIngest/preview only.

SQL query used to retrieve data from a JDBC source.

string
recipeId Optional. To add a dataset to a specific project, you can supply the recipeId to associate the dataset with. Integer
retrieveConnectMetadata Applies to /jdbcIngest/ingest only.

When set to true, Core Connect imports services metadata. Defaults to false when left blank. See Core Connect API Example Requests for an example.

Boolean
retrieveSourceMetadata Applies to /jdbcIngest/ingest and Snowflake JDBC sources only.

When set to true, Core Connect retrieves the metadata for the dataset stored in the Snowflake source. Defaults to false when left blank. See Core Connect API Example Requests for an example.

Boolean
statement Applies to /jdbcIngest/execute only. This SQL statement is not expected to return any results. For example:

INSERT INTO <final tablename> FROM (SELECT * FROM <tamr staging table> WHERE <condition>)

string
metadataQueryConfig Applies to /jdbcIngest/ingest and Snowflake JDBC sources only.

Optional. For a data source in Snowflake, retrieves metadata. See Adding a Metadata Property. For example:

“metadataConfig” : {
 “query”: “select COLUMN_NAME, TAG_NAME, TAG_VALUE from snowflake.account_usage.tag_references where OBJECT_NAME = '<import table>'”,
 “attributeColumn”:”COLUMN_NAME”,
 “keyColumn”:”TAG_NAME”,
 “valueColumn”:”TAG_VALUE”,
}

Contains the following keys:

  • query: SQL query identifying the attribute, key, and value columns in the dataset to import.
  • attributeColumn: the name of the column in the query that indicates the ingested dataset attribute.
  • keyColumn: the name of the column in the query for the metadata property’s key.
  • valueColumn: the name of the column in the query for the metadata property’s value.
Object
statement Applies to /jdbcIngest/execute only. This SQL statement is not expected to return any results. For example:

INSERT INTO <final tablename> FROM (SELECT * FROM <tamr staging table> WHERE <condition>)

string
truncateTamrDataset Allows you to include imported data additively or destructively.
  • When set to false (default), records from the imported file are added to the target dataset.
  • When set to true, all records are deleted (truncated) from the target dataset before the file is imported.
Boolean

Keys for JDBC Export Only

The following keys apply to /urlExport/jdbc only.

Key Description Data Type
exportDataConfig See sinkConfig Object. object
targetTableName The name of the target table in the target system to write exported data into. string
unifyDatasetName The Tamr dataset to export. string
truncateBeforeLoad When set to true, Connect deletes all of the rows in the target table before writing exported data into it. Defaults to false. Boolean
batchInsertSize The number of records to accumulate in the JDBC batch pool before sending to the database engine for insertion. Increasing batch size can improve throughput; however, too large a number can result in out of memory errors. Tamr recommends trying the following increments: 5000, 50000, 100000.
createTable Defaults to true: if the table being exported does not exist at the target, Core Connect creates a table on export.
For JDBC driver implementations that do not return a list of existing tables due to security or other implementation settings, you can set this key to false. When false, Core Connect does not attempt to create tables on export.
Boolean
intermittentCommits When set to true, Core Connect commits each batch of records. When set to false, Core Connect writes the entire export as one large transaction. Large transactions can be useful when the behavior you require is that the whole export should succeed even if latter records fail due to out of memory or size limitations. However, you must allocate enough transaction space to the target database for export jobs to succeed. This can be a challenge with SQL Server databases. Boolean

S3-Specific Keys

You include these JSON key:value pairs in the body of the following requests.

  • POST /urlExport/s3/avro
  • POST /urlExport/s3/delimited
  • POST /urlExport/s3/json
  • POST /urlIngest/s3/avro
  • POST /urlIngest/s3/delimited
  • POST /urlIngest/s3/json
  • POST /urlIngest/s3/delimited/profile
Key Description Data Type
accessKey S3 Access Key.

If S3 Access Key / Secret Keys are not provided, or role assumption via the awsRoleARN key is not used, Core Connect will fall back to the AWS Default Credentials lookup chain as described in the Working with AWS Credentials documentation.

string (optional)
awsEndpointUrl S3 AWS endpoint URL. Can be used with S3 private cloud. For more information, see the AWS PrivateLink for Amazon S3 documentation. string (optional)
awsExternalId AWS External Id when used with AWS Role assumption capability. For more information, see the AWS AssumeRole documentation.

Role session name is fixed as “tamr.connect” and the session duration is fixed to 1 hour.

string (optional)
awsRegion The AWS region to use with API calls. If a region is not specified, the default region provided by the lookup chain is used. For more information, see the AWS Region Selection documentation.

string (optional)
awsRoleArn Role ARN when used with AWS role assumption capability.

string (optional)
awsStsRegion When using awsRoleArn and awsExternalId, the awsStsRegion can be specified to reduce latency of authentication calls. For more information, see AWS Temporary security credentials in the IAM documentation. string (optional)
secretKey S3 Secret Access Key. If S3 Access Key / Secret Keys are not provided, or role assumption via the awsRoleARN key is not used, Core Connect will fall back to the AWS Default Credentials lookup chain as described in the Working with AWS Credentials documentation.

string (optional)
sessionToken The AWS session token to be used in authentication with an AWS access key and secret key. See GetSessionToken in the AWS documentation. string (optional)
encryptionConfig Applies to the urlExport/s3 endpoints only.

See encryptionConfig Object for descriptions of its keys.

object (optional)
sinkThreads Applies to the /urlExport/s3 endpoints only.

The number of parallel threads running the export job.

encryptionConfig Object

Key Description Data Type
algorithm S3 encryption algorithm. Possible values are AES256, KMS, or CUSTOM. string
customerB64EncKey Specifies the base64-encoded 256-bit encryption key to use to decrypt the source object. string
customerB64EncKeyMD5Digest Specifies the base64-encoded 128-bit MD5 digest of the encryption key used to decrypt/encrypt the source object. string
kmsKeyId Specifies the ID of the existing key in Vault to be used to decrypt/encrypt the object. string

ADLS2-Specific Keys

You include these JSON key:value pairs in the body of the following requests.

  • POST /urlExport/adls2/avro
  • POST /urlExport/adls2/delimited
  • POST /urlIngest/adls2/avro
  • POST /urlIngest/adls2/delimited
KeyDescriptionData Type
accountKeySent with accountName, the accountKey grants full access to the Azure storage account. For additional information about the security of this method, see Manage storage account access keys in the Microsoft documentation.string
clientIDThe clientId, clientSecret, and tenantId are all required to authenticate as a service principal. See the Azure Databricks - Accessing Data Lake - Using a Service Principal video for more information about service principals. Tip: The latter half of the video applies to DataBricks and can be skipped. The service principal should have the "Storage Blob Data Contributor" role or ACLS to access the specified container.string
clientSecretSee clientId. The clientId, clientSecret, and tenantId are all required to authenticate as a service principal.string
sasTokenUsed to grant limited access to a resource. For more information, see Grant limited access to Azure Storage resources using shared access signatures (SAS).string
tenantIdThe clientId, clientSecret, and tenantId are all required to authenticate as a service principal. The tenantId, also known as directoryId in the Azure documentation, is used to authenticate as a service principal.string
urlThe path or URI for the source file such as "file:///home/tamr/my_file.avro", "hdfs://data/my_file.avro", “file:///home/tamr/my_file.csv", or s3://bucket/data”.Must be in the format: "adls2://ACCOUNT_NAME.blob.core.windows.net/CONTAINER_NAME/ PATH_PREFIX"
or
"https://ACCOUNT_NAME.blob.core.windows.net/CONTAINER_NAME/ PATH_PREFIX"Note: When importing files from a directory, files are imported recursively through the directory and subdirectories. For importing delimited files, all files in the directory and subdirectories must be delimited files with identical formats, including having the same delimiter, schema, and primary key columns. For importing Avro, all files in the directory and subdirectories must have the same primary key.
string

GCS-Specific Keys

You include these JSON key:value pairs in the body of the following requests.

  • POST /urlExport/gcs/delimited
  • POST urlExport/gcs/avro
  • POST /urlIngest/gcs/avro
  • POST /urlIngest/gcs/delimited
Key Description Data Type
project The Google project in which the file is located. See Projects Cloud Storage in the Google documentation. string

HDFS-Specific Keys

This JSON key:value pair is used in the body of the following requests:

  • POST /urlExport/hdfs/avro
  • POST /urlIngest/hdfs/avro/avroSchema
Key Description Data Type
hadoopUserName HDFS username to use when HDFS is not protected by Kerberos. This overrides the username specified by the Connect configuration settings. string

Delimited File Import Keys

You include these JSON key:value pairs in the body of the following requests.

  • POST /urlIngest/s3/delimited
  • POST /urlIngest/adls2/delimited
  • POST /urlIngest/gcs/delimited
  • POST /urlIngest/serverfs/delimited
  • POST /urlIngest/s3/delimited/profile
  • POST /urlIngest/serverfs/delimited/profile

In addition to these keys, include any cloud-specific keys in the request body. See S3-Specific Keys, ADLS2-Specific Keys, or GCS-Specific Keys.

Key

Description

Data Type

datasetName

Name of the dataset in Tamr Core.

string

policyIds

Optional. An array of the authorization policies that will include the new Tamr Core dataset as a resource.

Note: If present, this list does not update the policyIds of a dataset that already exists and is being updated.

array

primaryKey

Set of fields uniquely identifying each record in the source.

string

profile

If set to true, Tamr Core adds a job to profile the new dataset after ingest to the queue. Set to false to profile the dataset at another time. Defaults to false.

Boolean

recipeId

Optional. To add a dataset to a specific project, you can supply the recipeId to associate the dataset with.

integer

resourceConfig

See resourceConfig Object for descriptions of its keys.

Object

truncateTamrDataset

Allows you to include imported data additively or destructively.

  • When set to false (default), records from the imported file are added to the target dataset.
  • When set to true, all records are deleted (truncated) from the target dataset before the file is imported.
Boolean
urlThe path or URI to the source file or directory, such as “file:///home/tamr/my_file.csv", "s3://bucket/data/my_file.csv", or s3://bucket/data”.Note: When importing files from a directory, files are imported recursively through the directory and subdirectories. All files in the directory and subdirectories must be delimited files with identical formats, including having the same delimiter, schema, and primary key columns.string

resourceConfig Object

Key Description Data Type
quoteCharacter Value used for escaping values where the field delimiter is part of the value. For example, the value " a , b " is parsed as a , b.
Defaults to a quotation mark character (").
string
recordSeparator Character sequence used to indicate a new line. Defaults to \n. string
characterSetName

Applies to urlIngest endpoints only.

Name of the character set used in the file. Defaults to UTF-8. See Character Sets table, Name column for more information.

string
charactersForDetection Applies to urlIngest endpoints only.

List of characters to use to automatically determine the column delimiter when the column delimiter is not known.

string
columnDelimiter Value used to separate individual fields in the input. Defaults to a comma (,). string
columns Applies to urlIngest endpoints only.

Specifies a custom header line.

string
delimiterDetectionEnabled Applies to urlIngest endpoints only.

When the column delimiter is not known, set this key to true so that Core Connect will analyze the dataset to detect the delimiter. The charactersForDetection key provides a hint for what characters to consider.

Boolean
escapeCharacter Value used for escaping the quote character inside an already escaped value. For example, the value " "" a , b "" " is parsed as " a , b ". string
maxCharsPerColumn Applies to urlIngest endpoints only.

Limit of characters to read in a single field. Used to avoid out of memory errors in case of invalid delimiters. Defaults to 4096, with a maximum of 40Mb.

int
multiValueDelimiter Applies to urlIngest endpoints only.

Value used to separate values in a field into an array. For example, the following color values are ingested as an array of strings when you set the multiValueDelimiter to a pipe (|) character.

sku, colors
"1001232", "blue|red|green"

string

Delimited File Export Keys

You include these JSON key:value pairs in the body of the following requests.

  • POST /urlExport/s3/delimited
  • POST /urlExport/adls2/delimited
  • POST /urlExport/gcs/delimited
  • POST /urlExport/serverfs/delimited
  • POST /urlExport/serverfs/delimited/profile

In addition to these keys, include any cloud-specific keys in the request body. See S3-Specific Keys, ADLS2-Specific Keys, or GCS-Specific Keys.

Key Description Data Type
datasetName Name of the dataset in Tamr. string
resourceConfig This object contains a subset of the keys that you can set when ingesting delimited files. You can set:
  • columnDelimiter
  • escapeCharacter
  • quoteCharacter
  • recordSeparator

For more information, see resourceConfig Object.

object
sinkConfig See sinkConfig Object for descriptions of its keys. object
url The path or URI to the destination for the file such as "file:///home/tamr/my_file.csv" or "s3://bucket/data/my_file.csv" string

sinkConfig Object

An object with the following keys is named exportDataConfig in urlExport/jdbc.

Key Description Data Type
columnsExcludeRegex Regular expression that defines how to exclude certain columns from export. string
deltaConfig Configures how differences are computed on export.

In the deltaConfig object, you specify values for the following keys:

  • fromVersion (int) The starting version of the Tamr Core dataset used in the delta export.
  • toVersion (int)
    The ending version of Tamr Core dataset used in the delta export.
object
exportDelta If true, the delta is exported based on deltaConfig. If true and deltaConfig is null, exports the latest delta if available. Boolean
limitRecords Sets a limit on the number of records to export. int
mergeArrayValuesEnabled If true, concatenates arrays of values before export. Boolean
mergedArrayValuesDelimiter The delimiter used to concatenate array values. string
renameFields Provides the ability to rename fields on export.

Example:

{"Addr":"Address"}

In this example, the Addr field in the Tamr Core dataset will be exported as the Address field.

Dictionary

Avro File Import Keys

You include these JSON key:value pairs in the body of the following requests.

  • POST /urlIngest/adls2/avro
  • POST /urlIngest/gcs/avro
  • POST /urlIngest/serverfs/avro

In addition to these keys, include any cloud-specific keys in the request body. See S3-Specific Keys, ADLS2-Specific Keys, or GCS-Specific Keys.

Key

Description

Data Type

datasetName

Name of the dataset in Tamr Core.

string

policyIds

Optional. An array of the authorization policies that will include the new Tamr Core dataset as a resource.
Note: If present, this list does not update the policyIds of a dataset that already exists and is being updated.

array

primaryKey

Set of fields uniquely identifying each record in the source.

string

profile

If set to true, Tamr Core adds a job to profile the new dataset after ingest to the queue.
Set to false to profile the dataset at another time. Defaults to false.

Boolean

recipeId

Optional. To add a dataset to a specific project, you can supply the recipeId to associate the dataset with.

integer

resourceConfig

The resourceConfig object contains the following key for Avro files:

  • ignoreComplexTypes (Boolean)

Avro supports six complex data types: Records, Enums, Arrays, Maps, Unions, and Fixed.

  • When set to true, Core Connect ignores any complex types detected in Avro.
  • When set to false, Core Connect returns an error when complex types are detected in the Avro file.

object

truncateTamrDataset

Allows you to include imported data additively or destructively.

  • When set to false (default), records from the imported file are added to the target dataset.
  • When set to true, all records are deleted (truncated) from the target dataset before the file is imported.
Boolean
urlThe path or URI for the source file such as "file:///home/tamr/my_file.avro", "hdfs://data/my_file.avro", or “hdfs://data/”.Note: When importing files from a directory, Tamr Core imports all .avro files recursively through the directory and subdirectories. These files must all have the same primary key.string

Avro File Export Keys

You include these JSON key:value pairs in the body of the following requests.

  • POST /urlExport/hdfs/avro
  • POST /urlExport/serverfs/avro
  • POST /urlExport/hdfs/avro/avroSchema

In addition to these keys, include the sinkConfig keys and any cloud-specific keys in the request body. See S3-Specific Keys, ADLS2-Specific Keys, or GCS-Specific Keys.

Key Description Data Type
datasetName Name of the dataset in Tamr. string
url The path or URI for the file destination such as "file:///home/tamr/my_file.avro" or "hdfs://data/my_file.avro" string

JSON File Export Keys

The resourceConfig object for /urlExport/s3/json and /urlExport/serverfs/json contains the following keys for JSON files.

Key

Description

Data Type

flattenEnable

Defines the type applied to imported values: array of values or values.

  • When set to false (default), the type for every attribute is an array of values. For example: {"TAMRSEQ":["1"],"PK":["305d411f"],"surname":["matei"]}.
  • When set to true, the type for every attribute is a value. For example: {"TAMRSEQ":"1","PK":"305d411f","surname":"matei"}.
Boolean
simplifiedDataTypesEnableDefines whether to simplify datatypes (true), or use the data types registering for the dataset you are exporting within Tamr Core (false).Boolean

These keys interact as follows:

  • When simplifiedDataTypesEnable=true && flattenEnable=false, all data is exported as array of string type.
  • When simplifiedDataTypesEnable=true && flattenEnable=true, all data is exported as string type.
  • When simplifiedDataTypesEnable=false, data is exported matching the type for its column as defined in the dataset service (regardless of the setting of flattenEnable).