Core Connect API Body Key Reference

Descriptions of the JSON keys that you can set in the body of Connect POST calls.

This reference is intended as a supplement to the Connect API Swagger documentation, which is available at http://<tamr_ip>:9100/docs.

To reduce repetition, keys that are common across different file types, cloud storage providers, and either export or import are described only once. This reference repeats descriptions only when a difference exists based on the requirements of a specific type, provider, or target.

As a result, you may need to refer to several sections to find information about all of the keys that you can define in the body of a given POST call.

JDBC Keys

You include these JSON key:value pairs in the body of the following JDBC-specific requests:

POST /jdbcIngest/batch
POST jdbcIngest/execute
POST /jdbcIngest/ingest
POST /jdbcIngest/preview
POST /jdbcIngest/profile
POST /urlExport/jdbc

queryConfig Object

You include the queryConfig object in all JDBC calls.

Key	Description	Data Type
dbPassword	The username to use when authenticating to the data source	string
dbUsername	The password to use when authenticating to the data source	string
fetchSize	The number of records for the JDBC driver to retrieve at a time. Can be adjusted to improve read performance at the expense of memory consumption.	int
jdbcUrl	The location and the type of the data source using MySQL connection URL syntax. See Connection URL Syntax in the Oracle documentation. Note: The ending slash is optional for all JDBC parquet exports across all cloud providers.	string

Keys for JDBC Import Only

The following keys apply to calls from one or more of the jdbcIngest endpoints only.

Key	Description	Data Type
queryTargetList	Applies to jdbcIngest/batch only. Describes the target Tamr datasets and their primary keys in order to ingest multiple query results at a time. Useful in cases where you have hundreds of tables to ingest, for example, research assays. Descriptions of this object's keys follow.	Array
datasetName	Applies to /jdbcIngest/ingest, /jdbcIngest/preview, and jdbcIngest/batch (in the queryTargetList object) only. Name of the dataset in Tamr Core.	Array
policyIds	Optional. The authorization policies that will include the new Tamr Core dataset as a resource. Note: If present, this list does not update the policyIds of a dataset that already exists and is being updated.	string
primaryKey	Applies to /jdbcIngest/ingest, /jdbcIngest/preview, and jdbcIngest/batch (in the queryTargetList object) only. The individual field that uniquely identifies each record in the source. Optional. If left blank, Tamr Core uses the `TAMRSEQ` column as the primary key. See Data Import in Using the Core Connect API.	Array
profile	If set to true, Tamr Core adds a job to profile the new dataset after ingest to the queue. Set to false to profile the dataset at another time. Defaults to false.	Boolean
query	Applies to /jdbcIngest/ingest, /jdbcIngest/batch, and /jdbcIngest/preview only. SQL query used to retrieve data from a JDBC source.	string
recipeId	Optional. To add a dataset to a specific project, you can supply the recipeId to associate the dataset with.	Integer
retrieveConnectMetadata	Applies to /jdbcIngest/ingest only. When set to true, Core Connect imports services metadata. Defaults to false when left blank. See Core Connect API Example Requests for an example.	Boolean
retrieveSourceMetadata	Applies to /jdbcIngest/ingest and Snowflake JDBC sources only. When set to true, Core Connect retrieves the metadata for the dataset stored in the Snowflake source. Defaults to false when left blank. See Core Connect API Example Requests for an example.	Boolean
statement	Applies to /jdbcIngest/execute only. This SQL statement is not expected to return any results. For example: `INSERT INTO <final tablename> FROM (SELECT * FROM <tamr staging table> WHERE <condition>)`	string
metadataQueryConfig	Applies to /jdbcIngest/ingest and Snowflake JDBC sources only. Optional. For a data source in Snowflake, retrieves metadata. See Adding a Metadata Property. For example: `“metadataConfig” : { “query”: “select COLUMN_NAME, TAG_NAME, TAG_VALUE from snowflake.account_usage.tag_references where OBJECT_NAME = '<import table>'”, “attributeColumn”:”COLUMN_NAME”, “keyColumn”:”TAG_NAME”, “valueColumn”:”TAG_VALUE”, }` Contains the following keys: query: SQL query identifying the attribute, key, and value columns in the dataset to import. attributeColumn: the name of the column in the query that indicates the ingested dataset attribute. keyColumn: the name of the column in the query for the metadata property’s key. valueColumn: the name of the column in the query for the metadata property’s value.	Object
statement	Applies to /jdbcIngest/execute only. This SQL statement is not expected to return any results. For example: `INSERT INTO <final tablename> FROM (SELECT * FROM <tamr staging table> WHERE <condition>)`	string
truncateTamrDataset	Allows you to include imported data additively or destructively. When set to false (default), records from the imported file are added to the target dataset. When set to true, all records are deleted (truncated) from the target dataset before the file is imported.	Boolean

Keys for JDBC Export Only

The following keys apply to /urlExport/jdbc only.

Key	Description	Data Type
exportDataConfig	See sinkConfig Object.	object
targetTableName	The name of the target table in the target system to write exported data into.	string
unifyDatasetName	The Tamr dataset to export.	string
truncateBeforeLoad	When set to true, Connect deletes all of the rows in the target table before writing exported data into it. Defaults to false.	Boolean
batchInsertSize	The number of records to accumulate in the JDBC batch pool before sending to the database engine for insertion. Increasing batch size can improve throughput; however, too large a number can result in out of memory errors. Tamr recommends trying the following increments: 5000, 50000, 100000.
createTable	Defaults to true: if the table being exported does not exist at the target, Core Connect creates a table on export. For JDBC driver implementations that do not return a list of existing tables due to security or other implementation settings, you can set this key to false. When false, Core Connect does not attempt to create tables on export.	Boolean
intermittentCommits	When set to true, Core Connect commits each batch of records. When set to false, Core Connect writes the entire export as one large transaction. Large transactions can be useful when the behavior you require is that the whole export should succeed even if latter records fail due to out of memory or size limitations. However, you must allocate enough transaction space to the target database for export jobs to succeed. This can be a challenge with SQL Server databases.	Boolean

S3-Specific Keys

You include these JSON key:value pairs in the body of the following requests.

POST /urlExport/s3/avro
POST /urlExport/s3/delimited
POST /urlExport/s3/json
POST /urlIngest/s3/avro
POST /urlIngest/s3/delimited
POST /urlIngest/s3/json
POST /urlIngest/s3/delimited/profile

Key	Description	Data Type
accessKey	S3 Access Key. If S3 Access Key / Secret Keys are not provided, or role assumption via the awsRoleARN key is not used, Core Connect will fall back to the AWS Default Credentials lookup chain as described in the Working with AWS Credentials documentation.	string (optional)
awsEndpointUrl	S3 AWS endpoint URL. Can be used with S3 private cloud. For more information, see the AWS PrivateLink for Amazon S3 documentation.	string (optional)
awsExternalId	AWS External Id when used with AWS Role assumption capability. For more information, see the AWS AssumeRole documentation. Role session name is fixed as “tamr.connect” and the session duration is fixed to 1 hour.	string (optional)
awsRegion	The AWS region to use with API calls. If a region is not specified, the default region provided by the lookup chain is used. For more information, see the AWS Region Selection documentation.	string (optional)
awsRoleArn	Role ARN when used with AWS role assumption capability.	string (optional)
awsStsRegion	When using `awsRoleArn` and `awsExternalId`, the `awsStsRegion` can be specified to reduce latency of authentication calls. For more information, see AWS Temporary security credentials in the IAM documentation.	string (optional)
secretKey	S3 Secret Access Key. If S3 Access Key / Secret Keys are not provided, or role assumption via the awsRoleARN key is not used, Core Connect will fall back to the AWS Default Credentials lookup chain as described in the Working with AWS Credentials documentation.	string (optional)
sessionToken	The AWS session token to be used in authentication with an AWS access key and secret key. See GetSessionToken in the AWS documentation.	string (optional)
encryptionConfig	Applies to the urlExport/s3 endpoints only. See encryptionConfig Object for descriptions of its keys.	object (optional)
sinkThreads	Applies to the /urlExport/s3 endpoints only. The number of parallel threads running the export job.

encryptionConfig Object

Key	Description	Data Type
algorithm	S3 encryption algorithm. Possible values are AES256, KMS, or CUSTOM.	string
customerB64EncKey	Specifies the base64-encoded 256-bit encryption key to use to decrypt the source object.	string
customerB64EncKeyMD5Digest	Specifies the base64-encoded 128-bit MD5 digest of the encryption key used to decrypt/encrypt the source object.	string
kmsKeyId	Specifies the ID of the existing key in Vault to be used to decrypt/encrypt the object.	string

ADLS2-Specific Keys

You include these JSON key:value pairs in the body of the following requests.

POST /urlExport/adls2/avro
POST /urlExport/adls2/delimited
POST /urlIngest/adls2/avro
POST /urlIngest/adls2/delimited

Key	Description	Data Type
accountKey	Sent with accountName, the accountKey grants full access to the Azure storage account. For additional information about the security of this method, see Manage storage account access keys in the Microsoft documentation.	string
clientID	The clientId, clientSecret, and tenantId are all required to authenticate as a service principal. See the Azure Databricks - Accessing Data Lake - Using a Service Principal video for more information about service principals. Tip: The latter half of the video applies to DataBricks and can be skipped. The service principal should have the "Storage Blob Data Contributor" role or ACLS to access the specified container.	string
clientSecret	See clientId. The clientId, clientSecret, and tenantId are all required to authenticate as a service principal.	string
sasToken	Used to grant limited access to a resource. For more information, see Grant limited access to Azure Storage resources using shared access signatures (SAS).	string
tenantId	The clientId, clientSecret, and tenantId are all required to authenticate as a service principal. The tenantId, also known as directoryId in the Azure documentation, is used to authenticate as a service principal.	string
url	The path or URI for the source file such as "file:///home/tamr/my_file.avro", "hdfs://data/my_file.avro", “file:///home/tamr/my_file.csv", or s3://bucket/data”.Must be in the format: "adls2://ACCOUNT_NAME.blob.core.windows.net/CONTAINER_NAME/ PATH_PREFIX" or "https://ACCOUNT_NAME.blob.core.windows.net/CONTAINER_NAME/ PATH_PREFIX"Note: When importing files from a directory, files are imported recursively through the directory and subdirectories. For importing delimited files, all files in the directory and subdirectories must be delimited files with identical formats, including having the same delimiter, schema, and primary key columns. For importing Avro, all files in the directory and subdirectories must have the same primary key.	string

GCS-Specific Keys

You include these JSON key:value pairs in the body of the following requests.

POST /urlExport/gcs/delimited
POST urlExport/gcs/avro
POST /urlIngest/gcs/avro
POST /urlIngest/gcs/delimited

Key	Description	Data Type
project	The Google project in which the file is located. See Projects Cloud Storage in the Google documentation.	string

HDFS-Specific Keys

This JSON key:value pair is used in the body of the following requests:

POST /urlExport/hdfs/avro
POST /urlIngest/hdfs/avro/avroSchema

Key	Description	Data Type
hadoopUserName	HDFS username to use when HDFS is not protected by Kerberos. This overrides the username specified by the Connect configuration settings.	string

Delimited File Import Keys

You include these JSON key:value pairs in the body of the following requests.

POST /urlIngest/s3/delimited
POST /urlIngest/adls2/delimited
POST /urlIngest/gcs/delimited
POST /urlIngest/serverfs/delimited
POST /urlIngest/s3/delimited/profile
POST /urlIngest/serverfs/delimited/profile

In addition to these keys, include any cloud-specific keys in the request body. See S3-Specific Keys, ADLS2-Specific Keys, or GCS-Specific Keys.

Key	Description	Data Type
datasetName	Name of the dataset in Tamr Core.	string
policyIds	Optional. An array of the authorization policies that will include the new Tamr Core dataset as a resource. Note: If present, this list does not update the policyIds of a dataset that already exists and is being updated.	array
primaryKey	Set of fields uniquely identifying each record in the source.	string
profile	If set to true, Tamr Core adds a job to profile the new dataset after ingest to the queue. Set to false to profile the dataset at another time. Defaults to false.	Boolean
recipeId	Optional. To add a dataset to a specific project, you can supply the recipeId to associate the dataset with.	integer
resourceConfig	See resourceConfig Object for descriptions of its keys.	Object
truncateTamrDataset	Allows you to include imported data additively or destructively. When set to false (default), records from the imported file are added to the target dataset. When set to true, all records are deleted (truncated) from the target dataset before the file is imported.	Boolean
url	The path or URI to the source file or directory, such as “file:///home/tamr/my_file.csv", "s3://bucket/data/my_file.csv", or s3://bucket/data”.Note: When importing files from a directory, files are imported recursively through the directory and subdirectories. All files in the directory and subdirectories must be delimited files with identical formats, including having the same delimiter, schema, and primary key columns.	string

resourceConfig Object

Key	Description	Data Type
quoteCharacter	Value used for escaping values where the field delimiter is part of the value. For example, the value `" a , b "` is parsed as `a , b`. Defaults to a quotation mark character (").	string
recordSeparator	Character sequence used to indicate a new line. Defaults to \n.	string
characterSetName	Applies to urlIngest endpoints only. Name of the character set used in the file. Defaults to UTF-8. See Character Sets table, Name column for more information.	string
charactersForDetection	Applies to urlIngest endpoints only. List of characters to use to automatically determine the column delimiter when the column delimiter is not known.	string
columnDelimiter	Value used to separate individual fields in the input. Defaults to a comma (,).	string
columns	Applies to urlIngest endpoints only. Specifies a custom header line.	string
delimiterDetectionEnabled	Applies to urlIngest endpoints only. When the column delimiter is not known, set this key to true so that Core Connect will analyze the dataset to detect the delimiter. The charactersForDetection key provides a hint for what characters to consider.	Boolean
escapeCharacter	Value used for escaping the quote character inside an already escaped value. For example, the value `" "" a , b "" "` is parsed as `" a , b "`.	string
maxCharsPerColumn	Applies to urlIngest endpoints only. Limit of characters to read in a single field. Used to avoid out of memory errors in case of invalid delimiters. Defaults to 4096, with a maximum of 40Mb.	int
multiValueDelimiter	Applies to urlIngest endpoints only. Value used to separate values in a field into an array. For example, the following color values are ingested as an array of strings when you set the multiValueDelimiter to a pipe (\|) character. `sku, colors` `"1001232", "blue\|red\|green"`	string

Delimited File Export Keys

You include these JSON key:value pairs in the body of the following requests.

POST /urlExport/s3/delimited
POST /urlExport/adls2/delimited
POST /urlExport/gcs/delimited
POST /urlExport/serverfs/delimited
POST /urlExport/serverfs/delimited/profile

In addition to these keys, include any cloud-specific keys in the request body. See S3-Specific Keys, ADLS2-Specific Keys, or GCS-Specific Keys.

Key	Description	Data Type
datasetName	Name of the dataset in Tamr.	string
resourceConfig	This object contains a subset of the keys that you can set when ingesting delimited files. You can set: columnDelimiter escapeCharacter quoteCharacter recordSeparator For more information, see resourceConfig Object.	object
sinkConfig	See sinkConfig Object for descriptions of its keys.	object
url	The path or URI to the destination for the file such as "file:///home/tamr/my_file.csv" or "s3://bucket/data/my_file.csv"	string

sinkConfig Object

An object with the following keys is named exportDataConfig in urlExport/jdbc.

Key	Description	Data Type
columnsExcludeRegex	Regular expression that defines how to exclude certain columns from export.	string
deltaConfig	Configures how differences are computed on export. In the deltaConfig object, you specify values for the following keys: fromVersion (int) The starting version of the Tamr Core dataset used in the delta export. toVersion (int) The ending version of Tamr Core dataset used in the delta export.	object
exportDelta	If true, the delta is exported based on deltaConfig. If true and deltaConfig is null, exports the latest delta if available.	Boolean
limitRecords	Sets a limit on the number of records to export.	int
mergeArrayValuesEnabled	If true, concatenates arrays of values before export.	Boolean
mergedArrayValuesDelimiter	The delimiter used to concatenate array values.	string
renameFields	Provides the ability to rename fields on export. Example: `{"Addr":"Address"}` In this example, the Addr field in the Tamr Core dataset will be exported as the Address field.	Dictionary

Avro File Import Keys

You include these JSON key:value pairs in the body of the following requests.

POST /urlIngest/adls2/avro
POST /urlIngest/gcs/avro
POST /urlIngest/serverfs/avro

In addition to these keys, include any cloud-specific keys in the request body. See S3-Specific Keys, ADLS2-Specific Keys, or GCS-Specific Keys.

Key	Description	Data Type
datasetName	Name of the dataset in Tamr Core.	string
policyIds	Optional. An array of the authorization policies that will include the new Tamr Core dataset as a resource. Note: If present, this list does not update the policyIds of a dataset that already exists and is being updated.	array
primaryKey	Set of fields uniquely identifying each record in the source.	string
profile	If set to true, Tamr Core adds a job to profile the new dataset after ingest to the queue. Set to false to profile the dataset at another time. Defaults to false.	Boolean
recipeId	Optional. To add a dataset to a specific project, you can supply the recipeId to associate the dataset with.	integer
resourceConfig	The resourceConfig object contains the following key for Avro files: ignoreComplexTypes (Boolean) Avro supports six complex data types: Records, Enums, Arrays, Maps, Unions, and Fixed. When set to true, Core Connect ignores any complex types detected in Avro. When set to false, Core Connect returns an error when complex types are detected in the Avro file.	object
truncateTamrDataset	Allows you to include imported data additively or destructively. When set to false (default), records from the imported file are added to the target dataset. When set to true, all records are deleted (truncated) from the target dataset before the file is imported.	Boolean
url	The path or URI for the source file such as "file:///home/tamr/my_file.avro", "hdfs://data/my_file.avro", or “hdfs://data/”.Note: When importing files from a directory, Tamr Core imports all .avro files recursively through the directory and subdirectories. These files must all have the same primary key.	string

Avro File Export Keys

You include these JSON key:value pairs in the body of the following requests.

POST /urlExport/hdfs/avro
POST /urlExport/serverfs/avro
POST /urlExport/hdfs/avro/avroSchema

In addition to these keys, include the sinkConfig keys and any cloud-specific keys in the request body. See S3-Specific Keys, ADLS2-Specific Keys, or GCS-Specific Keys.

Key	Description	Data Type
datasetName	Name of the dataset in Tamr.	string
url	The path or URI for the file destination such as "file:///home/tamr/my_file.avro" or "hdfs://data/my_file.avro"	string

JSON File Export Keys

The resourceConfig object for /urlExport/s3/json and /urlExport/serverfs/json contains the following keys for JSON files.

Key	Description	Data Type
flattenEnable	Defines the type applied to imported values: array of values or values. When set to false (default), the type for every attribute is an array of values. For example: `{"TAMRSEQ":["1"],"PK":["305d411f"],"surname":["matei"]}`. When set to true, the type for every attribute is a value. For example: `{"TAMRSEQ":"1","PK":"305d411f","surname":"matei"}`.	Boolean
simplifiedDataTypesEnable	Defines whether to simplify datatypes (true), or use the data types registering for the dataset you are exporting within Tamr Core (false).	Boolean

Key

Description

Data Type

flattenEnable

Defines the type applied to imported values: array of values or values.

When set to false (default), the type for every attribute is an array of values. For example: {"TAMRSEQ":["1"],"PK":["305d411f"],"surname":["matei"]}.
When set to true, the type for every attribute is a value. For example: {"TAMRSEQ":"1","PK":"305d411f","surname":"matei"}.

Boolean

simplifiedDataTypesEnable Defines whether to simplify datatypes (true), or use the data types registering for the dataset you are exporting within Tamr Core (false). Boolean

These keys interact as follows:

When simplifiedDataTypesEnable=true && flattenEnable=false, all data is exported as array of string type.
When simplifiedDataTypesEnable=true && flattenEnable=true, all data is exported as string type.
When simplifiedDataTypesEnable=false, data is exported matching the type for its column as defined in the dataset service (regardless of the setting of flattenEnable).