Descriptions of the JSON keys that you can set in the body of Connect POST calls.
This reference is intended as a supplement to the Connect API Swagger documentation, which is available at http://<tamr_ip>:9100/docs
.
To reduce repetition, keys that are common across different file types, cloud storage providers, and either export or import are described only once. This reference repeats descriptions only when a difference exists based on the requirements of a specific type, provider, or target.
As a result, you may need to refer to several sections to find information about all of the keys that you can define in the body of a given POST call.
JDBC Keys
You include these JSON key:value pairs in the body of the following JDBC-specific requests:
- POST /jdbcIngest/batch
- POST jdbcIngest/execute
- POST /jdbcIngest/ingest
- POST /jdbcIngest/preview
- POST /jdbcIngest/profile
- POST /urlExport/jdbc
queryConfig Object
You include the queryConfig object in all JDBC calls.
Key | Description | Data Type |
---|---|---|
dbPassword | The username to use when authenticating to the data source | string |
dbUsername | The password to use when authenticating to the data source | string |
fetchSize | The number of records for the JDBC driver to retrieve at a time. Can be adjusted to improve read performance at the expense of memory consumption. | int |
jdbcUrl | The location and the type of the data source using MySQL connection URL syntax. See Connection URL Syntax in the Oracle documentation.
|
string |
Keys for JDBC Import Only
The following keys apply to calls from one or more of the jdbcIngest endpoints only.
Key | Description | Data Type |
---|---|---|
queryTargetList | Applies to jdbcIngest/batch only.
Describes the target Tamr datasets and their primary keys in order to ingest multiple query results at a time. Useful in cases where you have hundreds of tables to ingest, for example, research assays. Descriptions of this object's keys follow. |
Array |
datasetName | Applies to /jdbcIngest/ingest, /jdbcIngest/preview, and jdbcIngest/batch (in the queryTargetList object) only.
Name of the dataset in Tamr Core. |
Array |
policyIds | Optional. The authorization policies that will include the new Tamr Core dataset as a resource.
Note: If present, this list does not update the policyIds of a dataset that already exists and is being updated. |
string |
primaryKey | Applies to /jdbcIngest/ingest, /jdbcIngest/preview, and jdbcIngest/batch (in the queryTargetList object) only.
The individual field that uniquely identifies each record in the source. Optional. If left blank, Tamr Core uses the `TAMRSEQ` column as the primary key. See Data Import in Using the Core Connect API. |
Array |
profile | If set to true, Tamr Core adds a job to profile the new dataset after ingest to the queue. Set to false to profile the dataset at another time. Defaults to false. | Boolean |
query | Applies to /jdbcIngest/ingest, /jdbcIngest/batch, and /jdbcIngest/preview only.
SQL query used to retrieve data from a JDBC source. |
string |
recipeId | Optional. To add a dataset to a specific project, you can supply the recipeId to associate the dataset with. | Integer |
retrieveConnectMetadata | Applies to /jdbcIngest/ingest only.
When set to true, Core Connect imports services metadata. Defaults to false when left blank. See Core Connect API Example Requests for an example. |
Boolean |
retrieveSourceMetadata | Applies to /jdbcIngest/ingest and Snowflake JDBC sources only.
When set to true, Core Connect retrieves the metadata for the dataset stored in the Snowflake source. Defaults to false when left blank. See Core Connect API Example Requests for an example. |
Boolean |
statement | Applies to /jdbcIngest/execute only. This SQL statement is not expected to return any results. For example:
INSERT INTO <final tablename> FROM (SELECT * FROM <tamr staging table> WHERE <condition>) |
string |
metadataQueryConfig | Applies to /jdbcIngest/ingest and Snowflake JDBC sources only.
Optional. For a data source in Snowflake, retrieves metadata. See Adding a Metadata Property. For example: “metadataConfig” : { Contains the following keys:
|
Object |
statement | Applies to /jdbcIngest/execute only. This SQL statement is not expected to return any results. For example:
INSERT INTO <final tablename> FROM (SELECT * FROM <tamr staging table> WHERE <condition>) |
string |
truncateTamrDataset | Allows you to include imported data additively or destructively.
|
Boolean |
Keys for JDBC Export Only
The following keys apply to /urlExport/jdbc only.
Key | Description | Data Type |
---|---|---|
exportDataConfig | See sinkConfig Object. | object |
targetTableName | The name of the target table in the target system to write exported data into. | string |
unifyDatasetName | The Tamr dataset to export. | string |
truncateBeforeLoad | When set to true, Connect deletes all of the rows in the target table before writing exported data into it. Defaults to false. | Boolean |
batchInsertSize | The number of records to accumulate in the JDBC batch pool before sending to the database engine for insertion. Increasing batch size can improve throughput; however, too large a number can result in out of memory errors. Tamr recommends trying the following increments: 5000, 50000, 100000. | |
createTable | Defaults to true: if the table being exported does not exist at the target, Core Connect creates a table on export. For JDBC driver implementations that do not return a list of existing tables due to security or other implementation settings, you can set this key to false. When false, Core Connect does not attempt to create tables on export. |
Boolean |
intermittentCommits | When set to true, Core Connect commits each batch of records. When set to false, Core Connect writes the entire export as one large transaction. Large transactions can be useful when the behavior you require is that the whole export should succeed even if latter records fail due to out of memory or size limitations. However, you must allocate enough transaction space to the target database for export jobs to succeed. This can be a challenge with SQL Server databases. | Boolean |
S3-Specific Keys
You include these JSON key:value pairs in the body of the following requests.
- POST /urlExport/s3/avro
- POST /urlExport/s3/delimited
- POST /urlExport/s3/json
- POST /urlIngest/s3/avro
- POST /urlIngest/s3/delimited
- POST /urlIngest/s3/json
- POST /urlIngest/s3/delimited/profile
Key | Description | Data Type |
---|---|---|
accessKey | S3 Access Key.
If S3 Access Key / Secret Keys are not provided, or role assumption via the awsRoleARN key is not used, Core Connect will fall back to the AWS Default Credentials lookup chain as described in the Working with AWS Credentials documentation.
|
string (optional) |
awsEndpointUrl | S3 AWS endpoint URL. Can be used with S3 private cloud. For more information, see the AWS PrivateLink for Amazon S3 documentation. | string (optional) |
awsExternalId | AWS External Id when used with AWS Role assumption capability. For more information, see the AWS AssumeRole documentation.
Role session name is fixed as “tamr.connect” and the session duration is fixed to 1 hour. |
string (optional) |
awsRegion | The AWS region to use with API calls. If a region is not specified, the default region provided by the lookup chain is used. For more information, see the AWS Region Selection documentation.
|
string (optional) |
awsRoleArn | Role ARN when used with AWS role assumption capability.
|
string (optional) |
awsStsRegion | When using awsRoleArn and awsExternalId , the awsStsRegion can be specified to reduce latency of authentication calls. For more information, see AWS Temporary security credentials in the IAM documentation.
|
string (optional) |
secretKey | S3 Secret Access Key. If S3 Access Key / Secret Keys are not provided, or role assumption via the awsRoleARN key is not used, Core Connect will fall back to the AWS Default Credentials lookup chain as described in the Working with AWS Credentials documentation.
|
string (optional) |
sessionToken | The AWS session token to be used in authentication with an AWS access key and secret key. See GetSessionToken in the AWS documentation. | string (optional) |
encryptionConfig | Applies to the urlExport/s3 endpoints only.
See encryptionConfig Object for descriptions of its keys. |
object (optional) |
sinkThreads | Applies to the /urlExport/s3 endpoints only.
The number of parallel threads running the export job. |
encryptionConfig Object
Key | Description | Data Type |
---|---|---|
algorithm | S3 encryption algorithm. Possible values are AES256, KMS, or CUSTOM. | string |
customerB64EncKey | Specifies the base64-encoded 256-bit encryption key to use to decrypt the source object. | string |
customerB64EncKeyMD5Digest | Specifies the base64-encoded 128-bit MD5 digest of the encryption key used to decrypt/encrypt the source object. | string |
kmsKeyId | Specifies the ID of the existing key in Vault to be used to decrypt/encrypt the object. | string |
ADLS2-Specific Keys
You include these JSON key:value pairs in the body of the following requests.
- POST /urlExport/adls2/avro
- POST /urlExport/adls2/delimited
- POST /urlIngest/adls2/avro
- POST /urlIngest/adls2/delimited
Key | Description | Data Type |
---|---|---|
accountKey | Sent with accountName, the accountKey grants full access to the Azure storage account. For additional information about the security of this method, see Manage storage account access keys in the Microsoft documentation. | string |
clientID | The clientId, clientSecret, and tenantId are all required to authenticate as a service principal. See the Azure Databricks - Accessing Data Lake - Using a Service Principal video for more information about service principals. Tip: The latter half of the video applies to DataBricks and can be skipped. The service principal should have the "Storage Blob Data Contributor" role or ACLS to access the specified container. | string |
clientSecret | See clientId. The clientId, clientSecret, and tenantId are all required to authenticate as a service principal. | string |
sasToken | Used to grant limited access to a resource. For more information, see Grant limited access to Azure Storage resources using shared access signatures (SAS). | string |
tenantId | The clientId, clientSecret, and tenantId are all required to authenticate as a service principal. The tenantId, also known as directoryId in the Azure documentation, is used to authenticate as a service principal. | string |
url | The path or URI for the source file such as "file:///home/tamr/my_file.avro", "hdfs://data/my_file.avro", “file:///home/tamr/my_file.csv", or s3://bucket/data”.Must be in the format: "adls2://ACCOUNT_NAME.blob.core.windows.net/CONTAINER_NAME/ PATH_PREFIX" or "https://ACCOUNT_NAME.blob.core.windows.net/CONTAINER_NAME/ PATH_PREFIX"Note: When importing files from a directory, files are imported recursively through the directory and subdirectories. For importing delimited files, all files in the directory and subdirectories must be delimited files with identical formats, including having the same delimiter, schema, and primary key columns. For importing Avro, all files in the directory and subdirectories must have the same primary key. | string |
GCS-Specific Keys
You include these JSON key:value pairs in the body of the following requests.
- POST /urlExport/gcs/delimited
- POST urlExport/gcs/avro
- POST /urlIngest/gcs/avro
- POST /urlIngest/gcs/delimited
Key | Description | Data Type |
---|---|---|
project | The Google project in which the file is located. See Projects Cloud Storage in the Google documentation. | string |
HDFS-Specific Keys
This JSON key:value pair is used in the body of the following requests:
- POST /urlExport/hdfs/avro
- POST /urlIngest/hdfs/avro/avroSchema
Key | Description | Data Type |
---|---|---|
hadoopUserName | HDFS username to use when HDFS is not protected by Kerberos. This overrides the username specified by the Connect configuration settings. | string |
Delimited File Import Keys
You include these JSON key:value pairs in the body of the following requests.
- POST /urlIngest/s3/delimited
- POST /urlIngest/adls2/delimited
- POST /urlIngest/gcs/delimited
- POST /urlIngest/serverfs/delimited
- POST /urlIngest/s3/delimited/profile
- POST /urlIngest/serverfs/delimited/profile
In addition to these keys, include any cloud-specific keys in the request body. See S3-Specific Keys, ADLS2-Specific Keys, or GCS-Specific Keys.
Key | Description | Data Type |
---|---|---|
datasetName | Name of the dataset in Tamr Core. | string |
policyIds | Optional. An array of the authorization policies that will include the new Tamr Core dataset as a resource. Note: If present, this list does not update the policyIds of a dataset that already exists and is being updated. | array |
primaryKey | Set of fields uniquely identifying each record in the source. | string |
profile | If set to true, Tamr Core adds a job to profile the new dataset after ingest to the queue. Set to false to profile the dataset at another time. Defaults to false. | Boolean |
recipeId | Optional. To add a dataset to a specific project, you can supply the recipeId to associate the dataset with. | integer |
resourceConfig | See resourceConfig Object for descriptions of its keys. | Object |
truncateTamrDataset | Allows you to include imported data additively or destructively.
| Boolean |
url | The path or URI to the source file or directory, such as “file:///home/tamr/my_file.csv", "s3://bucket/data/my_file.csv", or s3://bucket/data”.Note: When importing files from a directory, files are imported recursively through the directory and subdirectories. All files in the directory and subdirectories must be delimited files with identical formats, including having the same delimiter, schema, and primary key columns. | string |
resourceConfig Object
Key | Description | Data Type |
---|---|---|
quoteCharacter | Value used for escaping values where the field delimiter is part of the value. For example, the value " a , b " is parsed as a , b. Defaults to a quotation mark character ("). |
string |
recordSeparator | Character sequence used to indicate a new line. Defaults to \n. | string |
characterSetName
|
Applies to urlIngest endpoints only.
Name of the character set used in the file. Defaults to UTF-8. See Character Sets table, Name column for more information. |
string |
charactersForDetection | Applies to urlIngest endpoints only.
List of characters to use to automatically determine the column delimiter when the column delimiter is not known. |
string |
columnDelimiter | Value used to separate individual fields in the input. Defaults to a comma (,). | string |
columns | Applies to urlIngest endpoints only.
Specifies a custom header line. |
string |
delimiterDetectionEnabled | Applies to urlIngest endpoints only.
When the column delimiter is not known, set this key to true so that Core Connect will analyze the dataset to detect the delimiter. The charactersForDetection key provides a hint for what characters to consider. |
Boolean |
escapeCharacter | Value used for escaping the quote character inside an already escaped value. For example, the value " "" a , b "" " is parsed as " a , b ". | string |
maxCharsPerColumn | Applies to urlIngest endpoints only.
Limit of characters to read in a single field. Used to avoid out of memory errors in case of invalid delimiters. Defaults to 4096, with a maximum of 40Mb. |
int |
multiValueDelimiter | Applies to urlIngest endpoints only.
Value used to separate values in a field into an array. For example, the following color values are ingested as an array of strings when you set the multiValueDelimiter to a pipe (|) character.
|
string |
Delimited File Export Keys
You include these JSON key:value pairs in the body of the following requests.
- POST /urlExport/s3/delimited
- POST /urlExport/adls2/delimited
- POST /urlExport/gcs/delimited
- POST /urlExport/serverfs/delimited
- POST /urlExport/serverfs/delimited/profile
In addition to these keys, include any cloud-specific keys in the request body. See S3-Specific Keys, ADLS2-Specific Keys, or GCS-Specific Keys.
Key | Description | Data Type |
---|---|---|
datasetName | Name of the dataset in Tamr. | string |
resourceConfig | This object contains a subset of the keys that you can set when ingesting delimited files. You can set:
For more information, see resourceConfig Object. |
object |
sinkConfig | See sinkConfig Object for descriptions of its keys. | object |
url | The path or URI to the destination for the file such as "file:///home/tamr/my_file.csv" or "s3://bucket/data/my_file.csv" | string |
sinkConfig Object
An object with the following keys is named exportDataConfig in urlExport/jdbc.
Key | Description | Data Type |
---|---|---|
columnsExcludeRegex | Regular expression that defines how to exclude certain columns from export. | string |
deltaConfig | Configures how differences are computed on export.
In the deltaConfig object, you specify values for the following keys:
|
object |
exportDelta | If true, the delta is exported based on deltaConfig. If true and deltaConfig is null, exports the latest delta if available. | Boolean |
limitRecords | Sets a limit on the number of records to export. | int |
mergeArrayValuesEnabled | If true, concatenates arrays of values before export. | Boolean |
mergedArrayValuesDelimiter | The delimiter used to concatenate array values. | string |
renameFields | Provides the ability to rename fields on export.
Example:
In this example, the Addr field in the Tamr Core dataset will be exported as the Address field. |
Dictionary |
Avro File Import Keys
You include these JSON key:value pairs in the body of the following requests.
- POST /urlIngest/adls2/avro
- POST /urlIngest/gcs/avro
- POST /urlIngest/serverfs/avro
In addition to these keys, include any cloud-specific keys in the request body. See S3-Specific Keys, ADLS2-Specific Keys, or GCS-Specific Keys.
Key | Description | Data Type |
---|---|---|
datasetName | Name of the dataset in Tamr Core. | string |
policyIds | Optional. An array of the authorization policies that will include the new Tamr Core dataset as a resource. | array |
primaryKey | Set of fields uniquely identifying each record in the source. | string |
profile | If set to true, Tamr Core adds a job to profile the new dataset after ingest to the queue. | Boolean |
recipeId | Optional. To add a dataset to a specific project, you can supply the recipeId to associate the dataset with. | integer |
resourceConfig | The resourceConfig object contains the following key for Avro files:
Avro supports six complex data types: Records, Enums, Arrays, Maps, Unions, and Fixed.
| object |
truncateTamrDataset | Allows you to include imported data additively or destructively.
| Boolean |
url | The path or URI for the source file such as "file:///home/tamr/my_file.avro", "hdfs://data/my_file.avro", or “hdfs://data/”.Note: When importing files from a directory, Tamr Core imports all .avro files recursively through the directory and subdirectories. These files must all have the same primary key. | string |
Avro File Export Keys
You include these JSON key:value pairs in the body of the following requests.
- POST /urlExport/hdfs/avro
- POST /urlExport/serverfs/avro
- POST /urlExport/hdfs/avro/avroSchema
In addition to these keys, include the sinkConfig keys and any cloud-specific keys in the request body. See S3-Specific Keys, ADLS2-Specific Keys, or GCS-Specific Keys.
Key | Description | Data Type |
---|---|---|
datasetName | Name of the dataset in Tamr. | string |
url | The path or URI for the file destination such as "file:///home/tamr/my_file.avro" or "hdfs://data/my_file.avro" | string |
JSON File Export Keys
The resourceConfig object for /urlExport/s3/json and /urlExport/serverfs/json contains the following keys for JSON files.
Key | Description | Data Type |
---|---|---|
flattenEnable | Defines the type applied to imported values: array of values or values.
| Boolean |
simplifiedDataTypesEnable | Defines whether to simplify datatypes (true), or use the data types registering for the dataset you are exporting within Tamr Core (false). | Boolean |
These keys interact as follows:
- When simplifiedDataTypesEnable=true && flattenEnable=false, all data is exported as array of string type.
- When simplifiedDataTypesEnable=true && flattenEnable=true, all data is exported as string type.
- When simplifiedDataTypesEnable=false, data is exported matching the type for its column as defined in the dataset service (regardless of the setting of flattenEnable).