Examples of requests for the Core Connect API.
Single Dataset Query Import (Oracle jdbc url)
curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' --header 'Authorization: BasicCreds <Base64EncodedPassword>' -d '{ \
"query":"SELECT * FROM tamr.TAMR_TEST", \
"datasetName":"oracle_test", \
"queryConfig":{ \
"jdbcUrl":"jdbc:oracle:thin:@192.168.99.101:1521:orcl", \
"dbPassword": "<password>", \
"dbUsername": "<username>", \
"truncateTamrDataset": false,
"fetchSize":1 \
} \
}' 'http://localhost:9100/api/connect/jdbcIngest'
Jinja Replacement for Passwords
Jinja templating is part of the API request processing and can be used to specify environment variable names instead of sending password literals. See Using a Jinja Template.
curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' --header 'Authorization: BasicCreds <Base64EncodedPassword>' -d '{ \
"query":"SELECT * FROM tamr.TAMR_TEST", \
"datasetName":"oracle_test", \
"queryConfig":{ \
"jdbcUrl":"jdbc:oracle:thin:@192.168.99.101:1521:orcl", \
"dbUsername":"{{ CONNECT_DB_USERNAME }}", \
"dbPassword":"{{ CONNECT_DB_PASSWORD }}", \
"fetchSize":1 \
} \
}' 'http://localhost:9100/api/connect/jdbcIngest'
Batch Dataset Query Import (Oracle jdbc url)
curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' --header 'Authorization: BasicCreds <Base64EncodedPassword>' -d '{ \
"queryConfig": { \
"jdbcUrl": "jdbc:oracle:thin:@192.168.99.101:1521:orcl", \
"dbPassword": "<password>", \
"dbUsername": "<username>", \
"fetchSize": 0 \
}, \
"queryTargetList": [ \
{ \
"query": "SELECT * FROM tamr.TAMR_TEST where rownum<2", \
"datasetName": "oracle_test1", \
"truncateTamrDataset": true,
"primaryKey": [] \
}, \
{ \
"query": "SELECT * FROM tamr.TAMR_TEST where rownum>=2", \
"datasetName": "oracle_test2", \
"truncateTamrDataset": false,
"primaryKey": [] \
} \
] \
}' 'http://localhost:9100/api/connect/jdbcIngest/batch'
Presto Import
JDBC driver added for import from Presto DB with SSL authentication.
curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' -d '{ \
"query": "select supplier_id, supplier_number, primary_address_state, primary_address_country_code, po_method, payment_method from reg_prod.supplier", \
"datasetName": "suppliers", \
"primaryKey": [ \
"supplier_id" \
], \
"truncateTamrDataset": false,
"queryConfig": { \
"jdbcUrl": "jdbc:presto://prod.company.net:8446/hive/default?SSL=true&SSLTrustStorePath=/data/temp/Prod_key_files/truststore.jks&SSLTrustStorePassword=<password>&SSLKeyStorePath=/data/temp/Prod_key_files/keystorepresto.jks&SSLKeyStorePassword=<password>", \
"dbUsername": "<kerberos_user_name>", \
"dbPassword": "<password>", \
"fetchSize": 10000 \
} \
}' 'http://localhost:9100/api/connect/jdbcIngest/ingest'
Google BigQuery Examples
The BigQuery driver is provided by CData. See the full CData BigQuery documentation.
In the examples below, authentication is performed through the instance account. Other methods of authentication are available. See Authenticating to Google BigQuery and Using OAuth Authentication.
Google BigQuery Import
curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' --header 'Authorization: BasicCreds <Base64EncodedPassword>' -d '{ \
"query": "SELECT * from my_target_table limit 100000000;", \
"datasetName": "example_source_dataset", \
"primaryKey": ["record_id"], \
"queryConfig": { "jdbcUrl": "jdbc:googlebigquery:AuthScheme=GCPInstanceAccount;ProjectId=tamr-proj;DatasetId=tamr-staging;"} \
}' 'http://localhost:9100/api/connect/jdbcIngest/ingest'
Google BigQuery Export
curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' --header 'Authorization: BasicCreds <Base64EncodedPassword>' -d '{ \
"createTable": true, \
"exportDataConfig": { \
"mergedArrayValuesDelimiter": "|", \
"renameFields":{"col-with-dashes or with spaces": "a_valid_name"} \
}, \
"queryConfig": { \
"jdbcUrl": "jdbc:googlebigquery:AuthScheme=GCPInstanceAccount;ProjectId=tamr-proj;DatasetId=tamr-staging;" \
}, \
"unifyDatasetName": "example_unified_dataset" \
}' 'http://localhost:9100/api/connect/urlExport/jdbc'
Salesforce Examples
The Salesforce driver provided by CData. See the CData documentation for authentication options and connection string options.
If your organization has customized the Salesforce login URL, you need to set the LoginURL in the jdbcUrl
request body key.
Salesforce Import
{
"datasetName": "ingest_salesforce_example",
"primaryKey": [ "Id" ],
"query": "SELECT * FROM Tamr_Customer__c",
"queryConfig": {
"dbPassword": "<password>", \
"dbUsername": "<username>", \
"jdbcUrl": "jdbc:salesforce:Security Token=SecureToken;UseSandbox=true"
}
}
Salesforce Export
{
"createTable": false,
"truncateBeforeLoad": true,
"exportDataConfig": {},
"queryConfig": {
"dbPassword": "<password>",
"dbUsername": "<username>",
"jdbcUrl": "jdbc:salesforce:Security Token=SecureToken;UseSandbox=true"
},
"targetTableName": "Tamr_Customer__c",
"unifyDatasetName": "GoldenAccounts_golden_records"
}
Local Filesystem Parquet Examples
Local Filesystem Parquet Import
curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' -H 'Authorization: BasicCreds <creds>' -d '{
"queryConfig": {
"jdbcUrl": "jdbc:parquet:URI=/<local file path>;..."
},
"datasetName": "<dataset name>",
"query": "select * from [<filename (no .parquet extension)>]"
}' 'http://localhost:9100/api/connect/jdbcIngest/ingest'
Local Filesystem Parquet Export
curl -X POST --header 'Content-Type: application/json' --header
'Accept: application/json' -H 'Authorization: BasicCreds <creds>' -d
'{
"queryConfig": {
"jdbcUrl": "jdbc:parquet:URI=/<local file path>"
},
"unifyDatasetName": "<dataset name>",
"targetTableName": "<parquet file name>"
}' 'http://localhost:9100/api/connect/urlExport/jdbc'
Parquet Multi-file Examples
To import multiple Parquet files in a given directory, in the queryConfig/jdbcUrl
parameter, you add the AggregateFiles=true
specification, and make the query: "query": "select * from <any_string>"
. In the case of multi-file ingest, this last string value can be any string as the value is irrelevant.
To export multiple Parquet files, you specify a maximum number of records in a file with batchInsertSize
and, in the queryConfig/jdbcUrl
input parameter, define InsertMode=batch
. Each filename that results starts with a timestamp and ends with a part number.
To show these changes in context, examples of multi-file calls to a local file system follow.
Local File System Multi-file Import
curl -X POST --header 'Content-Type: application/json' --header 'Accept:
application/json' -H 'Authorization: BasicCreds <creds>' -d '{
"queryConfig": {
"jdbcUrl": "jdbc:parquet:URI=/<local file path>;AggregateFiles=true;..."
},
"datasetName": "<dataset name>",
"query": "select * from <any_string>"
}' 'http://localhost:9100/api/connect/jdbcIngest/ingest'
Local File System Multi-file Export
curl -X POST --header 'Content-Type: application/json' --header
'Accept: application/json' -H 'Authorization: BasicCreds <creds>' -d '{
"queryConfig": {
"jdbcUrl": "jdbc:parquet:URI=/<local file path>;InsertMode=batch;..."
},
"unifyDatasetName": "<dataset name>",
"targetTableName": "<parquet file name>",
“batchInsertSize”: <batch size>
}' 'http://<hostname>:9100/api/connect/urlExport/jdbc'
ADLS2 CSV Examples
ADLS2 CSV Import
curl -X POST --header 'Content-Type: application/json' --header 'Accept: text/plain' -d '{
"url": "adls2://adls2store.blob.core.windows.net/con1/dir1/subDir1/cars.csv", \
"accountName": "adls2store", \
"datasetName": "myDelimitedDataset", \
"primaryKey": [], \
"clientId": "d08d0a03-40ce-4a7b-9367-3000000002", \
"clientSecret": "qIKeyKeyKeywzmqepcxlcxtiY=", \
"tenantId": "8d0a03-40ce-4a7b-9367-3000000002", \
}' 'http://localhost:9100/api/connect/urlIngest/adls2/delimited'
ADLS2 CSV Export
curl -X POST --header 'Content-Type: application/json' --header 'Accept: text/plain' -d '{
"url": "adls2://adls2store.blob.core.windows.net/con1/dir1/subDir1/cars.csv", \
"accountName": "adls2store", \
"datasetName": "myDelimitedDataset", \
"primaryKey": [], \
"clientId": "<clientId>", \
"clientSecret": "<clientSecret>", \
"tenantId": "<tenantId>", \
"sinkConfig": { \
"mergedArrayValuesDelimiter": "", \
"columnsExcludeRegex": "", \
"renameFields": {}, \
"mergeArrayValuesEnabled": false, \
"limitRecords": 0 \
} \
}' 'http://localhost:9100/api/connect/urlExport/adls2/delimited'
ADLS2 Parquet Examples
ADLS2 Parquet Import
curl -X POST --header 'Content-Type: application/json' --header
'Accept: application/json' -H 'Authorization: BasicCreds <creds>' -d
'{
"queryConfig": {
"jdbcUrl": "jdbc:parquet:URI=abfss://<container name>/<path>;AzureStorageAccount=<storage account>;..."
},
"datasetName": "<dataset name>",
"query": "select * from [<filename (no .parquet extension)>]"
}' 'http://<host>:9100/api/connect/jdbcIngest/ingest'
ADLS2 Parquet Export
curl -X POST --header 'Content-Type: application/json' --header
'Accept: application/json' -H 'Authorization: BasicCreds <creds>' -d '{
"queryConfig": {
"jdbcUrl": "jdbc:parquet:URI=abfss://<container name>/<path>;AzureStorageAccount=<storage account>;..."
},
"unifyDatasetName": "<dataset name>",
"targetTableName": "<parquet file name>"
}' 'http://localhost:9100/api/connect/urlExport/jdbc'
ADLS2 Multi-file Examples
ADLS2 Avro Multi-file Import
curl -X POST 'http://localhost:9050/api/urlIngest/adls2/avro?async=true'
-H 'Content-Type: application/json;charset=utf-8'
-H 'Authorization: BasicCreds <EncodedCredentials>'
-d '{"url":"adls2://tamrcoreconnect.blob.core.windows.net/coreconnect/core-connect-unit-tests/small-avro/same-schema/",
"primaryKey":["DoesNotExist"],
"resourceConfig":{},
"datasetName":"ingest-adls2-avro"}'
ADLS2 Avro Multi-file Export
curl -X POST 'http://localhost:9050/api/urlExport/adls2/avro?async=true'
-H 'Content-Type: application/json;charset=utf-8'
-H 'Authorization: BasicCreds <EncodedCredentials>'
-d '{"url":"adls2://tamrcoreconnect.blob.core.windows.net/coreconnect/test-export/export-adls2.avro",
"sinkConfig":{},"resourceConfig":{},"datasetName":"test"}'
Redshift Export Example
{
"unifyDatasetName": "lb_lineage.csv",
"queryConfig": {
"jdbcUrl": "jdbc:redshift://field-testing.cyap0dqxp5xb.us-east-1.redshift.amazonaws.com:5439/mydb",
"dbPassword": "<password>",
"dbUsername": "<username>",
"fetchSize": 0
},
"targetTableName":"staging.output",
"batchInsertSize": 10000,
"truncateBeforeLoad": false,
"exportDataConfig": {
"mergedArrayValuesDelimiter": "||",
"limitRecords": 0,
"columnsExcludeRegex": "(last|first|middle).*"
}
}
S3 Examples
S3 Parquet Import
curl -X POST --header 'Content-Type: application/json' --header
'Accept: application/json' -H 'Authorization: BasicCreds <creds>' -d
'{
"queryConfig": {
"jdbcUrl": "jdbc:parquet:URI=s3://<bucket>/<path>;AuthScheme=AwsEC2Roles;..."
},
"datasetName": "<dataset name>",
"query": "select * from [<filename (no .parquet extension)>]"
}' 'http://<host>:9100/api/connect/jdbcIngest/ingest'
S3 Parquet Export
curl -X POST --header 'Content-Type: application/json' --header
'Accept: application/json' -H 'Authorization: BasicCreds YWRtaW46ZHQ=' -d '{
"unifyDatasetName": "<dataset name>",
"queryConfig": {
"jdbcUrl": "jdbc:parquet:URI=s3://<bucket>/<path>;AuthScheme=AwsEC2Roles";...
},
"targetTableName": "<parquet file name>"
}' 'http://localhost:9100/api/connect/urlExport/jdbc'
S3 JSON Export Example
{
"url": "s3://com-tamr-test-users/connect/na_ud_102.json",
"datasetName": "Nasdaq Categorization_unified_dataset",
"accessKey": "<secret>",
"secretKey": "<key>",
"awsRegion": "us-east-1",
"awsRoleArn": "<role-arn>",
"awsExternalId": "<external-id>",
"awsStsRegion": "us-east-1"
}
GCS Parquet Examples
When importing or exporting Parquet files to GCS, you must set the OAuthAccessToken
field to a token that has read/write permissions to the specified bucket. If you run the cURL command from an instance that has these permissions, you can embed the gcloud auth application-default print-access-token
command into the cURL to get the token, as shown in the examples that follow.
GCS Parquet Import
curl -X POST --header 'Content-Type: application/json' --header
'Accept: application/json' -H 'Authorization: BasicCreds <creds>=' -d
"{
\"queryConfig\": {
\"jdbcUrl\":
\"jdbc:parquet:URI=gs://<bucket>/<path>;ProjectId=<project id>;
OAuthAccessToken=$(gcloud auth application-default print-access-token);...\"
},
\"datasetName\": \"<dataset name>\",
\"query\": \"select * from [<filename (no .parquet extension)>]\"
}" 'http://<host>:9100/api/connect/jdbcIngest/ingest'
GCS Parquet File Export
curl -X POST --header 'Content-Type: application/json' --header
'Accept: application/json' -H 'Authorization: BasicCreds YWRtaW46ZHQ='
-d "{
\"unifyDatasetName\": \"<dataset name>\",
\"queryConfig\": {
\"jdbcUrl\": \"jdbc:parquet:URI=gs://<bucket>/<path>;ProjectId=tamr-gce-dev;
OAuthAccessToken=$(gcloud auth application-default print-access-token);...\"
},
\"targetTableName\": \"<target parquet file name>\"
}" 'http://<host>:9100/api/connect/urlExport/jdbc'
Azure SQL Export Example with Target Table Containing Spaces
{
"unifyDatasetName": "employee names.txt",
"queryConfig": {
"jdbcUrl": "jdbc:sqlserver://connect.database.windows.net:1433;DatabaseName=test;EncryptionMethod=ssl;ValidateServerCertificate=0;",
"dbPassword": "<password>",
"dbUsername": "<username>",
"fetchSize": 0
},
"truncateBeforeLoad": true,
"targetTableName": "dbo.\"test period ai\"",
"batchInsertSize": 10000,
"exportDataConfig": {
"mergedArrayValuesDelimiter": "|",
"columnsExcludeRegex": "",
"limitRecords": 0
}
}
Delta Export Examples
You can choose to export a set of new or modified records of a Tamr Core dataset between two materialization versions. Using the same API endpoints for full export (either to file systems or database), in exportDeltaConfig
set exportDelta
to true
, as shown in the example below).
Delta Export to RedShift for Latest Change
{
"unifyDatasetName": "lb_lineage.csv",
"queryConfig": {
"jdbcUrl": "jdbc:redshift://field-testing.cyap0dqxp5xb.us-east-1.redshift.amazonaws.com:5439/mydb",
"dbUsername": "<dbuser>",
"dbPassword": "<dbpassword>",
"fetchSize": 0
},
"targetTableName":"staging.output",
"batchInsertSize": 10000,
"truncateBeforeLoad": false,
"exportDataConfig": {
"exportDelta": true
}
}
Delta Export to Redshift for Specific Versions
By default, when you set exportDelta
to true
, Core Connect attempts to export the set of new or modified records between the two latest versions of the specified dataset.
For changes between two versions of the dataset, specify fromVersion
and/or toVersion
in deltaConfig
as shown below.
If the change between two specified versions or two latest versions of the requested dataset is not incremental (that is, the data is truncated and reloaded), the API call with exportDelta
set to true
returns an error.
{
"unifyDatasetName": "lb_lineage.csv",
"queryConfig": {
"jdbcUrl": "jdbc:redshift://field-testing.site.us-east-1.redshift.amazonaws.com:5439/mydb",
"dbUsername": "<dbusername>",
"dbPassword": "<dbpassword>",
"fetchSize": 0
},
"targetTableName":"staging.output",
"batchInsertSize": 10000,
"truncateBeforeLoad": false,
"exportDataConfig": {
"exportDelta": true,
"deltaConfig": {
"fromVersion": 100,
"toVersion": 102
}
}
Retrieve Metadata Example
Some JDBC sources store metadata properties for datasets in the source.
To retrieve metadata for services or for the dataset (currently available only for Snowflake) you can set the following body keys to true
:
retrieveConnectMetadata
: Core Connect retrieves the metadata for services.retrieveSourceMetadata
: Core Connect retrieves the metadata for the dataset stored in the Snowflake source. All source metadata properties have the prefixconnect.source
.
The following example imports both service and dataset metadata from Snowflake.
{
"datasetName": "snowflake_dataset_CUSTOMER",
"metadataConfig": {
"retrieveConnectMetadata": true,
"retrieveSourceMetadata": true
},
"primaryKey": ["C_CUSTKEY"],
"query": "select * from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER",
"queryConfig": {
"dbPassword": "<password>",
"dbUsername": "<username>",
"fetchSize": 0,
"jdbcUrl": "jdbc:snowflake://rx83695.us-central1.gcp.snowflakecomputing.com/?db=SNOWFLAKE_SAMPLE_DATA&warehouse=COMPUTE_WH&schema=TPCH_SF1&application=tamr"
}
}
Profiling Example
curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' --header 'Authorization: BasicCreds <Base64EncodedPassword>' -d '{ \
"queryConfig": { \
"jdbcUrl": "jdbc:oracle:thin:@192.168.99.101:1521:orcl", \
"dbUsername": "<userName>", \
"dbPassword": "<password>", \
"fetchSize": 1 \
}, \
"queryTargetList": [ \
{ \
"query": "SELECT * FROM tamr.TAMR_TEST", \
"datasetName": "profile_a", \
"primaryKey": [] \
}, \
{ \
"query": "SELECT * FROM tamr.ATTRIBUTE_MAPPINGS", \
"datasetName": "profile_a", \
"primaryKey": [] \
} \
\
] \
}' 'http://localhost:9100/api/jdbcIngest/profile'
This call uploads profiling information as records to the specified datasets.
Note: Tamr ignores any value you supply for primaryKey
. This endpoint generates its own primary key by concatenating the table name and the column name.
The resulting dataset contains the following attributes based on the profile:
{"TableName", "Source Table Name"},
{"ColumnName", "Column Name"},
{"ColumnType", "Column Data Type"},
{"DistinctValueCount", "Number of distinct values"},
{"EmptyValueCount", "Number of empty values"},
{"MaxValue", "Numerical or Date/Time Max"},
{"MinValue", "Numerical or Date/Time Min"},
{"MeanValue", "Numerical Mean"},
{"StdDevValue", "Standard Deviation"},
{"RecordCount", "Number of records"},
{"TotalValueCount", "Number of values count values in array in case of multi-value delimiter"},
{"Top100Values", "Top 100 most frequently occurring values"},
{"Top100Frequencies", "Frequency of Top100Values in a string"},
{"Top100Counts", "Absolute counts of Top100Values in a string"},
{primaryKeyName, "Primary Key"}