Core Connect API Example Requests

Examples of requests for the Core Connect API.

Single Dataset Query Import (Oracle jdbc url)

curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' --header 'Authorization: BasicCreds <Base64EncodedPassword>' -d '{ \ 
 "query":"SELECT * FROM tamr.TAMR_TEST", \ 
 "datasetName":"oracle_test", \ 
 "queryConfig":{ \ 
 "jdbcUrl":"jdbc:oracle:thin:@192.168.99.101:1521:orcl", \ 
 "dbPassword": "<password>", \
 "dbUsername": "<username>", \
 "truncateTamrDataset": false,
 "fetchSize":1 \ 
 } \ 
 }' 'http://localhost:9100/api/connect/jdbcIngest'

Jinja Replacement for Passwords

Jinja templating is part of the API request processing and can be used to specify environment variable names instead of sending password literals. See Using a Jinja Template.

curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' --header 'Authorization: BasicCreds <Base64EncodedPassword>' -d '{ \ 
 "query":"SELECT * FROM tamr.TAMR_TEST", \ 
 "datasetName":"oracle_test", \ 
 "queryConfig":{ \ 
 "jdbcUrl":"jdbc:oracle:thin:@192.168.99.101:1521:orcl", \ 
 "dbUsername":"{{ CONNECT_DB_USERNAME }}", \ 
 "dbPassword":"{{ CONNECT_DB_PASSWORD }}", \ 
 "fetchSize":1 \ 
 } \ 
 }' 'http://localhost:9100/api/connect/jdbcIngest'

Batch Dataset Query Import (Oracle jdbc url)

curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' --header 'Authorization: BasicCreds <Base64EncodedPassword>' -d '{ \ 
   "queryConfig": { \ 
     "jdbcUrl": "jdbc:oracle:thin:@192.168.99.101:1521:orcl", \ 
     "dbPassword": "<password>", \
     "dbUsername": "<username>", \
     "fetchSize": 0 \ 
   }, \ 
   "queryTargetList": [ \ 
     { \ 
       "query": "SELECT * FROM tamr.TAMR_TEST where rownum<2", \ 
       "datasetName": "oracle_test1", \ 
       "truncateTamrDataset": true,
       "primaryKey": [] \ 
     }, \ 
     { \ 
       "query": "SELECT * FROM tamr.TAMR_TEST where rownum>=2", \ 
       "datasetName": "oracle_test2", \ 
       "truncateTamrDataset": false,
       "primaryKey": [] \ 
     } \ 
   ] \ 
 }' 'http://localhost:9100/api/connect/jdbcIngest/batch'

Presto Import

JDBC driver added for import from Presto DB with SSL authentication.

curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' -d '{ \ 
   "query": "select supplier_id, supplier_number, primary_address_state, primary_address_country_code, po_method, payment_method from reg_prod.supplier", \ 
   "datasetName": "suppliers", \ 
   "primaryKey": [ \ 
     "supplier_id" \ 
   ], \ 
   "truncateTamrDataset": false,
   "queryConfig": { \ 
     "jdbcUrl": "jdbc:presto://prod.company.net:8446/hive/default?SSL=true&SSLTrustStorePath=/data/temp/Prod_key_files/truststore.jks&SSLTrustStorePassword=<password>&SSLKeyStorePath=/data/temp/Prod_key_files/keystorepresto.jks&SSLKeyStorePassword=<password>", \      
     "dbUsername": "<kerberos_user_name>", \
     "dbPassword": "<password>", \
     "fetchSize": 10000 \ 
   } \ 
 }' 'http://localhost:9100/api/connect/jdbcIngest/ingest'

Google BigQuery Examples

The BigQuery driver is provided by CData. See the full CData BigQuery documentation.

In the examples below, authentication is performed through the instance account. Other methods of authentication are available. See Authenticating to Google BigQuery and Using OAuth Authentication.

Google BigQuery Import

curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' --header 'Authorization: BasicCreds <Base64EncodedPassword>' -d '{ \ 
 "query": "SELECT * from my_target_table limit 100000000;",  \ 
 "datasetName": "example_source_dataset",  \ 
 "primaryKey": ["record_id"],  \ 
 "queryConfig": { "jdbcUrl": "jdbc:googlebigquery:AuthScheme=GCPInstanceAccount;ProjectId=tamr-proj;DatasetId=tamr-staging;"} \ 
 }' 'http://localhost:9100/api/connect/jdbcIngest/ingest'

Google BigQuery Export

curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' --header 'Authorization: BasicCreds <Base64EncodedPassword>' -d '{ \ 
 "createTable": true, \ 
 "exportDataConfig": { \ 
     "mergedArrayValuesDelimiter": "|", \ 
     "renameFields":{"col-with-dashes or with spaces": "a_valid_name"} \ 
 }, \ 
 "queryConfig": { \ 
     "jdbcUrl": "jdbc:googlebigquery:AuthScheme=GCPInstanceAccount;ProjectId=tamr-proj;DatasetId=tamr-staging;" \ 
 }, \ 
 "unifyDatasetName": "example_unified_dataset" \ 
 }' 'http://localhost:9100/api/connect/urlExport/jdbc'

Salesforce Examples

The Salesforce driver provided by CData. See the CData documentation for authentication options and connection string options.

If your organization has customized the Salesforce login URL, you need to set the LoginURL in the jdbcUrl request body key.

Salesforce Import

{
  "datasetName": "ingest_salesforce_example",
  "primaryKey": [ "Id" ],
  "query": "SELECT * FROM Tamr_Customer__c",
  "queryConfig": {
    "dbPassword": "<password>", \
    "dbUsername": "<username>", \
    "jdbcUrl": "jdbc:salesforce:Security Token=SecureToken;UseSandbox=true"
  }
}

Salesforce Export

{
  "createTable": false,
  "truncateBeforeLoad": true,
  "exportDataConfig": {},
  "queryConfig": {
    "dbPassword": "<password>",
    "dbUsername": "<username>",
    "jdbcUrl": "jdbc:salesforce:Security Token=SecureToken;UseSandbox=true"
  },
  "targetTableName": "Tamr_Customer__c",
  "unifyDatasetName": "GoldenAccounts_golden_records"
}

Local Filesystem Parquet Examples

Local Filesystem Parquet Import

curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' -H 'Authorization: BasicCreds <creds>' -d '{
   "queryConfig": {
     "jdbcUrl": "jdbc:parquet:URI=/<local file path>;..."
   },
   "datasetName": "<dataset name>",
   "query": "select * from [<filename (no .parquet extension)>]"
 }' 'http://localhost:9100/api/connect/jdbcIngest/ingest'

Local Filesystem Parquet Export

curl -X POST --header 'Content-Type: application/json' --header 
'Accept: application/json' -H 'Authorization: BasicCreds <creds>' -d 
'{
   "queryConfig": {
     "jdbcUrl": "jdbc:parquet:URI=/<local file path>"
   },
   "unifyDatasetName": "<dataset name>",
   "targetTableName": "<parquet file name>"
 }' 'http://localhost:9100/api/connect/urlExport/jdbc'

Parquet Multi-file Examples

To import multiple Parquet files in a given directory, in the queryConfig/jdbcUrl parameter, you add the AggregateFiles=true specification, and make the query: "query": "select * from <any_string>". In the case of multi-file ingest, this last string value can be any string as the value is irrelevant.

To export multiple Parquet files, you specify a maximum number of records in a file with batchInsertSize and, in the queryConfig/jdbcUrl input parameter, define InsertMode=batch. Each filename that results starts with a timestamp and ends with a part number.

To show these changes in context, examples of multi-file calls to a local file system follow.

Local File System Multi-file Import

curl -X POST --header 'Content-Type: application/json' --header 'Accept: 
application/json' -H 'Authorization: BasicCreds <creds>' -d '{
   "queryConfig": {
     "jdbcUrl": "jdbc:parquet:URI=/<local file path>;AggregateFiles=true;..."
   },
   "datasetName": "<dataset name>",
   "query": "select * from <any_string>"
 }' 'http://localhost:9100/api/connect/jdbcIngest/ingest'

Local File System Multi-file Export

curl -X POST --header 'Content-Type: application/json' --header 
'Accept: application/json' -H 'Authorization: BasicCreds <creds>' -d '{
   "queryConfig": {
     "jdbcUrl": "jdbc:parquet:URI=/<local file path>;InsertMode=batch;..."
   },
   "unifyDatasetName": "<dataset name>",
   "targetTableName": "<parquet file name>",
   “batchInsertSize”: <batch size>
 }' 'http://<hostname>:9100/api/connect/urlExport/jdbc'

ADLS2 CSV Examples

ADLS2 CSV Import

curl -X POST --header 'Content-Type: application/json' --header 'Accept: text/plain' -d '{ 
   "url": "adls2://adls2store.blob.core.windows.net/con1/dir1/subDir1/cars.csv", \ 
   "accountName": "adls2store", \
   "datasetName": "myDelimitedDataset", \ 
   "primaryKey": [], \ 
   "clientId": "d08d0a03-40ce-4a7b-9367-3000000002", \ 
   "clientSecret": "qIKeyKeyKeywzmqepcxlcxtiY=", \ 
   "tenantId": "8d0a03-40ce-4a7b-9367-3000000002", \
 }' 'http://localhost:9100/api/connect/urlIngest/adls2/delimited'

ADLS2 CSV Export

curl -X POST --header 'Content-Type: application/json' --header 'Accept: text/plain' -d '{ 
   "url": "adls2://adls2store.blob.core.windows.net/con1/dir1/subDir1/cars.csv", \ 
   "accountName": "adls2store", \
   "datasetName": "myDelimitedDataset", \ 
   "primaryKey": [], \ 
   "clientId": "<clientId>", \ 
   "clientSecret": "<clientSecret>", \ 
   "tenantId": "<tenantId>", \
   "sinkConfig": { \
    "mergedArrayValuesDelimiter": "", \
    "columnsExcludeRegex": "", \
    "renameFields": {}, \
    "mergeArrayValuesEnabled": false, \
    "limitRecords": 0 \
   } \ 
 }' 'http://localhost:9100/api/connect/urlExport/adls2/delimited'

ADLS2 Parquet Examples

ADLS2 Parquet Import

curl -X POST --header 'Content-Type: application/json' --header 
'Accept: application/json' -H 'Authorization: BasicCreds <creds>' -d 
'{
   "queryConfig": {
     "jdbcUrl": "jdbc:parquet:URI=abfss://<container name>/<path>;AzureStorageAccount=<storage account>;..."
   },
   "datasetName": "<dataset name>",
   "query": "select * from [<filename (no .parquet extension)>]"
 }' 'http://<host>:9100/api/connect/jdbcIngest/ingest'

ADLS2 Parquet Export

curl -X POST --header 'Content-Type: application/json' --header 
'Accept: application/json' -H 'Authorization: BasicCreds <creds>' -d '{
   "queryConfig": {
     "jdbcUrl": "jdbc:parquet:URI=abfss://<container name>/<path>;AzureStorageAccount=<storage account>;..."
   },
   "unifyDatasetName": "<dataset name>",
   "targetTableName": "<parquet file name>"
 }' 'http://localhost:9100/api/connect/urlExport/jdbc'

ADLS2 Multi-file Examples

ADLS2 Avro Multi-file Import

curl -X POST 'http://localhost:9050/api/urlIngest/adls2/avro?async=true' 
-H 'Content-Type: application/json;charset=utf-8' 
-H 'Authorization: BasicCreds <EncodedCredentials>' 
-d '{"url":"adls2://tamrcoreconnect.blob.core.windows.net/coreconnect/core-connect-unit-tests/small-avro/same-schema/",
"primaryKey":["DoesNotExist"],
"resourceConfig":{},
"datasetName":"ingest-adls2-avro"}'

ADLS2 Avro Multi-file Export

curl -X POST 'http://localhost:9050/api/urlExport/adls2/avro?async=true' 
-H 'Content-Type: application/json;charset=utf-8' 
-H 'Authorization: BasicCreds <EncodedCredentials>' 
-d '{"url":"adls2://tamrcoreconnect.blob.core.windows.net/coreconnect/test-export/export-adls2.avro",
"sinkConfig":{},"resourceConfig":{},"datasetName":"test"}'

Redshift Export Example

{
  "unifyDatasetName": "lb_lineage.csv",
  "queryConfig": {
    "jdbcUrl": "jdbc:redshift://field-testing.cyap0dqxp5xb.us-east-1.redshift.amazonaws.com:5439/mydb",
    "dbPassword": "<password>", 
    "dbUsername": "<username>", 
    "fetchSize": 0
  },
  "targetTableName":"staging.output",
 "batchInsertSize": 10000,
  "truncateBeforeLoad": false,
  "exportDataConfig": {
    "mergedArrayValuesDelimiter": "||",
    "limitRecords": 0,
    "columnsExcludeRegex": "(last|first|middle).*"
  }
}

S3 Examples

S3 Parquet Import

curl -X POST --header 'Content-Type: application/json' --header 
'Accept: application/json' -H 'Authorization: BasicCreds <creds>' -d 
'{
   "queryConfig": {
     "jdbcUrl": "jdbc:parquet:URI=s3://<bucket>/<path>;AuthScheme=AwsEC2Roles;..."
   },
   "datasetName": "<dataset name>",
   "query": "select * from [<filename (no .parquet extension)>]"
 }' 'http://<host>:9100/api/connect/jdbcIngest/ingest'

S3 Parquet Export

curl -X POST --header 'Content-Type: application/json' --header 
'Accept: application/json' -H 'Authorization: BasicCreds YWRtaW46ZHQ=' -d '{
  "unifyDatasetName": "<dataset name>",
  "queryConfig": {
    "jdbcUrl": "jdbc:parquet:URI=s3://<bucket>/<path>;AuthScheme=AwsEC2Roles";...
  },
  "targetTableName": "<parquet file name>"
 }' 'http://localhost:9100/api/connect/urlExport/jdbc'

S3 JSON Export Example

{
  "url": "s3://com-tamr-test-users/connect/na_ud_102.json",
  "datasetName": "Nasdaq Categorization_unified_dataset",
  "accessKey": "<secret>",
  "secretKey": "<key>",
  "awsRegion": "us-east-1",
  "awsRoleArn": "<role-arn>",
  "awsExternalId": "<external-id>",
  "awsStsRegion": "us-east-1"
}

GCS Parquet Examples

When importing or exporting Parquet files to GCS, you must set the OAuthAccessToken field to a token that has read/write permissions to the specified bucket. If you run the cURL command from an instance that has these permissions, you can embed the gcloud auth application-default print-access-token command into the cURL to get the token, as shown in the examples that follow.

GCS Parquet Import

curl -X POST --header 'Content-Type: application/json' --header 
'Accept: application/json' -H 'Authorization: BasicCreds <creds>=' -d 
"{
   \"queryConfig\": {
     \"jdbcUrl\": 
     \"jdbc:parquet:URI=gs://<bucket>/<path>;ProjectId=<project id>;
     OAuthAccessToken=$(gcloud auth application-default print-access-token);...\"
   },
   \"datasetName\": \"<dataset name>\",
   \"query\": \"select * from [<filename (no .parquet extension)>]\"
 }" 'http://<host>:9100/api/connect/jdbcIngest/ingest'

GCS Parquet File Export

curl -X POST --header 'Content-Type: application/json' --header 
'Accept: application/json' -H 'Authorization: BasicCreds YWRtaW46ZHQ=' 
-d "{
  \"unifyDatasetName\": \"<dataset name>\",
  \"queryConfig\": {
    \"jdbcUrl\": \"jdbc:parquet:URI=gs://<bucket>/<path>;ProjectId=tamr-gce-dev;
    OAuthAccessToken=$(gcloud auth application-default print-access-token);...\"
  },
  \"targetTableName\": \"<target parquet file name>\"
 }" 'http://<host>:9100/api/connect/urlExport/jdbc'

Azure SQL Export Example with Target Table Containing Spaces

{
  "unifyDatasetName": "employee names.txt",
  "queryConfig": {
    "jdbcUrl": "jdbc:sqlserver://connect.database.windows.net:1433;DatabaseName=test;EncryptionMethod=ssl;ValidateServerCertificate=0;",
    "dbPassword": "<password>", 
    "dbUsername": "<username>", 
    "fetchSize": 0
  },
  "truncateBeforeLoad": true,
  "targetTableName": "dbo.\"test period ai\"",
  "batchInsertSize": 10000,
  "exportDataConfig": {
    "mergedArrayValuesDelimiter": "|",
    "columnsExcludeRegex": "",
    "limitRecords": 0
  }
}

Delta Export Examples

You can choose to export a set of new or modified records of a Tamr Core dataset between two materialization versions. Using the same API endpoints for full export (either to file systems or database), in exportDeltaConfig set exportDelta to true, as shown in the example below).

Delta Export to RedShift for Latest Change

{
  "unifyDatasetName": "lb_lineage.csv",
  "queryConfig": {
    "jdbcUrl": "jdbc:redshift://field-testing.cyap0dqxp5xb.us-east-1.redshift.amazonaws.com:5439/mydb",
    "dbUsername": "<dbuser>",
    "dbPassword": "<dbpassword>",
    "fetchSize": 0
  },
  "targetTableName":"staging.output",
  "batchInsertSize": 10000,
  "truncateBeforeLoad": false,
  "exportDataConfig": {
    "exportDelta": true
  }
}

Delta Export to Redshift for Specific Versions

By default, when you set exportDelta to true, Core Connect attempts to export the set of new or modified records between the two latest versions of the specified dataset.

For changes between two versions of the dataset, specify fromVersion and/or toVersion in deltaConfig as shown below.

If the change between two specified versions or two latest versions of the requested dataset is not incremental (that is, the data is truncated and reloaded), the API call with exportDelta set to true returns an error.

{
  "unifyDatasetName": "lb_lineage.csv",
  "queryConfig": {
    "jdbcUrl": "jdbc:redshift://field-testing.site.us-east-1.redshift.amazonaws.com:5439/mydb",
    "dbUsername": "<dbusername>",
    "dbPassword": "<dbpassword>",
    "fetchSize": 0
  },
  "targetTableName":"staging.output",
  "batchInsertSize": 10000,
  "truncateBeforeLoad": false,
  "exportDataConfig": {
    "exportDelta": true,
    "deltaConfig": {
        "fromVersion": 100,
        "toVersion": 102
  }
}

Retrieve Metadata Example

Some JDBC sources store metadata properties for datasets in the source.

To retrieve metadata for services or for the dataset (currently available only for Snowflake) you can set the following body keys to true:

retrieveConnectMetadata: Core Connect retrieves the metadata for services.
retrieveSourceMetadata: Core Connect retrieves the metadata for the dataset stored in the Snowflake source. All source metadata properties have the prefix connect.source.

The following example imports both service and dataset metadata from Snowflake.

{
  "datasetName": "snowflake_dataset_CUSTOMER",
  "metadataConfig": {
    "retrieveConnectMetadata": true,
    "retrieveSourceMetadata": true
  },
  "primaryKey": ["C_CUSTKEY"],
  "query": "select * from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER",
  "queryConfig": {
    "dbPassword": "<password>",
    "dbUsername": "<username>",
    "fetchSize": 0,
    "jdbcUrl": "jdbc:snowflake://rx83695.us-central1.gcp.snowflakecomputing.com/?db=SNOWFLAKE_SAMPLE_DATA&warehouse=COMPUTE_WH&schema=TPCH_SF1&application=tamr"
  }
}

Profiling Example

curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' --header 'Authorization: BasicCreds <Base64EncodedPassword>' -d '{ \ 
   "queryConfig": { \ 
     "jdbcUrl": "jdbc:oracle:thin:@192.168.99.101:1521:orcl", \ 
     "dbUsername": "<userName>", \ 
     "dbPassword": "<password>", \ 
     "fetchSize": 1 \ 
   }, \ 
   "queryTargetList": [ \ 
     { \ 
       "query": "SELECT * FROM tamr.TAMR_TEST", \ 
       "datasetName": "profile_a", \ 
       "primaryKey": [] \ 
     }, \ 
     { \ 
       "query": "SELECT * FROM tamr.ATTRIBUTE_MAPPINGS", \ 
       "datasetName": "profile_a", \ 
       "primaryKey": [] \ 
     } \ 
  \ 
   ] \ 
 }' 'http://localhost:9100/api/jdbcIngest/profile'

This call uploads profiling information as records to the specified datasets.

Note: Tamr ignores any value you supply for primaryKey. This endpoint generates its own primary key by concatenating the table name and the column name.

The resulting dataset contains the following attributes based on the profile:

{"TableName", "Source Table Name"},
      {"ColumnName", "Column Name"},
      {"ColumnType", "Column Data Type"},
      {"DistinctValueCount", "Number of distinct values"},
      {"EmptyValueCount", "Number of empty values"},
      {"MaxValue", "Numerical or Date/Time Max"},
      {"MinValue", "Numerical or Date/Time Min"},
      {"MeanValue", "Numerical Mean"},
      {"StdDevValue", "Standard Deviation"},
      {"RecordCount", "Number of records"},
      {"TotalValueCount", "Number of values count values in array in case of multi-value delimiter"},
      {"Top100Values", "Top 100 most frequently occurring values"},
      {"Top100Frequencies", "Frequency of Top100Values in a string"},
      {"Top100Counts", "Absolute counts of Top100Values in a string"},
      {primaryKeyName, "Primary Key"}