User GuidesAPI ReferenceRelease Notes
Doc HomeHelp CenterLog In
API Reference

Core Connect API Example Requests

Examples of Core Connect requests, including a jdbcUrl example for each JDBC driver.

Example jdbcUrl Values for Supported Drivers

For a list of the supported JDBC driver versions and links to related resources, see
Core Connect.

Amazon Redshift

jdbc:redshift://field-testing.cyap0dqxp5xb.us-east-1.redshift.amazonaws.com:5439/mydb

Azure Synapse

jdbc:cdata:azuresynapse:user=<user_email_id>@<user_server>;password=<user_pass>;Server=<user_server>.database.windows.net;Database=<user_db>;

DataStax(DSE) Cassandra

jdbc:cassandra://;AuthMech=2;UID=issoadmin;PWD=issodt;SecureConnectionBundlePath=/tmp/secure-connect-af-isso.zip;TunableConsistency=6

Google BigQuery

jdbc:googlebigquery:AuthScheme=GCPInstanceAccount;ProjectId=tamr-proj;DatasetId=tamr-staging;

Hive

jdbc:hive2://localhost:10000/default;?hive.resultset.use.unique.column.names=false

H2

jdbc:h2:mem:db;DB_CLOSE_DELAY=-1

Microsoft SQL Server / Azure SQL

jdbc:sqlserver://connect.database.windows.net:1433;DatabaseName=test;EncryptionMethod=ssl;ValidateServerCertificate=0;

MySQL

jdbc:mysql://localhost:3306/mydata?characterEncoding=utf8

Oracle RDBMS

jdbc:oracle:thin:@192.168.99.101:1521:orcl

Parquet

Examples of a jdbcUrl for each of the cloud storage providers follow.

ADLS2

jdbc:parquet:URI=abfss://[email protected]/parquet_test/alltypes_dictionary.parquet;AzureStorageAccount=tamrcoreconnect;AggregateFiles=true;AuthScheme=AzureServicePrincipal;AzureTenant={{AZURE_TENANT_ID}};OAuthGrantType=CLIENT;InitiateOAuth=GETANDREFRESH;OAuthClientId={{AZURE_CLIENT_ID}};OAuthClientSecret={{AZURE_CLIENT_SECRET}}

AWS

jdbc:parquet:URI=s3://my-datastore-dev/applications/tamr_data/;AWSAccessKey=<key-value>;AWSSecretKey=<secret-value>;AWSRegion=OHIO;InsertMode=overwrite;

GCS

jdbcUrl:jdbc:parquet:URI=gs://tamr-qe/cc-export/alltypes_dictionary/;ProjectId=tamr-gce-dev;OAuthAccessToken=...

PostgreSQL

jdbc:postgresql://procurify-demo.tamrdev.com:5432/myData

Presto

jdbc:presto://prod.company.net:8446/hive/default?SSL=true&SSLTrustStorePath=/data/temp/Prod_key_files/truststore.jks&SSLTrustStorePassword=<password>&SSLKeyStorePath=/data/temp/Prod_key_files/keystorepresto.jks&SSLKeyStorePassword=<password>

Salesforce

jdbc:salesforce:Security Token=2lNEamAqnX5sT0;UseSandbox=true

SAP HANA

jdbc:cdata:saphana:User=TAMR_TEST;Password=Welcome01;Server=96fe5bb4-1b88-4b3c-ac17-7f289f9d13fb.hana.prod-us10.hanacloud.ondemand.com;Port=443;UseSSL = True;\

SnowFlake

jdbc:snowflake://rx83695.us-central1.gcp.snowflakecomputing.com/?db=SNOWFLAKE_SAMPLE_DATA&schema=TPCH_SF1&warehouse=DEMO_WH&application=tamr

Teradata RDBMS

jdbc:teradata://199.166.99.102/CHARSET=UTF8,TMODE=ANSI

Tibco TDV

jdbc:compositesw:dbapi@exampleDomain?domain=composite&dataSource=MyData&encrypt=true

Jinja Replacement for Passwords

Jinja templating is part of the API request processing and can be used to specify environment variable names instead of sending password literals. See Using a Jinja Template.

curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' --header 'Authorization: BasicCreds <Base64EncodedPassword>' -d '{ \ 
 "query":"SELECT * FROM tamr.TAMR_TEST", \ 
 "datasetName":"oracle_test", \ 
 "queryConfig":{ \ 
 "jdbcUrl":"jdbc:oracle:thin:@192.168.99.101:1521:orcl", \ 
 "dbUsername":"{{ CONNECT_DB_USERNAME }}", \ 
 "dbPassword":"{{ CONNECT_DB_PASSWORD }}", \ 
 "fetchSize":1 \ 
 } \ 
 }' 'http://localhost:9100/api/connect/jdbcIngest'

Single Dataset Query Import (Oracle jdbcUrl)

curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' --header 'Authorization: BasicCreds <Base64EncodedPassword>' -d '{ \ 
 "query":"SELECT * FROM tamr.TAMR_TEST", \ 
 "datasetName":"oracle_test", \ 
 "queryConfig":{ \ 
 "jdbcUrl":"jdbc:oracle:thin:@192.168.99.101:1521:orcl", \ 
 "dbPassword": "<password>", \
 "dbUsername": "<username>", \
 "truncateTamrDataset": false,
 "fetchSize":1 \ 
 } \ 
 }' 'http://localhost:9100/api/connect/jdbcIngest'

Batch Dataset Query Import (Oracle jdbcUrl)

curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' --header 'Authorization: BasicCreds <Base64EncodedPassword>' -d '{ \ 
   "queryConfig": { \ 
     "jdbcUrl": "jdbc:oracle:thin:@192.168.99.101:1521:orcl", \ 
     "dbPassword": "<password>", \
     "dbUsername": "<username>", \
     "fetchSize": 0 \ 
   }, \ 
   "queryTargetList": [ \ 
     { \ 
       "query": "SELECT * FROM tamr.TAMR_TEST where rownum<2", \ 
       "datasetName": "oracle_test1", \ 
       "truncateTamrDataset": true,
       "primaryKey": [] \ 
     }, \ 
     { \ 
       "query": "SELECT * FROM tamr.TAMR_TEST where rownum>=2", \ 
       "datasetName": "oracle_test2", \ 
       "truncateTamrDataset": false,
       "primaryKey": [] \ 
     } \ 
   ] \ 
 }' 'http://localhost:9100/api/connect/jdbcIngest/batch'

Presto Import

JDBC driver added for import from Presto DB with SSL authentication.

curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' -d '{ \ 
   "query": "select supplier_id, supplier_number, primary_address_state, primary_address_country_code, po_method, payment_method from reg_prod.supplier", \ 
   "datasetName": "suppliers", \ 
   "primaryKey": [ \ 
     "supplier_id" \ 
   ], \ 
   "truncateTamrDataset": false,
   "queryConfig": { \ 
     "jdbcUrl": "jdbc:presto://prod.company.net:8446/hive/default?SSL=true&SSLTrustStorePath=/data/temp/Prod_key_files/truststore.jks&SSLTrustStorePassword=<password>&SSLKeyStorePath=/data/temp/Prod_key_files/keystorepresto.jks&SSLKeyStorePassword=<password>", \      
     "dbUsername": "<kerberos_user_name>", \
     "dbPassword": "<password>", \
     "fetchSize": 10000 \ 
   } \ 
 }' 'http://localhost:9100/api/connect/jdbcIngest/ingest'

Google BigQuery Examples

The BigQuery JDBC driver is provided by CData. See the full CData BigQuery documentation .

In the examples below, authentication is performed through the instance account. Other methods of authentication are available. See Authenticating to Google BigQuery and Using OAuth Authentication.

Google BigQuery Import

curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' --header 'Authorization: BasicCreds <Base64EncodedPassword>' -d '{ \ 
 "query": "SELECT * from my_target_table limit 100000000;",  \ 
 "datasetName": "example_source_dataset",  \ 
 "primaryKey": ["record_id"],  \ 
 "queryConfig": { "jdbcUrl": "jdbc:googlebigquery:AuthScheme=GCPInstanceAccount;ProjectId=tamr-proj;DatasetId=tamr-staging;"} \ 
 }' 'http://localhost:9100/api/connect/jdbcIngest/ingest'

Google BigQuery Export

curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' --header 'Authorization: BasicCreds <Base64EncodedPassword>' -d '{ \ 
 "createTable": true, \ 
 "exportDataConfig": { \ 
     "mergedArrayValuesDelimiter": "|", \ 
     "renameFields":{"col-with-dashes or with spaces": "a_valid_name"} \ 
 }, \ 
 "queryConfig": { \ 
     "jdbcUrl": "jdbc:googlebigquery:AuthScheme=GCPInstanceAccount;ProjectId=tamr-proj;DatasetId=tamr-staging;" \ 
 }, \ 
 "unifyDatasetName": "example_unified_dataset" \ 
 }' 'http://localhost:9100/api/connect/urlExport/jdbc'

For more JDBC examples, see Example jdbcUrl Values for Supported Drivers.

Salesforce Examples

The Salesforce JDBC driver is provided by CData. See the CData documentation for authentication options andconnection string options.

If your organization has customized the Salesforce login URL, you need to set the LoginURL in the jdbcUrl request body key.

Salesforce Import

{
  "datasetName": "ingest_salesforce_example",
  "primaryKey": [ "Id" ],
  "query": "SELECT * FROM Tamr_Customer__c",
  "queryConfig": {
    "dbPassword": "<password>", \
    "dbUsername": "<username>", \
    "jdbcUrl": "jdbc:salesforce:Security Token=SecureToken;UseSandbox=true"
  }
}

Salesforce Export

{
  "createTable": false,
  "truncateBeforeLoad": true,
  "exportDataConfig": {},
  "queryConfig": {
    "dbPassword": "<password>",
    "dbUsername": "<username>",
    "jdbcUrl": "jdbc:salesforce:Security Token=SecureToken;UseSandbox=true"
  },
  "targetTableName": "Tamr_Customer__c",
  "unifyDatasetName": "GoldenAccounts_golden_records"
}

Local Filesystem Parquet Examples

Local Filesystem Parquet Import

curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' -H 'Authorization: BasicCreds <creds>' -d '{
   "queryConfig": {
     "jdbcUrl": "jdbc:parquet:URI=/<local file path>;..."
   },
   "datasetName": "<dataset name>",
   "query": "select * from [<filename (no .parquet extension)>]"
 }' 'http://localhost:9100/api/connect/jdbcIngest/ingest'

Local Filesystem Parquet Export

curl -X POST --header 'Content-Type: application/json' --header 
'Accept: application/json' -H 'Authorization: BasicCreds <creds>' -d 
'{
   "queryConfig": {
     "jdbcUrl": "jdbc:parquet:URI=/<local file path>"
   },
   "unifyDatasetName": "<dataset name>",
   "targetTableName": "<parquet file name>"
 }' 'http://localhost:9100/api/connect/urlExport/jdbc'

Parquet Multi-file Examples

To import multiple Parquet files from a given directory, in the queryConfig/jdbcUrl parameter, you add the AggregateFiles=true specification, and make the query: "query": "select * from [folder_name]", where “folder_name” is replaced with a folder containing the parquet file you would like to upload. Learn more about the CData JDBC Driver for Parquet AggregateFiles option.

To export multiple Parquet files, you specify a maximum number of records in a file with batchInsertSize and, in the queryConfig/jdbcUrl input parameter, define InsertMode=batch. Each filename that results starts with a timestamp and ends with a part number. Learn more about the CData JDBC Driver for Parquet InsetMode option.

To show these changes in context, examples of multi-file calls to a local file system follow.

Local File System Multi-file Import

curl -X POST --header 'Content-Type: application/json' --header 'Accept: 
application/json' -H 'Authorization: BasicCreds <creds>' -d '{
   "queryConfig": {
     "jdbcUrl": "jdbc:parquet:URI=/full/path/to/my-target-folder;AggregateFiles=true;"
   },
   "datasetName": "my_tamr_dataset",
   "query": "select * from [my-target-folder]"
 }' 'http://localhost:9100/api/connect/jdbcIngest/ingest'

Local File System Multi-file Export

curl -X POST --header 'Content-Type: application/json' --header 
'Accept: application/json' -H 'Authorization: BasicCreds <creds>' -d '{
   "queryConfig": {
     "jdbcUrl": "jdbc:parquet:URI=/full/path/to/my-target-folder;InsertMode=batch;BatchSize=10000;"
   },
   "unifyDatasetName": "my_tamr_dataset",
   "targetTableName": "cleaned_data"
 }' 'http://localhost:9100/api/connect/urlExport/jdbc'

For more JDBC examples, see Example jdbcUrl Values for Supported Drivers.

Server Local Filesystem CSV Example

The following example imports a delimited file from the server filesystem. It also includes specifications for additional formatting characters in the source file. The base for the file URL is relative to the path specified by the TAMR_CONNECT_FILE_DROP_PATH configuration variable.

curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' --header 'Authorization: BasicCreds <Base64EncodedPassword>' -d '{ \ 
   "resourceConfig": { \ 
     "columnDelimiter": ",", \ 
     "quoteCharacter": "\"", \ 
     "recordSeparator": "\n", \
     "multiValueDelimiter": "|", \
     "escapeCharacter": "\\" \ 
   }, \ 
   "url": "file:///example_delimited.csv", \ 
   "datasetName": "example_delimited.csv" \ 
 }' 'http://localhost:9050/api/urlIngest/serverfs/delimited'

For more information about specifying formatting characters, see the keys for the resourceConfig object.

ADLS2 CSV Examples

ADLS2 CSV Import

curl -X POST --header 'Content-Type: application/json' --header 'Accept: text/plain' -d '{ 
   "url": "adls2://adls2store.blob.core.windows.net/con1/dir1/subDir1/cars.csv", \ 
   "accountName": "adls2store", \
   "datasetName": "myDelimitedDataset", \ 
   "primaryKey": [], \ 
   "clientId": "d08d0a03-40ce-4a7b-9367-3000000002", \ 
   "clientSecret": "qIKeyKeyKeywzmqepcxlcxtiY=", \ 
   "tenantId": "8d0a03-40ce-4a7b-9367-3000000002", \
 }' 'http://localhost:9100/api/connect/urlIngest/adls2/delimited'

ADLS2 CSV Export

curl -X POST --header 'Content-Type: application/json' --header 'Accept: text/plain' -d '{ 
   "url": "adls2://adls2store.blob.core.windows.net/con1/dir1/subDir1/cars.csv", \ 
   "accountName": "adls2store", \
   "datasetName": "myDelimitedDataset", \ 
   "primaryKey": [], \ 
   "clientId": "<clientId>", \ 
   "clientSecret": "<clientSecret>", \ 
   "tenantId": "<tenantId>", \
   "sinkConfig": { \
    "mergedArrayValuesDelimiter": "", \
    "columnsExcludeRegex": "", \
    "renameFields": {}, \
    "mergeArrayValuesEnabled": false, \
    "limitRecords": 0 \
   } \ 
 }' 'http://localhost:9100/api/connect/urlExport/adls2/delimited'

ADLS2 Parquet Examples

ADLS2 Parquet Import

curl -X POST --header 'Content-Type: application/json' --header 'Accept: 
application/json' -H 'Authorization: BasicCreds <Base64EncodedPassword>=' -d '{
  "queryConfig": {
    "jdbcUrl": 
"jdbc:parquet:URI=abfss://[email protected]/parquet_test/alltypes_dictionary.parquet;AzureStorageAccount=tamrcoreconnect;AggregateFiles=true;AuthScheme=AzureServicePrincipal;AzureTenant={{AZURE_TENANT_ID}};OAuthGrantType=CLIENT;InitiateOAuth=GETANDREFRESH;OAuthClientId={{AZURE_CLIENT_ID}};OAuthClientSecret={{AZURE_CLIENT_SECRET}}"
  },
  "datasetName": "alltypes_dictionary-azure",
  "query": "select * from alltypes_dictionary"
}' 'http://localhost:9100/api/connect/jdbcIngest/ingest' | jq

ADLS2 Parquet Export

curl -X POST --header 'Content-Type: application/json' --header 'Accept: 
application/json' -H 'Authorization: BasicCreds <Base64EncodedPassword>=' -d '{
  "unifyDatasetName": "alltypes_dictionary-azure",
 "queryConfig": {
    "jdbcUrl": 
"jdbc:parquet:URI=abfss://[email protected]/test-export;AzureStorageAccount=tamrcoreconnect;AggregateFiles=true;AuthScheme=AzureServicePrincipal;AzureTenant={{AZURE_TENANT_ID}};OAuthGrantType=CLIENT;InitiateOAuth=GETANDREFRESH;OAuthClientId={{AZURE_CLIENT_ID}};OAuthClientSecret={{AZURE_CLIENT_SECRET}}"
 },
 "targetTableName": "alltypes_dictionary"
}' 'http://localhost:9100/api/connect/urlExport/jdbc' | jq

ADLS2 Multi-file Examples

ADLS2 Avro Multi-file Import

curl -X POST 'http://localhost:9050/api/urlIngest/adls2/avro?async=true' 
-H 'Content-Type: application/json;charset=utf-8' 
-H 'Authorization: BasicCreds <EncodedCredentials>' 
-d '{"url":"adls2://tamrcoreconnect.blob.core.windows.net/coreconnect/core-connect-unit-tests/small-avro/same-schema/",
"primaryKey":["DoesNotExist"],
"resourceConfig":{},
"datasetName":"ingest-adls2-avro"}'

ADLS2 Avro Multi-file Export

curl -X POST 'http://localhost:9050/api/urlExport/adls2/avro?async=true' 
-H 'Content-Type: application/json;charset=utf-8' 
-H 'Authorization: BasicCreds <EncodedCredentials>' 
-d '{"url":"adls2://tamrcoreconnect.blob.core.windows.net/coreconnect/test-export/export-adls2.avro",
"sinkConfig":{},"resourceConfig":{},"datasetName":"test"}'

Redshift Export Example

{
  "unifyDatasetName": "lb_lineage.csv",
  "queryConfig": {
    "jdbcUrl": "jdbc:redshift://field-testing.cyap0dqxp5xb.us-east-1.redshift.amazonaws.com:5439/mydb",
    "dbPassword": "<password>", 
    "dbUsername": "<username>", 
    "fetchSize": 0
  },
  "targetTableName":"staging.output",
 "batchInsertSize": 10000,
  "truncateBeforeLoad": false,
  "exportDataConfig": {
    "mergedArrayValuesDelimiter": "||",
    "limitRecords": 0,
    "columnsExcludeRegex": "(last|first|middle).*"
  }
}

S3 Examples

S3 Parquet Import

curl -X POST --header 'Content-Type: application/json' --header 'Accept: 
application/json' -H 'Authorization: BasicCreds <Base64EncodedPassword>=' -d '{
  "queryConfig": {
    "jdbcUrl": "jdbc:parquet:URI=s3://tamr-core-connect-test/parquet-variety/alltypes_dictionary.parquet;AuthScheme=AwsCredentialsFile"
  },
  "datasetName": "alltypes_dictionary-s3",
  "query": "select * from alltypes_dictionary"
}' 'http://localhost:9100/api/connect/jdbcIngest/ingest' | jq

S3 Parquet Export

curl -X POST --header 'Content-Type: application/json' --header 'Accept: 
application/json' -H 'Authorization: BasicCreds <Base64EncodedPassword>=' -d '{
  "unifyDatasetName": "alltypes_dictionary-s3",
 "queryConfig": {
    "jdbcUrl": 
"jdbc:parquet:URI=s3://tamr-core-connect-test/export/;AuthScheme=AwsCredentialsFile"
 },
 "targetTableName": "alltypes_dictionary"
}' 'http://localhost:9100/api/connect/urlExport/jdbc' | jq

S3 JSON Export Example

{
  "url": "s3://com-tamr-test-users/connect/na_ud_102.json",
  "datasetName": "Nasdaq Categorization_unified_dataset",
  "accessKey": "<secret>",
  "secretKey": "<key>",
  "awsRegion": "us-east-1",
  "awsRoleArn": "<role-arn>",
  "awsExternalId": "<external-id>",
  "awsStsRegion": "us-east-1"
}

GCS Parquet Examples

When importing or exporting Parquet files to GCS, you must set the OAuthAccessToken field to a token that has read/write permissions to the specified bucket. If you run the cURL command from an instance that has these permissions, you can embed the gcloud auth application-default print-access-token command into the cURL to get the token, as shown in the examples that follow.

GCS Parquet Import

curl -X POST -H 'Content-Type: application/json' -H 'Accept: application/json' -H 
'Authorization: BasicCreds <Base64EncodedPassword>=' -d "{
  \"queryConfig\": {
    \"jdbcUrl\": 
\"jdbc:parquet:URI=gs://tamr-qe/parquet-files-testing/data/alltypes_dictionary.parquet;ProjectId=tamr-gce-dev;OAuthAccessToken=..."
  },
  \"datasetName\": \"alltypes_dictionary\",
  \"query\": \"select * from alltypes_dictionary\"
}" 'http://localhost:9100/api/connect/jdbcIngest/ingest' | jq

GCS Parquet File Export

curl -X POST -H 'Content-Type: application/json' -H 'Accept: application/json' -H 
'Authorization: BasicCreds <Base64EncodedPassword>=' -d "{
 \"unifyDatasetName\": \"alltypes_dictionary\",
 \"queryConfig\": {
    \"jdbcUrl\": \"jdbc:parquet:URI=gs://tamr-qe/cc-export/alltypes_dictionary/;ProjectId=tamr-gce-dev;OAuthAccessToken=..."
 },
 \"targetTableName\": \"alltypes_dictionary\"
}" 'http://localhost:9100/api/connect/urlExport/jdbc?async=true' | jq

Azure SQL Export Example with Target Table Containing Spaces

{
  "unifyDatasetName": "employee names.txt",
  "queryConfig": {
    "jdbcUrl": "jdbc:sqlserver://connect.database.windows.net:1433;DatabaseName=test;EncryptionMethod=ssl;ValidateServerCertificate=0;",
    "dbPassword": "<password>", 
    "dbUsername": "<username>", 
    "fetchSize": 0
  },
  "truncateBeforeLoad": true,
  "targetTableName": "dbo.\"test period ai\"",
  "batchInsertSize": 10000,
  "exportDataConfig": {
    "mergedArrayValuesDelimiter": "|",
    "columnsExcludeRegex": "",
    "limitRecords": 0
  }
}

Delta Export Examples

You can choose to export a set of new or modified records of a Tamr Core dataset between two materialization versions. Using the same API endpoints for full export (either to file systems or database), in exportDeltaConfig set exportDelta to true, as shown in the example below).

Delta Export to RedShift for Latest Change

{
  "unifyDatasetName": "lb_lineage.csv",
  "queryConfig": {
    "jdbcUrl": "jdbc:redshift://field-testing.cyap0dqxp5xb.us-east-1.redshift.amazonaws.com:5439/mydb",
    "dbUsername": "<dbuser>",
    "dbPassword": "<dbpassword>",
    "fetchSize": 0
  },
  "targetTableName":"staging.output",
  "batchInsertSize": 10000,
  "truncateBeforeLoad": false,
  "exportDataConfig": {
    "exportDelta": true
  }
}

Delta Export to Redshift for Specific Versions

By default, when you set exportDelta to true, Core Connect attempts to export the set of new or modified records between the two latest versions of the specified dataset.

For changes between two versions of the dataset, specify fromVersion and/or toVersion in deltaConfig as shown below.

If the change between two specified versions or two latest versions of the requested dataset is not incremental (that is, the data is truncated and reloaded), the API call with exportDelta set to true returns an error.

{
  "unifyDatasetName": "lb_lineage.csv",
  "queryConfig": {
    "jdbcUrl": "jdbc:redshift://field-testing.site.us-east-1.redshift.amazonaws.com:5439/mydb",
    "dbUsername": "<dbusername>",
    "dbPassword": "<dbpassword>",
    "fetchSize": 0
  },
  "targetTableName":"staging.output",
  "batchInsertSize": 10000,
  "truncateBeforeLoad": false,
  "exportDataConfig": {
    "exportDelta": true,
    "deltaConfig": {
        "fromVersion": 100,
        "toVersion": 102
  }
}

Retrieve Metadata Example

Some JDBC sources store metadata properties for datasets in the source.

To retrieve metadata for services or for the dataset (currently available only for Snowflake) you can set the following body keys to true:

  • retrieveConnectMetadata: Core Connect retrieves the metadata for services.
  • retrieveSourceMetadata: Core Connect retrieves the metadata for the dataset stored in the Snowflake source. All source metadata properties have the prefix connect.source.

The following example imports both service and dataset metadata from Snowflake.

{
  "datasetName": "snowflake_dataset_CUSTOMER",
  "metadataConfig": {
    "retrieveConnectMetadata": true,
    "retrieveSourceMetadata": true
  },
  "primaryKey": ["C_CUSTKEY"],
  "query": "select * from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER",
  "queryConfig": {
    "dbPassword": "<password>",
    "dbUsername": "<username>",
    "fetchSize": 0,
    "jdbcUrl": "jdbc:snowflake://rx83695.us-central1.gcp.snowflakecomputing.com/?db=SNOWFLAKE_SAMPLE_DATA&warehouse=COMPUTE_WH&schema=TPCH_SF1&application=tamr"
  }
}

Profiling Example

curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' --header 'Authorization: BasicCreds <Base64EncodedPassword>' -d '{ \ 
   "queryConfig": { \ 
     "jdbcUrl": "jdbc:oracle:thin:@192.168.99.101:1521:orcl", \ 
     "dbUsername": "<userName>", \ 
     "dbPassword": "<password>", \ 
     "fetchSize": 1 \ 
   }, \ 
   "queryTargetList": [ \ 
     { \ 
       "query": "SELECT * FROM tamr.TAMR_TEST", \ 
       "datasetName": "profile_a", \ 
       "primaryKey": [] \ 
     }, \ 
     { \ 
       "query": "SELECT * FROM tamr.ATTRIBUTE_MAPPINGS", \ 
       "datasetName": "profile_a", \ 
       "primaryKey": [] \ 
     } \ 
  \ 
   ] \ 
 }' 'http://localhost:9100/api/jdbcIngest/profile'

This call uploads profiling information as records to the specified datasets.

Note: Tamr ignores any value you supply for primaryKey. This endpoint generates its own primary key by concatenating the table name and the column name.

The resulting dataset contains the following attributes based on the profile:

{"TableName", "Source Table Name"},
      {"ColumnName", "Column Name"},
      {"ColumnType", "Column Data Type"},
      {"DistinctValueCount", "Number of distinct values"},
      {"EmptyValueCount", "Number of empty values"},
      {"MaxValue", "Numerical or Date/Time Max"},
      {"MinValue", "Numerical or Date/Time Min"},
      {"MeanValue", "Numerical Mean"},
      {"StdDevValue", "Standard Deviation"},
      {"RecordCount", "Number of records"},
      {"TotalValueCount", "Number of values count values in array in case of multi-value delimiter"},
      {"Top100Values", "Top 100 most frequently occurring values"},
      {"Top100Frequencies", "Frequency of Top100Values in a string"},
      {"Top100Counts", "Absolute counts of Top100Values in a string"},
      {primaryKeyName, "Primary Key"}