User GuidesAPI ReferenceRelease NotesEnrichment APIs
Doc HomeSupportLog In

Core Connect API Example Requests

Examples of requests for the Core Connect API.

Single Dataset Query Import (Oracle jdbc url)

curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' --header 'Authorization: BasicCreds <Base64EncodedPassword>' -d '{ \ 
 "query":"SELECT * FROM tamr.TAMR_TEST", \ 
 "datasetName":"oracle_test", \ 
 "queryConfig":{ \ 
 "jdbcUrl":"jdbc:oracle:thin:@192.168.99.101:1521:orcl", \ 
 "dbPassword": "<password>", \
 "dbUsername": "<username>", \
 "truncateTamrDataset": false,
 "fetchSize":1 \ 
 } \ 
 }' 'http://localhost:9100/api/connect/jdbcIngest'

Jinja Replacement for Passwords

Jinja templating is part of the API request processing and can be used to specify environment variable names instead of sending password literals. See Using a Jinja Template.

curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' --header 'Authorization: BasicCreds <Base64EncodedPassword>' -d '{ \ 
 "query":"SELECT * FROM tamr.TAMR_TEST", \ 
 "datasetName":"oracle_test", \ 
 "queryConfig":{ \ 
 "jdbcUrl":"jdbc:oracle:thin:@192.168.99.101:1521:orcl", \ 
 "dbUsername":"{{ CONNECT_DB_USERNAME }}", \ 
 "dbPassword":"{{ CONNECT_DB_PASSWORD }}", \ 
 "fetchSize":1 \ 
 } \ 
 }' 'http://localhost:9100/api/connect/jdbcIngest'

Batch Dataset Query Import (Oracle jdbc url)

curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' --header 'Authorization: BasicCreds <Base64EncodedPassword>' -d '{ \ 
   "queryConfig": { \ 
     "jdbcUrl": "jdbc:oracle:thin:@192.168.99.101:1521:orcl", \ 
     "dbPassword": "<password>", \
     "dbUsername": "<username>", \
     "fetchSize": 0 \ 
   }, \ 
   "queryTargetList": [ \ 
     { \ 
       "query": "SELECT * FROM tamr.TAMR_TEST where rownum<2", \ 
       "datasetName": "oracle_test1", \ 
       "truncateTamrDataset": true,
       "primaryKey": [] \ 
     }, \ 
     { \ 
       "query": "SELECT * FROM tamr.TAMR_TEST where rownum>=2", \ 
       "datasetName": "oracle_test2", \ 
       "truncateTamrDataset": false,
       "primaryKey": [] \ 
     } \ 
   ] \ 
 }' 'http://localhost:9100/api/connect/jdbcIngest/batch'

Presto Import

JDBC driver added for import from Presto DB with SSL authentication.

curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' -d '{ \ 
   "query": "select supplier_id, supplier_number, primary_address_state, primary_address_country_code, po_method, payment_method from reg_prod.supplier", \ 
   "datasetName": "suppliers", \ 
   "primaryKey": [ \ 
     "supplier_id" \ 
   ], \ 
   "truncateTamrDataset": false,
   "queryConfig": { \ 
     "jdbcUrl": "jdbc:presto://prod.company.net:8446/hive/default?SSL=true&SSLTrustStorePath=/data/temp/Prod_key_files/truststore.jks&SSLTrustStorePassword=<password>&SSLKeyStorePath=/data/temp/Prod_key_files/keystorepresto.jks&SSLKeyStorePassword=<password>", \      
     "dbUsername": "<kerberos_user_name>", \
     "dbPassword": "<password>", \
     "fetchSize": 10000 \ 
   } \ 
 }' 'http://localhost:9100/api/connect/jdbcIngest/ingest'

Google BigQuery Examples

The BigQuery driver is provided by CData. See the full CData BigQuery documentation.

In the examples below, authentication is performed through the instance account. Other methods of authentication are available. See Authenticating to Google BigQuery and Using OAuth Authentication.

Google BigQuery Import

curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' --header 'Authorization: BasicCreds <Base64EncodedPassword>' -d '{ \ 
 "query": "SELECT * from my_target_table limit 100000000;",  \ 
 "datasetName": "example_source_dataset",  \ 
 "primaryKey": ["record_id"],  \ 
 "queryConfig": { "jdbcUrl": "jdbc:googlebigquery:AuthScheme=GCPInstanceAccount;ProjectId=tamr-proj;DatasetId=tamr-staging;"} \ 
 }' 'http://localhost:9100/api/connect/jdbcIngest/ingest'

Google BigQuery Export

curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' --header 'Authorization: BasicCreds <Base64EncodedPassword>' -d '{ \ 
 "createTable": true, \ 
 "exportDataConfig": { \ 
     "mergedArrayValuesDelimiter": "|", \ 
     "renameFields":{"col-with-dashes or with spaces": "a_valid_name"} \ 
 }, \ 
 "queryConfig": { \ 
     "jdbcUrl": "jdbc:googlebigquery:AuthScheme=GCPInstanceAccount;ProjectId=tamr-proj;DatasetId=tamr-staging;" \ 
 }, \ 
 "unifyDatasetName": "example_unified_dataset" \ 
 }' 'http://localhost:9100/api/connect/urlExport/jdbc'

Salesforce Examples

The Salesforce driver provided by CData. See the CData documentation for authentication options and connection string options.

If your organization has customized the Salesforce login URL, you need to set the LoginURL in the jdbcUrl request body key.

Salesforce Import

{
  "datasetName": "ingest_salesforce_example",
  "primaryKey": [ "Id" ],
  "query": "SELECT * FROM Tamr_Customer__c",
  "queryConfig": {
    "dbPassword": "<password>", \
    "dbUsername": "<username>", \
    "jdbcUrl": "jdbc:salesforce:Security Token=SecureToken;UseSandbox=true"
  }
}

Salesforce Export

{
  "createTable": false,
  "truncateBeforeLoad": true,
  "exportDataConfig": {},
  "queryConfig": {
    "dbPassword": "<password>",
    "dbUsername": "<username>",
    "jdbcUrl": "jdbc:salesforce:Security Token=SecureToken;UseSandbox=true"
  },
  "targetTableName": "Tamr_Customer__c",
  "unifyDatasetName": "GoldenAccounts_golden_records"
}

ALDS2 CSV Examples

ALDS2 CSV Import

curl -X POST --header 'Content-Type: application/json' --header 'Accept: text/plain' -d '{ 
   "url": "adls2://adls2store.blob.core.windows.net/con1/dir1/subDir1/cars.csv", \ 
   "accountName": "adls2store", \
   "datasetName": "myDelimitedDataset", \ 
   "primaryKey": [], \ 
   "clientId": "d08d0a03-40ce-4a7b-9367-3000000002", \ 
   "clientSecret": "qIKeyKeyKeywzmqepcxlcxtiY=", \ 
   "tenantId": "8d0a03-40ce-4a7b-9367-3000000002", \
 }' 'http://localhost:9100/api/connect/urlIngest/adls2/delimited'

ADLS2 CSV Export

curl -X POST --header 'Content-Type: application/json' --header 'Accept: text/plain' -d '{ 
   "url": "adls2://adls2store.blob.core.windows.net/con1/dir1/subDir1/cars.csv", \ 
   "accountName": "adls2store", \
   "datasetName": "myDelimitedDataset", \ 
   "primaryKey": [], \ 
   "clientId": "<clientId>", \ 
   "clientSecret": "<clientSecret>", \ 
   "tenantId": "<tenantId>", \
   "sinkConfig": { \
    "mergedArrayValuesDelimiter": "", \
    "columnsExcludeRegex": "", \
    "renameFields": {}, \
    "mergeArrayValuesEnabled": false, \
    "limitRecords": 0 \
   } \ 
 }' 'http://localhost:9100/api/connect/urlExport/adls2/delimited'

Redshift Export Example

{
  "unifyDatasetName": "lb_lineage.csv",
  "queryConfig": {
    "jdbcUrl": "jdbc:redshift://field-testing.cyap0dqxp5xb.us-east-1.redshift.amazonaws.com:5439/mydb",
    "dbPassword": "<password>", 
    "dbUsername": "<username>", 
    "fetchSize": 0
  },
  "targetTableName":"staging.output",
 "batchInsertSize": 10000,
  "truncateBeforeLoad": false,
  "exportDataConfig": {
    "mergedArrayValuesDelimiter": "||",
    "limitRecords": 0,
    "columnsExcludeRegex": "(last|first|middle).*"
  }
}

Azure SQL Export Example with Target Table Containing Spaces

{
  "unifyDatasetName": "employee names.txt",
  "queryConfig": {
    "jdbcUrl": "jdbc:sqlserver://connect.database.windows.net:1433;DatabaseName=test;EncryptionMethod=ssl;ValidateServerCertificate=0;",
    "dbPassword": "<password>", 
    "dbUsername": "<username>", 
    "fetchSize": 0
  },
  "truncateBeforeLoad": true,
  "targetTableName": "dbo.\"test period ai\"",
  "batchInsertSize": 10000,
  "exportDataConfig": {
    "mergedArrayValuesDelimiter": "|",
    "columnsExcludeRegex": "",
    "limitRecords": 0
  }
}

Delta Export Examples

You can choose to export a set of new or modified records of a Tamr Core dataset between two materialization versions. Using the same API endpoints for full export (either to file systems or database), in exportDeltaConfig set exportDelta to true, as shown in the example below).

Delta Export to RedShift for Latest Change

{
  "unifyDatasetName": "lb_lineage.csv",
  "queryConfig": {
    "jdbcUrl": "jdbc:redshift://field-testing.cyap0dqxp5xb.us-east-1.redshift.amazonaws.com:5439/mydb",
    "dbUsername": "<dbuser>",
    "dbPassword": "<dbpassword>",
    "fetchSize": 0
  },
  "targetTableName":"staging.output",
  "batchInsertSize": 10000,
  "truncateBeforeLoad": false,
  "exportDataConfig": {
    "exportDelta": true
  }
}

Delta Export to Redshift for Specific Versions

By default, when you set exportDelta to true, Core Connect attempts to export the set of new or modified records between the two latest versions of the specified dataset.

For changes between two versions of the dataset, specify fromVersion and/or toVersion in deltaConfig as shown below.

If the change between two specified versions or two latest versions of the requested dataset is not incremental (that is, the data is truncated and reloaded), the API call with exportDelta set to true returns an error.

{
  "unifyDatasetName": "lb_lineage.csv",
  "queryConfig": {
    "jdbcUrl": "jdbc:redshift://field-testing.site.us-east-1.redshift.amazonaws.com:5439/mydb",
    "dbUsername": "<dbusername>",
    "dbPassword": "<dbpassword>",
    "fetchSize": 0
  },
  "targetTableName":"staging.output",
  "batchInsertSize": 10000,
  "truncateBeforeLoad": false,
  "exportDataConfig": {
    "exportDelta": true,
    "deltaConfig": {
        "fromVersion": 100,
        "toVersion": 102
  }
}

Retrieve Metadata Example

Some JDBC sources store metadata properties for datasets in the source.

To retrieve metadata for services or for the dataset (currently available only for Snowflake) you can set the following body keys to true:

  • retrieveConnectMetadata: Core Connect retrieves the metadata for services.
  • retrieveSourceMetadata: Core Connect retrieves the metadata for the dataset stored in the Snowflake source. All source metadata properties have the prefix connect.source.

The following example imports both service and dataset metadata from Snowflake.

{
  "datasetName": "snowflake_dataset_CUSTOMER",
  "metadataConfig": {
    "retrieveConnectMetadata": true,
    "retrieveSourceMetadata": true
  },
  "primaryKey": ["C_CUSTKEY"],
  "query": "select * from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER",
  "queryConfig": {
    "dbPassword": "<password>",
    "dbUsername": "<username>",
    "fetchSize": 0,
    "jdbcUrl": "jdbc:snowflake://rx83695.us-central1.gcp.snowflakecomputing.com/?db=SNOWFLAKE_SAMPLE_DATA&warehouse=COMPUTE_WH&schema=TPCH_SF1&application=tamr"
  }
}

Profiling Example

curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' --header 'Authorization: BasicCreds <Base64EncodedPassword>' -d '{ \ 
   "queryConfig": { \ 
     "jdbcUrl": "jdbc:oracle:thin:@192.168.99.101:1521:orcl", \ 
     "dbUsername": "<userName>", \ 
     "dbPassword": "<password>", \ 
     "fetchSize": 1 \ 
   }, \ 
   "queryTargetList": [ \ 
     { \ 
       "query": "SELECT * FROM tamr.TAMR_TEST", \ 
       "datasetName": "profile_a", \ 
       "primaryKey": [] \ 
     }, \ 
     { \ 
       "query": "SELECT * FROM tamr.ATTRIBUTE_MAPPINGS", \ 
       "datasetName": "profile_a", \ 
       "primaryKey": [] \ 
     } \ 
  \ 
   ] \ 
 }' 'http://localhost:9100/api/jdbcIngest/profile'

This call uploads profiling information as records to the specified datasets.

Note: Tamr ignores any value you supply for primaryKey. This endpoint generates its own primary key by concatenating the table name and the column name.

The resulting dataset contains the following attributes based on the profile:

{"TableName", "Source Table Name"},
      {"ColumnName", "Column Name"},
      {"ColumnType", "Column Data Type"},
      {"DistinctValueCount", "Number of distinct values"},
      {"EmptyValueCount", "Number of empty values"},
      {"MaxValue", "Numerical or Date/Time Max"},
      {"MinValue", "Numerical or Date/Time Min"},
      {"MeanValue", "Numerical Mean"},
      {"StdDevValue", "Standard Deviation"},
      {"RecordCount", "Number of records"},
      {"TotalValueCount", "Number of values count values in array in case of multi-value delimiter"},
      {"Top100Values", "Top 100 most frequently occurring values"},
      {"Top100Frequencies", "Frequency of Top100Values in a string"},
      {"Top100Counts", "Absolute counts of Top100Values in a string"},
      {primaryKeyName, "Primary Key"}