Tamr Documentation

Batch Operation of a Mastering Project

Run Tamr in batch mode from dataset truncate through mastering and exporting.

API calls to play a mastering project in batch operation (truncate and load mode).API calls to play a mastering project in batch operation (truncate and load mode).

API calls to play a mastering project in batch operation (truncate and load mode).

Batch Operation

  1. Truncate Dataset: DELETE /v1/datasets/{datasetId}/records
    Truncate the records of the dataset {name}. This removes all existing records from the dataset.
  2. Update Dataset: POST /v1/datasets/{datasetId}:updateRecords
    Update the records of the dataset {name} using the command CREATE. Because of Step 1, all records in this step are effectively inserted. In other words, no updates occur.
  3. Update Unified Dataset: POST /v1/projects/{project}/unifiedDataset:refresh
    Update the unified dataset using its associated project's ID. Additionally, capture the id of the operation from the response.
  4. Wait For Operation: GET /v1/operations/{operationId}
    Poll the storage status of the task submitted in Step 3 using the captured id until state=SUCCEEDED is received.

If the machine learning (ML) configuration has been edited (see Configuring Inclusion in Machine Learning), then proceed to Step 5, else skip to Step 6.

  1. Generate Record Pairs using an internal API. POST /v1/projects/{project}/recordPairs:refresh.
    Generate record pairs using the pairs {recipeId} and the operation keyword pairs. Additionally, capture the id of the submitted job from the response.
  2. Wait For Operation: GET /v1/operations/{operationId}
    Poll the storage status of the task submitted in Step 6 using the captured id until status=SUCCEEDED is received.
  3. Update Results, using an internal API. Contact Tamr representative for assistance: POST /v1/projects/{project}/recordClustersWithData:refresh.
    Apply the entity resolution model using its {recipeId} and the operation keyword trainPredictCluster. Additionally, capture the id of the submitted job from the response.
  4. Wait For Operation: GET /v1/operations/{operationId}
    Poll the storage status of the task submitted in Step 8 using the captured id until status=SUCCEEDED is received.
  5. Materialize Clusters Dataset: POST /export
    Materialize the clusters dataset {datasetId} to a specified export configuration and data format and capture the id of the submitted job from the response.
  6. Fetch UUID: GET /export/{exportId}
    Using the captured id in Step 10, fetch the job details and capture the uuid.
  7. Wait For Operation: GET /export/read/{exportId}/
    Poll the storage status of the task submitted in Step 11 using the captured id until status=SUCCEEDED or status=FAILED is received.
  8. Export Dataset: GET /export/read/dataset/{datasetId}
    Export the materialized categorized dataset {datasetId}.

Updated 2 months ago



Batch Operation of a Mastering Project


Run Tamr in batch mode from dataset truncate through mastering and exporting.

Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.