User GuidesAPI ReferenceRelease Notes
Doc HomeHelp CenterLog In
User Guides

Batch Operation of a Mastering Project

Run Tamr in batch mode from dataset truncate through mastering and exporting.

1508

API calls to play a mastering project in batch operation (truncate and load mode).

Batch Operation

  1. Truncate Dataset: POST /dataset/datasets/{name}/truncate
    Truncate the records of the dataset {name}. This removes all existing records from the dataset.
  2. Update Dataset: POST /dataset/datasets/{name}/update
    Update the records of the dataset {name} using the command CREATE. Because of Step 1. all records in this step are effectively inserted (i.e. no updates occur).
  3. Update Unified Dataset: POST /recipe/recipes/{recipeId}/run/records
    Update the unified dataset using its {recipeId} and the operation keyword records. Additionally, capture the uuid of the submitted task from the response.
  4. Wait For Task: GET /dataset/tasks/uuid/{uuid}
    Poll the storage status of the task submitted in Step 3 using the captured uuid until storageStatus=COMMITTED received.

If the machine learning (ML) configuration has been edited (Toggle Inclusion in Machine Learning), then proceed to Step 5 else skip to Step 6.
5. Update Recipe: POST /recipe/recipes/{id}/populate
6. Generate Record Pairs: POST /recipe/recipes/{recipeId}/run/pairs
Generate record pairs using the pairs {recipeId} and the operation keyword pairs. Additionally, capture the id of the submitted job from the response.
7. Wait For Task: GET /dataset/tasks/uuid/{uuid}
Poll the storage status of the task submitted in Step 6 using the captured uuid until storageStatus=COMMITTED received.
8. Update Results: POST /recipe/recipes/{recipeId}/run/trainPredictCluster
Apply the entity resolution model using its {recipeId} and the operation keyword trainPredictCluster. Additionally, capture the id of the submitted job from the response.
9. Wait For Task: GET /dataset/tasks/uuid/{uuid}
Poll the storage status of the task submitted in Step 8 using the captured uuid until storageStatus=COMMITTED received.
10. Materialize Clusters Dataset: POST /export
Materialize the clusters dataset {datasetId} to a specified export configuration and data format and capture the id of the submitted job from the response.
11. Fetch UUID: GET /job/jobs/{id}
Using the captured id in Step 10, fetch the job details and capture the uuid.
12. Wait For Task: GET /dataset/tasks/uuid/{uuid}
Using the captured uuid from Step 11, poll the storage status of the task until storageStatus=COMMITTED received.
13. Export Dataset: GET /export/read/dataset/{datasetId}
Export the materialized categorized dataset {datasetId}.