Batch Operation of a Categorization Project
Run Tamr in batch mode from dataset truncate, through categorization and export.

API calls to play a categorization project in batch operation (truncate mode).
Batch Operation
- Truncate Dataset: POST /dataset/datasets/{name}/truncate
Truncate the records of the dataset{name}
. This removes all existing records from the dataset. - Update Dataset: POST /dataset/datasets/{name}/update
Update the records of the dataset{name}
using the commandCREATE
. Because of Step 1. all records in this step are effectively inserted (i.e. no updates occur). - Update Unified Dataset: POST /recipe/recipes/{recipeId}/run/records
Update the unified dataset using its{recipeId}
and the operation keywordrecords
. Additionally, capture theuuid
of the submitted task from the response. - Wait For Task: GET /dataset/tasks/uuid/{uuid}
Poll the storage status of the task submitted in Step 3 using the captureduuid
untilstorageStatus=COMMITTED
received.
If the machine learning (ML) configuration has been edited (Toggle Inclusion in Machine Learning), then proceed to Step 5 else skip to Step 6.
5. Update Recipe: POST /recipe/recipes/{id}/populate
6. Update Categorizations: POST /recipe/recipes/{recipeId}/run/categorizations
Update the categorizations using its {recipeId}
and the operation keyword categorizations
. Additionally, capture the uuid
of the submitted task from the response.
7. Wait For Task: GET /dataset/tasks/uuid/{uuid}
Poll the storage status of the task submitted in Step 6 using the captured uuid
until storageStatus=COMMITTED
received.
8. Materialize Categorized Dataset: POST /export
Materialize the categorized dataset {datasetId}
to a specified export configuration and data format and capture the id
of the submitted job from the response.
9. Fetch UUID: GET /job/jobs/{id}
Using the captured id
in Step 8, fetch the job details and capture the uuid
.
10. Wait For Task: GET /dataset/tasks/uuid/{uuid}
Using the captured uuid
from Step 9, poll the storage status of the task until storageStatus=COMMITTED
received.
11. Export Dataset: GET /export/read/dataset/{datasetId}
Export the materialized categorized dataset {datasetId}
.
Updated over 5 years ago