Tamr Documentation

Batch Operation of a Categorization Project

Run Tamr in batch mode from dataset truncate, through categorization and export.

API calls to play a categorization project in batch operation (truncate mode).API calls to play a categorization project in batch operation (truncate mode).

API calls to play a categorization project in batch operation (truncate mode).

Batch Operation

  1. Truncate Dataset: POST /dataset/datasets/{name}/truncate
    Truncate the records of the dataset {name}. This removes all existing records from the dataset.
  2. Update Dataset: POST /dataset/datasets/{name}/update
    Update the records of the dataset {name} using the command CREATE. Because of Step 1, all records in this step are effectively inserted. In other words, no updates occur.
  3. Update Unified Dataset: POST /projects/{project}/unifiedDataset:refresh
    Update the unified dataset using its associated project's ID. Additionally, capture the id of the operation from the response.
  4. Wait For Task: GET /operations/{operationId}
    Poll the storage status of the task submitted in Step 3 using the captured id until state=SUCCEEDED received.

If the machine learning (ML) configuration has been edited (Toggle Inclusion in Machine Learning), then proceed to Step 5 else skip to Step 6.

  1. Update Recipe with the internal API POST /recipe/recipes/{id}/populate.
  2. Update Categorizations with the internal API POST /recipe/recipes/{recipeId}/run/categorizations.
    Update the categorizations using its {recipeId} and the operation keyword categorizations. Additionally, capture the id of the submitted job from the response.
  3. Wait For Task: GET /operations/{operationId}
    Poll the storage status of the task submitted in Step 6 using the captured id until status=SUCCEEDED received.
  4. Materialize Categorized Dataset: POST /export
    Materialize the categorized dataset {datasetId} to a specified export configuration and data format and capture the jobId of export from the response.
  5. Wait For Task: GET /operations/{operationId}
    Using the captured id from Step 8, poll the status of the task until state=SUCCEEDED received.
  6. Export Dataset: GET /export/read/dataset/{datasetId}
    Export the materialized categorized dataset {datasetId}.

Updated 3 months ago



Batch Operation of a Categorization Project


Run Tamr in batch mode from dataset truncate, through categorization and export.

Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.