Tamr Documentation

Continuous Operation of a Categorization Project

Run Tamr continuously from dataset updating through categorization and exporting.

Checklist before proceeding

  • At least one Categorization project exists (Creating a Project ).
  • At least one dataset is added to the project and is schema mapped to the project's unified dataset (Adding a Dataset ).
  • The Update Unified Dataset and Update Categorizations jobs have both been executed.
  • An external dataset is created for export using POST /v1/datasets and its id captured from the response for use in Step 8 below. Use, for example, _classification_with_records as the upstream dataset, viz. the upstreamDatasetIds list.

Continuous Operation

API calls to play a categorization project in continuous operation (upsert mode).API calls to play a categorization project in continuous operation (upsert mode).

API calls to play a categorization project in continuous operation (upsert mode).

  1. Update Dataset: POST /v1/datasets/{datasetId}:updateRecords?header=false
    Update the records of the dataset {datasetId} using the command CREATE.

  2. Refresh Unified Dataset: POST /v1/projects/{project}/unifiedDataset:refresh Update the unified dataset of the project using its project id {project}. Additionally, capture the id from the response.

  3. Wait For Operation: GET /v1/operations/{operationId}
    Using the captured id from Step 2, poll the status state of the operation until status.state="SUCCEEDED" received.

  4. Refresh Model: POST /v1/projects/{project}/categorizations/model:refresh
    Additionally, capture the id of the submitted operation from the response.

  5. Wait For Operation: GET /v1/operations/{operationId}
    Using the captured id from Step 4, poll the status state of the operation until status.state="SUCCEEDED" received.

  6. Refresh Categorizations: POST /v1/projects/{project}/categorizations:refresh
    Apply the categorization model for the project {project}. Additionally, capture the id of the submitted operation from the response.

  7. Wait For Operation: GET /v1/operations/{operationId}
    Using the captured id from Step 6, poll the status state of the operation until status.state="SUCCEEDED" received.

  8. Refresh Dataset Export: POST /v1/datasets/{datasetId}:refresh
    Export and materialize the dataset created in checklist above using the {datasetId}.

  9. Wait For Operation: GET /v1/operations/{operationId}
    Using the captured id from Step 8, poll the status state of the operation until status.state="SUCCEEDED" received.

Updated 3 months ago



Continuous Operation of a Categorization Project


Run Tamr continuously from dataset updating through categorization and exporting.

Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.