Continuous Operation of a Categorization Project
Run Tamr continuously from dataset updating through categorization and exporting.
Checklist before proceeding
- At least one Categorization project exists (Creating a Project ).
- At least one dataset is added to the project and is schema mapped to the project's unified dataset (Adding a Dataset ).
- The Update Unified Dataset and Update Categorizations jobs have both been executed.
- An external dataset is created for export using POST /v1/datasets and its
id
captured from the response for use in Step 8 below. Use, for example, _classification_with_records as the upstream dataset, viz. theupstreamDatasetIds
list.
Continuous Operation

API calls to play a categorization project in continuous operation (upsert mode).
-
Update Dataset: POST /v1/datasets/{datasetId}:updateRecords?header=false
Update the records of the dataset{datasetId}
using the commandCREATE
. -
Refresh Unified Dataset: POST /v1/projects/{project}/unifiedDataset:refresh Update the unified dataset of the project using its project id
{project}
. Additionally, capture theid
from the response. -
Wait For Operation: GET /v1/operations/{operationId}
Using the capturedid
from Step 2, poll the status state of the operation untilstatus.state="SUCCEEDED"
received. -
Refresh Model: POST /v1/projects/{project}/categorizations/model:refresh
Additionally, capture theid
of the submitted operation from the response. -
Wait For Operation: GET /v1/operations/{operationId}
Using the capturedid
from Step 4, poll the status state of the operation untilstatus.state="SUCCEEDED"
received. -
Refresh Categorizations: POST /v1/projects/{project}/categorizations:refresh
Apply the categorization model for the project{project}
. Additionally, capture theid
of the submitted operation from the response. -
Wait For Operation: GET /v1/operations/{operationId}
Using the capturedid
from Step 6, poll the status state of the operation untilstatus.state="SUCCEEDED"
received. -
Refresh Dataset Export: POST /v1/datasets/{datasetId}:refresh
Export and materialize the dataset created in checklist above using the{datasetId}
. -
Wait For Operation: GET /v1/operations/{operationId}
Using the capturedid
from Step 8, poll the status state of the operation untilstatus.state="SUCCEEDED"
received.
Updated over 5 years ago