User GuidesAPI ReferenceRelease Notes
Doc HomeHelp CenterLog In
User Guides

Continuous Operation of a Categorization Project

Run Tamr continuously from dataset updating through categorization and exporting.

Checklist before proceeding

  • At least one Categorization project exists (Creating a Project ).
  • At least one dataset is added to the project and is schema mapped to the project's unified dataset (Adding a Dataset ).
  • The Update Unified Dataset and Update Categorizations jobs have both been executed.
  • An external dataset is created for export using POST /v1/datasets and its id captured from the response for use in Step 8 below. Use, for example, _classification_with_records as the upstream dataset, viz. the upstreamDatasetIds list.

Continuous Operation

1452

API calls to play a categorization project in continuous operation (upsert mode).

  1. Update Dataset: POST /v1/datasets/{datasetId}:updateRecords?header=false
    Update the records of the dataset {datasetId} using the command CREATE.

  2. Refresh Unified Dataset: POST /v1/projects/{project}/unifiedDataset:refresh Update the unified dataset of the project using its project id {project}. Additionally, capture the id from the response.

  3. Wait For Operation: GET /v1/operations/{operationId}
    Using the captured id from Step 2, poll the status state of the operation until status.state="SUCCEEDED" received.

  4. Refresh Model: POST /v1/projects/{project}/categorizations/model:refresh
    Additionally, capture the id of the submitted operation from the response.

  5. Wait For Operation: GET /v1/operations/{operationId}
    Using the captured id from Step 4, poll the status state of the operation until status.state="SUCCEEDED" received.

  6. Refresh Categorizations: POST /v1/projects/{project}/categorizations:refresh
    Apply the categorization model for the project {project}. Additionally, capture the id of the submitted operation from the response.

  7. Wait For Operation: GET /v1/operations/{operationId}
    Using the captured id from Step 6, poll the status state of the operation until status.state="SUCCEEDED" received.

  8. Refresh Dataset Export: POST /v1/datasets/{datasetId}:refresh
    Export and materialize the dataset created in checklist above using the {datasetId}.

  9. Wait For Operation: GET /v1/operations/{operationId}
    Using the captured id from Step 8, poll the status state of the operation until status.state="SUCCEEDED" received.