Tamr Documentation

Continuous Operation of a Mastering Project

Run a Mastering project continuously from dataset updating through clustering.

Checklist before proceeding

  • At least one Mastering project exists (Creating a Project ).
  • At least one dataset is added to the project and its schema is mapped to the project's unified dataset (Adding a Dataset ).
  • The Update Unified Dataset job has been run at least once.
  • The Review and Update Clusters job has been run at least once.
  • An external dataset is created for export using POST /v1/datasets and its id captured from the response for use in Step 9 below. Use, for example, _clusters_with_data as the upstream dataset, viz. the upstreamDatasetIds list.

Continuous Operation

API calls to play a mastering project in continuous operation (upsert mode).API calls to play a mastering project in continuous operation (upsert mode).

API calls to play a mastering project in continuous operation (upsert mode).

  1. Update Dataset: POST /v1/datasets/{datasetId}:updateRecords?header=false. Update the records of the dataset {datasetId} using the command CREATE.

  2. Refresh Unified Dataset: POST /v1/projects/{project}/unifiedDataset:refresh Update the unified dataset of the project using its project id {project}. Additionally, capture the id from the response.

  3. Wait For Operation: GET /v1/operations/{operationId}
    Using the captured id from Step 2, poll the status state of the operation until status.state="SUCCEEDED" is received.

If the machine learning (ML) configuration has been edited (Toggle Inclusion in Machine Learning), then proceed to Step 4 else skip to Step 5.

  1. Update Recipe: POST /recipe/recipes/{id}/populate

  2. Refresh Record Pairs: POST /v1/projects/{project}/recordPairs:refresh
    Generate record pairs using the latest blocking model. Additionally, capture the id of the submitted operation from the response.

  3. Wait For Operation: GET /v1/operations/{operationId}
    Using the captured id from Step 5, poll the status state of the operation until status.state="SUCCEEDED" is received.

  4. Refresh Record Clusters: POST /v1/projects/{project}/recordClusters:refresh
    Apply the latest mastering model for the project {project} and generate clusters. Additionally, capture the id of the submitted operation from the response.

  5. Wait For Operation: GET /v1/operations/{operationId}
    Using the captured id from Step 7, poll the status state of the operation until status.state="SUCCEEDED" is received.

  6. Refresh Dataset Export: POST /v1/datasets/{datasetId}:refresh
    Export and materialize the dataset created in the checklist above (_clusters_with_data), using its {datasetId}.

  7. Wait For Operation: GET /v1/operations/{operationId}
    Using the captured id from Step 9, poll the status state of the operation until status.state="SUCCEEDED" is received.

Updated 2 months ago



Continuous Operation of a Mastering Project


Run a Mastering project continuously from dataset updating through clustering.

Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.