User GuidesAPI ReferenceRelease Notes
Doc HomeHelp CenterLog In
User Guides

Continuous Operation of a Mastering Project

Run a Mastering project continuously from dataset updating through clustering and exporting.

Checklist before proceeding

  • At least one Mastering project exists (Creating a Project ).
  • At least one dataset is added to the project and its schema is mapped to the project's unified dataset (Adding a Dataset ).
  • The Update Unified Dataset job has been run at least once.
  • The Review and Update Clusters job has been run at least once.
  • An external dataset is created for export using POST /v1/datasets and its id captured from the response for use in Step 9 below. Use, for example, _clusters_with_data as the upstream dataset, viz. the upstreamDatasetIds list.

Continuous Operation

1122

API calls to play a mastering project in continuous operation (upsert mode).

  1. Update Dataset: POST /v1/datasets/{datasetId}:updateRecords?header=false. Update the records of the dataset {datasetId} using the command CREATE.

  2. Refresh Unified Dataset: POST /v1/projects/{project}/unifiedDataset:refresh Update the unified dataset of the project using its project id {project}. Additionally, capture the id from the response.

  3. Wait For Operation: GET /v1/operations/{operationId}
    Using the captured id from Step 2, poll the status state of the operation until status.state="SUCCEEDED" is received.

If the machine learning (ML) configuration has been edited (Toggle Inclusion in Machine Learning), then proceed to Step 4 else skip to Step 5.
4. Update Recipe: POST /recipe/recipes/{id}/populate

  1. Refresh Record Pairs: POST /v1/projects/{project}/recordPairs:refresh
    Generate record pairs using the latest blocking model. Additionally, capture the id of the submitted operation from the response.

  2. Wait For Operation: GET /v1/operations/{operationId}
    Using the captured id from Step 5, poll the status state of the operation until status.state="SUCCEEDED" is received.

  3. Refresh Record Clusters: POST /v1/projects/{project}/recordClusters:refresh
    Apply the latest mastering model for the project {project} and generate clusters. Additionally, capture the id of the submitted operation from the response.

  4. Wait For Operation: GET /v1/operations/{operationId}
    Using the captured id from Step 7, poll the status state of the operation until status.state="SUCCEEDED" is received.

  5. Refresh Dataset Export: POST /v1/datasets/{datasetId}:refresh
    Export and materialize the dataset created in the checklist on top of this procedure, using the {datasetId}.

  6. Wait For Operation: GET /v1/operations/{operationId}
    Using the captured id from Step 9, poll the status state of the operation until status.state="SUCCEEDED" is received.