Continuous Operation of a Mastering Project
Run a Mastering project continuously from dataset updating through clustering and exporting.
Checklist before proceeding
- At least one Mastering project exists (Creating a Project ).
- At least one dataset is added to the project and is schema mapped to the project's unified dataset (Adding a Dataset ).
- The Update Unified Dataset and Update Categorizations jobs have both been executed at least once.
- An external dataset is created for export using POST /v1/datasets and its
id
captured from the response for use in Step 9 below. Use, for example, _clusters_with_data as the upstream dataset, viz. theupstreamDatasetIds
list.
Continuous Operation

API calls to play a mastering project in continuous operation (upsert mode).
-
Update Dataset: POST /v1/datasets/{datasetId}:updateRecords?header=false
Update the records of the dataset{datasetId}
using the commandCREATE
. -
Refresh Unified Dataset: POST /v1/projects/{project}/unifiedDataset:refresh Update the unified dataset of the project using its project id
{project}
. Additionally, capture theid
from the response. -
Wait For Operation: GET /v1/operations/{operationId}
Using the capturedid
from Step 2, poll the status state of the operation untilstatus.state="SUCCEEDED"
received.
If the machine learning (ML) configuration has been edited (Toggle Inclusion in Machine Learning), then proceed to Step 4 else skip to Step 5.
4. Update Recipe: POST /recipe/recipes/{id}/populate
-
Refresh Record Pairs: POST /v1/projects/{project}/recordPairs:refresh
Generate record pairs using the latest binning model. Additionally, capture theid
of the submitted operation from the response. -
Wait For Operation: GET /v1/operations/{operationId}
Using the capturedid
from Step 5, poll the status state of the operation untilstatus.state="SUCCEEDED"
received. -
Refresh Record Clusters: POST /v1/projects/{project}/recordClusters:refresh
Apply the latest mastering model for the project{project}
and generate clusters. Additionally, capture theid
of the submitted operation from the response. -
Wait For Operation: GET /v1/operations/{operationId}
Using the capturedid
from Step 7, poll the status state of the operation untilstatus.state="SUCCEEDED"
received. -
Refresh Dataset Export: POST /v1/datasets/{datasetId}:refresh
Export and materialize the dataset created in checklist above using the{datasetId}
. -
Wait For Operation: GET /v1/operations/{operationId}
Using the capturedid
from Step 9, poll the status state of the operation untilstatus.state="SUCCEEDED"
received.
Updated over 5 years ago