Before You Begin

Verify the following before completing the procedures in this topic:

At least one mastering project exists (Creating a Project).
The project includes at one dataset, and you have performed schema mapping on the project’s unified dataset (Adding a Dataset).
You have run the Update Unified Dataset, Generate Pairs, and Apply Feedback and Update Results jobs at least once.
You have run the Review and Update Clusters job at least once.

Updating the Input Datasets

Complete this procedure for each input dataset in your project.

To update an input dataset:

Find the id of your input dataset: GET /v1/datasets
When this API is used with a filter parameter such as name==my-dataset.csv, it returns the dataset definition of the named dataset. From the API response, capture the numeric value after datasets/ from the id of the desired dataset to use as the datasetId in subsequent steps.
(Optional) Delete (truncate) records in the input dataset: DELETE /v1/datasets/{datasetId}/records
Complete this step if you want to remove all records currently in your input dataset and only include source records added during the next step. Use the datasetId you obtained in step 1.
Update dataset: POST /v1/datasets/{datasetId}:updateRecords
Update the records of the dataset {datasetId} using the command CREATE for new records and record updates and using the command DELETE to delete individual records. Use the datasetId you obtained in step 1. If you completed step 1, all records in this step are effectively inserted. In other words, no updates occur.

Generating, Publishing, and Exporting Clusters

Complete this procedure after updating your input datasets.

To generate, publish, and export clusters:

Find the id of your project dataset: GET /v1/projects
When this is used with a filter parameter such as name==my-project, it returns the project definition of the named project. From the API response, capture the numeric value after the projects/ from the id to use as the project id in subsequent steps, and the unifiedDatasetName of the desired project.
Update unified dataset: POST /v1/projects/{project}/unifiedDataset:refresh
Wait for operation: GET /v1/operations/{operationId}
Using the captured id from Step 2, poll the status state of the operation until status.state=SUCCEEDED is received.
Generate pairs: POST /v1/projects/{project}/recordPairs:refresh
Generate pairs using the project’s {projectId}.
(Optional) Generate high-impact pairs: POST /v1/projects/{project}/highImpactPairs:refresh
Wait for operation: GET /v1/operations/{operationId}
Predict matching pairs: POST /v1/project/{project}/recordPairsWithPredictions:refresh
Wait for operation: GET /v1/operations/{operationId}
Generate record clusters: POST /v1/projects/{project}/recordClusters:refresh
Apply the latest mastering model for the project {project} and generate clusters.
Wait for operation: GET /v1/operations/{operationId}
(Optional) To continuously monitor model performance:
a. Generate test records and clusters for users to curate: POST v1/projects/{project}/testRecords:refresh
Tamr Core will use the cluster verification on these test records to compute cluster accuracy metrics.
b. Wait for operation: GET /v1/operations/{operationId}
c. Generate high-impact training clusters for curation: POST v1/projects/{project}/trainingClusters:refresh
Users can filter to high-impact clusters on the UI to review and verify them.
d. Wait for operation: GET /v1/operations/{operationId}
e. Compute cluster accuracy metrics: POST v1/projects/{project}/clustersAccuracy:refresh
Use this endpoint after users provide feedback to most of the test records. (Learn more about cluster accuracy metrics).
f. Wait for operation: GET /v1/operations/{operationId}
Publish clusters: POST /v1/projects/{project}/publishedClusters:refresh
Wait for operation: GET /v1/operations/{operationId}
Download cluster records: GET /v1/projects/{project}/publishedClustersWithData/records
Download as CSV if text/csv is specified for the “Accept” header. Other supported
types are avro/binary to get an Avro file, or application/json to get JSON.

Alternate Methods of Data Ingestion and Export

This topic provides one method of data ingestion and data export. For other methods, see Exporting a Dataset to a Local File System and Uploading a Dataset into a Project.