Deleting Projects and Associated Downstream Datasets from Tamr

WARNING

  • Once a project has been deleted, there is no way to retrieve that project back.

  • Be careful when deleting projects within a chain of projects e.g.
    Schema Mapping Project → Mastering Project → Golden Record Project

  • Deleting upstream projects e.g. the schema mapping/mastering project in this case can cause unintended/irreversible consequences.

    Recommendation: Take a backup before carrying out any deletions of Projects especially projects that are part of a chain of projects.

In order to completely remove a project and the datasets derived from it there are two steps involved:

  1. Deleting the Projects
  2. Deleting the Datasets

Note, before deleting the datasets derived from a project, you will need to delete the project first.

Deleting Projects

Deleting Projects not within Chained Projects

In this example, we are trying to delete the project called Customer 360. Before deleting a project, note down the name of the unified dataset that is associated with the project.

You can check this by accessing the unified dataset tab within your project. The unified dataset name is shown at the top left of the browser. In this case, the name of the unified dataset is customer_view

Proceed with deleting the project - refer to this link for deleting projects.

Deleting Projects Mastering Projects with a Golden Records Project

This example looks into what considerations need to be taken when deleting a mastering project that has a golden record project downstream as shown in the figure below.

By deleting the mastering project in this example, this forces the source data (published clusters dataset from the mastering project) to become static. Therefore, you will no longer be able to make any adjustments to your data e.g. from merging/splitting clusters, amending the mastering model or fresh data imports.

You will however, be able to amend your golden record rules and make any value overrides.

Deleting Datasets

Completely removing all the datasets associated with a mastering project, requires the use of APIs. However, the same APIs can be used to remove all datasets associated with other project types e.g. schema mapping, categorization or golden record projects.

WARNING:

  • Extreme caution must be taken when deleting datasets using APIs.
  • Ensure you are deleting the correct dataset before triggering any APIs.
  • Once this is triggered, any changes are irreversible. As with all serious modifications in Tamr, ensure that a backup is taken prior to executing this action.
  1. Login to Tamr using the web browser and go to swagger docs:

<http://hostname:9100/docs>.

Click on the versioned tab at the top so you can see the screenshot above.

  1. Use the GET /v1/Datasets API to get a list of all the datasets in Tamr. Click Try It Out!

When you obtain a response, Do Ctrl + F and find the unified dataset that you noted down from Step 1.

  1. Note down the ID corresponding to that dataset, as shown below:

In this case the id is 9.

  1. Now scroll down to the following API and click on it: DELETE /v1/datasets/{datasetId}

  1. Place the ID that you found from Step 5 (in this case ID was 9) and enter in ID in the Dataset ID Field. Ensure to change the cascade function to True. Click Try it out!

This will delete all the datasets that are dependent on the unified dataset (these are the datasets you had trouble deleting via the UI). Once this is completed, when you go back to the dataset catalog - all those datasets will no longer be listed there.