How to Clear and Reindex Elasticsearch

This article is relevant for Tamr versions 2019.023 and above.

Level of complexity: Complex (for advanced users)

Use case: Clearing and reindexing Elasticsearch (ES) is a way to ensure that the data contained in Elasticsearch has the correct content and structure and is also in sync with what is in Postgres and/or Hbase. These systems can become out of sync in certain scenarios where the data content or structure changes dramatically (for example, during an upgrade or potentially from troubleshooting complex issues).

Important: Take a Tamr backup before applying the resolution suggested below.

How-to-guide for clearing and reindexing Elasticsearch

Option 1: Clearing all ES data

This process manually deletes all ES indexes.

This process applies only to single-node ES clusters and does not work in cloud-native deployments (for example, Tamr on GCP) where a shared ES cluster is used.

  1. Stop Tamr (do not stop dependencies).
  2. Run the following delete command:
    curl -X DELETE localhost:9200/_all
    If ES is not running on the default port, replace 9200 with the value of the port in the TAMR_ES_APIHOST configuration variable.
    Important: This command leaves the Tamr UI in an unusable state until ES is repopulated via the reindex APIs (step 4).
  3. Start Tamr.
  4. Run the following two reindexing APIs to repopulate Elasticsearch, in this order:
    • reindex/all-datascale
    • reindex/all-humanscale

  1. Update golden records, following these instructions.

Option 2: Clearing ES data for individual projects

This process cleans up ES indexes for individual projects.

  1. Stop Tamr (do not stop dependencies).
  2. Run the following delete command, supplying the numeric ID of the project as the <id>:
    curl -X DELETE localhost:9200/tamr_project_<id>
    If ES is not running on the default port, replace 9200 with the value of the port in the TAMR_ES_API_PORT configuration variable.
  3. Start Tamr.
  4. Make small insignificant changes to re-materialize the unified dataset. For example:
    select *;
  5. Rerun the pipelines of the project.
  6. Update golden records, following these instructions.