How to Clear and Reindex Elasticsearch
This article is relevant for Tamr versions 2019.023 and above.
Level of complexity: Complex (for advanced users)
Use case: Clearing and reindexing Elasticsearch (ES) is a way to ensure that the data contained in Elasticsearch has the correct content and structure and is also in sync with what is in Postgres and/or Hbase. These systems can become out of sync in certain scenarios where the data content or structure changes dramatically (for example, during an upgrade or potentially from troubleshooting complex issues).
Important: Take a Tamr backup before applying the resolution suggested below.
How-to-guide for clearing and reindexing Elasticsearch
Option 1: Clearing all ES data
This process manually deletes all ES indexes.
This process applies only to single-node ES clusters and does not work in cloud-native deployments (for example, Tamr on GCP) where a shared ES cluster is used.
- Stop Tamr (do not stop dependencies).
- Run the following delete command:
curl -X DELETE localhost:9200/_all
If ES is not running on the default port, replace9200
with the value of the port in the TAMR_ES_APIHOST configuration variable.
Confirm the response is "true". If the response is "false" then attempt the delete command again.
Important: This command leaves the Tamr UI in an unusable state until ES is repopulated via the reindex APIs (step 4). - Start Tamr.
- Run the following two reindexing APIs to repopulate Elasticsearch, in this order:
reindex/all-datascale
reindex/all-humanscale
- Update golden records, following these instructions.
Option 2: Clearing ES data for individual projects
This process cleans up ES indexes for individual projects.
- Stop Tamr (do not stop dependencies).
- Run the following delete command, supplying the numeric ID of the project as the
<id>
:
curl -X DELETE localhost:9200/tamr_project_<id>
If ES is not running on the default port, replace9200
with the value of the port in the TAMR_ES_API_PORT configuration variable. - Start Tamr.
- Make small insignificant changes to re-materialize the unified dataset. For example:
select *;
- Rerun the pipelines of the project.
- Update golden records, following these instructions.
Updated about 2 months ago