Factory Reset for Azure Cloud-Native Tamr Deployments

Follow these instructions if you want to reset Tamr to a clean, empty state in an Azure Cloud-Native deployment, without destroying and recreating all of the Azure resources.

Tamr’s configuration and state are kept in the following places. Each must be reset or at least renamed.

  • Tamr configuration
  • Postgres
  • Elasticsearch
  • HDInsight HBase
  • ADLS

Tamr Configuration

Tamr’s configuration in zookeeper is managed by the unify-admin.sh utility. Tamr’s dependencies must be running to perform this step. Tamr can be stopped but does not need to be.

Unless otherwise specified, all steps must be executed as the Tamr functional user.

First backup the current userDefined configuration so you can refer to the settings later:

cd <tamr_home>/tamr/utils

./unify-admin.sh config:get --userDefined > config_custom_<date>.yaml

./unify-admin.sh config:get > config_all_<date>.yaml

To access the HBase shell later: create a directory on the Tamr VM named hbase_conf_noenv and extract the HBase configuration files from Tamr’s Zookeeper:

cd <tamr_home>/tamr/utils

./unify-admin.sh zk:get --zk-path zk://localhost:21281/tamr/unify001/hbase-conf/hbase-site.xml > /path/to/hbase_conf_noenv/hbase-site.xml

./unify-admin.sh zk:get --zk-path zk://localhost:21281/tamr/unify001/hbase-conf/hbase-policy.xml > /path/to/hbase_conf_noenv/hbase-policy.xml

Once you have verified that the configuration is backed up, go ahead and clear out the configuration:

./unify-admin.sh config:reset

Then stop Tamr and its dependencies.

cd ..

./stop-unify.sh

./stop-dependencies.sh

Postgres

Clear out the metadata in Postgres by dropping and recreating the doit database.

Switch to the Postgres user and run psql (Postgres command line interface):

sudo su - postgres

psql

In psql, drop the database and recreate it.

DROP DATABASE doit;

CREATE DATABASE doit WITH OWNER tamr;

exit

Log off as a Postgres user and switch back to the Tamr functional user.

exit

Elasticsearch

Clear out Tamr’s data that has been indexed to Elasticsearch. Check the Tamr configuration variable TAMRES_APIHOST in config_all.yaml to find the host and port.

curl -X DELETE http://<TAMR_ES_APIHOST>/_all

HDInsight HBase

Tamr stores its data in HBase. You can either:

  • disable and drop the old tables in HBase. This is recommended and removes the old data.
  • or configure Tamr to use a new HBase namespace. This ignores the previous data but does not remove it. If you drop the previous tables, you can still choose to create a new namespace.

To drop the previous tables without ssh access:

If you don’t have ssh access to HBase, you can run the HBase shell from the Tamr VM.

To log into the HBase shell, run the following commands. The hbase_conf_noenv directory was created in the Tamr Configuration section, and it must contain the files Hbase-site.xml and Hbase-policy.xml.It must NOT contain the file Hbase-env.sh.

cd <tamr_home>/hbase-1.3.1/bin

export HBASE_CONF_DIR=/path/to/hbase_conf_noenv

export JAVA_HOME=<tamr_home>/openjdk-8u222 

./hbase shell

Note: For Tamr Core v2024.002.0 and later, replace openjdk-8u222with openjdk-17.0.11+9.

Use the HBase shell to first disable all the tables in the namespace, then to drop all the tables. Note that disabling and dropping all the tables may take a long time. The HBase namespace is given in the Tamr configuration variable TAMR_HBASE_NAMESPACE.

disable_all '<TAMR_HBASE_NAMESPACE>.*'

drop_all '<TAMR_HBASE_NAMESPACE>.*'

exit

To drop the previous tables using ssh access

If you do have ssh access to HDInsight HBase, you can use that directly to run the HBase commands.

First ssh into an HBase master node. You can find the names of master nodes from the Ambari UI, or try the standard ssh URL with your CLUSTERNAME (example below). The username and password for the UI were specified in the terraform config as gateway_username and gateway_password. The ssh username and ssh key were specified in the terraform configuration as username and ssh_public_key.

ssh [email protected]

Use the Hbase shell to first disable all the tables in the namespace, then to drop all the tables. Note that disabling and dropping all the tables may take a long time. The HBase namespace is given in the Tamr configuration variable TAMR_HBASE_NAMESPACE.

hbase shell

disable_all '<TAMR_HBASE_NAMESPACE>.*'

drop_all '<TAMR_HBASE_NAMESPACE>.*'

exit

Then log out of the node.

To use a new HBase namespace:

When configuring Tamr, set a new value for TAMR_HBASE_NAMESPACE. See the section below on Configuring and Starting Tamr.

ADLS

Tamr uses ADLS to share files and logs with Databricks Spark, and also to store dataset exports requested in the Tamr UI. You can either delete the existing directories and files (from the Azure portal if you have access), or define new, unique paths in the Tamr configuration to ignore the previous files.

You can retrieve these values from configcustom.yaml. The Tamr configuration variables containing ADLS paths are:

TAMR_UNIFY_DATA_DIR

TAMR_JOB_SPARK_EVENT_LOGS_DIR

TAMR_JOB_DATABRICKS_WORKINGSPACE

For example, if these were your previous values, you could change “dev” to “dev2”:
TAMR_UNIFY_DATA_DIR: "/mnt/<container_name>/tamr/unify-data-dev”

TAMR_JOB_SPARK_EVENT_LOGS_DIR: "dbfs:/mnt/<container_name>/tamr/unify-data-dev/job/sparkEventLogs"

TAMR_JOB_DATABRICKS_WORKINGSPACE: "/FileStore/dev/jars"

See the section below on Configuring and Starting Tamr.

Configuring and Starting Tamr

At this point, you have a fresh installation of Tamr and need to set Tamr’s configuration. Follow the Tamr public documentation starting with Step 11: Configure Tamr..

Make sure that Zookeeper is started to be able to set Tamr’s configuration.

cd <tamr_home>/tamr

./start-zk.sh

You may make a copy of configcustom.yaml to use the previous custom configuration values as a starting point.

Make sure to change the values related to HBase namespace and ADLS paths if needed.

IMPORTANT: to change the Tamr system user password with TAMR_SYSTEM_PASSWORD, you should NOT set that variable before first starting Tamr. Instead, follow the directions in the Tamr public docs to set the system user password first in the Tamr UI, then in Tamr’s configuration variable. If you previously had TAMR_SYSTEM_PASSWORD set in your custom configuration Yaml file, comment it out or remove it when doing the first-time configuration of Tamr.