Tamr Documentation

Utilities for Validation and System-Wide Processes

Use utilities to run health checks and other system-wide processes on demand.

Validation Health Checks

You can use <tamr-home-directory>/tamr/utils/unify-admin.sh validate to check the current state of the Tamr software and configuration, and diagnose possible problems.

Validation Command Flags

The validate option has the following flags.

validate [-f <file>] [-h] [-l] [-v]

  • -f,--file <file> The file to which you would like to write validation results.
  • -h,--help Provides help for this command (this information).
  • -l,--list Lists the validation groups, such as pre-dependency, or pre-start, and the validation checks that run in those groups.
  • -v,--verbose Enables verbose logging to the console.

List of Validation Checks

Tamr validation scripts include, but are not limited to, the following checks:

  • Tamr license. Verifies the Tamr license validity and fails if the provided license is expired. The script returns the license information, such as the license key and type (evaluation or not), creation and expiration dates, customer name, and description.
  • Operating System. Verifies that the user is not the root user and prints diagnostic information about the operating system running Tamr.
  • Memory usage. Verifies that the memory settings Tamr requires are sufficient based on the operating system it is running on.
  • Permissions, symbolic links, and disk space. The process walks down the path from each Tamr-configured subdirectory, checks for permissions, the presence of symbolic links, and available storage space. It reports if it cannot access a directory, and warns you if storage space is not sufficient in any of the directories (typically, the space should not be less than 1GB).
  • HBase external dependency. Verifies that HBase servers are running on expected ports, the Zookeeper cluster required by HBase is running as expected, and that Tamr can establish a connection to this cluster and retrieve its status. The script logs diagnostic information that HBase provides.
  • PostgreSQL external dependency. Verifies that Tamr can connect to the PostgreSQL database, ensures that its version is compatible, and runs simple queries. The script also logs diagnostic information about the presence of persistence and dataset schemas required by Tamr, such as their Postgres-specific migration status.
  • Elasticsearch external dependency. Verifies the connectivity to the Elasticsearch cluster used to power the user interface in Tamr, and ensures that this cluster is running. The script also verifies version compatibility; and that the Elasticsearch data directory exists, has sufficient free space, and is readable and writeable.

Note: If set to true, the --skipEnvironmentValidation flag allows the upgrade process to skip system validation checks at the start of the upgrade command. Use this flag with caution as it allows the upgrade process to proceed with a potentially invalid configuration which could cause it to fail. For more information, see Upgrading Tamr.

The following example lists validation groups and the checks that run in those groups:

./unify-admin.sh validate -v post-start

DEBUG: LogEvents - Checking whether ZK config cluster running at localhost:21281 is healthy
DEBUG: LogEvents - Trying to run ZooKeeper command 'ruok' against zk://localhost:21281
DEBUG: LogEvents - Finished trying to run 'ruok'.
DEBUG: ValidateCommand - Loading all system configuration
DEBUG: ValidateCommand - Finished loading all system configuration
INFO : ValidateCommand - Running validation groups: [post-start]
DEBUG: ValidateCommand - Running post-start validations:
DEBUG: ValidateCommand - Running HealthChecksValidator.
DEBUG: ValidateCommand - Description: Runs the healthchecks for each of Tamr's microservices
ERROR StatusLogger No Log4j 2 configuration file found. Using default configuration (logging only errors to the console), or user programmatically provided configurations. Set system property 'log4j2.debug' to show Log4j 2 internal initialization logging. See https://logging.apache.org/log4j/2.x/manual/configuration.html for instructions on how to configure Log4j 2
INFO : HealthChecksValidator -
DEBUG: HealthChecksValidator - Checking health of service: procurify
DEBUG: HealthChecksValidator - procurify service is healthy!
DEBUG: HealthChecksValidator - procurify service healthcheck finished.
INFO : HealthChecksValidator -
INFO : HealthChecksValidator -
DEBUG: HealthChecksValidator - Checking health of service: transform
DEBUG: HealthChecksValidator - transform service is healthy!
DEBUG: HealthChecksValidator - transform service healthcheck finished.
INFO : HealthChecksValidator -
INFO : HealthChecksValidator -
DEBUG: HealthChecksValidator - Checking health of service: dataset
DEBUG: HealthChecksValidator - dataset service is healthy!
DEBUG: HealthChecksValidator - dataset service healthcheck finished.
INFO : HealthChecksValidator -
INFO : HealthChecksValidator -
DEBUG: HealthChecksValidator - Checking health of service: auth
DEBUG: HealthChecksValidator - auth service is healthy!
DEBUG: HealthChecksValidator - auth service healthcheck finished.
INFO : HealthChecksValidator -
INFO : HealthChecksValidator -
DEBUG: HealthChecksValidator - Checking health of service: dedup
DEBUG: HealthChecksValidator - dedup service is healthy!
DEBUG: HealthChecksValidator - dedup service healthcheck finished.
INFO : HealthChecksValidator -
INFO : HealthChecksValidator -
DEBUG: HealthChecksValidator - Checking health of service: preview
DEBUG: HealthChecksValidator - preview service is healthy!
DEBUG: HealthChecksValidator - preview service healthcheck finished.
INFO : HealthChecksValidator -
INFO : HealthChecksValidator -
DEBUG: HealthChecksValidator - Checking health of service: recipe
DEBUG: HealthChecksValidator - recipe service is healthy!
DEBUG: HealthChecksValidator - recipe service healthcheck finished.
INFO : HealthChecksValidator -
INFO : HealthChecksValidator -
DEBUG: HealthChecksValidator - Checking health of service: pubapi
DEBUG: HealthChecksValidator - pubapi service is healthy!
DEBUG: HealthChecksValidator - pubapi service healthcheck finished.
INFO : HealthChecksValidator -
INFO : HealthChecksValidator -
DEBUG: HealthChecksValidator - Checking health of service: taxonomy
DEBUG: HealthChecksValidator - taxonomy service is healthy!
DEBUG: HealthChecksValidator - taxonomy service healthcheck finished.
INFO : HealthChecksValidator -
INFO : HealthChecksValidator -
DEBUG: HealthChecksValidator - Checking health of service: persist
DEBUG: HealthChecksValidator - persist service is healthy!
DEBUG: HealthChecksValidator - persist service healthcheck finished.
INFO : HealthChecksValidator -
INFO : HealthChecksValidator -
DEBUG: HealthChecksValidator - Checking health of service: match
DEBUG: HealthChecksValidator - match service is healthy!
DEBUG: HealthChecksValidator - match service healthcheck finished.
INFO : HealthChecksValidator -
DEBUG: ValidateCommand - Finished HealthChecksValidator.
DEBUG: ValidateCommand - Finished post-start validations.
INFO : ValidateCommand -
INFO : ValidateCommand - *=======================*
INFO : ValidateCommand - | Validation succeeded! |
INFO : ValidateCommand - *=======================*
INFO : ValidateCommand -

In contrast, this example shows the output when the health check for Elasticsearch fails:

...
DEBUG: HealthChecksValidator - Checking health of service: dedup
WARN : HealthChecksValidator - Unhealthy service: dedup
INFO : HealthChecksValidator - 
WARN : HealthChecksValidator - Healthcheck failed for service ElasticSearchCluster: = Result{isHealthy=false, message=java.net.ConnectException: Connection refused, error=java.lang.Throwable: java.net.ConnectException: Connection refused, timestamp=2020-03-11T04:54:21.731Z}
DEBUG: HealthChecksValidator - dedup service healthcheck finished.
...
INFO : ValidateCommand - *===============================================*
INFO : ValidateCommand - | Encountered 5 warnings and 0 critical errors. |
INFO : ValidateCommand - *===============================================*
INFO : ValidateCommand - 
WARN : ValidateCommand - Validation 'post-start':
WARN : ValidateCommand - [WARNING] Unhealthy service: procurify
WARN : ValidateCommand - [WARNING] Healthcheck failed for service UnifyElasticSearchDataCluster: = Result{isHealthy=false, message=java.net.ConnectException: Connection refused, error=java.lang.Throwable: java.net.ConnectException: Connection refused, timestamp=2020-03-11T04:54:21.577Z}
WARN : ValidateCommand - [WARNING] Unhealthy service: dedup
WARN : ValidateCommand - [WARNING] Healthcheck failed for service ElasticSearchCluster: = Result{isHealthy=false, message=java.net.ConnectException: Connection refused, error=java.lang.Throwable: java.net.ConnectException: Connection refused, timestamp=2020-03-11T04:54:21.731Z}

Running Validation Checks

Validation checks run by default before an upgrade, but you can also run them at any time.

To run health check validation scripts:

  1. Run the administrative utility with the new validate option, such as:
<tamr-home-directory>/tamr/utils/unify-admin.sh validate
  1. To obtain detailed output, run <tamr-home-directory>/tamr/utils/unify-admin.sh validate -v.
  2. To create a report of validation checks and send it to a specific location, run <tamr-home-directory>/tamr/utils/unify-admin.sh validate -f [filename], where [filename] is the path to the output file.

Dataset Cleanup

You can use the CleanupIncompletelyDeletedProjects maintenance utility to convert derived datasets for deleted projects into source datasets. Tamr administrators can then either delete these datasets or materialize them for use in other projects in the Tamr UI.

Note: This script can only be used to clean up datasets backed by HBaseStorageDriver.

To run the dataset cleanup utility:

  1. Run the administrative utility with the maintenance option and specify the CleanupIncompletelyDeletedProjects script as follows:
./unify-admin.sh maintenance --script CleanupIncompletelyDeletedProjects

CleanupIncompletelyDeletedProjects is interactive. A log showing a typical series of interactions follows.

[email protected]:~/tamr/utils$ ./unify-admin.sh maintenance --script CleanupIncompletelyDeletedProjects

Executing script: CleanupIncompletelyDeletedProjects
Attempting to run CleanupIncompletelyDeletedProjects.Analyze

Script: CleanupIncompletelyDeletedProjects
Script Added at: 2020-09-24T12:00:00Z
Description: Garbage Collect all orphaned recipes associated with deleted projects and detach derived datasets associated with deleted projects into source datasets

Gathering orphaned recipes and datasets
Project with ID 3 is incompletely deleted with the following orphaned dataset(s):
Dataset "categ_test_unified_dataset" with ID: 60
Project with ID 7 is incompletely deleted with the following orphaned dataset(s):
Dataset "sm_test_2_unified_dataset" with ID: 86
Would you like to specify inputs and execute the script? [Y/n]
y
Deleted Project with ID: 3 still has derived datasets associated with it, would you like to turn the derived datasets into source datasets? [Y/n]
y
Deleted Project with ID: 7 still has derived datasets associated with it, would you like to turn the derived datasets into source datasets? [Y/n]
y
Attempting to run CleanupIncompletelyDeletedProjects.Execute

Script: CleanupIncompletelyDeletedProjects
Script Added at: 2020-09-24T12:00:00Z
Description: Garbage Collect all orphaned recipes associated with deleted projects and detach derived datasets associated with deleted projects into source datasets

Gathering orphaned recipes and datasets
Project with ID 3 is incompletely deleted with the following orphaned dataset(s):
Dataset "categ_test_unified_dataset" with ID: 60
Project with ID 7 is incompletely deleted with the following orphaned dataset(s):
Dataset "sm_test_2_unified_dataset" with ID: 86
Cleaned up project with ID: 3, its following derived datasets are converted to source datasets:
[60]
Cleaned up project with ID: 7, its following derived datasets are converted to source datasets:
[86]
The following orphaned Recipes associated with deleted projects have been garbage collected:
[]
The following orphaned Recipes failed to be garbage collected:
[]
Derived datasets associated with the following deleted projects have been turned into source datasets:
[3, 7]
The following projects failed to be cleaned up:
[]
Finished running: CleanupIncompletelyDeletedProjects. See recipe logs for verbose logging.

For information about other maintenance utilities that are available for unify-admin.sh, see Managing Primary Keys.

Transformation Tools

You can use the ./tamr/libs/transform-tools.jar file to make system-wide updates to transformations.

For example, instead of rounding numbers up, your leadership decides to make a system-wide change to round them down instead. Instead of changing every transformation that uses the round() function manually, you can use the function-replacer option in the transform-tools.jar to change this function in every transformation.

Another example is when Tamr releases an improved version of a function that changes its syntax or result. Tamr also makes the previous version available under a new name. Using function-replacer, you can change all instances of top() to legacy.top().

To run the function replacer:

Run java -jar ./tamr/libs/transform-tools.jar function-replacer and specify the new and old function names. The following example replaces the top() function with legacy.top().

java -jar ./tamr/libs/transform-tools.jar function-replacer --new=legacy.top --old=top --tamr-url=http://localhost:9100 -p

To get help for the function replacer, run java -jar ./tamr/libs/transform-tools.jar function-replacer without any options.

java -jar ./tamr/libs/transform-tools.jar function-replacer

Missing required options [--password, --old=<oldFunc>, --new=<newFunc>, --tamr-url=<host>]
Usage: tools function-replacer [-hV] -p[=<password>] -n=<newFunc> -o=<oldFunc>
                               --tamr-url=<host> [-u=<username>]
Replaces a transformation function from one function to another in all projects
with unified datasets.
  -h, --help              Show this help message and exit.
  -n, --new=<newFunc>     New function name to replace old function name'
  -o, --old=<oldFunc>     Old function name to be replaced
  -p, --password[=<password>]

      --tamr-url=<host>   Full url in unify - Ex:'http://localhost:9100'
  -u, --user=<username>   User name in unify
  -V, --version           Print version information and exit.

Updated 5 months ago



Utilities for Validation and System-Wide Processes


Use utilities to run health checks and other system-wide processes on demand.

Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.