User GuidesAPI ReferenceRelease Notes
Doc HomeHelp CenterLog In
User Guides

Utilities for Validation and System Processes

Use utilities to run health checks and other system-wide processes on demand.

Validation Health Checks

You can use <tamr-home-directory>/tamr/utils/unify-admin.sh validate to check the current state of the Tamr Core software and configuration, and diagnose possible problems.

Validation Command Flags

The validate option has the following flags.

validate [-f <file>] [-h] [-l] [-v]

  • -f,--file <file> The file to which you would like to write validation results.
  • -h,--help Provides help for this command (this information).
  • -l,--list Lists the validation groups, such as pre-dependency, or pre-start, and the validation checks that run in those groups.
  • -v,--verbose Enables verbose logging to the console.

List of Validation Checks

The validation scripts include, but are not limited to, the following checks:

  • Tamr Core license. Verifies license validity and fails if the provided license is expired. The script returns the license information, such as the license key and type (evaluation or not), creation and expiration dates, customer name, and description.
  • Operating System. Verifies that the user is not the root user and prints diagnostic information about the operating system running Tamr Core.
  • Memory usage. Verifies that the memory settings the Tamr Core requires are sufficient based on the operating system it is running on.
  • Permissions, symbolic links, and disk space. The process walks down the path from each Tamr Core-configured subdirectory, checks for permissions, the presence of symbolic links, and available storage space. It reports if it cannot access a directory, and warns you if storage space is not sufficient in any of the directories (typically, the space should not be less than 1GB).
  • ulimits. Checks the open file ulimit against the required minimum of 1000000.
  • vm.map_map_count. Checks that vm.max_map_count (which specifies the maximum number of memory map areas) is at least the required value of 262144.
  • HBase external dependency. Verifies that HBase servers are running on expected ports, the Zookeeper cluster required by HBase is running as expected, and that Tamr Core can establish a connection to this cluster and retrieve its status. The script logs diagnostic information that HBase provides.
  • PostgreSQL external dependency. Verifies that Tamr Core can connect to the PostgreSQL database, ensures that its version is compatible, and runs simple queries. The script also logs diagnostic information about the presence of persistence and dataset schemas required by Tamr Core, such as their Postgres-specific migration status.
  • Elasticsearch external dependency. Verifies the connectivity to the Elasticsearch cluster used to power the user interface in Tamr Core, and ensures that this cluster is running. The script also verifies version compatibility; and that the Elasticsearch data directory exists, has sufficient free space, and is readable and writeable.
  • Disk space available. Verifies at least 20% of free disk space available in directories in which you install Tamr Core.
  • Valid Tamr backup URI. Verifies that the TAMR_BACKUP_URI points to a valid path. An invalid path does not prevent Tamr from starting. However, if the path does not exist or cannot be created, you receive an error when attempting to start a backup.

Note: If you set the --skipEnvironmentValidation flag to true, the upgrade process skips the system validation checks at the start of an upgrade. Use this flag with caution as it allows the upgrade process to proceed with a potentially invalid configuration, which could cause it to fail. For more information, see Upgrading Tamr Core.

The following example lists validation groups and the checks that run in those groups:

./unify-admin.sh validate -v post-start

DEBUG: LogEvents - Checking whether ZK config cluster running at localhost:21281 is healthy
DEBUG: LogEvents - Trying to run ZooKeeper command 'ruok' against zk://localhost:21281
DEBUG: LogEvents - Finished trying to run 'ruok'.
DEBUG: ValidateCommand - Loading all system configuration
DEBUG: ValidateCommand - Finished loading all system configuration
INFO : ValidateCommand - Running validation groups: [post-start]
DEBUG: ValidateCommand - Running post-start validations:
DEBUG: ValidateCommand - Running HealthChecksValidator.
DEBUG: ValidateCommand - Description: Runs the healthchecks for each of Tamr Core's microservices
ERROR StatusLogger No Log4j 2 configuration file found. Using default configuration (logging only errors to the console), or user programmatically provided configurations. Set system property 'log4j2.debug' to show Log4j 2 internal initialization logging. See https://logging.apache.org/log4j/2.x/manual/configuration.html for instructions on how to configure Log4j 2
INFO : HealthChecksValidator -
DEBUG: HealthChecksValidator - Checking health of service: procurify
DEBUG: HealthChecksValidator - procurify service is healthy!
DEBUG: HealthChecksValidator - procurify service healthcheck finished.
INFO : HealthChecksValidator -
INFO : HealthChecksValidator -
DEBUG: HealthChecksValidator - Checking health of service: transform
DEBUG: HealthChecksValidator - transform service is healthy!
DEBUG: HealthChecksValidator - transform service healthcheck finished.
INFO : HealthChecksValidator -
INFO : HealthChecksValidator -
DEBUG: HealthChecksValidator - Checking health of service: dataset
DEBUG: HealthChecksValidator - dataset service is healthy!
DEBUG: HealthChecksValidator - dataset service healthcheck finished.
INFO : HealthChecksValidator -
INFO : HealthChecksValidator -
DEBUG: HealthChecksValidator - Checking health of service: auth
DEBUG: HealthChecksValidator - auth service is healthy!
DEBUG: HealthChecksValidator - auth service healthcheck finished.
INFO : HealthChecksValidator -
INFO : HealthChecksValidator -
DEBUG: HealthChecksValidator - Checking health of service: dedup
DEBUG: HealthChecksValidator - dedup service is healthy!
DEBUG: HealthChecksValidator - dedup service healthcheck finished.
INFO : HealthChecksValidator -
INFO : HealthChecksValidator -
DEBUG: HealthChecksValidator - Checking health of service: preview
DEBUG: HealthChecksValidator - preview service is healthy!
DEBUG: HealthChecksValidator - preview service healthcheck finished.
INFO : HealthChecksValidator -
INFO : HealthChecksValidator -
DEBUG: HealthChecksValidator - Checking health of service: recipe
DEBUG: HealthChecksValidator - recipe service is healthy!
DEBUG: HealthChecksValidator - recipe service healthcheck finished.
INFO : HealthChecksValidator -
INFO : HealthChecksValidator -
DEBUG: HealthChecksValidator - Checking health of service: pubapi
DEBUG: HealthChecksValidator - pubapi service is healthy!
DEBUG: HealthChecksValidator - pubapi service healthcheck finished.
INFO : HealthChecksValidator -
INFO : HealthChecksValidator -
DEBUG: HealthChecksValidator - Checking health of service: taxonomy
DEBUG: HealthChecksValidator - taxonomy service is healthy!
DEBUG: HealthChecksValidator - taxonomy service healthcheck finished.
INFO : HealthChecksValidator -
INFO : HealthChecksValidator -
DEBUG: HealthChecksValidator - Checking health of service: persist
DEBUG: HealthChecksValidator - persist service is healthy!
DEBUG: HealthChecksValidator - persist service healthcheck finished.
INFO : HealthChecksValidator -
INFO : HealthChecksValidator -
DEBUG: HealthChecksValidator - Checking health of service: match
DEBUG: HealthChecksValidator - match service is healthy!
DEBUG: HealthChecksValidator - match service healthcheck finished.
INFO : HealthChecksValidator -
DEBUG: ValidateCommand - Finished HealthChecksValidator.
DEBUG: ValidateCommand - Finished post-start validations.
INFO : ValidateCommand -
INFO : ValidateCommand - *=======================*
INFO : ValidateCommand - | Validation succeeded! |
INFO : ValidateCommand - *=======================*
INFO : ValidateCommand -

In contrast, this example shows the output when the health check for Elasticsearch fails:

...
DEBUG: HealthChecksValidator - Checking health of service: dedup
WARN : HealthChecksValidator - Unhealthy service: dedup
INFO : HealthChecksValidator - 
WARN : HealthChecksValidator - Healthcheck failed for service ElasticSearchCluster: = Result{isHealthy=false, message=java.net.ConnectException: Connection refused, error=java.lang.Throwable: java.net.ConnectException: Connection refused, timestamp=2020-03-11T04:54:21.731Z}
DEBUG: HealthChecksValidator - dedup service healthcheck finished.
...
INFO : ValidateCommand - *===============================================*
INFO : ValidateCommand - | Encountered 5 warnings and 0 critical errors. |
INFO : ValidateCommand - *===============================================*
INFO : ValidateCommand - 
WARN : ValidateCommand - Validation 'post-start':
WARN : ValidateCommand - [WARNING] Unhealthy service: procurify
WARN : ValidateCommand - [WARNING] Healthcheck failed for service UnifyElasticSearchDataCluster: = Result{isHealthy=false, message=java.net.ConnectException: Connection refused, error=java.lang.Throwable: java.net.ConnectException: Connection refused, timestamp=2020-03-11T04:54:21.577Z}
WARN : ValidateCommand - [WARNING] Unhealthy service: dedup
WARN : ValidateCommand - [WARNING] Healthcheck failed for service ElasticSearchCluster: = Result{isHealthy=false, message=java.net.ConnectException: Connection refused, error=java.lang.Throwable: java.net.ConnectException: Connection refused, timestamp=2020-03-11T04:54:21.731Z}

Running Validation Checks

Validation checks run by default before an upgrade, and certain validation checks run every time you start Tamr Core, but you can also run them at any time.

To run health check validation scripts:

  1. Run the administrative utility with the new validate option, such as:
<tamr-home-directory>/tamr/utils/unify-admin.sh validate 
  1. To obtain detailed output, run <tamr-home-directory>/tamr/utils/unify-admin.sh validate -v.
  2. To create a report of validation checks and send it to a specific location, run <tamr-home-directory>/tamr/utils/unify-admin.sh validate -f [filename], where [filename] is the path to the output file.

Dataset Cleanup

You can use the CleanupIncompletelyDeletedProjects maintenance utility to convert derived datasets for deleted projects into source datasets. Team members with the admin role can then either delete these datasets or profile them for use in other projects.

To run the dataset cleanup utility:

Run the administrative utility with the maintenance option and specify the CleanupIncompletelyDeletedProjects script as follows:

./unify-admin.sh maintenance --script CleanupIncompletelyDeletedProjects

CleanupIncompletelyDeletedProjects is interactive. A log showing a typical series of interactions follows.

ubuntu@mitul-maintenance:~/tamr/utils$ ./unify-admin.sh maintenance --script CleanupIncompletelyDeletedProjects

Executing script: CleanupIncompletelyDeletedProjects
Attempting to run CleanupIncompletelyDeletedProjects.Analyze

Script: CleanupIncompletelyDeletedProjects
Script Added at: 2020-09-24T12:00:00Z
Description: Garbage Collect all orphaned recipes associated with deleted projects and detach derived datasets associated with deleted projects into source datasets

Gathering orphaned recipes and datasets
Project with ID 3 is incompletely deleted with the following orphaned dataset(s):
Dataset "categ_test_unified_dataset" with ID: 60
Project with ID 7 is incompletely deleted with the following orphaned dataset(s):
Dataset "sm_test_2_unified_dataset" with ID: 86
Would you like to specify inputs and execute the script? [Y/n]
y
Deleted Project with ID: 3 still has derived datasets associated with it, would you like to turn the derived datasets into source datasets? [Y/n]
y
Deleted Project with ID: 7 still has derived datasets associated with it, would you like to turn the derived datasets into source datasets? [Y/n]
y
Attempting to run CleanupIncompletelyDeletedProjects.Execute

Script: CleanupIncompletelyDeletedProjects
Script Added at: 2020-09-24T12:00:00Z
Description: Garbage Collect all orphaned recipes associated with deleted projects and detach derived datasets associated with deleted projects into source datasets

Gathering orphaned recipes and datasets
Project with ID 3 is incompletely deleted with the following orphaned dataset(s):
Dataset "categ_test_unified_dataset" with ID: 60
Project with ID 7 is incompletely deleted with the following orphaned dataset(s):
Dataset "sm_test_2_unified_dataset" with ID: 86
Cleaned up project with ID: 3, its following derived datasets are converted to source datasets:
[60]
Cleaned up project with ID: 7, its following derived datasets are converted to source datasets:
[86]
The following orphaned Recipes associated with deleted projects have been garbage collected:
[]
The following orphaned Recipes failed to be garbage collected:
[]
Derived datasets associated with the following deleted projects have been turned into source datasets:
[3, 7]
The following projects failed to be cleaned up:
[]
Finished running: CleanupIncompletelyDeletedProjects. See recipe logs for verbose logging.

Transformation Tools

You can use the ./tamr/libs/transform-tools.jar file to make system-wide updates to transformations.

For example, instead of rounding numbers up, your leadership decides to make a system-wide change to round them down instead. Instead of changing every transformation that uses the round() function manually, you can use the function-replacer option in the transform-tools.jar to change this function in every transformation.

Another example is when Tamr releases an improved version of a function that changes its syntax or result. Tamr also makes the previous version available under a new name. Using function-replacer, you can change all instances of one function to a different function.

To run the function replacer:

Run java -jar ./tamr/libs/transform-tools.jar function-replacer and specify the new and old function names.

java -jar ./tamr/libs/transform-tools.jar function-replacer --new=<new_function> --old=<old_function> --tamr-url=http://localhost:9100 -p

To get help for the function replacer, run java -jar ./tamr/libs/transform-tools.jar function-replacer without any options.

java -jar ./tamr/libs/transform-tools.jar function-replacer

Missing required options [--password, --old=<oldFunc>, --new=<newFunc>, --tamr-url=<host>]
Usage: tools function-replacer [-hV] -p[=<password>] -n=<newFunc> -o=<oldFunc>
                               --tamr-url=<host> [-u=<username>]
Replaces a transformation function from one function to another in all projects
with unified datasets.
  -h, --help              Show this help message and exit.
  -n, --new=<newFunc>     New function name to replace old function name'
  -o, --old=<oldFunc>     Old function name to be replaced
  -p, --password[=<password>]

      --tamr-url=<host>   Full url in unify - Ex:'http://localhost:9100'
  -u, --user=<username>   User name in unify
  -V, --version           Print version information and exit.

See Tips for Troubleshooting Transformations.

Custom Toolbar Buttons

After you define custom buttons in a YAML file for your implementation of Tamr Core, you enable them by running the ui:config –-extensionConfig command.

./unify-admin.sh ui:config --extensionConfig /path/to/example-extension.yaml

See Adding a Custom Toolbar Button.