System Health Status
You can check the health status of Tamr Core and its supporting services from the Tamr Core API or command line. See the How to Check Health Status of Tamr article for instructions and more information.
Storage Space Monitoring and Alerts
To prevent errors and issues that can arise when disk utilization approaches 100%, the Tamr system monitors free space in all storage locations to which Tamr Core and its supporting services are configured to write. If any of these storage locations drop below 20% free space, the health check reports the system status as unhealthy
and an alert displays in the user interface.
The health check message and the Tamr logs describe which storage system or systems have less than 20% free space, and which configured directories are associated with them.
If the free space drops below 10%, Tamr cancels any running jobs, including backups, project imports, and so on. This behavior is also triggered if any storage system drops below 10GB free.
You must increase free space above 10% for the system to resume running jobs, and above 20% for it to return to healthy
.
Changing the Storage Space Thresholds and Polling Interval
Tamr Core is configured by default to warn that the system is unhealthy when storage space is below 20% and to automatically cancel jobs when storage space is below 10%. Tamr Core polls for storage space every minute, in all storage locations to which Tamr Core and its supporting services are configured to write. You can change these utilization thresholds and the polling interval using the following variables:
TAMR_STORAGE_LEVEL_WARN
: Fraction of free space below which the system will become unhealthy. Default 0.2 (20%).TAMR_STORAGE_LEVEL_STOP
: Fraction of free space below which the system automatically cancels running jobs. Default 0.1 (10%).TAMR_STORAGE_LEVEL_POLL_INTERVAL
: The interval at which Tamr Core checks the storage level of configured storage locations. Default: 1m (1 minute).
Note: The TAMR_STORAGE_SPACE_CHECK_DIRS
variable determines which directories and backing services are included in this health check. Contact Support at [email protected] for assistance if you need to change this value.
To change the storage space thresholds and polling interval:
- Set the following configuration variables to the new thresholds and polling interval:
<tamr-home-directory>/tamr/utils/unify-admin.sh config:set TAMR_DISK_LEVEL_WARN="threshold" TAMR_DISK_LEVEL_STOP="threshold" TAMR_DISK_LEVEL_POLL_INTERVAL="interval"
- Restart Tamr Core and its dependencies. See Restarting Tamr Core.
Updated over 1 year ago