Debugging Disk Utilization

It is Tamr Core’s behavior that if the disk utilization exceeds 80%, things don't work so well (if at all). So, if you run into a scenario in which Tamr instance is going over 60 or 70% disk utilization, you may soon run into trouble. The question then becomes, "How do you fix it?"

The basics

We already have some public docs around checking the disk utilization and some steps you can take to reduce it. But suppose you have done all that and your disk utilization is still over 60%. What should you do next? Well, it’s time to dive in and see which folders are hogging the available memory!

The process

One command above all is your friend in these scenarios, and it is:

$ du -d 1 -h

The command is for "disk utilization." The options specified are -d 1 for looking at a depth of 1 and -h for human readable sizes. Use this command (along with grep for G and T to reduce the number of entries returned) to look through the folders and files systematically to identify where most of your disk is being utilized.

Some Gold Mines

Temp files from Canceled / Incomplete jobs

Sometimes, when a job is canceled or otherwise fails to complete (like due to insufficient disk space), it does not clean up properly in the instance. In such cases, a lot of data could end up getting written to the following directory: {$TAMR_HOME}/tamr/unify-data/job/workspaces/wms/jobs/<job-id>/tmp.

These can be safely deleted.

Old Exports

It is also possible that some of the old generated exports could be cleared out to free up space. Depending on the number of records, the size of these exports can run into several tens, if not hundreds, of GB. The exports themselves will be present under {$TAMR_HOME}/tamr/unify-data/procurify/exports. It is critical to ensure that some of these exports can be cleared, as some scripts may rely on old exports. An added complication is that these exports cannot be browsed by name. Use a combination of size and the last modified date to tell which exports to clear versus those to keep. In the worst case, if a necessary export was accidentally deleted, it can always be recreated via the UI or APIs.

Old Backups

While this is covered in the steps to follow to free up disk space, sometimes old backups live in multiple places on an instance. It may not be immediately obvious that this is the case, however, without doing a detailed investigation into the disk utilization. Removing them from all such locations can free up a significant amount of disk space.

Smaller Nuggets

When trying to free up disk space on a disk that runs into the TB range, clearing items that are only a few GB or smaller can feel futile. But depending on the number of such items, their cumulative size can be significant. Some of these examples are:

  1. Binaries of previously installed versions of Tamr or of those used to upgrade Tamr - it may make sense to retain the binary file of the current version but other versions are unlikely to be useful.
  2. The temp directory used for installation. Technically, the installation script has an option to clear out the temp directory when the installation is complete. But sometimes this option is skipped or missed. This directory can be cleared out safely.

Hitting a wall

HBase-Data

It is possible that this exercise will simply say that most of the disk utilization is in {$TAMR_HOME}/tamr/hbase-data. Unfortunately, this is where the data that is on the Tamr instance lives. Setting the config variable TAMR_DATASET_NUMBER_OF_VERSIONS_TO_KEEP down to 2 from the default of 5 may help. Otherwise, if this is the repository causing a strain on the disk utilization, the only option may be to recommend increasing the disk size. [email protected] can help file a request for Tamr-hosted customers and take the recommended actions directly if On-Prem.