Spark Jobs or Tasks in Tamr Fail with Errors Mentioning Snappy Library

Problem: Spark jobs (such as profiling) or other tasks in Unify (such as dataset upload or preview) fail with errors mentioning snappy library. Here are some examples:

Upload error:

Upload failed.com.tamr.common.except.ServiceException: Error while processing input stream for dataset record updates.
Caused by: java.net.SocketException: Connection reset.
Please check the logs for more information.

Dataset log error:

! java.lang.UnsatisfiedLinkError: /tmp/snappy-unknown-be6020a8-095c-4ccd-ac30-21653d120bb2-libsnappyjava.so: /tmp/snappy-unknown-be6020a8-095c-4ccd-ac30-21653d120bb2-libsnappyjava.so:
failed to map segment from shared object: Operation not permitted
…
...
Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.xerial.snappy.Snappy
...

Cause: Tamr and Spark make use of the snappy JNI compression library when reading and writing data and will attempt to use the default /tmp path for the JVM. Errors result when /tmp is mounted as not executable (noexec attribute).

To check you can run this command to view all mounts:

>>mount

And see that it contains a line like this with the noexec flag on /tmp

tmpfs   /tmp    tmpfs   size=2048M,mode=1777,nodev,nosuid,noexec        0 0

Resolution: If you cannot modify the mount for /tmp to be executable, then you can configure Tamr and Spark to use a non-default location for the temp directory. The following workaround is for versions starting from v2019.007 and beyond, which contains the required environment variables. For versions older than v2019.007, please contact support for help.

  1. Create an executable folder on an executable mount, for example:
${TAMR_UNIFY_HOME}/temp2/
  1. Go to ${TAMR_UNIFY_HOME}/tamr/utils/ and set the configuration variables by executing the following commands:
./unify-admin.sh config:set TAMR_JOB_SPARK_PROPS="{spark.executor.extraJavaOptions: -Dorg.xerial.snappy.tempdir=<path from step 1>}"
./unify-admin.sh config:set TAMR_UNIFY_EXTRA_ARGS="-Dorg.xerial.snappy.tempdir=<path from step 1>"
  1. Restart Tamr and Tamr dependencies:

cd <TAMR_UNIFY_HOME>/tamr

# Stop Tamr and Tamr dependencies
./stop-unify.sh
./stop-dependencies.sh

# Check if you still have any java process running
ps -ef | grep java

# If any java process is running, kill the process; PID = Process ID
kill PID

# Start Tamr and Tamr dependencies
./start-dependencies.sh
./start-unify.sh