"Unsafe Projection" Error with Many Transformations

Problem: In versions of Tamr older than v2020.016, jobs may fail due to a Spark error, like the one shown here, with mention of “Unsafe Projection”:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 9.0 failed 1 times, most recent failure: Lost task 0.0 in stage 9.0 (TID 7, localhost, executor driver): org.codehaus.janino.JaninoRuntimeException: failed to compile: org.codehaus.janino.JaninoRuntimeException: Code of method "evalExpr$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V" of class "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection" grows beyond 64 KB

Cause: Do you have a large number of transformations running? That may be the cause of this error. Spark executors can run out of compute power when a large number of transformations are referencing the same attribute.

Resolution: To fix this issue there are two recommended options:

  1. Upgrade Tamr to v2020.016 or later; or
  2. Use a checkpoint statement periodically between transformations. This statement tells Spark to calculate everything up to the checkpoint first, and then move to the next one. It should be used sparingly if errors occur or transformations are running very slowly, as it can actually decrease performance in other cases.