User GuidesAPI ReferenceRelease Notes
Doc HomeHelp CenterLog In
User Guides

Configuring the Spark Environment

Configure the Apache Spark analytics engine for Tamr Core.

You can set the following Spark environment variables:

Configuration VariableDescription
TAMR_SPARK_MEMORYThe total memory to use for the Spark cluster. For information on calculating this value, see YARN Cluster Manager Jobs.
TAMR_SPARK_CORESThe total number of cores to use for the Spark cluster.
TAMR_JOB_SPARK_CLUSTERThe full URL of the Spark cluster being used. The default value is yarn.
TAMR_JOB_SPARK_CONFIG_OVERRIDESA list of named sets of Spark configuration overrides.

See the Tamr Core Help Center for more details on the overrides.
TAMR_JOB_SPARK_DRIVER_MEMThe amount of memory a Tamr Core job uses for the driver process, such as 1G or 2G.
TAMR_JOB_SPARK_EVENT_LOGS_DIRThe directory for storing logs for Spark jobs.
TAMR_JOB_SPARK_EXECUTOR_MEMThe amount of memory a Tamr Core job uses per executor process, such as 2G or 8G.
TAMR_JOB_SPARK_EXECUTOR_CORESThe total number of cores to use per executor process, such as 2.
TAMR_SPARK_WORKDIRThe directory to use for the Spark working directory.
TAMR_SPARK_LOGSThe directory to use for Spark log files.
TAMR_JOB_SPARK_SUBMIT_TIMEOUT_SECONDSThe timeout period (in seconds) for Spark submitters. The default is 300s.

See Configuration Variable Reference for a complete list of Tamr Core configuration variables.