User GuidesAPI ReferenceRelease Notes
Doc HomeHelp CenterLog In

Configuring the Spark Environment

Configure the Apache Spark analytics engine for Tamr Core.

You can set the following Spark environment variables:

Configuration VariableDescription
TAMR_SPARK_MEMORYThe total memory to use for the Spark cluster. For information on calculating this value, see YARN Cluster Manager Jobs.
TAMR_SPARK_CORESThe total number of cores to use for the Spark cluster.
TAMR_JOB_SPARK_CLUSTERThe full URL of the Spark cluster being used. The default value is yarn.
TAMR_JOB_SPARK_CONFIG_OVERRIDESA list of named sets of Spark configuration overrides.

See the Tamr Core Help Center for more details on the overrides.
TAMR_JOB_SPARK_DRIVER_MEMThe amount of memory a Tamr Core job uses for the driver process, such as 1G or 2G.
TAMR_JOB_SPARK_EVENT_LOGS_DIRThe directory for storing logs for Spark jobs.
TAMR_JOB_SPARK_EXECUTOR_MEMThe amount of memory a Tamr Core job uses per executor process, such as 2G or 8G.
TAMR_JOB_SPARK_EXECUTOR_CORESThe total number of cores to use per executor process, such as 2.
TAMR_SPARK_WORKDIRThe directory to use for the Spark working directory.
TAMR_SPARK_LOGSThe directory to use for Spark log files.
TAMR_JOB_SPARK_SUBMIT_TIMEOUT_SECONDSThe timeout period (in seconds) for Spark submitters. The default is 300s.

See Configuration Variable Reference for a complete list of Tamr Core configuration variables.