User GuidesAPI ReferenceRelease NotesEnrichment APIs
Doc HomeHelp CenterLog In

Configuring the Spark Environment

Configure the Apache Spark analytics engine for Tamr Core.

You can set the following Spark environment variables:

Configuration Variable

Description

TAMR_SPARK_MEMORY

The total memory to use for the Spark cluster. For information on calculating this value, see YARN Cluster Manager Jobs.

TAMR_SPARK_CORES

The total number of cores to use for the Spark cluster.

TAMR_JOB_SPARK_CLUSTER

The full URL of the Spark cluster being used. The default value is yarn.

TAMR_JOB_SPARK_CONFIG_OVERRIDES

A list of named sets of Spark configuration overrides.

See the Tamr Core Help Center for more details on the overrides.

TAMR_JOB_SPARK_DRIVER_MEM

The amount of memory a Tamr Core job uses for the driver process, such as 1G or 2G.

TAMR_JOB_SPARK_EVENT_LOGS_DIR

The directory for storing logs for Spark jobs.

TAMR_JOB_SPARK_EXECUTOR_MEM

The amount of memory a Tamr Core job uses per executor process, such as 2G or 8G.

TAMR_JOB_SPARK_EXECUTOR_CORES

The total number of cores to use per executor process, such as 2.

TAMR_SPARK_WORKDIR

The directory to use for the Spark working directory.

TAMR_SPARK_LOGS

The directory to use for Spark log files.

TAMR_JOB_SPARK_SUBMIT_TIMEOUT_SECONDS

The timeout period (in seconds) for Spark submitters. The default is 300s.

See Configuration Variable Reference for a complete list of Tamr Core configuration variables.


Did this page help you?