Configuring the Spark Environment
Configure the Apache Spark analytics engine for Tamr Core.
You can set the following Spark environment variables:
Configuration Variable | Description |
---|---|
TAMR_SPARK_MEMORY | The total memory to use for the Spark cluster. For information on calculating this value, see YARN Cluster Manager Jobs. |
TAMR_SPARK_CORES | The total number of cores to use for the Spark cluster. |
TAMR_JOB_SPARK_CLUSTER | The full URL of the Spark cluster being used. The default value is yarn . |
TAMR_JOB_SPARK_CONFIG_OVERRIDES | A list of named sets of Spark configuration overrides. See the Tamr Core Help Center for more details on the overrides. |
TAMR_JOB_SPARK_DRIVER_MEM | The amount of memory a Tamr Core job uses for the driver process, such as 1G or 2G . |
TAMR_JOB_SPARK_EVENT_LOGS_DIR | The directory for storing logs for Spark jobs. |
TAMR_JOB_SPARK_EXECUTOR_MEM | The amount of memory a Tamr Core job uses per executor process, such as 2G or 8G . |
TAMR_JOB_SPARK_EXECUTOR_CORES | The total number of cores to use per executor process, such as 2 . |
TAMR_SPARK_WORKDIR | The directory to use for the Spark working directory. |
TAMR_SPARK_LOGS | The directory to use for Spark log files. |
TAMR_JOB_SPARK_SUBMIT_TIMEOUT_SECONDS | The timeout period (in seconds) for Spark submitters. The default is 300s . |
See Configuration Variable Reference for a complete list of Tamr Core configuration variables.
Updated over 2 years ago