Important: The configuration described in this topic is supported only for scalable cloud-native deployments; it is not supported for single-node deployments.
Tamr Core uses the cluster manager from YARN for running Spark jobs, instead of the standalone cluster manager from Spark. The YARN cluster manager starts up a ResourceManager and NodeManager servers.
To see the list of all Spark jobs that have been submitted to the cluster manager, access the YARN Resource Manager at its Web UI port.
Server logs are stored in the
TAMR_LOG_DIR, which defaults to
The logs for Spark jobs are stored in the
By default, Tamr Core uses the following ports for YARN:
Note: The next section lists additional ports.
You can specify different YARN ports. For information, see Configuring Tamr Core.
The YARN cluster manager uses the following configuration properties in Tamr Core. You can optionally specify your own values for these properties on a cloud-native deployment. For single-node deployments, Tamr recommends that you use the defaults listed below.
This list also includes some of the YARN ports. The previous section lists the defaults for these ports.
|TAMR_YARN_RESOURCE_MANAGER_HOST||The hostname of the Spark YARN ResourceManager. This variable configures the hostname of the ResourceManager. The default is the same hostname as the HOST_IP that you can determine by running |
|TAMR_YARN_NODE_MANAGER_HOST||The hostname of the Spark YARN NodeManager. The default is the same hostname as the HOST_IP that you can determine by running |
|TAMR_YARN_NODE_MANAGER_PORT||The port of the Spark YARN NodeManager. The default port is 8042.|
|TAMR_YARN_TEMP_DIR||The directory for storing temporary files produced by YARN.|
Specify this location if you need to control access to it. The default value is
|TAMR_JOB_SPARK_YARN_QUEUE||The name of the YARN queue for submitting Spark jobs. No default is provided. By default Spark jobs are submitted to an empty queue.|
|TAMRYARN_SCHEDULER_CAPACITY MAXIMUM_AM_RESOURCE_PERCENT||The maximum percentage of resources that can be used to run application masters (AM) in the YARN cluster. It allows you to control the number of applications running concurrently. The default is 1.0. When set to 1.0, this means that all AMs can take as much as possible total memory (100%). Use the default for single-node Tamr deployments running on the YARN cluster. Possible values are between 0.0 and 1.0, inclusive.|
|TAMR_JOB_SPARK_LOCAL_YARN_JARS||A list of paths to JAR files that the YARN cluster manager uses with a local filesystem. You can list multiple paths with a semicolon separator, and use |
Note: Do not change the
See Configuration Variable Reference for a complete list of Tamr Core configuration variables.
You can adjust the Spark memory resources in
TAMR_SPARK_MEMORY based on the following formula. By default, this property accounts for the necessary overhead for running Spark jobs in the YARN cluster manager.
TAMR_SPARK_MEMORY >= 1.1 * x * TAMR_JOB_SPARK_EXECUTOR_INSTANCES + 1.1 * y
- x represents
TAMR_JOB_SPARK_EXECUTOR_MEMper instance, in GB.
- y represents
TAMR_JOB_SPARK_DRIVER_MEMper instance, in GB.
Tamr Core rounds every computation up to a whole number of GBs. This formula also applies to the Spark resource properties that you can specify in the
TAMR_JOB_SPARK_CONFIG_OVERRIDES configuration parameter for Tamr Core.
Updated 5 months ago