Tamr Documentation

Single-Node Deployments

In single-node deployments, the Tamr platform is deployed on a single server.

Tamr requires deployment to a dedicated server and does not support deployments where the server on which Tamr is installed also runs other applications (as of v.2019.013). Only Tamr and its dependencies can run on this server. This applies to on premises or cloud deployments, where Tamr is deployed on a physical server or a virtual machine (VM) environment.

Specifications for On Premises Deployments

Tamr can run entirely on a single server.

Large

ResourceSpecification
CPU32 cores
Memory256 GB
Disk5TB SSD

Medium

ResourceSpecification
CPU16 cores
Memory128 GB
Disk2 TB SSD

Small (Minimum)

ResourceSpecification
CPU8 cores
Memory64 GB
Disk1 TB SSD

XFS is the recommended filesystem for all Tamr deployments.

Specifications for Cloud Deployments

AWS Sizing and Limits

Tamr recommends the following configurations for small, medium, and large single-node deployments on AWS.

Recommended AWS configurations

For more information, see Amazon EC2 On-Demand Pricing.

GCP Sizing and Limits

Check your Google Compute Engine (GCE) resource quota limits. For more information, see the Google Resource quotas.

Tamr has the following minimum requirements for a single-node deployment:

  • 3 CPU cores and 64GB RAM.
  • For up to 20 million records, Tamr recommends an n1-highmem-8 instance deployment.
  • For larger numbers of records, Tamr recommends n1-highmem-16 or n1-highmem-32 instance deployments.

Tamr recommends the following configurations for small, medium, and large single-node deployments.

Sizing and capacity recommendations for GCP

For more information, see GCP Machine Types and Persistent disks in the GCP documentation.

Azure Sizing and Limits

Tamr recommends the following configurations for small, medium, and large single-node deployments.

Sizing and capacity recommendations for Azure

For more information, see the Microsoft documentation about the Ev3 series and disk types.

PostgreSQL Deployment Requirements

For single-node Azure and GCP deployments, we only support installing Postgres on the same server as Tamr.

For single-node AWS deployments, we recommend installing Postgres on the same server as Tamr. If required, you can install Postgres on a separate AWS RDS Postgres instance using the Tamr AWS RDS Terraform module. If deploying via RDS, you must follow our terraform module instructions in Deploying Tamr on AWS and ensure that there is a route between the Tamr VM and the RDS network.

Included Services and Ports

A single-node deployment provides access to these microservices at the following default ports.

Tamr Microservices


The following services are available at the listed default ports. In addition, they are available via a proxy at the default Tamr port (TCP 9100).

Optional administrative ports can be found at each of the above ports +1. For example, an administrative port for 9020 is found at 9021. They include endpoints for operational information and ensure that a heavy user request load cannot prevent administrative requests from getting through.

Service

Default Port

Description

Auth

9020

User authentication

Dataset

9150

Dataset management

Data Movement Service

9155

Dataset movement between Tamr and cloud storage destinations

Dedup

9140

Deduplication service for mastering projects

Match

9170

Low Latency Match service

Persistence

9080

Database persistence

Preview

9040

Spark runner for preview of mappings and data transformations

Public API

9180

Public APIs for working with Tamr

Recipe

9190

Orchestration service for tracking tasks and their dependencies

Taxonomy

9400

Taxonomy service for classification projects

Transform

9160

Transformation service for schema mapping and running transformations

Unify

9100

Tamr front-end application

External Services

Service

Ports

Description

Elasticsearch

9200, 9300

Elasticsearch

Elasticsearch Front End

9130

Elasticsearch for Tamr front-end application

elasticsearch_exporter

9135

Instrument Elasticsearch for Prometheus metrics gathering

elasticseach_logging

9250, 9350

Elasticsearch for logging

Grafana

31101

Monitoring dashboard

graphite_exporter

31108, 31109

Spark metrics for Prometheus

Kibana

5601

Logging dashboard

node_exporter

9110

System metrics for Prometheus

postgres_exporter

31187

PostgreSQL metrics for Prometheus

PostgreSQL

5432

Internal database for Tamr application metadata

Prometheus

31390

Monitoring and alerting framework

ZooKeeper

21281

Tamr HBase ZooKeeper client

HBase

16010

HBase Master

HBase

9113

HBase Master Exporter

HBase

60010

HBase Master JMX

HBase

9114

HBase Region Server Exporter

HBase

60030

HBase Region Server

HBase
configured by TAMR_HBASE_ZK_CLIENT_PORT

2181

HBase ZooKeeper (as distinct from Tamr HBase ZooKeeper)

YARN

8088 (HTTP), 8090 (HTTPS)

YARN Resource Manager dashboard

YARN

8031, 8032, 8033, 8042

YARN Resource Manager Resource, and its admin and tracker ports

YARN

8030

YARN Resource Manager scheduler

Updated 12 days ago



Single-Node Deployments


In single-node deployments, the Tamr platform is deployed on a single server.

Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.