User GuidesAPI ReferenceRelease Notes
Doc HomeHelp CenterLog In
User Guides

Single-Node Deployments

In single-node deployments, the Tamr Core platform is deployed on a single server.

Tamr Core must be deployed on a dedicated server. Only Tamr Core and its dependencies can run on this server. This requirement applies to on-premises and cloud-native deployments, where Tamr Core is deployed on a physical server or a virtual machine (VM) environment.

Tamr does not support deployments in which Tamr Core is installed on a server that also runs other applications.

important Important: Tamr Core cannot operate on disks that are more than 80% full. Starting with release v2022.001.0, Tamr Core validation scripts verify that at least 20% disk space is available.

Sizing Guidelines

In general, Tamr Core performance scales linearly with computational resources. The specifications that follow use the following "t-shirt" sizes:

  • Large: 1M to 10M records
  • Medium: 100k to 1M records
  • Small: up to 100k records

For larger volumes, Tamr offers cloud-native deployments that can be scaled to meet your needs.

Specifications for On-Premises Deployments

Tamr Core can run entirely on a single server.

Deployment Resource Specification
Large CPU 32 cores
Memory 256 GB
Disk 5TB SSD
Medium CPU 16 cores
Memory 128 GB
Disk 2TB SSD
Small (minimum) CPU 8 cores
Memory 64 GB
Disk 1TB SSD

XFS is the recommended file system for all Tamr Core deployments.

Specifications for Cloud Deployments

AWS Sizing and Limits

Tamr recommends the following configurations for single-node deployments on AWS.

Deployment Recommended Sizing AWS EC2 Capacity/Storage*
Large CPU 32 cores r6g.8xlarge
Memory 256 GB
Disk 5TB SSD 5TB EBS SSD
Medium CPU 16 cores r6g.4xlarge
Memory 128 GB
Disk 2TB SSD 2TB EBS SSD
Small CPU 8 cores r6g.2xlarge
Memory 64 GB
Disk 1TB SSD 1TB EBS SSD

*Recommended AWS capacity or better. Do not use an AWS Graviton processor for the VM that runs Tamr Core.

For more information, see Amazon EC2 On-Demand Pricing.

GCP Sizing and Limits

Check your Google Compute Engine (GCE) resource quota limits. For more information, see the Google Resource quotas.

Tamr recommends the following configurations for single-node deployments.

Deployment Recommended Sizing GCP Type/Storage Option*
Large CPU 32 cores N2-highmem-32
Memory 256 GB
Disk 5 TB SSD 5 TB pd-balanced
Medium CPU 16 cores N2-highmem-16
Memory 128 GB
Disk 2 TB SSD 2 TB pd-balanced
Small CPU 8 cores N2-highmem-8
Memory 64 GB
Disk 1 TB SSD 1 TB pd-balanced

*Recommended GCP capacity or better

For more information, see GCP Machine Types and Persistent disks in the GCP documentation.

Azure Sizing and Limits

Tamr recommends the following configurations for single-node deployments.

Deployment Recommended Sizing Ev3 Series/Storage*
Large CPU 32 cores Standard_E32_v3
Memory 256 GB
Disk 5TB SSD Premium/Standard SSD 5TB
Medium CPU 16 cores Standard_E16_v3
Memory 128 GB
Disk 2TB SSD Premium/Standard SSD 2TB
Small CPU 8 cores Standard_E8_v3
Memory 64 GB
Disk 1TB SSD Premium/Standard SSD 1TB

*Recommended Azure capacity or better

For more information, see the Microsoft documentation about the Ev3 series
and disk types.

PostgreSQL Deployment Requirements

For single-node Azure and GCP deployments, Tamr only supports installing PostgreSQL on the same server as Tamr Core.

For single-node AWS deployments, Tamr recommends installing PostgreSQL on the same server as Tamr Core. If required, you can install PostgreSQL on a separate AWS RDS PostgreSQL instance using the Tamr AWS RDS Terraform module. If deploying via RDS, you must follow the Terraform module instructions in Deploying Tamr on AWS and ensure that there is a route between the Tamr Core VM and the RDS network.

NGINX Deployment Requirements

NGINX is a reverse proxy server configured to allow clients to access Tamr Core securely over HTTPS, and is a critical component in the Tamr Core network security layer. For more information, see Requirements for NGINX version support, Installing NGINX, and Configuring HTTPS.

For non-production environments, configuring a firewall (below), NGINX, and HTTPS is strongly recommended but not required.

important Important: If you do not configure a firewall, NGINX, and HTTPS in a non-production deployment, all users on the network will have access to the data. Use a unique password for this deployment.

Firewall Requirements

For cloud deployments on AWS, Azure, or Google Cloud Platform, use the firewall provided by the cloud provider. This allows you to have control over and visibility into the firewall from the cloud console. See the following:

For on-premises VM deployments, use the firewall provided by the operating system.

Firewall configuration requirements:

  • Implement least-privilege principles. Block all traffic to Tamr host by default and only allow the specific traffic you need. This includes limiting the rule to just the protocols and ports you need.
  • If you are restricting traffic to Tamr based on IP addresses, try to minimize the number of rules. It's easier to track one rule that allows traffic from a range of 16IPs than it is to track 16 separate rules. Specify using IPv4 addresses in CIDR notation, such as 1.2.3.4/32.
  • Allow only internal access to Tamr Core default port 9100 (via TCP).
  • Open port 443 for HTTPS.
    Note: If you plan to forward HTTP traffic to HTTPS, also open port 80.

Included Services and Ports

Securing Tamr Core Ports

When configuring the firewall, only open port 443 and any ports required by your system administrators to outside traffic. For example, port 22 is often used for SSH access.

The host firewall configuration can be configured through your cloud provider console, or through the host’s operating system.

For reference:

Note: If you plan to forward HTTP traffic to HTTPS, also open port 80.

Tamr Core Microservices

A single-node deployment provides access to these microservices at the following default ports. In addition, they are available via a proxy at the default Tamr Core port (TCP 9100).

Optional administrative ports can be found at each of the above ports +1. For example, an administrative port for 9020 is found at 9021. They include endpoints for operational information and ensure that a heavy user request load cannot prevent administrative requests from getting through.

Service Default Port Description
Auth 9020 User authentication
Dataset 9150 Dataset management
Core Connect 9050 Dataset movement between Tamr Core and cloud storage destinations
Dedup 9140 Deduplication service for mastering projects
Match 9170 Low Latency Match services
Persistence 9080 Database persistence
Preview 9040 Spark runner for preview of mappings and data transformations
Public API 9180 Public APIs for working with Tamr Core
Recipe 9190 Orchestration service for tracking tasks and their dependencies
Taxonomy 9400 Taxonomy service for classification projects
Transform 9160 Transformation service for schema mapping and running transformations
Unify 9100 Tamr front-end application

External Services

Service Ports Description
Elasticsearch 9200, 9300 Elasticsearch
Elasticsearch Front End 9130 Elasticsearch for Tamr Core front-end application
elasticsearch_exporter 9135 Instrument Elasticsearch for Prometheus metrics gathering
elasticseach_logging 9250, 9350 Elasticsearch for logging
Grafana 31101 Monitoring dashboard
graphite_exporter 31108, 31109 Spark metrics for Prometheus
Kibana 5601 Logging dashboard
node_exporter 9110 System metrics for Prometheus
postgres_exporter 31187 PostgreSQL metrics for Prometheus
PostgreSQL 5432 Internal database for Tamr Core application metadata
Prometheus 31390 Monitoring and alerting framework
ZooKeeper 21281 Tamr Core HBase ZooKeepe client
HBase 16010 HBase Master
HBase 9113 HBase Master Exporter
HBase 60010 HBase Master JMX
HBase 9114 HBase Region Server Exporter
HBase 60030 HBase Region Server
HBase
configured by TAMR_HBASE_ZK_CLIENT_PORT
2181 HBase ZooKeeper (as distinct from Tamr HBase ZooKeeper)
YARN 8088 (HTTP)
8090 (HTTPS)
YARN Resource Manager dashboard
YARN 8031, 8032, 8033, 8042 YARN Resource Manager Resource, and its admin and tracker ports
YARN 8030 YARN Resource Manager scheduler