Single-Node Deployments
In single-node deployments, the Tamr Core platform is deployed on a single server.
Tamr Core must be deployed on a dedicated server. Only Tamr Core and its dependencies can run on this server. This requirement applies to on-premises and cloud-native deployments, where Tamr Core is deployed on a physical server or a virtual machine (VM) environment.
Tamr does not support deployments in which Tamr Core is installed on a server that also runs other applications.
Important: Tamr Core cannot operate on disks that are more than 80% full. Starting with release v2022.001.0, Tamr Core validation scripts verify that at least 20% disk space is available.
Sizing Guidelines
In general, Tamr Core performance scales linearly with computational resources. The specifications that follow use the following "t-shirt" sizes:
- Large: 1M to 10M records
- Medium: 100k to 1M records
- Small: up to 100k records
For larger volumes, Tamr offers cloud-native deployments that can be scaled to meet your needs.
Specifications for On-Premises Deployments
Tamr Core can run entirely on a single server.
Deployment | Resource | Specification | |
---|---|---|---|
Large | CPU | 32 cores | |
Memory | 256 GB | ||
Disk | 5TB SSD | ||
Medium | CPU | 16 cores | |
Memory | 128 GB | ||
Disk | 2TB SSD | ||
Small (minimum) | CPU | 8 cores | |
Memory | 64 GB | ||
Disk | 1TB SSD |
XFS is the recommended file system for all Tamr Core deployments.
Specifications for Cloud Deployments
AWS Sizing and Limits
Tamr recommends the following configurations for single-node deployments on AWS.
Deployment | Recommended Sizing | AWS EC2 Capacity/Storage* | |
---|---|---|---|
Large | CPU | 32 cores | r6a.8xlarge |
Memory | 256 GB | ||
Disk | 5TB SSD | 5TB EBS SSD | |
Medium | CPU | 16 cores | r6a.4xlarge |
Memory | 128 GB | ||
Disk | 2TB SSD | 2TB EBS SSD | |
Small | CPU | 8 cores | r6a.2xlarge |
Memory | 64 GB | ||
Disk | 1TB SSD | 1TB EBS SSD |
*Recommended AWS capacity or better. Do not use an AWS Graviton processor for the VM that runs Tamr Core.
For more information, see Amazon EC2 On-Demand Pricing.
GCP Sizing and Limits
Check your Google Compute Engine (GCE) resource quota limits. For more information, see the Google Resource quotas.
Tamr recommends the following configurations for single-node deployments.
Deployment | Recommended Sizing | GCP Type/Storage Option* | |
---|---|---|---|
Large | CPU | 32 cores | N2-highmem-32 |
Memory | 256 GB | ||
Disk | 5 TB SSD | 5 TB pd-balanced | |
Medium | CPU | 16 cores | N2-highmem-16 |
Memory | 128 GB | ||
Disk | 2 TB SSD | 2 TB pd-balanced | |
Small | CPU | 8 cores | N2-highmem-8 |
Memory | 64 GB | ||
Disk | 1 TB SSD | 1 TB pd-balanced |
*Recommended GCP capacity or better
For more information, see GCP Machine Types and Persistent disks in the GCP documentation.
Azure Sizing and Limits
Tamr recommends the following configurations for single-node deployments.
Deployment | Recommended Sizing | Ev3 Series/Storage* | |
---|---|---|---|
Large | CPU | 32 cores | Standard_E32_v3 |
Memory | 256 GB | ||
Disk | 5TB SSD | Premium/Standard SSD 5TB | |
Medium | CPU | 16 cores | Standard_E16_v3 |
Memory | 128 GB | ||
Disk | 2TB SSD | Premium/Standard SSD 2TB | |
Small | CPU | 8 cores | Standard_E8_v3 |
Memory | 64 GB | ||
Disk | 1TB SSD | Premium/Standard SSD 1TB |
*Recommended Azure capacity or better
For more information, see the Microsoft documentation about the Ev3 series
and disk types.
PostgreSQL Deployment Requirements
For single-node Azure and GCP deployments, Tamr only supports installing PostgreSQL on the same server as Tamr Core.
For single-node AWS deployments, Tamr recommends installing PostgreSQL on the same server as Tamr Core. If required, you can install PostgreSQL on a separate AWS RDS PostgreSQL instance using the Tamr AWS RDS Terraform module. If deploying via RDS, you must follow the Terraform module instructions in Deploying Tamr on AWS and ensure that there is a route between the Tamr Core VM and the RDS network.
NGINX Deployment Requirements
NGINX is a reverse proxy server configured to allow clients to access Tamr Core securely over HTTPS, and is a critical component in the Tamr Core network security layer. For more information, see Requirements for NGINX version support, Installing NGINX, and Configuring HTTPS.
For non-production environments, configuring a firewall (below), NGINX, and HTTPS is strongly recommended but not required.
Important: If you do not configure a firewall, NGINX, and HTTPS in a non-production deployment, all users on the network will have access to the data. Use a unique password for this deployment.
Firewall Requirements
For cloud deployments on AWS, Azure, or Google Cloud Platform, use the firewall provided by the cloud provider. This allows you to have control over and visibility into the firewall from the cloud console. See the following:
- AWS: Getting Started with AWS Network ACLS in the AWS documentation
- Azure: Deploy and Configure Azure Firewall Using the Azure Portal in the Azure documentation
- GCP: VPC Firewall Rules Overview in the GCP documentation
For on-premises VM deployments, use the firewall provided by the operating system.
Firewall configuration requirements:
- Implement least-privilege principles. Block all traffic to Tamr host by default and only allow the specific traffic you need. This includes limiting the rule to just the protocols and ports you need.
- If you are restricting traffic to Tamr based on IP addresses, try to minimize the number of rules. It's easier to track one rule that allows traffic from a range of 16IPs than it is to track 16 separate rules. Specify using IPv4 addresses in CIDR notation, such as
1.2.3.4/32
. - Allow only internal access to Tamr Core default port
9100
(via TCP). - Open port
443
for HTTPS.
Note: If you plan to forward HTTP traffic to HTTPS, also open port80
.
Included Services and Ports
Securing Tamr Core Ports
When configuring the firewall, only open port 443
and any ports required by your system administrators to outside traffic. For example, port 22
is often used for SSH access.
The host firewall configuration can be configured through your cloud provider console, or through the host’s operating system.
For reference:
Note: If you plan to forward HTTP traffic to HTTPS, also open port 80
.
Tamr Core Microservices
A single-node deployment provides access to these microservices at the following default ports. In addition, they are available via a proxy at the default Tamr Core port (TCP 9100
).
Optional administrative ports can be found at each of the above ports +1
. For example, an administrative port for 9020
is found at 9021
. They include endpoints for operational information and ensure that a heavy user request load cannot prevent administrative requests from getting through.
Service | Default Port | Description |
---|---|---|
Auth | 9020 | User authentication |
Dataset | 9150 | Dataset management |
Core Connect | 9050 | Dataset movement between Tamr Core and cloud storage destinations |
Dedup | 9140 | Deduplication service for mastering projects |
Match | 9170 | Low Latency Match services |
Persistence | 9080 | Database persistence |
Preview | 9040 | Spark runner for preview of mappings and data transformations |
Public API | 9180 | Public APIs for working with Tamr Core |
Recipe | 9190 | Orchestration service for tracking tasks and their dependencies |
Taxonomy | 9400 | Taxonomy service for classification projects |
Transform | 9160 | Transformation service for schema mapping and running transformations |
Unify | 9100 | Tamr front-end application |
External Services
Service | Ports | Description |
---|---|---|
Elasticsearch | 9200, 9300 | Elasticsearch |
Elasticsearch Front End | 9130 | Elasticsearch for Tamr Core front-end application |
elasticsearch_exporter | 9135 | Instrument Elasticsearch for Prometheus metrics gathering |
elasticseach_logging | 9250, 9350 | Elasticsearch for logging |
Grafana | 31101 | Monitoring dashboard |
graphite_exporter | 31108, 31109 | Spark metrics for Prometheus |
Kibana | 5601 | Logging dashboard |
node_exporter | 9110 | System metrics for Prometheus |
postgres_exporter | 31187 | PostgreSQL metrics for Prometheus |
PostgreSQL | 5432 | Internal database for Tamr Core application metadata |
Prometheus | 31390 | Monitoring and alerting framework |
ZooKeeper | 21281 | Tamr Core HBase ZooKeepe client |
HBase | 16010 | HBase Master |
HBase | 9113 | HBase Master Exporter |
HBase | 60010 | HBase Master JMX |
HBase | 9114 | HBase Region Server Exporter |
HBase | 60030 | HBase Region Server |
HBase configured by TAMR_HBASE_ZK_CLIENT_PORT |
2181 | HBase ZooKeeper (as distinct from Tamr HBase ZooKeeper) |
YARN | 8088 (HTTP) 8090 (HTTPS) |
YARN Resource Manager dashboard |
YARN | 8031, 8032, 8033, 8042 | YARN Resource Manager Resource, and its admin and tracker ports |
YARN | 8030 | YARN Resource Manager scheduler |
Updated 2 months ago