User GuidesAPI ReferenceRelease Notes
Doc HomeHelp CenterLog In

Deploying Single-Node Tamr Core on AWS (Commercial Marketplace)

Deployment steps for launching and accessing Tamr Core to a single instance using Amazon Web Services (AWS) commercial marketplace, with basic network and security requirements, status checks, costs, backups, and support.

Note: The Tamr AWS Marketplace Image is available from multiple AWS Marketplaces. These instructions are intended for use with the commercial marketplace.

Requirements for Users Deploying on AWS

Users deploying Tamr Core on AWS must have PostgreSQL experience and basic Linux experience. Users also must have permission to run bash scripts and commands on the AWS Tamr Core EC2 instance.

What Is Included in the Tamr Marketplace Image on AWS?

The Tamr Core software offered on Amazon Web Services (AWS) Marketplace is a preconfigured and fully integrated package. It allows you to run expert-assisted machine learning jobs on vast amounts of data.

The Tamr AWS Marketplace Image (AMI) has the following characteristics:

  • It is a single virtual machine (VM) image for the Linux Platform (Ubuntu 22.04).
  • The image includes:
    • A version of the Tamr Core software, optimized for use with Amazon Web Services (AWS).
    • Software dependencies that Tamr Core requires, including HBase, Spark on Yarn, Elasticsearch, a PostgreSQL instance, and ZooKeeper. For a diagram that shows these components, see Deployments. The list of licenses for Open Source software used in the Tamr package is included in the home/ubuntu/tamr/licenses/unify-licenses/licenses directory. After you deploy the instance, you can access this directory by connecting to the instance via SSH.
  • If you deploy the Tamr BYOL (Bring Your Own License) image from AWS Marketplace, you must have a valid license to use it. You are responsible for purchasing and managing your own license from Tamr. For information, see Accessing the Tamr Core Instance below. Alternatively, images that are purchased using the Hourly/Annual model include the license as part of the purchase.

Before You Begin

Before you deploy Tamr Core, complete or verify the following:

  • You have an AWS account. To create an AWS account, see How do I create and activate a new AWS account?.
  • You have sufficient Amazon EC2 service quota limits, and your quota limits allow you to create instances with characteristics that Tamr Core requires. For more information, see AWS Sizing and Limits in this guide.
  • You have installed the AWS Command Line Interface (AWS CLI). You can use it to manage your instance after it is deployed. See AWS Command Line Interface in the AWS documentation.
  • You have generated SSH keys. You will need the keys to connect to your instance with SSH. See Amazon EC2 key pairs and Linux instances in the AWS documentation.
  • You have created a security group for your instance. See Working with security groups. The firewall rules in the security group allow you to specify the type of access--internal access via a VPC network, or secure public access over HTTPS--and to specify ports for each type of connection that must be kept open. You will use these ports to access the Tamr Core user interface and run commands to check the health of the Tamr Core instance.

Working with AWS Security Groups

See Creating a Security Group and Adding Firewall Rules for instructions. You add firewall rules to the security group to control traffic as follows:

  • Allow only internal access to the Tamr Core default port 9100, protocol TCP.
  • Allow traffic on port 443 for HTTPS connections, and on port 80 for HTTP connections if you plan to redirect HTTP traffic to HTTPS. Tamr strongly advises that the source be set to a restrictive IP range that you specify using the CIDR notation. Avoid setting the source to allow traffic from Anywhere (0.0.0.0/0 for IPv4 and ::/0 for IPv6).

Accessing Tamr Core via HTTPS

If you are not using a VPC, secure external access to Tamr Core via HTTPS via a reverse proxy from the NGINX application server. For more information, see Installing NGINX and Configuring HTTPS. If required, you can set up SSL via an application load balancer. See Tamr Core AWS Network Reference Architecture
.

For non-production environments configuring a firewall, NGINX, and HTTPS is strongly recommended but not required.

important Important: If you do not configure a firewall, NGINX, and HTTPS in a non-production deployment, all users on the network will have access to the data. Use a unique password for this deployment.

Creating a Security Group and Adding Firewall Rules

To create a security group for Tamr Core and configure its firewall rules:

  1. To create a security group, see Create a security group in the AWS documentation.
  2. To add firewall rules to the security group, see Add rules to a security group in the AWS documentation. Create the following ingress rules in the security group:
    • Tamr Core default port 9100:
      • Type: Custom TCP
      • Port range: 9100
      • Source: Custom
        Note: Tamr recommends only allowing ingress traffic from a private VPC. Tamr does not recommend allowing access via the public Internet on port 9100.
      • Description: default-allow-9100
    • Tamr Web Application (HTTPS):
      • Type: HTTPS
      • Port range: 443
      • Source: Custom
        Note: Tamr recommends allowing ingress traffic only from a private VPC.
      • Description: tamr-web-https
    • Tamr Web Application (HTTP):
      • Type: HTTP
      • Port range: 80
      • Source: Custom
        Note: Tamr recommends only allowing ingress traffic from a private VPC. Tamr does not recommend allowing access via the public Internet on port 80.
      • Description: tamr-web-http

Deploying a Tamr Core Instance from the AWS Console

To deploy a Tamr Core instance

  1. Sign in to your AWS account.
  2. Navigate to AWS Marketplace Subscriptions.
  3. Select Discover products and then search for Tamr using the keyword Tamr.
  4. Select Customer-hosted Tamr Mastering BYOL and then select Continue to subscribe.
  5. Review the terms and select Accept Terms. It can take some time for AWS to add the subscription to the account.
  6. When the subscription is active, select Continue to configuration.
  7. Supply the following values to configure Tamr Core:
    • Fulfillment option: 64-bit (x86) Amazon Machine Image (AMI)
    • Software version: Select the version of Tamr Core.
    • Region: Select the AWS region to deploy Tamr Core.
  8. Select Continue to Launch, then make the following selections on the Launch this software page:
    • Choose Action: Select Launch from Website or Launch through EC2.
    • EC2 Instance Type: See AWS Sizing and Limits
    • VPC Settings: Select the VPC to launch the instance.
    • Key Pair Settings: Select the SSH key created earlier.
  9. Select Launch. Tamr Core begins deploying. This process can take several minutes.
  10. Select the instance to retrieve the public IPv4 address for accessing Tamr Core. This is the host address of your Tamr instance. You use the hostname of this instance when prompted for http://<hostname>:9100.
  11. If you selected the BYOL option, obtain a license key, username, and password by contacting Tamr Support at [email protected]. You must provide the license key and these credentials when you access the Tamr Core instance from a browser.
  12. Change the default database password. See Configuring PostgreSQL for instructions.

Accessing the Tamr Core Instance

This procedure assumes that users in your team are already connected via your own VPC network.

  • To access the Tamr Core instance for the first time after it has been deployed, you need to have a license key and an initial set of credentials (username and password). If you selected the Hourly/Annual option, the license is already included.
  • To access the Tamr Core instance on a regular basis (after you have provided the license key) inside a VPC, use: http://<hostname>:9100.

To access the Tamr instance for the first time and provide the license key (BYOL)

  1. In your browser, sign in to your AWS account and go the EC2 service.
  2. Select the Tamr Core EC2 instance and then from the Actions dropdown select Connect. Connect to the SSH instance using one of the options that displays.
  3. When you connect, use the command line to provide the Tamr Core license key: ${TAMR_UNIFY_HOME}/tamr/utils/unify-admin.sh config:set TAMR_LICENSE_KEY="<license-key-value>". See Setting the License Key.
  4. Restart Tamr Core:
    cd ${TAMR_UNIFY_HOME}/tamr
    ./stop-unify.sh
    ./stop-dependencies.sh
    ./start-dependencies.sh
    ./start-unify.sh
    
    See Restarting Tamr Core.
  5. Go to the following URL to access the Tamr Core instance: http://<hostname>:9100.
  6. Enter the credentials you received from Tamr. Change the default password immediately.

Now you are able to use Tamr Core. To verify your installation, see Tamr Installation Verification Steps.

Checking Tamr Core Health and License Status

You can use the Tamr Core health check API to check Tamr Core health status. The health API endpoint returns health checks for the service and for ZooKeeper, which Tamr Core uses for configuration management.

To check Tamr Core health status:

  1. Open the health API endpoint at: http://<hostname>:9100/docs#!/.
  2. Navigate to service/health.
  3. Select Try it out, or use the following cURL command:
    curl -X GET --header 'Accept: application/json' 'http://<hostname>:9100/api/service/health'.

If a health check failure occurs, restart the Tamr Core instance to recover from the failure. To help troubleshoot the instance, access the Tamr Core Help Center.

If you receive a "Tamr license is not valid" error on login, see the Tamr Core Help Center and To update a license key set by a configuration file.

Starting and Stopping Tamr Core

Start and stop Tamr Core using these scripts located in the ${TAMR_UNIFY_HOME}/tamr directory:

cd ${TAMR_UNIFY_HOME}/tamr
./start-dependencies.sh
./start-unify.sh
./stop-unify.sh
./stop-dependencies.sh

For information, see Restarting Tamr.

Security

To ensure secure access, users in your team must have access to your team’s VPC in which the Tamr Core EC2 instance was launched.

Tamr recommends that you use storage volume encryption to protect your data. See Amazon EBS encryption in the AWS documentation.

Note: Encryption at rest: Because a standard Tamr Core deployment on cloud infrastructure uses dedicated service instances, Tamr Core depends on service-level encryption for encryption at rest. This ensures that Tamr Core’s data is encrypted using keys that are distinct from those used by other applications, while also allowing keys to be managed using the cloud provider’s standard key management service.

Costs

The cost of running the Tamr Core instance is a combination of:

  • Tamr Core cost. Tamr Core cost is per license, with additional cost for optional services and support. To obtain a license key, contact Tamr Support at [email protected].
  • AWS infrastructure costs for the virtual machine on which you are running Tamr Core. See Amazon EC2 pricing.
  • AWS storage costs. Optionally, you can choose to store Tamr Core backups in AWS S3 storage. See Amazon S3 pricing.

Scaling Up

To scale your Tamr Core deployment on AWS, use individual sizing increases for your AWS EC2 instance. If you need additional storage, attach an external storage drive to the EC2 instance. For scaling out your deployment, contact your Tamr account representative or Tamr Support at [email protected].

Backups and Disaster Recovery

Take regular backups of Tamr Core and keep the backups in AWS object storage (Amazon S3) outside of your AWS instance. To create backups, use the Tamr Core backup API. For information, see Backup.

Upgrades

Tamr releases new software versions frequently. While Tamr strives to maintain the most recent version available on AWS Marketplace, your instance version may not be the latest and is not automatically upgraded. To upgrade to the most recent version, or to create a custom deployment in AWS, contact Tamr Support at [email protected].

Configuring Core Connect

To move data files from cloud storage into Tamr Core, and export datasets from Tamr Core to cloud storage, you use the Core Connect service. See Configuring Core Connect.

Monitoring

AWS CloudWatch provides monitoring for AWS cloud resources. Alerts can be created. See the CloudWatch alarms documentation for details.

Logging

See Logging in Single-Node Deployments.

Support

For technical support, contact Tamr Support at [email protected].