Deploying Single-Node Tamr Core on AWS (ICMP)
Deployment steps for launching and accessing Tamr Core to a single instance using the Amazon Web Services (AWS) intelligence community marketplace, with basic network and security requirements, status checks, costs, backups, and support.
Note: The Tamr AWS Marketplace Image is available from multiple AWS Marketplaces. These instructions are intended for use with the ICMP.
Requirements for Users Deploying on AWS
Users deploying Tamr Core on AWS must have PostgreSQL experience and basic Linux experience. Users also must have permission to run bash scripts and commands on the AWS Tamr Core EC2 instance.
Users must also be familiar with AWS procedures and have access to guidance published by Amazon, including:
- AWS re:Post Knowledge Center
- Amazon EC2 User Guide for Linux Instances
- Amazon CloudWatch User Guide
What Is Included in the Tamr Marketplace Image on AWS?
The Tamr Core software offered on Amazon Web Services (AWS) Marketplace is a preconfigured and fully integrated package. It allows you to run expert-assisted machine learning jobs on vast amounts of data.
The Tamr AWS Marketplace Image (AMI) has the following characteristics:
- It is a single virtual machine (VM) image for the Linux Platform (Ubuntu 22.04).
- The image includes:
- A version of the Tamr Core software, optimized for use with Amazon Web Services (AWS).
- Software dependencies that Tamr Core requires, including HBase, Spark on Yarn, Elasticsearch, a PostgreSQL instance, and ZooKeeper. For a diagram that shows these components, see Deployments in this guide. The list of licenses for Open Source software used in the Tamr package is included in the
home/ubuntu/tamr/licenses/unify-licenses/licenses
directory. After you deploy the instance, you can access this directory by connecting to the instance via SSH.
- If you deploy the Tamr BYOL (Bring Your Own License) image from AWS Marketplace, you must have a valid license to use it. You are responsible for purchasing and managing your own license from Tamr. For more information, see Accessing the Tamr Core Instance below. Alternatively, images that are purchased using the Hourly/Annual model include the license as part of the purchase.
Before You Begin
Before you deploy Tamr Core, complete or verify the following:
- You have an AWS account. To create an AWS account, see How do I create and activate a new AWS account? in the AWS re:Post Knowledge Center.
- You have sufficient Amazon EC2 service quota limits as described in the Amazon EC2 User Guide for Linux Instances, and your quota limits allow you to create instances with characteristics that Tamr Core requires. For more information, see AWS Sizing and Limits in this guide.
- You have installed the AWS Command Line Interface (AWS CLI). You can use it to manage your instance after it is deployed.
- You have generated SSH keys. You will need the keys to connect to your instance with SSH. See Amazon EC2 key pairs and Linux instances in the Amazon EC2 User Guide for Linux Instances.
- You have created a security group for your instance. See Working with security groups in this guide. The firewall rules in the security group allow you to specify the type of access--internal access via a VPC network, or secure public access over HTTPS--and to specify ports for each type of connection that must be kept open. You will use these ports to access the Tamr Core user interface and run commands to check the health of the Tamr Core instance.
Working with AWS Security Groups
You add firewall rules to the security group to control traffic as follows:
- Allow only internal access to the Tamr Core default port
9100
, protocolTCP
. - Allow traffic on port
443
for HTTPS connections, and on port80
for HTTP connections if you plan to redirect HTTP traffic to HTTPS. Tamr strongly advises that the source be set to a restrictive IP range that you specify using Classless Inter-Domain Routing (CIDR) notation. Avoid setting the source to allow traffic from Anywhere (0.0.0.0/0
for IPv4 and::/0
for IPv6).
Creating a Security Group and Adding Firewall Rules
To create a security group for Tamr Core and configure its firewall rules:
- To create a security group, see Create a security group in the Amazon EC2 User Guide for Linux Instances.
- To add firewall rules to the security group, see Add rules to a security group in the Amazon EC2 User Guide for Linux Instances. Create the following
ingress
rules in the security group:
- Tamr Core default port 9100:
- Type:
Custom TCP
- Port range:
9100
- Source:
Custom
Note: Tamr recommends only allowing ingress traffic from a private VPC. Tamr does not recommend allowing access via the public Internet on port9100
. - Description:
default-allow-9100
- Type:
- Tamr Web Application (HTTPS):
- Type:
HTTPS
- Port range:
443
- Source:
Custom
Note: Tamr recommends allowing ingress traffic only from a private VPC. - Description:
tamr-web-https
- Type:
- Tamr Web Application (HTTP):
- Type:
HTTP
- Port range:
80
- Source:
Custom
Note: Tamr recommends only allowing ingress traffic from a private VPC. Tamr does not recommend allowing access via the public Internet on port80
. - Description:
tamr-web-http
- Type:
Deploying a Tamr Core Instance from the AWS Console
To deploy a Tamr Core instance
- Sign in to your AWS account.
- Navigate to AWS Marketplace Subscriptions.
- Select Discover products and then search for Tamr using the keyword
Tamr
. - Select Customer-hosted Tamr Mastering BYOL and then select Continue to subscribe.
- Review the terms and select Accept Terms. It can take some time for AWS to add the subscription to the account.
- When the subscription is active, select Continue to configuration.
- Supply the following values to configure Tamr Core:
- Fulfillment option: 64-bit (x86) Amazon Machine Image (AMI)
- Software version: Select the version of Tamr Core.
- Region: Select the AWS region to deploy Tamr Core.
- Select Continue to Launch, then make the following selections on the Launch this software page:
- Choose Action: Select
Launch from Website
orLaunch through EC2
. - EC2 Instance Type: See AWS Sizing and Limits in this guide.
- VPC Settings: Select the VPC to launch the instance.
- Key Pair Settings: Select the SSH key created earlier.
- Choose Action: Select
- Select Launch. Tamr Core begins deploying. This process can take several minutes.
- Select the instance to retrieve the public IPv4 address for accessing Tamr Core. This is the host address of your Tamr instance. You use the hostname of this instance when prompted for
http://<hostname>:9100
. - If you selected the BYOL option, obtain a license key, username, and password by contacting Tamr Support by email at [email protected]. You must provide the license key and these credentials when you access the Tamr Core instance from a browser.
- Change the default database password. See Configuring PostgreSQL for instructions.
Accessing the Tamr Core Instance
This procedure assumes that users in your team are already connected via your own VPC network.
- To access the Tamr Core instance for the first time after it has been deployed, you need to have a license key and an initial set of credentials (username and password). If you selected the Hourly/Annual option, the license is already included.
- To access the Tamr Core instance on a regular basis (after you have provided the license key) inside a VPC, use:
http://<hostname>:9100
.
To access the Tamr instance for the first time and provide the license key (BYOL)
- In your browser, sign in to your AWS account and go the EC2 service.
- Select the Tamr Core EC2 instance and then from the Actions dropdown select Connect. Connect to the SSH instance using one of the options that displays.
- When you connect, use the command line to provide the Tamr Core license key:
${TAMR_UNIFY_HOME}/tamr/utils/unify-admin.sh config:set TAMR_LICENSE_KEY="<license-key-value>"
. See Setting the License Key in this guide. - Restart Tamr Core:
See Restarting Tamr Core in this guide.cd ${TAMR_UNIFY_HOME}/tamr ./stop-unify.sh ./stop-dependencies.sh ./start-dependencies.sh ./start-unify.sh
- Go to the following URL to access the Tamr Core instance:
http://<hostname>:9100
. - Enter the credentials you received from Tamr. Change the default password immediately.
Now you are able to use Tamr Core. To verify your installation, see Tamr Installation Verification Steps in this guide.
Checking Tamr Core Health and License Status
You can use the Tamr Core health check API to check Tamr Core health status. The health API endpoint returns health checks for the service and for ZooKeeper, which Tamr Core uses for configuration management.
To check Tamr Core health status:
- Open the health API endpoint at:
http://<hostname>:9100/docs#!/
. - Navigate to
service/health
. - Select Try it out, or use the following cURL command:
curl -X GET --header 'Accept: application/json' 'http://<hostname>:9100/api/service/health'
.
If a health check failure occurs, restart the Tamr Core instance to recover from the failure.
Starting and Stopping Tamr Core
Start and stop Tamr Core using these scripts located in the ${TAMR_UNIFY_HOME}/tamr
directory:
cd ${TAMR_UNIFY_HOME}/tamr
./start-dependencies.sh
./start-unify.sh
./stop-unify.sh
./stop-dependencies.sh
For more information, see Restarting Tamr in this guide.
Security
To ensure secure access, users in your team must have access to your team’s VPC in which the Tamr Core EC2 instance was launched.
Tamr recommends that you use storage volume encryption to protect your data. See Amazon EBS encryption in the Amazon EC2 User Guide for Linux Instances.
Note: Encryption at rest: Because a standard Tamr Core deployment on cloud infrastructure uses dedicated service instances, Tamr Core depends on service-level encryption for encryption at rest. This ensures that Tamr Core’s data is encrypted using keys that are distinct from those used by other applications, while also allowing keys to be managed using the cloud provider’s standard key management service.
Costs
The cost of running the Tamr Core instance is a combination of:
- Tamr Core cost. Tamr Core cost is per license, with additional cost for optional services and support. To obtain a license key, contact Tamr Support by email at [email protected].
- AWS infrastructure costs for the virtual machine on which you are running Tamr Core. Refer to Amazon for current EC2 pricing.
- AWS storage costs. Optionally, you can choose to store Tamr Core backups in AWS S3 storage. Refer to Amazon for current S3 pricing.
Scaling Up
To scale your Tamr Core deployment on AWS, use individual sizing increases for your AWS EC2 instance. If you need additional storage, attach an external storage drive to the EC2 instance. For scaling out your deployment, contact your Tamr account representative or Tamr Support at [email protected].
Backups and Disaster Recovery
Take regular backups of Tamr Core and keep the backups in AWS object storage (Amazon S3) outside of your AWS instance. To create backups, use the Tamr Core backup API. S AWS Backup and Restore and Backup in this guide.
Upgrades
Tamr releases new software versions frequently. While Tamr strives to maintain the most recent version available on AWS Marketplace, your instance version may not be the latest and is not automatically upgraded. To upgrade to the most recent version, or to create a custom deployment in AWS, contact Tamr Support by email at [email protected].
Configuring Core Connect
To move data files from cloud storage into Tamr Core, and export datasets from Tamr Core to cloud storage, you use the Core Connect service. See Configuring Core Connect in this guide.
Monitoring
AWS CloudWatch provides monitoring for AWS cloud resources. Alerts can be created. See the CloudWatch alarms documentation in the Amazon CloudWatch User Guide for details.
Logging
See Logging in Single-Node Deployments in this guide.
Support
For technical support, contact Tamr Support by email at [email protected].
Updated 5 months ago