Intro to Nginx and Secure Connections

What is Nginx?

Nginx is a best-of-breed web server that can be used as a reverse proxy, HTTP cache, and load balancer. In our deployments, we primarily use Nginx as a reverse proxy to provide a secure, encrypted, connection via HTTPS.

As of v2021.015, it has been listed as a requirement for Tamr deployments.

Below we refer to SSL, but considering nearly all SSL certificates are now TLS, we are using the two interchangeably (you will most likely also see this elsewhere online).

What does Nginx do for Tamr deployments?

A reverse proxy forwards traffic from a URL to a specific port within the target domain based on logic or load balancing requirements. Load balancing is not usually a consideration for Tamr Deployments so we often simply use the traffic forwarding capabilities, such as ‘forward tamr-dev.com to machine-IP-address:9100’.

HTTPS requires a valid SSL certificate and provides a secure connection between the outside world and the Tamr VM (requests and responses are encrypted). Together with the traffic forwarding capabilities, this allows Tamr users to connect only using HTTPS connections.

Best practices for single-node deployments

The diagram on our docs page shows how, with HTTPS, port 443 is used to speak to the outside world, whilst 9100 is used in the server for microservices. This means that the only ports on the machine that need to be able to send and receive traffic are 443 (standard HTTPS port) and 22 (SSH). All other ports can and should be closed. This is required for effective security.

Securing Connections:

What happens to HTTP traffic when we are using HTTPS?

HTTP by default goes over port 80. There are 3 potential cases:

  1. Port 80 is closed. No HTTP traffic to the server is possible on this port.
  2. Port 80 is open but HTTP traffic is forwarded to HTTPS. Nginx config would need to include a section like this:
    server { listen 80 default_server; server_name _; return 301 https://$host$request_uri; }
  3. Port 80 is open and HTTPS is on, but there are no HTTP restrictions. It will be possible to access Tamr via HTTP and HTTPS. This is unusual and might happen if you wanted to have a secure UI for users to interact with and don’t care about other traffic, but, make no mistake, this is a major security gap (as it completely undermines having HTTPS) and action should be taken to have this resolved.

How can I interact with Tamr via Python using HTTPS?

Tamr unify client relies on the Python Requests library for API calls. Both the toolbox and unify client allow you to send traffic over HTTPS bypassing protocol=”https” when creating a client (you likely also need to set port=None). By default Requests (and your tamr client) verify SSL certificates for HTTPS requests, just like a web browser.

import tamr_toolbox as tbox
import os

tamr = tbox.utils.client.create(protocol='https',
     host='tamr.example.com',
     port=None,
     username=os.environ['TAMR_USERNAME'],
     password=os.environ['TAMR_PASSWORD'])

If you are having issues with API calls over HTTPS (e.g. getting an SSL or certificate error such as [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate) then it is possible to either point the verification to a certificate from a trusted CA or turn off the verification. If verification is set to False, Requests will accept any certificate and will ignore hostname mismatches and/or expired certificates.

The easiest way to do this is to modify the requests. session used in your Tamr client:

tamr.session.verify = "/path/to/cert/file"

or

tamr.session.verify = False

Other resources

Tamr public docs on installing and configuring Nginx