Jump to content

OpenClinica User Manual/UsingAReverseProxy

From Wikibooks, open books for an open world

Using a secure reverse proxy for OpenClinica

[edit | edit source]

Introduction

[edit | edit source]

Out-of-the-box OpenClinica is served by Apache Tomcat, while this works nicely, we can get a performance boost if we add a "reverse proxy" to the equation. A reverse proxy server will take weight off the shoulders of the Tomcat server. It can send static files like CSS, javascript and images without bothering Tomcat about it. Tomcat only needs to serve the dynamic study content i.e. the generated pages with our study data. Furthermore, a reverse proxy allows us to send compressed responses, or tell the client that a file is not going to change for a long while, and thus that the already downloaded file can be used instead of downloading it again from the server. These modifications reduce bandwidth use. The reserve proxy can also be used for SSL encryption, again taking away workload from Tomcat.

The main body of this page is a tutorial using Nginx (pronounce: Engine X). Not only is Nginx very fast, more importantly it is very easy to install and configure. Nginx is open sourced under the "2-clause BSD license".

Nginx is only in Beta on Windows

[edit | edit source]

Whilst Nginx has release versions for Linux/BSD, it is only in Beta on Windows operating systems (see the nginx for Windows page). Alternative Windows proxy software include Apache HTTP Server (detailed below) and SQUID.

Another option - don't use a proxy but set up Tomcat for compression and SSL encryption

[edit | edit source]

If you just want your pages to be encrypted and compressed, but don't need the caching effects of a reverse proxy such as Nginx, Tomcat can be set up for SSL encryption by following the SSL Configuration HOW-TO, and Compression can also be set up in Tomcat. This thread suggests setting up Tomcat for SSL encryption and compression, rather than using a proxy as when configured optimally, OpenClinica is primarily limited by the performance of Postgres' caching (see page on Performance).

Install Nginx

[edit | edit source]

Prerequisites:

[edit | edit source]

First use your package manager (apt-get, yum) to check if the following libraries are installed on your system. Also make sure the development files are available for these libraries, these have the postfix '-dev' or '-devel' depending on your system.

  • zlib library (for gzip module)
  • pcre library (for rewrite module)
  • openssl library (for ssl support)

RedHat/CentOS example:

  $ sudo yum install zlib zlib-devel pcre pcre-devel openssl openssl-devel

Debian/Ubuntu example:

  $ sudo apt-get install zlib zlib-dev pcre pcre-dev openssl openssl-dev

Because we are going to compile the source code the development tools and development libraries for your system need to be installed.

RedHat/CentOS example:

  $ sudo yum groupinstall "Development Tools"
  $ sudo yum groupinstall "Development Libraries"

Debian/Ubuntu example:

  $ sudo apt-get install build-essential

Download and install Nginx

[edit | edit source]

Download the stable source from http://wiki.nginx.org/Install#Source_Releases.
At the time of writing the version is: 1.0.5.
After downloading unpack Nginx. Configure Nginx with ssl support and gzip compression. Build it.
Here are the steps:

  $ wget http://nginx.org/download/nginx-1.0.5.tar.gz
  $ tar xvfz nginx-1.0.5.tar.gz
  $ cd nginx-1.0.5
  $ ./configure --with-http_ssl_module --with-http_gzip_static_module
  $ make
  $ sudo make install

If all went well Nginx is now installed in /usr/local/nginx/.

Installing the init script

[edit | edit source]

RedHat/CentOS

[edit | edit source]

Get the init script for RedHat/CentOS here: http://wiki.nginx.org/RedHatNginxInitScript
We need to modify this file slightly to get it working with our installation.
Change: nginx="/usr/sbin/nginx" to: nginx="/usr/local/nginx/sbin/nginx"
Change: NGINX_CONF_FILE="/etc/nginx/nginx.conf" to: NGINX_CONF_FILE="/usr/local/nginx/conf/nginx.conf"

Debian/Ubuntu

[edit | edit source]

Get the init script from: http://wiki.nginx.org/Nginx-init-ubuntu
Change: DAEMON=/usr/local/sbin/nginx to DAEMON=/usr/local/nginx/sbin/nginx

When you have modified the file to your needs copy the init script `nginx` to /etc/init.d/. Make sure it starts automatically by running these commands:

  $ sudo chkconfig --del nginx
  $ sudo chkconfig --level 2345 nginx on

Check if everything is working well by starting the server. Before you start the server make sure no other webserver is using port 80.

  $ sudo /etc/init.d/nginx start

When you browse to http://localhost, you should see the message `Welcome to Nginx`.

Secure connection

[edit | edit source]

We want to create a secure connection to our study system. If you do not have a security certificate already, create one using this excellent guide on creating a self-signed certificate: http://www.akadia.com/services/ssh_test_certificate.html. Follow this guide up to step 4. Let's say your security files are called `my-server.crt` and `my-server.key`. Create a directory called ssl in /usr/local/. Copy the two files to the folder /usr/local/ssl/. You will see these file names again in the configuration file.
Now we can configure Nginx.

Configure Nginx

[edit | edit source]

The main configuration file can be found in /usr/local/nginx/conf/nginx.conf

Here is an example configuration file that gives us compression, encryption, expiry headers and a proxy to our OpenClinica installation. Read the comments in the file for explanation. Copy this to your nginx.conf file.
Be aware that OpenClinica was installed as a webapp with the name `studies`. That is why you will find urls like http://localhost:8080/studies in the configuration file. This confiruration also hardcodes the location of the "includes" and "images" directories as local directories.

Nginx configuration file

[edit | edit source]
user  nobody;
# the number of worker_processes should at least be equal to the number of CPU cores of the server
worker_processes  2;

error_log  logs/error.log;
pid        logs/nginx.pid;

events {
    worker_connections  1024;
}

http {
    # set max upload size
    client_max_body_size 3M;   

    include       mime.types;
    default_type  application/octet-stream;
	
    # set log file format
    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';

    access_log  logs/access.log  main;

    sendfile on;

    # enable compression
    gzip  on;
    gzip_static on;
    gzip_http_version 1.1;
    gzip_disable "MSIE [1-6]\.";

    # specify what data will be compressed
    gzip_types text/plain text/html text/css text/javascript image/png image/x-icon application/x-javascript application/xml;

    # optimization for ssl sessions:
    # the ssl_session_cache reuses connections per client therefore minimizing the burden of the computationally expensive SSL handshakes
    # one megabyte of the cache (shared:SSL:1m;) contains about 4000 sessions. 100k about 400 
    ssl_session_cache    shared:SSL:100k;
    # reuse SSL session parameters to avoid SSL handshakes, time in minutes
    ssl_session_timeout  10m;

    # set keepalive connections to send several requests via one connection, time in seconds
    keepalive_timeout  120;
    
    # set the time that nginx will wait for the proxy connection
    proxy_connect_timeout 120s;
    proxy_read_timeout 120s;

    # HTTP server
    # define server on port 80 (http)
    server {
        listen       80;
        server_name  my-server.org;

        access_log  logs/host.access.log  main;

	# force the client to use a secure connection
        location /studies {
            rewrite ^/(.*)$ https://my-server.org/$1 redirect;
        }

        error_page  404              /404.html;
        # redirect server error pages to the static page /50x.html
        error_page   500 502 503 504  /50x.html;
        location = /50x.html {
            root   html;
        }
    }

    # HTTPS server
    # define server on port 443 (https)
    server {
        listen       443;
        server_name  my-server.org;

	# turn on data encryption for secure connections
        ssl                  on;
        ssl_certificate      /usr/local/ssl/my-server.crt;
        ssl_certificate_key  /usr/local/ssl/my-server.key;

	# directly serve the static files in the `includes` directory
        location ~ ^/studies/includes/(.*)$ {
            # add future expiry date to force caching of the file on the client
            expires max;
            add_header Cache-Control "public";
            alias /usr/local/tomcat/webapps/studies/includes/$1;
        }
 	# directly serve the static files in the `images` directory
        location ~ ^/studies/images/(.*)$ {
            # add future expiry date to force caching of the file on the client   
            expires max;
            add_header Cache-Control "public";
            alias /usr/local/tomcat/webapps/studies/images/$1;
        }
	# pass all other requests to Tomcat
        location /studies {
            proxy_pass http://127.0.0.1:8080/studies;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        }
    }
}

After we have made change to the configuration file we must restart nginx, like this:

$ sudo /etc/init.d/nginx restart

Try the new configuration by going to http://localhost/studies/.

Proxy for more than one OpenClinica installation

[edit | edit source]

The part that is responsible for passing requests to OpenClinica is:

  location /studies {
            proxy_pass http://127.0.0.1:8080/studies;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
  }

If you have more than one OpenClinica installed, you can still have one access point but as many back-ends as you need.
Consider for example this configuration:

  location /study_1 {
            proxy_pass http://127.0.0.1:8080/OpenClinica_1;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
  }
  location /study_2 {
            proxy_pass http://192.168.1.100:8080/OpenClinica_2;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
  }
  location /study_3 {
            proxy_pass http://192.168.1.101:8080/OpenClinica_3;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
  }

In this configuration one Nginx server is a proxy for three OpenClinica servers, these can be real or virtual servers. As you can see the first one is running on the same machine as the Nginx server. The other two are servers in the local domain.

Restricting access

[edit | edit source]

Most of the time data will only be entered from a limited number of known locations. We can instruct Nginx to only allow access to our server from those locations based on their public IP addresses. All other IP address are blocked. Have a look at this snippet:

location /studies {
    # grant access to the following ip addresses
    allow 80.78.17.10;   #gabon satellite
    allow 41.211.145.61; #gabon adsl
    allow 41.220.12.34;  #uganda office
    # disallow all other ip addresses
    deny all;

    proxy_pass http://127.0.0.1:8080/studies;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}

Connections from unauthorized IP addresses will get a 403 error. We can catch this error and send a more meaningful message to the client. To do so create a file called 403.html in the folder /usr/local/nginx/html/. Add something like this to the file:

<html>
    <head>
        <title>Error 403 - IP Address Blocked</title>
    </head>
    <body>
        <h2>Your IP Address is not registered, therefore you do not have access to this site.</h2>
    </body>
</html>

Add the next line to the http section of the Nginx configuration file.

error_page 403 http://my-server.org/403.html;

Now unauthorized clients will get an informative message when they try to access our study system.

Configuring OpenClinica

[edit | edit source]

Ensure that OpenClinica's sysURL setting (in the file OpenClinica/WEB-INF/classes/datainfo.properties) is set to the public URL of the system, e.g.

sysURL=https://my-server.org/${WEBAPP}/MainMenu

This ensures that URLs in system emails and some internal messages correctly point to the proxy, rather than anywhere else.

See the difference

[edit | edit source]

When you install and run the Firefox extension YSlow, you can see the improvement of the reverse proxy over the standard OpenClinica installation. Use port 8080 to bypass the proxy. Here is an example of the Notes & Discrepancies page from OC 3.0.4.1:


Before optimization: YSlow Statistics Before


After optimization: YSlow Statistics After



Explanation: on the left we see the anatomy of the page. To load this page, the browser sends no less than 76 requests to the server. These requests result in a total download size of 462.7k. If we add gzip compression the size is reduced to 162.6K. The first time we contact the OpenClinica server all static content like images, javascripts and css files are cached on the client machine. On the right side we see the effect of caching: the amount of data downloaded from the server is greatly reduced. The 'expires max' tells the browser that it should not check the server for a new version of a file for a long time. That is why we only hit the server twice in the optimized situation. This is a major improvement, especially in situations where bandwidth is limited and latency is high. In this case the number of requests can even be reduced to one, because we have not set 'expires max' for the favicon file.

Setting up Apache HTTP Server as a reverse proxy

[edit | edit source]

A similar result can be achieved by using Apache HTTP Server as a reverse proxy. Note that the configuration below doesn't yet include SSL:

#Ensure correct modules are uncommented
LoadModule deflate_module modules/mod_deflate.so
LoadModule headers_module modules/mod_headers.so
LoadModule proxy_module modules/mod_proxy.so
LoadModule proxy_http_module modules/mod_proxy_http.so

<IfModule proxy_module>

# Disable forward proxy requests
ProxyRequests Off

# Allow requests from selected hosts or domains
<Proxy *>
Order Allow,Deny
Allow from all
</Proxy>

# Configure reverse proxy requests for OpenClinica with a long timeout for lots of data
ProxyPass / http://localhost:8080/ timeout=1800
ProxyPassReverse / http://localhost:8080/

</IfModule>

<IfModule expires_module>
<IfModule headers_module>
# Add long expires headers and caching for images and includes (javascript) directories
# based on http://cjohansen.no/en/apache/using_a_far_future_expires_header
<LocationMatch "/.*/images/.*$">
    ExpiresActive On
    ExpiresDefault "access plus 1 year"
	Header set Cache-Control "public"
</LocationMatch>
<LocationMatch "/.*/includes/.*$">
    ExpiresActive On
    ExpiresDefault "access plus 1 year"
	Header set Cache-Control "public"
</LocationMatch>
</IfModule>
</IfModule>

#compress output to relevant browsers
<IfModule deflate_module>
<Location />
# Insert filter
SetOutputFilter DEFLATE

# Netscape 4.x has some problems...
BrowserMatch ^Mozilla/4 gzip-only-text/html

# Netscape 4.06-4.08 have some more problems
BrowserMatch ^Mozilla/4\.0[678] no-gzip

# MSIE masquerades as Netscape, but it is fine
BrowserMatch \bMSIE !no-gzip !gzip-only-text/html
# Don't compress images
SetEnvIfNoCase Request_URI \
\.(?:gif|jpe?g|png)$ no-gzip dont-vary

#SetEnvIfNoCase Request_URI \
#"/.*/CreateCRFVersion.*$" no-gzip dont-vary

# Make sure proxies don't deliver the wrong content
Header append Vary User-Agent env=!dont-vary
</Location> 
</IfModule>

<IfModule headers_module>
#Optional section to avoid issues with OpenClinica 3.1.2 (and possibly later) and Microsoft's TMG
#Delete this whole section if you aren't using TMG:
<LocationMatch "/.*/AdministrativeEditing.*$" >
    RequestHeader unset Accept-Encoding
    Header unset Content-Encoding
</LocationMatch>
<LocationMatch "/.*/CreateCRFVersion.*$" >
    RequestHeader unset Accept-Encoding
    Header unset Content-Encoding
</LocationMatch>
<LocationMatch "/.*/DataEntry.*$" >
    RequestHeader unset Accept-Encoding
    Header unset Content-Encoding
</LocationMatch>
<LocationMatch "/.*/DoubleDataEntry.*$" >
    RequestHeader unset Accept-Encoding
    Header unset Content-Encoding
</LocationMatch>
<LocationMatch "/.*/InitialDataEntry.*$" >
    RequestHeader unset Accept-Encoding
    Header unset Content-Encoding
</LocationMatch>
<LocationMatch "/.*/PrintCRF.*$" >
    RequestHeader unset Accept-Encoding
    Header unset Content-Encoding
</LocationMatch>
<LocationMatch "/.*/PrintDataEntry.*$" >
    RequestHeader unset Accept-Encoding
    Header unset Content-Encoding
</LocationMatch>
<LocationMatch "/.*/PrintEventCRF.*$" >
    RequestHeader unset Accept-Encoding
    Header unset Content-Encoding
</LocationMatch>
<LocationMatch "/.*/PrintAllEventCRF.*$" >
    RequestHeader unset Accept-Encoding
    Header unset Content-Encoding
</LocationMatch>
<LocationMatch "/.*/PrintAllSiteEventCRF.*$" >
    RequestHeader unset Accept-Encoding
    Header unset Content-Encoding
</LocationMatch>
<LocationMatch "/.*/SectionPreview.*$" >
    RequestHeader unset Accept-Encoding
    Header unset Content-Encoding
</LocationMatch>
<LocationMatch "/.*/ViewSectionDataEntry.*$" >
    RequestHeader unset Accept-Encoding
    Header unset Content-Encoding
</LocationMatch>
</IfModule>

Conclusion

[edit | edit source]

Installing and configuring a reverse proxy is not very difficult. With a reverse proxy in place we let Tomcat do what it does best: serve dynamic content. The reverse proxy takes care of all other requests, SSL, content compression and caching on the client. We have seen that the optimizations reduce bandwidth use and the number of requests send to the server. This adds to the responsiveness of the application, in particular on networks that have high latency and low bandwidth.