- Determine what level of load each proxy comfortably handles
- Collect the latency percentile distribution, which we find is the metric most directly correlated with user experience
Testing Protocols and Metrics Collected
We used the load‑generation programwrk2
to emulate a client, making continuous requests over HTTPS during a defined period. The system under test – HAProxy or NGINX – acted as a reverse proxy, establishing encrypted connections with the clients simulated by wrk
threads, forwarding requests to a backend web server running NGINX Plus R22, and returning the response generated by the web server (a file) to the client.
Each of the three components (client, reverse proxy, and web server) ran Ubuntu 20.04.1 LTS on a c5n.2xlarge Amazon Machine Image (AMI) in EC2.
As mentioned, we collected the full latency percentile distribution from each test run. Latency is defined as the amount of time between the client generating the request and receiving the response. A latency percentile distribution sorts the latency measurements collected during the testing period from highest (most latency) to lowest.
Testing Methodology
Client
Usingwrk2
(version 4.0.0), we ran the following script on the Amazon EC2 instance:
taskset -c 0-3 wrk -t 4 -c 100 -d 30s -R requests_per_second –latency https://adc.domain.com:443/
To simulate many clients accessing a web application, 4 wrk
threads were spawned that together established 100 connections to the reverse proxy. During the 30-second test run, the script generated a specified number of RPS. These parameters correspond to the following wrk2
options:
‑t
option – Number of threads to create (4)‑c
option – Number of TCP connections to create (100)‑d
option – Number of seconds in the testing period (30 seconds)‑R
option – Number of RPS issued by the client‑‑latency
option – Output includes corrected latency percentile information
TLS_AES_256_GCM_SHA384
cipher suite. (Because TLSv1.2 is still commonly used on the Internet, we re‑ran the tests with it as well; the results were so similar to those for TLSv1.3 that we don’t include them here.)
HAProxy: Configuration and Versioning
We provisioned HAProxy version 2.3 (stable) as the reverse proxy.
- Configuration parameters – including limits, statistics, and rates – must be defined separately for each process.
- Performance metrics are collected per‑process; combining them requires additional config, which can be quite complex.
- Each process handles health checks separately, so target servers are probed per process rather than per server as expected.
- Session persistence is not possible.
- A dynamic configuration change made via the HAProxy Runtime API applies to a single process, so you must repeat the API call for each process.
USING MULTIPLE PROCESSES IS HARDER TO DEBUG AND IS REALLY DISCOURAGED.
HAProxy introduced multi‑threading in version 1.8 as an alternative to multi‑processing. Multi‑threading mostly solves the state‑sharing problem, but as we discuss in Performance Results, in multi‑thread mode HAProxy does not perform as well as in multi‑process mode. Our HAProxy configuration included provisioning for both multi‑thread mode (HAProxy MT) and multi‑process mode (HAProxy MP). To alternate between modes at each RPS level during the testing, we commented and uncommented the appropriate set of lines and restarted HAProxy for the configuration to take effect:$ sudo service haproxy restart
Here’s the configuration with HAProxy MT provisioned: four threads are created under one process and each thread pinned to a CPU. For HAProxy MP (commented out here), there are four processes each pinned to a CPU.
global
#Multi-thread mode
nbproc 1
nbthread 4
cpu-map auto:1/1-4 0-3
#Multi-process mode
#nbproc 4
#cpu-map 1 0
#cpu-map 2 1
#cpu-map 3 2
#cpu-map 4 3
ssl-server-verify none
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
maxconn 4096
defaults
log global
option httplog
option http-keep-alive
frontend Local_Server
bind 172.31.12.25:80
bind 172.31.12.25:443 ssl crt /etc/ssl/certs/bundle-hapee.pem
redirect scheme https code 301 if !{ ssl_fc }
default_backend Web-Pool
http-request set-header Connection keep-alive
backend Web-Pool
mode http
server server1 backend.workload.1:80 check
NGINX: Configuration and Versioning
We deployed NGINX Open Source version 1.18.0 as the reverse proxy.
auto
parameter to the worker_processes
directive, which is also the setting in the default nginx.conf file distributed from our repository. Additionally, the worker_cpu_affinity
directive was included to pin each worker process to a CPU (each 1
in the second parameter denotes a CPU in the machine).
user nginx;
worker_processes auto;
worker_cpu_affinity auto 1111;
error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;
events {
worker_connections 1024;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format main ‘$remote_addr – $remote_user [$time_local] “$request” ‘
‘$status $body_bytes_sent “$http_referer” ‘
‘”$http_user_agent” “$http_x_forwarded_for”‘;
access_log /var/log/nginx/access.log main;
sendfile on;
keepalive_timeout 65;
keepalive_requests 100000;
server {
listen 443 ssl reuseport;
ssl_certificate /etc/ssl/certs/hapee.pem;
ssl_certificate_key /etc/ssl/private/hapee.key;
ssl_protocols TLSv1.3;
location / {
proxy_set_header Connection ‘ ‘;
proxy_http_version 1.1;
proxy_pass http://backend;
}
}
upstream backend {
server backend.workload.1:80;
keepalive 100;
}
}
Performance Results
With your reverse proxy acting as the front end to your application, its performance is critical. We tested each reverse proxy (NGINX, HAProxy MP, and HAProxy MT) at increasing numbers of RPS until one of them reached 100% CPU utilization. All three performed similarly at the RPS levels where CPU was not exhausted. Reaching 100% CPU utilization occurred first for HAProxy MT, at 85,000 RPS, and at that point performance worsened dramatically for both HAProxy MT and HAProxy MP. Here we present the latency percentile distribution of each reverse proxy at that load level. The chart was plotted from the output of thewrk
script using the HdrHistogram program available on GitHub.

wrk2
README) “high latency responses result in the load generator coordinating with the server to avoid measurement during high latency periods”. Luckily wrk2
corrects for coordinated omission by default (for more details about coordinated omission, see the README).
When HAProxy MT exhausts the CPU at 85,000 RPS, many requests experience high latency. They are rightfully included in the data because we are correcting for coordinated omission. It just takes one or two high‑latency requests to delay a page load and result in the perception of poor performance. Given that a real system is serving multiple users at a time, even if only 1% of requests have high latency (the value at the 99th percentile), a large proportion of users are potentially affected.