Graceful, Adaptive Throttling with HAProxy

[2019.11.19]

The other day, I was asked to come up with a solution to calm some wesbites most zealous visitors down and have them behave more friendly towards their World Wide Web fellows.

(I say “other day” to let you believe this particular strain of genius stroke me overnight rather than through the course of several weeks of tedious work)

Requirements

Well, at first, it was just “throw 429 at them jerks!”. But after some thinking, it seemed a bit harsh. In this wonderful world we live in, visitors = customers = money. We like money!

Also, in the beginning, it was just “let’s consider everyone equal!”. But that would have been so lame! Some visitors or websites aren’t that bent on contributing to our Top500 fortune. So let’s make sure these are less equal than those who are more inclined to fill our bank account!

In the end, it all came down to:

Gracefully throttle the traffic down without dropping requests
Be able to throttle categories of traffic differently

(Easy peasy!)

Contenders

I started to look at HAProxy and its exquisite rate-limiting examples… which were just too overwhelming for my feeble brain!

Then I look at NGINX and its rate-limiting module… which just made me want to ask for more.

I could have looked at Apache and its rate-limiting module… but no (it wasn’t looked at kindly enough by the websites operator).

Given the requirements, I had no choice to go back to HAProxy and prepare for blinding sessions of headache.

Design

Generally speaking, throttling comes do:

multiple buckets which are used to track the requests rate for each category of traffic you’re interested in
a Time-to-Live (TTL) period - sliding window - during which the requests rate is observed for each throttling bucket
a burst count of initial requests that are able to pass through a throttling bucket unaffected (at any rate)
a target rate to gracefully slow requests down past the initial burst in a throttling bucket
ultimately dropping requests with HTTP status code 429 when not able to keep up with graceful throttling

From the category point of view, one may be interested to differentiate traffic based on:

its source (IP address)
the visited website
the visited website owner

The definition of the burst and rate limits for a given category of traffic corresponds to its Service Level Objective (SLO).

(All things re-considered, not so easy peasy…)

Leveraging HAProxy Stick Tables, the idea is to:

count the overall quantity of requests <-> burst limit; the http_req_cnt counter is readily available for that purpose
compute the requests rate, passed the burst limit; we’ll use the first Global Purpose Counter (gcp0_rate)
compute the successfully throttled (non 429-dropped) requests rate <-> rate limit; we’ll use the second Global Purpose Counter (gcp1_rate)
keep track of the estimated quantity of clients sharing the same throttling bucket (see the Throttling algorithm below); we’ll use the Global Purpose Tag (gpt0)

Throttling algorithm

Throttling can be achieved by delaying requests according to:

delay = N * 1/slo * min(1, rate/slo)

Where:

N is the quantity of parallel clients sharing the same throttling bucket
slo is the target rate
1/slo is the corresponding target delay
rate is the actual rate
min(1, rate/slo) is a damping factor, aiming at delaying requests only when needed (no need to delay the requests of clients that are not “challenging” the SLO)

Obtaining the quantity N of parallel clients is non-trivial when multiple clients concurrently share multiple throttling buckets (at least in the context of HAProxy).

However, N can be estimated by observing the actual rate of the throttled traffic.

(Bear with me!)

Let’s start with N = 1. If, despite the proper delay being applied, the actual rate is still significantly above the targeted SLO - rate/slo >> 1.0 - one can reasonably assume the number of clients for the next iteration is actually proportional to rate/slo:

N' = rate/slo

At the next iteration, the delay will be adapted according to that new estimated quantity of clients and the actual rate brought back to the target SLO; rate/slo ~ 1.0.

This approach can be generalized to re-evaluating the quantity of clients at each iteration:

N' = N * rate/slo
if N' < 1: N' = 1

The throttling algorithm will thus dynamically adapt itself to the varying quatity of clients.

Given the stick table gpt0 32-bit unsigned integer register shall be used to track the quantity of clients, we have to account for its inherent lack of numeric precision by tracking the quantity M of milli-clients:

M = N * 1000

Also, in order to improve the stability of the N' ~ N feedback loop, the feedback shall be attenuated by a factor inversely proportional - 1/N - to the quantity of clients:

N' = N * (1 + 1/N * (rate-slo)/slo)

(This comes down to in-/de-creasing the quantity of clients only by unit steps)

Finally, in order to flatten the actual rate over the sampling period and thus lessen the negative impact of its oscillations on the feedback loop, some random jitter shall be added to the throttling delay:

delay = N * 1/slo * min(1, rate/slo) * random([1-;1+])

(Wicked!)

Delaying requests vs responses

In the context of throttling, one should differentiate download from upload patterns:

download patterns typically imply small requests and large responses
upload patterns typically imply large requests and small responses

Thus, in order to reduce the memory pressure on HAProxy, one ought to:

delay requests for download traffic; GET, HEAD, etc.
delay responses for upload traffic; POST, PUT, etc.

(…)

Implementation

Ready? Warning! Here be dragons!…

First have a look at the entire configuration. You’ll notice that it relies in particular on custom LUA extensions. I’ll pass on the banalities of HAProxy configuration and concentrate on the bits-‘n-pieces of the matter at hand.

Patch required

haproxy < 2.1 requires a patch to allow GPT0 to be set - sc-set-gpt0 - to a value derived from an expression rather than a constant. This patch has been merged during the development cycle of the 2.1 version.

Stick table

A single stick stable table is used to keep track of the various throttling variables:

# Stick table
table tb_throttling type binary len 8 size 1m expire 30s store http_req_cnt,gpc0_rate(1s),gpc1_rate(1s),gpt0

Service Level Objectives (SLO) mapping

Each category of traffic is associated with its Service Level Objective (SLO), thanks to HAProxy Maps:

# SLO mapping
# ... source+website
http-request set-var(txn.slo) lua.thr-src-add(txn.website),map_str_int('/etc/haproxy/source+website-slo.map',-1)
# [...]
# ... owner
http-request set-var(txn.slo) var(txn.owner),map_str_int('/etc/haproxy/owner-slo.map',0) if { var(txn.slo) eq -1 }

For performances reasons, each SLO is an integer ID from 0 to 5999, where the millenia digit (0 to 5) indicates its categorization type:

0 (0 to 999): global
1 (1000 to 1999): per source (IP address)
2 (2000 to 2999): per website
3 (3000 to 3999): per source and website
4 (4000 to 4999): per owner
5 (5000 to 5999): per source and owner

SLO type <-> throttling key

In order to associate each connection to its corresponding throttling bucket, according to its throttling type and using a single stick table, a “universal” key is computed based on the parameters that define it (source, website and/or owner). This key is then XXH64-hashed to minimize memory usage and guarantee uniform lookup performances.

# SLO type <-> throttling key
http-request set-var(txn.thr_key) lua.thr-key(txn.slo,txn.bucket,txn.tenant),xxh64()

(Nifty, ain’t it?!)

SLO parameters

The parameters - burst and rate - of each SLO have been defined in the configuration itself:

# SLO (burst/rate-limit)
# ... 0 (global): burst=10000, rate=5000r/s
http-request set-var(txn.slo_burst) int(10000) if { var(txn.slo) eq 0 }
http-request set-var(txn.slo_rate) int(5000) if { var(txn.slo) eq 0 }
# [...]

Those could also be stored in a corresponding map, especially if many different SLOs are to be used.

Throttling

The current state of the throttling bucket matching the established key can now be retrieved:

# Throttling (tracking/state)
http-request track-sc0 var(txn.thr_key) table pe_throttling/tb_throttling
http-request set-var(txn.thr_count) sc_http_req_cnt(0)
http-request set-var(txn.thr_rate) sc_gpc0_rate(0)
http-request set-var(txn.thr_rate_ok) sc_gpc1_rate(0)
http-request set-var(txn.thr_clients) sc_get_gpt0(0)

And used to determine if (graceful) throttling is needed, in which case we either delay download requests or upload responses:

# Throttling
http-request set-var(txn.thr_do_rate) bool(1) if { var(txn.thr_count),sub(txn.slo_burst),add(txn.slo_rate) gt 0 }
acl acl_methods_upload method PUT POST
http-request lua.thr-delay txn.slo_rate txn.thr_rate txn.thr_clients if { var(txn.thr_do_rate) -m bool } !acl_methods_upload
http-response lua.thr-delay txn.slo_rate txn.thr_rate txn.thr_clients if { var(txn.thr_do_rate) -m bool } acl_methods_upload

Now that the appropriate delay has been applied, based on the estimated quantity of clients, we can re-estimate it and update the corresponding tracking register, gpt0:

# Re-estimate the quantity of clients
http-request set-var(txn.thr_clients) lua.thr-clients(txn.slo_rate,txn.thr_rate,txn.thr_clients)
http-request sc-set-gpt0(0) var(txn.thr_clients)

Finally, we update the rate tracking counters - gpc0 and gpc1 - before and after we 429-drop requests, if the actual rate gets two times bigger than what our objective is (we must give the graceful throttling algorithm some leeway to do its job):

# Rate-limiting
http-request set-var(txn.thr_over_rate) bool(1) if { var(txn.thr_rate_ok),div(2),sub(txn.slo_rate) gt 0 }
http-request sc-inc-gpc0(0) if { var(txn.thr_do_rate) -m bool }
http-request deny deny_status 429 if { var(txn.thr_over_rate) -m bool }
http-request sc-inc-gpc1(0) if { var(txn.thr_do_rate) -m bool }

(Yeah!!!)