Home
Got Linux ?

Blah blah blah... Mostly technical thoughts, rants and gibberish


Graceful, Adaptive Throttling with HAProxy

[2019.11.08]

The other day, I was asked to come up with a solution to calm some wesbites most zealous visitors down and have them behave more friendly towards their World Wide Web fellows.

(I say “other day” to let you believe this particular strain of genius stroke me overnight rather than through the course of several weeks of tedious work)

Requirements

Well, at first, it was just “throw 429 at them jerks!”. But after some thinking, it seemed a bit harsh. In this wonderful world we live in, visitors = customers = money. We like money!

Also, in the beginning, it was just “let’s consider everyone equal!”. But that would have been so lame! Some visitors or websites aren’t that bent on contributing to our Top500 fortune. So let’s make sure these are less equal than those who are more inclined to fill our bank account!

In the end, it all came down to:

(Easy peasy!)

Contenders

I started to look at HAProxy and its exquisite rate-limiting examples… which were just too overwhelming for my feeble brain!

Then I look at NGINX and its rate-limiting module… which just made me want to ask for more.

I could have looked at Apache and its rate-limiting module… but no (it wasn’t looked at kindly enough by the websites operator).

Given the requirements, I had no choice to go back to HAProxy and prepare for blinding sessions of headache.

Design

Generally speaking, throttling comes do:

From the category point of view, one may be interested to differentiate traffic based on:

The definition of the burst and rate limits for a given category of traffic corresponds to its Service Level Objective (SLO).

(All things re-considered, not so easy peasy…)

Leveraging HAProxy Stick Tables, the idea is to:

Throttling algorithm

Throttling can be achieved by delaying requests according to:

delay = N * 1/slo * min(1, rate/slo)

Where:

Obtaining the quantity N of parallel clients is non-trivial when multiple clients concurrently share multiple throttling buckets (at least in the context of HAProxy).

However, N can be estimated by observing the actual rate of the throttled traffic.

(Bear with me!)

Let’s start with N = 1. If, despite the proper delay being applied, the actual rate is still significantly above the targeted SLO - rate/slo >> 1.0 - one can reasonably assume the number of clients for the next iteration is actually proportional to rate/slo:

N' = rate/slo

At the next iteration, the delay will be adapted according to that new estimated quantity of clients and the actual rate brought back to the target SLO; rate/slo ~ 1.0.

This approach can be generalized to re-evaluating the quantity of clients at each iteration:

N' = N * rate/slo
if N' < 1: N' = 1

The throttling algorithm will thus dynamically adapt itself to the varying quatity of clients.

Given the stick table gpt0 32-bit unsigned integer register shall be used to track the quantity of clients, we have to account for its inherent lack of numeric precision by tracking the quantity M of milli-clients:

M = N * 1000

Also, in order to improve the stability of the N' ~ N feedback loop, the feedback shall be attenuated by a factor inversely proportional - 1/N - to the quantity of clients:

N' = N * (1 + 1/N * (rate-slo)/slo)

(This comes down to in-/de-creasing the quantity of clients only by unit steps)

Finally, in order to flatten the actual rate over the sampling period and thus lessen the negative impact of its oscillations on the feedback loop, some random jitter shall be added to the throttling delay:

delay = N * 1/slo * min(1, rate/slo) * random([1-;1+])

(Wicked!)

Delaying requests vs responses

In the context of throttling, one should differentiate download from upload patterns:

Thus, in order to reduce the memory pressure on HAProxy, one ought to:

(…)

Implementation

Ready? Warning! Here be dragons!…

First have a look at the entire configuration. You’ll notice that it relies in particular on custom LUA extensions. I’ll pass on the banalities of HAProxy configuration and concentrate on the bits-‘n-pieces of the matter at hand.

Patch required

At the time of writing, haproxy (2.0.8) requires a patch to allow GPT0 to be set - sc-set-gpt0 - to a value derived from an expression rather than a constant.

Stick table

A single stick stable table is used to keep track of the various throttling variables:

# Stick table
table tb_throttling type binary len 8 size 1m expire 30s store http_req_cnt,gpc0_rate(1s),gpc1_rate(1s),gpt0

Service Level Objectives (SLO) mapping

Each category of traffic is associated with its Service Level Objective (SLO), thanks to HAProxy Maps:

# SLO mapping
# ... source+website
http-request set-var(txn.slo) lua.thr-src-add(txn.website),map_str_int('/etc/haproxy/source+website-slo.map',-1)
# [...]
# ... owner
http-request set-var(txn.slo) var(txn.owner),map_str_int('/etc/haproxy/owner-slo.map',0) if { var(txn.slo) eq -1 }

For performances reasons, each SLO is an integer ID from 0 to 5999, where the millenia digit (0 to 5) indicates its categorization type:

SLO type <-> throttling key

In order to associate each connection to its corresponding throttling bucket, according to its throttling type and using a single stick table, a “universal” key is computed based on the parameters that define it (source, website and/or owner). This key is then XXH64-hashed to minimize memory usage and guarantee uniform lookup performances.

# SLO type <-> throttling key
http-request set-var(txn.thr_key) lua.thr-key(txn.slo,txn.bucket,txn.tenant),xxh64()

(Nifty, ain’t it?!)

SLO parameters

The parameters - burst and rate - of each SLO have been defined in the configuration itself:

# SLO (burst/rate-limit)
# ... 0 (global): burst=10000, rate=5000r/s
http-request set-var(txn.slo_burst) int(10000) if { var(txn.slo) eq 0 }
http-request set-var(txn.slo_rate) int(5000) if { var(txn.slo) eq 0 }
# [...]

Those could also be stored in a corresponding map, especially if many different SLOs are to be used.

Throttling

The current state of the throttling bucket matching the established key can now be retrieved:

# Throttling (tracking/state)
http-request track-sc0 var(txn.thr_key) table pe_throttling/tb_throttling
http-request set-var(txn.thr_count) sc_http_req_cnt(0)
http-request set-var(txn.thr_rate) sc_gpc0_rate(0)
http-request set-var(txn.thr_rate_ok) sc_gpc1_rate(0)
http-request set-var(txn.thr_clients) sc_get_gpt0(0)

And used to determine if (graceful) throttling is needed, in which case we either delay download requests or upload responses:

# Throttling
http-request set-var(txn.thr_do_rate) bool(1) if { var(txn.thr_count),sub(txn.slo_burst),add(txn.slo_rate) gt 0 }
acl acl_methods_upload method PUT POST
http-request lua.thr-delay txn.slo_rate txn.thr_rate txn.thr_clients if { var(txn.thr_do_rate) -m bool } !acl_methods_upload
http-response lua.thr-delay txn.slo_rate txn.thr_rate txn.thr_clients if { var(txn.thr_do_rate) -m bool } acl_methods_upload

Now that the appropriate delay has been applied, based on the estimated quantity of clients, we can re-estimate it and update the corresponding tracking register, gpt0:

# Re-estimate the quantity of clients
http-request set-var(txn.thr_clients) lua.thr-clients(txn.slo_rate,txn.thr_rate,txn.thr_clients)
http-request sc-set-gpt0(0) var(txn.thr_clients)

Finally, we update the rate tracking counters - gpc0 and gpc1 - before and after we 429-drop requests, if the actual rate gets two times bigger than what our objective is (we must give the graceful throttling algorithm some leeway to do its job):

# Rate-limiting
http-request set-var(txn.thr_over_rate) bool(1) if { var(txn.thr_rate_ok),div(2),sub(txn.slo_rate) gt 0 }
http-request sc-inc-gpc0(0) if { var(txn.thr_do_rate) -m bool }
http-request deny deny_status 429 if { var(txn.thr_over_rate) -m bool }
http-request sc-inc-gpc1(0) if { var(txn.thr_do_rate) -m bool }

(Yeah!!!)