[2019.11.19]
The other day, I was asked to come up with a solution to calm some wesbites most zealous visitors down and have them behave more friendly towards their World Wide Web fellows.
(I say “other day” to let you believe this particular strain of genius stroke me overnight rather than through the course of several weeks of tedious work)
Well, at first, it was just “throw 429 at them jerks!”. But after some thinking, it seemed a bit harsh. In this wonderful world we live in, visitors = customers = money. We like money!
Also, in the beginning, it was just “let’s consider everyone equal!”. But that would have been so lame! Some visitors or websites aren’t that bent on contributing to our Top500 fortune. So let’s make sure these are less equal than those who are more inclined to fill our bank account!
In the end, it all came down to:
Gracefully throttle the traffic down without dropping requests
Be able to throttle categories of traffic differently
(Easy peasy!)
I started to look at HAProxy and its exquisite rate-limiting examples… which were just too overwhelming for my feeble brain!
Then I look at NGINX and its rate-limiting module… which just made me want to ask for more.
I could have looked at Apache and its rate-limiting module… but no (it wasn’t looked at kindly enough by the websites operator).
Given the requirements, I had no choice to go back to HAProxy and prepare for blinding sessions of headache.
Generally speaking, throttling comes do:
multiple buckets which are used to track the requests rate for each category of traffic you’re interested in
a Time-to-Live (TTL) period - sliding window - during which the requests rate is observed for each throttling bucket
a burst count of initial requests that are able to pass through a throttling bucket unaffected (at any rate)
a target rate to gracefully slow requests down past the initial burst in a throttling bucket
ultimately dropping requests with HTTP status code 429 when not able to keep up with graceful throttling
From the category point of view, one may be interested to differentiate traffic based on:
its source (IP address)
the visited website
the visited website owner
The definition of the burst and rate limits for a given category of traffic corresponds to its Service Level Objective (SLO).
(All things re-considered, not so easy peasy…)
Leveraging HAProxy Stick Tables, the idea is to:
count the overall quantity of requests <-> burst limit;
the http_req_cnt
counter is readily available for that purpose
compute the requests rate, passed the burst limit;
we’ll use the first Global Purpose Counter (gcp0_rate
)
compute the successfully throttled (non 429-dropped) requests rate
<-> rate limit;
we’ll use the second Global Purpose Counter (gcp1_rate
)
keep track of the estimated quantity of clients sharing the same throttling
bucket (see the Throttling algorithm below);
we’ll use the Global Purpose Tag (gpt0
)
Throttling can be achieved by delaying requests according to:
delay = N * 1/slo * min(1, rate/slo)
Where:
N
is the quantity of parallel clients sharing the same throttling bucket
slo
is the target rate
1/slo
is the corresponding target delay
rate
is the actual rate
min(1, rate/slo)
is a damping factor, aiming at delaying requests only
when needed (no need to delay the requests of clients that are not
“challenging” the SLO)
Obtaining the quantity N
of parallel clients is non-trivial when multiple
clients concurrently share multiple throttling buckets (at least in the
context of HAProxy).
However, N
can be estimated by observing the actual rate of the throttled
traffic.
(Bear with me!)
Let’s start with N = 1
. If, despite the proper delay being applied, the actual
rate is still significantly above the targeted SLO - rate/slo >> 1.0
- one can
reasonably assume the number of clients for the next iteration is actually
proportional to rate/slo
:
N' = rate/slo
At the next iteration, the delay will be adapted according to that new estimated
quantity of clients and the actual rate brought back to the target SLO;
rate/slo ~ 1.0
.
This approach can be generalized to re-evaluating the quantity of clients at each iteration:
N' = N * rate/slo
if N' < 1: N' = 1
The throttling algorithm will thus dynamically adapt itself to the varying quatity of clients.
Given the stick table gpt0
32-bit unsigned integer register shall be used
to track the quantity of clients, we have to account for its inherent lack of
numeric precision by tracking the quantity M
of milli-clients:
M = N * 1000
Also, in order to improve the stability of the N' ~ N
feedback loop, the
feedback shall be attenuated by a factor inversely proportional - 1/N
- to the
quantity of clients:
N' = N * (1 + 1/N * (rate-slo)/slo)
(This comes down to in-/de-creasing the quantity of clients only by unit steps)
Finally, in order to flatten the actual rate over the sampling period and thus
lessen the negative impact of its oscillations on the feedback loop, some
random
jitter shall be added to the throttling delay:
delay = N * 1/slo * min(1, rate/slo) * random([1-;1+])
(Wicked!)
In the context of throttling, one should differentiate download from upload patterns:
download patterns typically imply small requests and large responses
upload patterns typically imply large requests and small responses
Thus, in order to reduce the memory pressure on HAProxy, one ought to:
delay requests for download traffic; GET, HEAD, etc.
delay responses for upload traffic; POST, PUT, etc.
(…)
Ready? Warning! Here be dragons!…
First have a look at the entire configuration. You’ll notice that it relies in particular on custom LUA extensions. I’ll pass on the banalities of HAProxy configuration and concentrate on the bits-‘n-pieces of the matter at hand.
haproxy < 2.1 requires a patch to allow GPT0 to be set -
sc-set-gpt0
- to a value derived from an expression rather than a constant.
This patch has been merged during the development cycle of the 2.1 version.
A single stick stable table is used to keep track of the various throttling variables:
# Stick table
table tb_throttling type binary len 8 size 1m expire 30s store http_req_cnt,gpc0_rate(1s),gpc1_rate(1s),gpt0
Each category of traffic is associated with its Service Level Objective (SLO), thanks to HAProxy Maps:
# SLO mapping
# ... source+website
http-request set-var(txn.slo) lua.thr-src-add(txn.website),map_str_int('/etc/haproxy/source+website-slo.map',-1)
# [...]
# ... owner
http-request set-var(txn.slo) var(txn.owner),map_str_int('/etc/haproxy/owner-slo.map',0) if { var(txn.slo) eq -1 }
For performances reasons, each SLO is an integer ID from 0
to 5999
,
where the millenia digit (0
to 5
) indicates its categorization type:
0 (0
to 999
): global
1 (1000
to 1999
): per source (IP address)
2 (2000
to 2999
): per website
3 (3000
to 3999
): per source and website
4 (4000
to 4999
): per owner
5 (5000
to 5999
): per source and owner
In order to associate each connection to its corresponding throttling bucket, according to its throttling type and using a single stick table, a “universal” key is computed based on the parameters that define it (source, website and/or owner). This key is then XXH64-hashed to minimize memory usage and guarantee uniform lookup performances.
# SLO type <-> throttling key
http-request set-var(txn.thr_key) lua.thr-key(txn.slo,txn.bucket,txn.tenant),xxh64()
(Nifty, ain’t it?!)
The parameters - burst and rate - of each SLO have been defined in the configuration itself:
# SLO (burst/rate-limit)
# ... 0 (global): burst=10000, rate=5000r/s
http-request set-var(txn.slo_burst) int(10000) if { var(txn.slo) eq 0 }
http-request set-var(txn.slo_rate) int(5000) if { var(txn.slo) eq 0 }
# [...]
Those could also be stored in a corresponding map, especially if many different SLOs are to be used.
The current state of the throttling bucket matching the established key can now be retrieved:
# Throttling (tracking/state)
http-request track-sc0 var(txn.thr_key) table pe_throttling/tb_throttling
http-request set-var(txn.thr_count) sc_http_req_cnt(0)
http-request set-var(txn.thr_rate) sc_gpc0_rate(0)
http-request set-var(txn.thr_rate_ok) sc_gpc1_rate(0)
http-request set-var(txn.thr_clients) sc_get_gpt0(0)
And used to determine if (graceful) throttling is needed, in which case we either delay download requests or upload responses:
# Throttling
http-request set-var(txn.thr_do_rate) bool(1) if { var(txn.thr_count),sub(txn.slo_burst),add(txn.slo_rate) gt 0 }
acl acl_methods_upload method PUT POST
http-request lua.thr-delay txn.slo_rate txn.thr_rate txn.thr_clients if { var(txn.thr_do_rate) -m bool } !acl_methods_upload
http-response lua.thr-delay txn.slo_rate txn.thr_rate txn.thr_clients if { var(txn.thr_do_rate) -m bool } acl_methods_upload
Now that the appropriate delay has been applied, based on the estimated quantity
of clients, we can re-estimate it and update the corresponding tracking
register, gpt0
:
# Re-estimate the quantity of clients
http-request set-var(txn.thr_clients) lua.thr-clients(txn.slo_rate,txn.thr_rate,txn.thr_clients)
http-request sc-set-gpt0(0) var(txn.thr_clients)
Finally, we update the rate tracking counters - gpc0
and gpc1
- before and
after we 429-drop requests, if the actual rate gets two times bigger than what
our objective is (we must give the graceful throttling algorithm some leeway to
do its job):
# Rate-limiting
http-request set-var(txn.thr_over_rate) bool(1) if { var(txn.thr_rate_ok),div(2),sub(txn.slo_rate) gt 0 }
http-request sc-inc-gpc0(0) if { var(txn.thr_do_rate) -m bool }
http-request deny deny_status 429 if { var(txn.thr_over_rate) -m bool }
http-request sc-inc-gpc1(0) if { var(txn.thr_do_rate) -m bool }
(Yeah!!!)