Limit concurrent TLS handshakes#136386
Conversation
Introduces a per-event-loop limit on the number of TLS handshakes running at once. When at the limit, subsequent TLS handshakes are delayed and processed in LIFO order.
|
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
|
Hi @DaveCTurner, I've created a changelog YAML for you. |
|
Oh well CI didn't like this very much at all :/ |
|
I think using throttler per event-loop is smart. But absolute number of in-flight-handshakes looks hard to tune. Ultimately we want handshakes do not exhaust CPU (stating obvious), hence number of handshakes derives from CPU usage. Why not to use a better metric proxy to CPU that requires less tuning? For example how much time we spent handshaking in last N seconds. I think there are too many factors that can impact number of connections: hardware, JDK version, CPU used for other needs, etc. |
It's not obvious, and indeed that's not actually the goal at all. We want to avoid the situation where we're doing work on TLS handshakes that ultimately time out, because that work is a pointless waste of CPU. I added a comment in ead6075 to describe things in more detail, and to give some justification behind the defaults I settled on. You're right that the actual numbers depend on all sorts of factors but a maximum handshake rate of 400Hz seems like a reasonable starting point. I can see some value in auto-tuning: these defaults assume we can achieve a handshake rate of at least 100Hz, but if the CPU is more than 75% busy with other things then that isn't a valid assumption. With a handshake rate lower than 100Hz we will start to hit client timeouts on the 1000 My hunch is that this isn't a case which happens (ignoring bugs that just clog up the transport worker thread which are a separate thing) and it adds quite some complexity so I'd rather wait until we see evidence it's needed before doing it. I could however be convinced that we want a different split than 1000+1000 by default. For instance if we we went with a 200+1800 split then we'd be able to cope with a 20Hz average handshake rate without any |
|
No questions about implementation and reasoning, some nits, but thats later. The 200+1800 split sounds good. But I'm still not very clear how does this solution solve our existing problems. The whole design is plausible based on assumptions of 2.5ms CPU time per handshake and transport threads would schedule more or less evenly on CPU's. It's reasonable when system not under stress, but neither is strong enough. CPU time per handshake varies a lot when CPU usage approaches limit. It would be interesting to see how you choose 2.5ms per handshake. Also our over-threading can lead to multiple transport threads would run on a single core. So throttler-per-thread might not be as good as a global throttler, especially on smaller nodes (2 CPUs, ~100 threads) My thinking about requirements. There are 2 parts: prevention and reaction. We try not over-commit, but over-commit can happen due to lots of external factors, so we need to react when overload happens. I would expect we do these:
So exposing number of connections as settings might not be a right move. There should be component that can tell how many connections right now is OK looking at surroundings, and dynamically change it. I'm afraid there is no simple solution (a single knob) that will work with our range of cases. |
|
I still think you're trying to solve a different problem here. We're not trying to shed load in order to free up CPU for other work. We're only trying to avoid doing work that we can tell in advance will turn out to be pointless. |
There was a problem hiding this comment.
Waiting for ClientHello and per-thread-throttler smells like YAGNI. It adds complexity to the code, harder to read, but benefits are not obvious unless measured in real case. My hunch if we simply admit 400(200) handshakes * transport_workers and reject others that would be good enough, accepted ones will be queued anyway. That will remove whole action-listener dance, a simple atomic counter accept or not-accept connection. I understand the theory behind current proposal, thanks for explaining on the call, I think simple counter will work about the same.
And it's easier to think about and tweak. Just a number of how many handshakes in flight. Too many? Clients see timeouts? CPU is high? reduce number. Too many rejections and CPU is running free? increase number.
| acceptChannelPredicate.setBoundAddress(boundAddress()); | ||
| } | ||
|
|
||
| tlsHandshakeThrottleManager.start(); |
There was a problem hiding this comment.
Should start before binding server? Might trip some tests in future where connection accepted before manager starts
There was a problem hiding this comment.
Starting the manager is just about registering settings handlers and metrics, and the tests all wait for everything to start before opening any connections, so I don't think it matters in practice, but conceptually makes sense so I moved it in 6f1c4d3.
| * {@link #handleHandshakeCompletion} to trigger either the processing of another handshake or a decrement of | ||
| * {@link #inProgressHandshakesCount}. | ||
| */ | ||
| private static class HandshakeCompletionWatcher extends SimpleUserEventChannelHandler<SslCompletionEvent> { |
There was a problem hiding this comment.
SslHandshakeCompletionEvent is more precise, SslCompletionEvent includes close_notify too. A follow up question from chat: does SslHandshakeCompletionEvent waits for another round-trip or fires after ClientHello?
There was a problem hiding this comment.
Thanks yes that does seem slightly better. Presumably someone could otherwise trick us into considering the handshake complete early by fabricating a pre-handshake close_notify?
There was a problem hiding this comment.
Rather future proofing. It works fine because handler is removed, but it's a loose relationship
| protected void eventReceived(ChannelHandlerContext ctx, SslCompletionEvent evt) { | ||
| ctx.pipeline().remove(HandshakeCompletionWatcher.this); | ||
| if (evt.isSuccess()) { | ||
| handshakeCompletePromise.onResponse(null); |
There was a problem hiding this comment.
should remove this handler from pipeline too?
There was a problem hiding this comment.
Isn't that what ctx.pipeline().remove(HandshakeCompletionWatcher.this); does?
| * Promise which accumulates the messages received until we receive a full handshake. Completed when we receive a full | ||
| * handshake, at which point all the delayed messages are pushed down the pipeline for actual processing. | ||
| */ | ||
| private final SubscribableListener<Void> handshakeStartedPromise = new SubscribableListener<>(); |
There was a problem hiding this comment.
I would prefer throttle manager returns a handshakeStartFuture, that will provide handshakeCompletePromise, so it's clear that when start is failed there is nothing to complete. Something like
Future<Promise<HandshakeComplete>> canStartFuture = manager.register()
canStartFuture.addListener(f -> {
if (f.isFailure())
closeChannel();
else {
promise = f.get();
channel.pipeline.add(new SslHandler()); // dont add handler until ready to proceed
channel.pipeline.addAfter(ssl, HandshakeCompleteHandler(promise));
}
})There was a problem hiding this comment.
Hmm yeah it's a little more Netty-esque to do it that way, but then should we be delaying the creation of the whole rest of the pipeline too? Would be a bit weird to come back and just insert the SslHandler in the middle of a mostly-pre-existing pipeline. Likewise should we wait for the AcceptChannelHandler to accept the channel before setting anything else up?
There was a problem hiding this comment.
Don't see problems with adding/removing handler after init phase. Ideally we should drop connection before constructing channel. Doesn't work with current solution
mhl-b
left a comment
There was a problem hiding this comment.
LGTM. My previous comment about using counter might not work and LIFO is a good choice here. Left another comment about max-in-progress.
| */ | ||
| public static final Setting<Integer> SETTING_HTTP_NETTY_TLS_HANDSHAKES_MAX_IN_PROGRESS = intSetting( | ||
| "http.netty.tls_handshakes.max_in_progress", | ||
| 1000, // See [NOTE: TLS Handshake Throttling] above |
There was a problem hiding this comment.
Can max_in_progress be greater than number of transport workers? If we assume that ClientHello produces successful handshake event, without doing round-trip, then every in-progress-handshake will be completed in the same thread execution?
Maybe use only max_delayed, without in max_in_progress?
There was a problem hiding this comment.
This is the max per transport worker.
There was a problem hiding this comment.
Well, same question. Can it be larger than 1 at any time for a worker?
| // watch for the completion of this channel's initial handshake at which point we can release one for another | ||
| // channel | ||
| .addLast( | ||
| "initial-tls-handshake-completion-watcher", | ||
| tlsHandshakeThrottle.newHandshakeCompletionWatcher(handshakeCompletePromise) | ||
| ); |
There was a problem hiding this comment.
For my education:
- Is it possible to use
SslHander#handshakeFuture()for the completion watching without the need of a separate handler? - Instead of handling the specific
ClientHellomessage, is it possible to accumulate generic bytes when there is no capacity for the TLS handling and forward them when a handshake from a different channel finishes?
Thanks!
There was a problem hiding this comment.
- in principle yes but we also have to watch for the channel closing before processing a handshake, and yet we want to remove all this machinery once the handshake completes. This is all surmountable, Netty promises do in fact support
removeListener(), but IMO it's not really any simpler than just having a single handler to remove on completion. - if we just forwarded generic bytes without first checking that they comprised a complete
ClientHellomessage, we'd have to count that as an in-flight handshake, and then it would be possible for a malicious client to consume all the available in-flight handshake slots by sending partial handshakes and never completing them.
Introduces a per-event-loop limit on the number of TLS handshakes running at once. When at the limit, subsequent TLS handshakes are delayed and processed in LIFO order.
Introduces a per-event-loop limit on the number of TLS handshakes
running at once. When at the limit, subsequent TLS handshakes are
delayed and processed in LIFO order.
Closes ES-12457