At the risk of sounding like a crustacean cult member, really hope the skeptics read this post. No hype, no drama, just slow, steady high perf incremental improvement in a crucially important area without any feet blown off.
I feel bad for other/new system languages, you get so much for the steeper learning curve with Rust (cult membership optional). And I think it’s genuinely difficult to reproduce Rust’s feature set.
I stole these graphs for a branch of that thread ffmpeg started on twitter. The one where they were flaming rav1d vs dav1d performance to attack Rust generally.
I don't like the RiiR cult. I do like smart use of a safer language and think long-term it can get better than C++ with the right work.
I'm the person who is running the rav1d bounty, also involved with the Rustls project.
In many (most?) situations I think Rust is effectively as fast as C, but it's not a given. They're close enough that depending on the situation, one can be faster than the other.
If you told me I had to make a larger and more complex piece of code fast though, I'd pick Rust. Because of the rules that the Rust compiler enforces, it's easier to have confidence in the correctness of your code and that really frees you up when you're dealing with multi-threaded or algorithmic complexity. For example, you can be more confident about how little locking you can get away with, what the minimum amount of time is that you need to keep some memory around, or what exact state is possible and thus needs to be handled at any given time.
There are some things that make rav1d performance particularly challenging. For example - unlike Rustls, which was written from the start in Rust, rav1d is largely the result of C to Rust translation. This means the code is mostly not idiomatic Rust, and the Rust compiler is generally more optimized for idiomatic Rust. How much this particular issue contributes to the performance gap I don't know, but it's a suspect and to the extent that it's worth pursuing, one would probably want to figure out where being idiomatic matters most instead of blanket rewriting everything.
For certain types of people, Rust has a way of just feeling better to use. After learning Rust I just can't imagine choosing to use C or C++ for a future project ever again. Rust is just too good.
In Rustls, TLS is implemented entirely in Rust. It uses aws-lc-rs [1] for cryptography, and aws-lc-rs uses assembly for core cryptographic routines, which are wrapped in some C code, which then exposes a Rust API which Rustls uses.
It's not practical right now to write high performance cryptographic code in a secure way (e.g. without side channels) in anything other than assembly.
I wish they included details on how they ran these benchmarks, like they did last year [1].
I'd like to take a look and try to understand why there's such a big difference in handshake performance. I wouldn't expect single threaded handshake performance to vary so much between stacks... it should be mostly limited by crypto operations. Last time, they did say something about having a cpu optimization for handshaking that the other stack might not have, but this is on a different platform and they didn't mention that.
I'd also be interested in seeing what it looks like with OpenSSL 1.1.1, given the recent article from HAProxy about difficulties with OpenSSL 3 [2]
I'm not a Rust guy and I probably won't be any time soon, but Rustls is such an exciting project in my eyes. Projects like BoringSSL are cool and noble in their intentions, but having something that's not just a hygienic codebase but an implicitly safer one feels deeply satisfying. I'm eagerly looking forward to this finding its way into production use cases.
However I tried rustls with redis for my axum application, for some reason it was not working, even though my self signed ca certificate was updated in my system's local CA store.
After a lot of try I gave up then thought about trying native tls, and it worked in first go.
The first is C bindings for the native Rustls API. This should work great for anyone who wants to use Rustls from C, but it means writing to the Rustls API.
The second is C bindings that provide OpenSSL compatibility. This only supports a subset of the OpenSSL API (enough for Nginx but not yet HAProxy, for example), so not everything that uses OpenSSL will work with the Rustls OpenSSL compatibility layer yet. We are actively improving the amount of OpenSSL API surface that we support.
Would love to see compliance and accreditation coming through for native rusttls, like FIPS. That'll unlock a large potential market, which can in turn unlock other markets.
You can get FIPS by using some of the third party back-end integration via aws-lc-rs.
The default cryptographic back-end for Rustls, aws-lc-rs, is FIPS compliant and integrated in a FIPS-compliant way so it's easy to get FIPS compliance with Rustls.
I wonder if replacing the encryption key every 6 hours would be a good use case for a crossbeam-epoch, though this may be premature optimization, and that library requires writing unsafe code as far as I can tell.
I think it is worth optimizing, there's a noticable, but small, dip in handshakes per second going from 1 to 2 threads.
If I were to optimize it, and the cycling rate is fixed and long, I would have the global storage be behind a simple Mutex, and be something like (Expiration, oldval, newval), on use, check a threadlocal copy, use it if it's not expired, otherwise lock the global, if the global is not expired, copy it to thread local. If the global is expired, generate a new one, saving the old value so that the previous generation tickets are still valid.
You can use a simple Mutex, because contention is limited to the expiration window. You could generate a new ticket secret outside the lock to reduce the time spent while locked, at the expense of generating a ticket secret that's immediately discarded for each thread except the winning thread. Not a huge difference either way, unless you cycle tickets very frequently, or run a very large number of threads.
AIUI epoch GC doesn't require Arc's atomic increment/decrement operations which can be slower than naive loads (https://codeberg.org/nyanpasu64/cachebash), but at this point we're getting into nano-optimization territory.
I feel bad for other/new system languages, you get so much for the steeper learning curve with Rust (cult membership optional). And I think it’s genuinely difficult to reproduce Rust’s feature set.
I don't like the RiiR cult. I do like smart use of a safer language and think long-term it can get better than C++ with the right work.
In many (most?) situations I think Rust is effectively as fast as C, but it's not a given. They're close enough that depending on the situation, one can be faster than the other.
If you told me I had to make a larger and more complex piece of code fast though, I'd pick Rust. Because of the rules that the Rust compiler enforces, it's easier to have confidence in the correctness of your code and that really frees you up when you're dealing with multi-threaded or algorithmic complexity. For example, you can be more confident about how little locking you can get away with, what the minimum amount of time is that you need to keep some memory around, or what exact state is possible and thus needs to be handled at any given time.
There are some things that make rav1d performance particularly challenging. For example - unlike Rustls, which was written from the start in Rust, rav1d is largely the result of C to Rust translation. This means the code is mostly not idiomatic Rust, and the Rust compiler is generally more optimized for idiomatic Rust. How much this particular issue contributes to the performance gap I don't know, but it's a suspect and to the extent that it's worth pursuing, one would probably want to figure out where being idiomatic matters most instead of blanket rewriting everything.
For certain types of people, Rust has a way of just feeling better to use. After learning Rust I just can't imagine choosing to use C or C++ for a future project ever again. Rust is just too good.
I think it’s not a good indication of the success of the language.
It's not practical right now to write high performance cryptographic code in a secure way (e.g. without side channels) in anything other than assembly.
[1] https://github.com/aws/aws-lc-rs
Regarding crypto operations, I know as of now for rust projects assembly is a must to have constant time guarantees.
Maybe there could be a way with intrinsics and a constant-time marker, similar to unsafe, to use pure rust.
In the meantime I think there still is too much C code.
It’s a great step in the good direction by the way.
https://github.com/rustls/webpki
I'd like to take a look and try to understand why there's such a big difference in handshake performance. I wouldn't expect single threaded handshake performance to vary so much between stacks... it should be mostly limited by crypto operations. Last time, they did say something about having a cpu optimization for handshaking that the other stack might not have, but this is on a different platform and they didn't mention that.
I'd also be interested in seeing what it looks like with OpenSSL 1.1.1, given the recent article from HAProxy about difficulties with OpenSSL 3 [2]
[1] https://www.memorysafety.org/blog/rustls-performance-outperf...
[2] https://www.haproxy.com/blog/state-of-ssl-stacks
https://rustls.dev/perf/2024-11-28-threading/
However I tried rustls with redis for my axum application, for some reason it was not working, even though my self signed ca certificate was updated in my system's local CA store.
After a lot of try I gave up then thought about trying native tls, and it worked in first go.
Was there no way to provide a custom CA store (that only included your self signed one)?
TLDR OpenSSL days seem to be coming to an end, but Rustls C bindings add not production ready yet.
The first is C bindings for the native Rustls API. This should work great for anyone who wants to use Rustls from C, but it means writing to the Rustls API.
The second is C bindings that provide OpenSSL compatibility. This only supports a subset of the OpenSSL API (enough for Nginx but not yet HAProxy, for example), so not everything that uses OpenSSL will work with the Rustls OpenSSL compatibility layer yet. We are actively improving the amount of OpenSSL API surface that we support.
You can get FIPS by using some of the third party back-end integration via aws-lc-rs.
If I were to optimize it, and the cycling rate is fixed and long, I would have the global storage be behind a simple Mutex, and be something like (Expiration, oldval, newval), on use, check a threadlocal copy, use it if it's not expired, otherwise lock the global, if the global is not expired, copy it to thread local. If the global is expired, generate a new one, saving the old value so that the previous generation tickets are still valid.
You can use a simple Mutex, because contention is limited to the expiration window. You could generate a new ticket secret outside the lock to reduce the time spent while locked, at the expense of generating a ticket secret that's immediately discarded for each thread except the winning thread. Not a huge difference either way, unless you cycle tickets very frequently, or run a very large number of threads.