tbh "trait" feels like a very problematic name for that type, for this kind of educational purpose - `trait` is already an established concept and keyword: https://doc.rust-lang.org/book/ch10-02-traits.html
It's especially problematic because traits don't have memory behaviors like this article in most cases - by default they're unsized, because it's a description of behavior, not data, and you can't even use them as a struct field without extra work.
Like, replace "trait" in here with "box" and see how confusing it would be to be describing how you saved memory by boxing your box, because option doesn't box like many other languages do.
Are there any tools that help finding these kinds of things? Like a profiler that says "80% of the allocated bytes are objects of this type, with 95% of those having that field set to None"
It would be super useful since I think this is pretty likely to be surprising to many users. But the profiler would need to be a particularly-specific refinement of even that: you need to make it obvious that it's not "95% of your Option<Thing>s are None, and your Option<Things> are using X bytes", but that "95% of the bytes used for your Option<Thing>s are used for None versions." Otherwise you could just assume that your non-None ones are just that chunky, or you have that many of them... I haven't seen a profiler with that level of insight, unfortunately.
Perhaps because this feels like a fairly rust-specific gotcha. Especially if you're coming from languages where there's often not much syntactical distinction made between "this is a pointer because I don't want to be copying it" and "this is a pointer because it's optional."
For instance, it's not until now that I actually understood what the sibling comment about the Enum type size discrepancy lint meant: "This lint obviously cannot take the distribution of variants in your running program into account. It is possible that the smaller variants make up less than 1% of all instances, in which case the overhead is negligible and the boxing is counter-productive. Always measure the change this lint suggests." I had always accidentally read this backwards, thinking it meant something more to the effect of "if most of the instances are actually small, then it's not a problem here, but be aware that some of them are much larger so some of your calls to things with this could end up passing much larger types."
The closest I am aware of is clippy (`cargo clippy` in a standard Rust project will run it with default configurations).
Clippy is essentially a linter; and one of its checks catches cases where different enum variants have a significantly different size; with a suggestion to Box the larger variant.
Since this is just a linter, it doesn't actually have any knowledge of how frequently each variant is actually used. It also doesn't address the situation in the article at all.
I wonder from time to time whether you can decide the best “schema shape” beforehand, ie before you can run real workloads that stress the memory implications of such things. This can be very useful if you are trying to decide the boundary of some public facing API, but for whatever reason can’t run benchmarks (lack of impl, data, time, etc).
Without that, if you try to suggest a transformation like this when the schema is first conceived, it will likely be considered premature optimization.
Box<str> is still two words (length and pointer). That's better than the 3 words (length, pointer, capacity) for strings, but Box<String> is one word (not including the heap allocation).
It's especially problematic because traits don't have memory behaviors like this article in most cases - by default they're unsized, because it's a description of behavior, not data, and you can't even use them as a struct field without extra work.
Like, replace "trait" in here with "box" and see how confusing it would be to be describing how you saved memory by boxing your box, because option doesn't box like many other languages do.
Perhaps because this feels like a fairly rust-specific gotcha. Especially if you're coming from languages where there's often not much syntactical distinction made between "this is a pointer because I don't want to be copying it" and "this is a pointer because it's optional."
For instance, it's not until now that I actually understood what the sibling comment about the Enum type size discrepancy lint meant: "This lint obviously cannot take the distribution of variants in your running program into account. It is possible that the smaller variants make up less than 1% of all instances, in which case the overhead is negligible and the boxing is counter-productive. Always measure the change this lint suggests." I had always accidentally read this backwards, thinking it meant something more to the effect of "if most of the instances are actually small, then it's not a problem here, but be aware that some of them are much larger so some of your calls to things with this could end up passing much larger types."
Clippy is essentially a linter; and one of its checks catches cases where different enum variants have a significantly different size; with a suggestion to Box the larger variant.
Since this is just a linter, it doesn't actually have any knowledge of how frequently each variant is actually used. It also doesn't address the situation in the article at all.
Without that, if you try to suggest a transformation like this when the schema is first conceived, it will likely be considered premature optimization.