Time to first token is a very important performance metric, as I figured out using a Mac Studio M3 Ultra (that is quite slow on this aspect).
But 32GB for a TDP of 230W is perhaps not super interesting. Especially because you probably want to have more than one card. It's a lot of heat. You could use the cards for heating up a building, but heatpumps exist.
A lot of the TDP is reserved for running the shader units at full-power. My RTX 3070 Ti only pulls ~110w of it's 320w running CUDA inference on Gemma 26b and E4B.
It's not that it's reserving power, but rather that you hit some bottleneck on a 3070 Ti before running into thermal limits-- it's likely limited by either tensor core saturation or RAM throughput. Running the workload with Nvidia's profiling tools should make the bottleneck obvious.
Don't think that's true. The drivers are bad (not sure terrible is fair, they have improved a lot) esp for older directx etc games. But Vulkan support is pretty good and that's all you need for LLMs really.
I would like one for the vram but I am sure they will be unobtainable after the initial stock sells out as I assume they were produced before the RAM prices went up.
They'll always have iGPUs so whether or not they stay in the dGPU market depends mostly on whether or not people buy them. So they might not, whole market seems to be moving to SoCs/APUs/whatever you want to call them.
The drivers often need per game optimisations these will be missing but I doubt Intel would nerf them, just rely on you not paying a lot for RAM the game won't use.
I actually meant it in a different way. I would get it for local AI stuff, but being able to game on it would be a huge plus, otherwise I would need two different machines.
But 32GB for a TDP of 230W is perhaps not super interesting. Especially because you probably want to have more than one card. It's a lot of heat. You could use the cards for heating up a building, but heatpumps exist.
Intel looks like they'll leave the dedicated GPU space, so it's a bit doubtful if the drivers will ever catch up.
Or the makers intentionally nerf them, in order to better segment the markets/product lines?