We are scarily close to realtime personalization of video which if you agree with this NeurIPS paper [1] may lead to someone inadvertently creating “digital heroin”
> We further urge the machine learning community to act proactively by establishing robust design guidelines, collaborating with public health experts, and supporting targeted policy measures to ensure responsible and ethical deployment
We’ve seen this play out before, when social media first came to prominence. I’m too old and cynical to believe anything will happen. But I really don’t know what to do about it at a person level. Even if I refuse to engage in this content, and am able to identify it, and keep my family away from it…it feels like a critical mass of people in my community/city/country are going to be engaging with it. It feels hopeless.
I tend to think that it leads to censorship, and then censorship at a broader level in the name of protecting our kids. See with social networks where you now have to give your ID card to protect kids.
The best way in that case is education of the kids / people and automatically flag potentially harmful / disgusting content and let the owner of the device set-up the level of filtering he wants.
Like with LLMs they should be somewhat neutral in default mode but they should never refuse a request if user asks.
Otherwise the line between technology provider and content moderator is too blurry, and tomorrow SV people are going to abuse of that power (or be coerced by money or politics).
At a person / parent level, time limits (like you can do with web filtering device for TikTok), content policy would solve and taking time to spend with the kids as much as possible and to talk to them so they don’t become dumber and dumber due to short videos.
But totally opposed that it should be done on public policy level: “now you have right to watch pornography but only after you give ID to prove you are adult” (this is already the case in France for example)
It can quickly become: “now to watch / generate controversial content, you have to ID”
Too stupid to even contemplate and would only serve to remove what little control the individual has left of their own computer in the name of safety.
We need to take the threat of companies wresting control of our privacy and autonomy from us a lot more seriously, and not engage with ridiculous hyperbole from “ai ethics” types.
Having the ability to do real-time video generation on a single workstation GPU is mind blowing.
I'm currently hosting a video generation website, also on a single GPU (with a queue), which is also something I didn't even think possible a few years ago (my show HN from earlier today, coincidentally: https://news.ycombinator.com/item?id=46388819). Interesting times.
Looks like there is some quality reduction, but nonetheless 2s to generate a 5s video on a 5090 for WAN 2.1 is absolutely crazy. Excited to see more optimizations like this moving into 2026.
those streams of text are often conditioned on the prompts - people are using it to learn about new concepts, and as a hyperpersonalised version of search. it can not only tell you of tools you didn't know existed, but it can show you how to use them.
I do like my buttons to stay where I left them - but that can be conditioned. instead of gnome "designers" telling me the button needs to be wide enough to hit with my left foot, I could tell the system I want this button to be small and in that corner - and add it to my prompt.
Nevertheless it does seem that generating will fairly soon become fast enough to extend a video clip in realtime. Autoregressive by the second. Integrated with a multi modal input model you would be very close to an AI avatar that would be extremely compelling.
I mean the baselines were deliberately worse and not how someone would be using these to begin with maybe noobs and the quoted number is only for DIT steps not for other encoding and decoding steps, which is actually quite high still. No actual use of FA4/Cutlass based kernels nor TRT at any point.
Now if someone could release an optimization like this for the M4 Max I would be so happy. Last time I tried generating a video it was something like an hour for a 480p 5-second clip.
[1] https://neurips.cc/virtual/2025/loc/san-diego/poster/121952
We’ve seen this play out before, when social media first came to prominence. I’m too old and cynical to believe anything will happen. But I really don’t know what to do about it at a person level. Even if I refuse to engage in this content, and am able to identify it, and keep my family away from it…it feels like a critical mass of people in my community/city/country are going to be engaging with it. It feels hopeless.
The best way in that case is education of the kids / people and automatically flag potentially harmful / disgusting content and let the owner of the device set-up the level of filtering he wants.
Like with LLMs they should be somewhat neutral in default mode but they should never refuse a request if user asks.
Otherwise the line between technology provider and content moderator is too blurry, and tomorrow SV people are going to abuse of that power (or be coerced by money or politics).
At a person / parent level, time limits (like you can do with web filtering device for TikTok), content policy would solve and taking time to spend with the kids as much as possible and to talk to them so they don’t become dumber and dumber due to short videos.
But totally opposed that it should be done on public policy level: “now you have right to watch pornography but only after you give ID to prove you are adult” (this is already the case in France for example)
It can quickly become: “now to watch / generate controversial content, you have to ID”
We need to take the threat of companies wresting control of our privacy and autonomy from us a lot more seriously, and not engage with ridiculous hyperbole from “ai ethics” types.
I'm currently hosting a video generation website, also on a single GPU (with a queue), which is also something I didn't even think possible a few years ago (my show HN from earlier today, coincidentally: https://news.ycombinator.com/item?id=46388819). Interesting times.
I actually think we are already there with quality, but nobody is going to wait 10 minutes to do a task with video that takes 2 seconds with text.
If Sora/Kling/whatever ran cool locally 24/7 at 60FPS, would anyone ever build a UI? Or a (traditional) OS?
I think it's worth watching the scaling graph.
I like my buttons to stay where I left them.
I do like my buttons to stay where I left them - but that can be conditioned. instead of gnome "designers" telling me the button needs to be wide enough to hit with my left foot, I could tell the system I want this button to be small and in that corner - and add it to my prompt.
It will never beat a GPU in parallelization, so such optimizations are not possible, or less effective anyway.