CUDA Tile Open Sourced

(github.com)

149 points | by JonChesterfield 6 days ago

8 comments

fooblaster 4 hours ago
Let's see if developers sleepwalk into another trap to keep us locked into nvidia's hardware for the next decade.
[-]
- trueismywork 13 minutes ago
  TileIR is Apache licensed so AMD can implement it as well.
- the__alchemist 2 hours ago
  IMO it's not Nvidia's fault the competing APIs are high friction.
  [-]
  - flyingcoder 2 hours ago
    AMD screwed up so badly.
    [-]
    - fooblaster 2 hours ago
      That is true, but that doesn't mean Nvidia is not engaging in engineering to intentionally kneecap competition. Triton and other languages like that are a huge threat and CUtile is a means to combat that threat and prevent a hardware abstraction layer.
    - positron26 1 hour ago
      Hundreds of thousands of developers with access to a global communication network were not stopped by AMD. Why act like dependents or wait for some bright star of consensus unless the intent is really about getting the work for free?
      We don't have to wait for singular companies or foundations to fix ecosystem problems. Only the means of coordination are needed. https://prizeforge.com isn't there yet, but it is already capable of bootstrapping its own development. Matching funds, joining the team, or contributing on MuTate will all make the ball pick up speed faster.
opan 4 hours ago
>The CUDA Tile IR project is under the Apache License v2.0 with LLVM Exceptions
boywitharupee 6 hours ago
shouldn't the title be "CUDA Tile IR Open Sourced"?
xmorse 6 hours ago
Writing this in Mojo would have been so much easier
[-]
- 3abiton 6 hours ago
  It's barely gaining adoption though. The lack of buzz is a chicken and egg issue for Mojo. I fiddled shortly with it (mainly to get it working some of my pythong scripts), and it was suprisingly easy. It'll shoot up one day for sure if Latner doesn't give up early on it.
  [-]
  - ronsor 4 hours ago
    Isn't the compiler still closed source? I and many other ML devs have no interest in a closed-source compiler. We have enough proprietary things from NVIDIA.
    [-]
    - 3abiton 1 hour ago
      Yes, but Latner said multiple time it's closed until it matures (he apparently did this with llvm and swift too). So not unusal. His open source target is end of 2026. In all fairness, I have 0 doubts that he would deliver.
    - 0x696C6961 1 hour ago
      Yeah, the mojo pitch is so good, but I don't think anyone has an appetite for the potential fuckery that comes with a closed source platform.
  - boredatoms 2 hours ago
    I feel like its in AMD/Intel/G’s interest to pile a load of effort into (an open source) mojo
- llmslave2 3 hours ago
  I really want Mojo to take off. Maybe in a few years. The lack of an stdlib holds it back more than they think, and since their focus is narrow atm it's not useful for the vast majority of work.
- bigyabai 5 hours ago
  Use-cases like this are why Mojo isn't used in production, ever. What does Nvidia gain from switching to a proprietary frontend for a compiler backend they're already using? It's a legal headache.
  Second-rate libraries like OpenCL had industry buy-in because they were open. They went through standards committees and cooperated with the rest of the industry (even Nvidia) to hear-out everyone's needs. Lattner gave up on appealing to that crowd the moment he told Khronos to pound sand. Nobody should be wondering why Apple or Nvidia won't touch Mojo with a thirty-nine and a half foot pole.
  [-]
  - xmorse 4 hours ago
    Kernels now written in Mojo were all in hand written in MLIR like in this repo. They made a full language because that's not scalable, a sane language is totally worth it. Nvidia will probably end up buying them in a few years.
    [-]
    - saagarjha 38 minutes ago
      Nobody is writing MLIR by hand, what are you on about? There are so many MLIR frontends
    - bigyabai 2 hours ago
      I don't think Nvidia would acquire Mojo when the Triton compiler is open source, optimized for Nvidia hardware and considered a industry standard.
  - oedemis 1 hour ago
    how mojo with max optimize the process?
  - itsthecourier 3 hours ago
    what about a fourty feet pole? would it be viable?
- pjmlp 5 hours ago
  It would help if they were not so much macOS and Linux focused.
  Julia, Python GPU JITs work great on Windows, and many people only get Windows systems as default at work.
  [-]
  - saagarjha 37 minutes ago
    Approximately nobody writing high performance code for AI training is using Windows. Why should they target it?
  - bigyabai 1 hour ago
    I've commissioned a board of MENSA members to devise a workaround for this issue; they've identified two potential solutions.
    1) Install Linux
    2) Summon Chris Lattner to play you a sad song on the world's smallest violin in honor of the Windows devs that refuse to install WSL.
jauntywundrkind 7 hours ago
Will be interesting to see if Nvidia and other have any interest & energy getting this used by others, if there actually is an ecosystem forming around it.
Google leading XLA & IREE, with awesome intermediate representations, used by lots of hardware platforms, and backing really excellent Jax & Pytorch implementations, having tools for layout & optinization folks can share: they really build an amazing community.
There's still so much room for planning/scheduling, so much hardware we have yet to target. RISC-V has really interesting vector instructions, for example, and it seems like there's so much exploration / work to do to better leverage that.
Nvidia has partners everywhere now. Nvlink is used by Intel, AWS Tritanium, others. Yesterday the Groq exclusive license that Nvidia paid to give to Groq?! Seeing how and when CUDA Tiles emerges: will be interesting. Moving from fabric partnerships, up up up the stack.
[-]
- pjmlp 6 hours ago
  For NVidia it suffices this is a Python JIT allowing programming CUDA compute kernels directly in Python instead of C++, yet another way how Intel and AMD, alongside Khronos APIs, lag behind in great developer experiences for GPU compute programming.
  Ah, and Nsight debugging also supports Python CUDA Tiles debugging.
  https://developer.nvidia.com/blog/simplify-gpu-programming-w...
  [-]
  - saagarjha 36 minutes ago
    Nsight does not have a debugger.
  - Q6T46nT668w6i3m 6 hours ago
    Slang is a fantastic developer experience.
    [-]
    - Conscat 2 hours ago
      I work at Nvidia, and my team is using Slang for all of our (numerous and non-trivial) kernels because its automatic differentiation type system is so nice.
    - pjmlp 5 hours ago
      Especially when using the tooling from who created it, before offering it to Khronos as GLSL replacement, NVIDIA.
- Moosdijk 6 hours ago
  > There's still so much room for planning/scheduling, so much hardware we have yet to target
  this is nicely illustrated by this recent article:
  https://news.ycombinator.com/item?id=46366998
  [-]
  - saagarjha 35 minutes ago
    Wrong type of scheduling.
- turtletontine 7 hours ago
  On the RISC-V vector instructions, could you elaborate? Are the vector extensions substantially different from those in ARM or x86?
  [-]
  - adgjlsfhk1 6 hours ago
    it's fairly similar to Arm's sve2, but very different from the x86 side in that the instructions are variable length rather than fixed
toolboxg1x0 5 hours ago
NVIDIA tensor core units, where the second column in kernel optimization is producing a test suite.
CamperBob2 6 hours ago
Fun game: see how many clicks it takes you to learn what MLIR stands for.
I lost count at five or six. Define your acronyms on first use, people.
[-]
- saagarjha 40 minutes ago
  This is a GitHub repo for compiler engineers.
  [-]
  - CamperBob2 28 minutes ago
    Cool. This is a site for hackers of all stripes.
    [-]
    - saagarjha 5 minutes ago
      Yes, so given that you clearly had trouble figuring out what it was, maybe you could have shared with the class?
- ipnon 1 hour ago
  GPU programming definitely is not beginner friendly. There's a much higher learning curve than most open source projects. To learn basic Python you need to know about definitions and loops and variables, but to learn CUDA kernels you need to know maybe an order of magnitude more concepts to write anything useful. It's just not worth the time to cater to people who don't RTFM, the README would be twice as long and be redundant to the target audience of the library.
  [-]
  - CamperBob2 27 minutes ago
    That's the whole problem. I had to "R" multiple "FMs" before one of them bothered to define the acronym.
    Stop carrying water for poor documentation practice.
- roughly 5 hours ago
  The ol’ TMA problem.
- piskov 5 hours ago
  If only there was a chat-based app that you could ask questions to.
- fragmede 6 hours ago
  I did it in three. I selected it in your comment, and then had to hit "more" to get to the menu to ask Google about it, which brought me to https://www.google.com/search?q=MLIR which says: MLIR is an open-source compiler infrastructure project developed as a sub-project of the LLVM project. Hopefully
  Get better at computers and stop needing to be spoon-fed information, people!
  [-]
  - reactordev 6 hours ago
    In this day and age, asking questions about what something is is a minefield of “just ask AI” and “You should know this”. Let’s stop putting down people who ask questions and root out those that have shitty answers.
    [-]
    - ThrowawayTestr 5 hours ago
      Google is nearly 30 years old
      [-]
      - pjmlp 5 hours ago
        And we are not counting Yahoo, Altavista, Ask Jeeves, MSN,...
    - fragmede 5 hours ago
      I get why it feels frustrating when someone snaps "just google it." Nobody likes feeling dumb. That said, there’s a meaningful difference between asking a genuine question and demanding that every discussion be padded to accommodate readers who won’t even type four letters into a search bar. Expecting complete spoon-feeding in technical threads isn’t curiosity; it’s a refusal to engage. Learning requires participation.
      [-]
      - VTimofeenko 3 hours ago
        > Learning requires participation
        I won't argue, but there is a middle ground between articles consisting of pure JAFAs and this:
        > accommodate readers who won’t even type four letters into a search bar
        I think it helps if acronyms are expanded at least once or in a footnote so that the potential new reader can follow along and does not need to guess what ACMV^ means.
        ^: Awesome Combobulating Method by VTimofeenko, patent pending.
      - reactordev 3 hours ago
        Easy, if that’s how you feel, skip the comment and don’t engage.
        Telling people who want to have that participation and discussion to “RTFM” is not a good response.
        Often you’ll come across the authors on these posts that can shed direct, 1st person evidence, of what we’re talking about.
        So please, when someone asks “what is that?” Don’t respond with “RTFM”.
      - CamperBob2 5 hours ago
        You're posting a spirited defense of substandard technical writing. Just curious -- why is that?
        [-]
        guipsp 4 hours ago
        You cannot explain everything to everyone all the time. Besides, this is not even a paper. Sometimes you are not the target audience and have to put some words into Google.
  - iaebsdfsh 5 hours ago
    From Wikipedia: The name "Multi-Level Intermediate Representation" reflects the system’s ability to model computations at various abstraction levels and progressively lower them toward machine code.
  - poita66 6 hours ago
    And yet you didn’t tell us what it stands for, just what it is. The person you’re responding to was specifically talking about finding out what it stands for