The Bitter Lesson of LLM Extensions

(sawyerhood.com)

64 points | by sawyerjhood 4 hours ago

6 comments

mkagenius 3 hours ago
> Skills are the actualization of the dream that was set out by ChatGPT Plugins .. But I have a hypothesis that it might actually work now because the models are actually smart enough for it to work.
and earlier Simon Willison argued[1] that Skills are even bigger deal than MCP.
But I do not see as much hype for Skills as it was for MCP - it seems people are in the MCP "inertia" and having no time to shift to Skills.
1. https://simonwillison.net/2025/Oct/16/claude-skills/
[-]
- sawyerjhood 23 minutes ago
  I agree with you. I don't see people hyping them and I think a big part of this is that we have sort of hit an LLM fatigue point right now. Also Skills require that your agent can execute arbitrary code which is a bigger buy-in cost if your app doesn't have this already.
- zby 2 hours ago
  I still don't get what is special about the skills directory - since like forever I instructed Claud Code - "please read X and do Y" - how skills are different from that?
  [-]
  - simonw 1 hour ago
    They're not. They are just a formalization of that pattern, with a very tiny extra feature where the model harness scans that folder on startup and loads some YAML metadata into the system prompt so it knows which ones to read later on.
  - mkagenius 2 hours ago
    The difference is that the code in the directory (and the markdown) are hardcoded and known to work beforehand.
    [-]
    - munk-a 32 minutes ago
      But we are still reliant on the LLM correctly interpreting the choice to pick the right skill. So "known to work" should be understood in the very limited context of "this sub-function will do what it was designed to do reliably" rather than "if the user asks to use this sub-function it will do was it was designed to do reliably".
      Skills feel like a non-feature to me. It feels more valuable to connect a user to the actual tool and let them familiarize themselves with it (and not need the LLM to find it in the future) rather than having the tool embedded in the LLM platform. I will carve out a very big exception of accessibility here - I love my home device being an egg timer - it's a wonderful egg timer (when it doesn't randomly play music) and I could buy an egg timer but having a hands-free egg timer is actually quite valuable to me while cooking. So I believe there is real value in making these features accessible through the LLM over media that the feature would normally be difficult to use in.
  - bavell 2 hours ago
    Not really special, just officially supported and I'm guessing how best to use it baked in via RL. Claude already knows how skills work vs learning your own home-rolled solution.
- fzysingularity 2 hours ago
  I definitely see the value and versatility of Claude Skills (over what MCP is today), but I find the sandboxed execution to be painfully inefficient.
  Even if we expect the LLMs to fully resolve the task, it'll heavily rely on I/O and print statements sprinkled across the execution trace to get the job done.
  [-]
  - mkagenius 2 hours ago
    > but I find the sandboxed execution to be painfully inefficient
    sandbox is not mandatory here. You can execute the skills on your host machine too (with some fidgeting) but it's a good practice and probably for the better to get in to the habit of executing code in an isolated environment for security purposes.
    [-]
    - munk-a 29 minutes ago
      The better practice is, if it isn't a one-off, being introduced to the tool (perhaps by an LLM) and then just running the tool yourself with structured inputs when it is appropriate. I think the 2015 era novice coding habit of copying a blob of twenty shell scripts off of stack overflow and blindly running them in your terminal (while also not good for obvious reasons) was better than that essentially happening but you not being able to watch and potentially learn what those commands were.
      [-]
      - fzysingularity 23 minutes ago
        I do think that if the agents can successfully resolve these tasks in a code execution environment, it can likely come up with better parametrized solutions with structured I/O - assuming these are workflows we want to run over and over again.
- robot-wrangler 2 hours ago
  Skills are like the "end-user" version of MCP at best, where MCP is for people building systems. Any other point of view raises a lot of questions.
  Aren't skills really just a collection of tagged MCP prompts, config resources, and tools, except with more lock-in since only Claude can use it? About that "agent virtual environment" that runs the scripts.. how is it customized, and.. can it just be a container? Aren't you going to need to ship/bundle dependencies for the tools/libraries those skills require/reference, and at that point why are we avoiding MCP-style docker/npx/uvx again?
  Other things that jump out are that skills are supposed to be "composable", yet afaik it's still the case that skills may not explicitly reference other skills. Huge limiting factors IMHO compared to MCP servers that can just use boring inheritance and composition with, you know, programming languages, or composition/grouping with namespacing and such at the server layer. It's unclear how we're going to extend skills, require skills, use remote skills, "deploy" reusable skills etc etc, and answering all these questions gets us most of the way back to MCP!
  That said, skills do seem like a potentially useful alternate "view" on the same data/code that MCP is covering. If it really catches on, maybe we'll see skill-to-MCP converters for serious users that want to be able do the normal stuff (like scaling out, testing in isolation, doing stuff without being completely attached to the claude engine forever). Until there's interoperability I personally can't see getting interested though
- CuriouslyC 2 hours ago
  Skills do something you could already do with folder level readme files and hyperlinks inside source, but in a vendor-locked-in way. Not a fan.
  [-]
  - pluralmonad 2 hours ago
    They are just text files though. I'm sensitive to vendor lock-in and do not perceive a standard folder structure and bare text files to be that.
    [-]
    - bavell 2 hours ago
      Yeah, the reason I like Skills better than MCP is specifically because skills are just plain text.
  - mkagenius 2 hours ago
    It's definitely not vendor locked. For instance, I have made it work with Gemini with Open-Skills[1].
    It is after all a collection of instructions and code that any other llm can read and understand and then do a code execution (via tool call / mcp call)
    1. Open-Skills: https://github.com/BandarLabs/open-skills
Der_Einzige 1 minute ago
Zero discussion around LLM sampling. How do you leave such a gaping hole in such a written piece? I know it's not AI cus AI wouldn't be that sloppy.
j2kun 1 hour ago
I don't see how "they improved the models" is related to the bitter lesson. You are still injecting human-level expertise (whether it is by prompts or a structured API) to compensate for the model's failures. A "bitter lesson" would be that the model can do better without any injection, but more compute power, than it could with human interference.
[-]
- idle_zealot 1 hour ago
  > A "bitter lesson" would be that the model can do better without any injection, but more compute power, than it could with human interference.
  This is what I expected the post to be about before clicking.
zby 1 hour ago
I believe that what we need is treating prompts as stochastic programs and using a special shell for calling them. Claude Code and Codex and other coding agents are like that - now everybody understands that they are not just coding assistants they are a general shell that can use LLM for executing specs. I would like to have this extracted from IDE tools - this is what I am working on in llm-do.
dsign 3 hours ago
I don't know, even ChatGPT 5.1 hallucinates API's that don't exist, though it's a step forward in that it also hallucinates the non existence of APIs that exist.
But I reckon that every time that humans have been able to improve their information processing in any way, the world has changed. Even if all we get is to have an LLM be right more times than it is wrong, the world will change again.
vessenes 3 hours ago
```
  > "If I could short MCP, I would"
```
I mean, MCP is hard to work with. But there's a very large set of things that we want a hardened interface to out there - if not MCP, it will be something very like it. In particular, MCP was probably overly complicated at the design phase to deal with the realities of streaming text / tokens back and forth live. That is, it chose not to abstract these realities in exchange for some nice features, and we got a lot of implementation complexity early.
To quote the Systems Bible, any working complex system is only the result of the growth of a working simple system -- MCP seems to me to be right on the edge of what you'd define as a "working simple system" -- but to the extent it's all torn down for something simpler, that thing will inevitably evolve to allow API specifications, API calls, and streaming interaction modes.
Anyway, I'm "neutral" on MCP, which is to say I don't love it. But I don't have a better system in mind, and crucially, because these models still need fine-tuning to deal properly with agent setups, I think it's likely here to stay.
[-]
- robot-wrangler 2 hours ago
  I always see the hard/complex criticism but find it confusing.. what is the perceived difficulty with MCP at the implementation level? (I do understand the criticism about exhausting tokens with tool-descriptions and stuff, but that's a different challenge)
  Doesn't seem like implementation could be more simple. Just JSON-RPC and API stuff. For example the MCP hello-world with python and FastMCP is practically 1-to-1 with a http/web flavored hello-world in flask
  [-]
  - vessenes 27 minutes ago
    There is a LOT under the surface. custom routes, bidirectional streaming choices (it started as a "local first" protocol). Implementing an endpoint from scratch is not easy, and the spec documentation moves very quickly, and generally doesn't have simple-to-digest updates for implementation.
    I haven't looked in a few months, so my information might be a bit out of date, but at the time - if you wanted to use a python server from the modelcontextprotocol GitHub, fine. If you wanted to, say, build a proxy server in rust or golang, you were looking at a set of half-implemented server implementations targeting two-versions-old MCP specs while clients like claude obscure even which endpoints they use for discovery.
    It's an immature spec, moderately complicated, and moving really quickly with only a few major 'subscribers' to the server side; I found it challenging to work with.
- zby 2 hours ago
  MCP is another middleware story - this always fails (hat tip Benetict Evans).