OpenAI, Google and Anthropic are struggling to build more advanced AI

(bloomberg.com)

364 points | by lukebennett 1 day ago

69 comments

thebigspacefuck 1 day ago
https://archive.ph/2024.11.13-100709/https://www.bloomberg.c...
LASR 6 hours ago
Question for the group here: do we honestly feel like we've exhausted the options for delivering value on top of the current generation of LLMs?
I lead a team exploring cutting edge LLM applications and end-user features. It's my intuition from experience that we have a LONG way to go.
GPT-4o / Claude 3.5 are the go-to models for my team. Every combination of technical investment + LLMs yields a new list of potential applications.
For example, combining a human-moderated knowledge graph with an LLM with RAG allows you to build "expert bots" that understand your business context / your codebase / your specific processes and act almost human-like similar to a coworker in your team.
If you now give it some predictive / simulation capability - eg: simulate the execution of a task or project like creating a github PR code change, and test against an expert bot above for code review, you can have LLMs create reasonable code changes, with automatic review / iteration etc.
Similarly there are many more capabilities that you can ladder on and expose into LLMs to give you increasingly productive outputs from them.
Chasing after model improvements and "GPT-5 will be PHD-level" is moot imo. When did you hire a PHD coworker and they were productive on day-0 ? You need to onboard them with human expertise, and then give them execution space / long-term memories etc to be productive.
Model vendors might struggle to build something more intelligent. But my point is that we already have so much intelligence and we don't know what to do with that. There is a LOT you can do with high-schooler level intelligence at super-human scale.
Take a naive example. 200k context windows are now available. Most people, through ChatGPT, type out maybe 1500 tokens. That's a huge amount of untapped capacity. No human is going to type out 200k of context. Hence why we need RAG, and additional forms of input (eg: simulation outcomes) to fully leverage that.
[-]
- afro88 5 hours ago
  > potential applications > if you ... > for example ...
  Yes there seems to be lots of potential. Yes we can brainstorm things that should work. Yes there is a lot of examples of incredible things in isolation. But it's a little bit like those youtube videos showing amazing basketball shots in 1 try, when in reality lots of failed attempts happened beforehand. Except our users experience the failed attempts (LLM replies that are wrong, even when backed by RAG) and it's incredibly hard to hide those from them.
  Show me the things you / your team has actually built that has decent retention and metrics concretely proving efficiency improvements.
  LLMs are so hit and miss from query to query that if your users don't have a sixth sense for a miss vs a hit, there may not be any efficiency improvement. It's a really hard problem with LLM based tools.
  There is so much hype right now and people showing cherry picked examples.
  [-]
  - jihadjihad 5 hours ago
    > Except our users experience the failed attempts (LLM replies that are wrong, even when backed by RAG) and it's incredibly hard to hide those from them.
    This has been my team's experience (and frustration) as well, and has led us to look at using LLMs for classifying / structuring, but not entrusting an LLM with making a decision based on things like a database schema or business logic.
    I think the technology and tooling will get there, but the enormous amount of effort spent trying to get the system to "do the right thing" and the nondeterministic nature have really put us into a camp of "let's only allow the LLM to do things we know it is rock-solid at."
    [-]
    - sdesol 4 hours ago
      > "let's only allow the LLM to do things we know it is rock-solid at."
      Even this is insanely hard in my opinion. The one thing that you would assume LLM to excel at is spelling and grammar checking for the English language, but even the top model (GPT-4o) can be insanely stupid/unpredictable at times. Take the following example from my tool:
      https://app.gitsense.com/?doc=6c9bada92&model=GPT-4o&samples...
      5 models are asked if the sentence is correct and GPT-4o got it wrong all 5 times. It keeps complaining that GitHub is spelled like Github, when it isn't. Note, only 2 weeks ago, Claude 3.5 Sonnet did the same thing.
      I do believe LLM is a game changer, but I'm not convinced it is designed to be public-facing. I see LLM as a power tool for domain experts, and you have to assume whatever it spits out may be wrong, and your process should allow for it.
      Edit:
      I should add that I'm convinced that not one single model will rule them all. I believe there will be 4 or 5 models that everybody will use and each will be used to challenge one another for accuracy and confidence.
      [-]
      - vidarh 1 hour ago
        I do contract work on fine-tuning efforts, and I can tell you that most humans aren't designed to be public-facing either.
        While LLMs do plenty of awful things, people make the most incredibly stupid mistakes too, and that is what LLMs needs to be benchmarked against. The problem is that most of the people evaluating LLMs are better educated than most and often smarter than most. When you see any quantity of prompts input by a representative sample of LLM losers, you quickly lose all faith in humanity.
        I'm not saying LLMs are good enough. They're not. But we will increasingly find that there are large niches where LLMs are horrible and error prone yet still outperform the people companies are prepared to pay to do the task.
        In other words, on one hand you'll have domain experts becoming expert LLM-wranglers. On the other hand you'll have public-facing LLMs eating away at tasks done by low paid labour where people can work around their stupid mistakes with process or just accepting the risk, same as they currently do with undertrained labor.
      - SimianSci 3 hours ago
        > "I see LLM as a power tool for domain experts, and you have to assume whatever it spits out may be wrong, and your process should allow for it."
        this gets to the heart of it for me. I think LLMs are an incredible tool, providing advanced augmentation on our already developed search capabilities. What advanced user doesnt want to have a colleague they can talk about their specific domain capacity with?
        The problem comes from the hyperscaling ambitions of the players who were the first in this space. They quickly hyped up the technology beyond want it should have been.
      - larodi 2 hours ago
        Those Apple engineers stated in a very clear tone:
        - every time a different result is produced.
        - no reasoning capabilities were categorically determined.
        So this is it. If you want LLM - brace for different results and if this is okay for your application (say it’s about speech or non-critical commands) then off you are.
        Otherwise simply forget this approach, and particularly when you need reproducible discreet results.
        I don’t think it gets any better than that and nothing so far implicated it will (with this particular approach to AGI or whatever the wet dream is)
        [-]
        rco8786 1 hour ago
        There’s another option here though. Human supervised tasks.
        There’s a whole classification of tasks where a human can look at a body of work and determine whether it’s correct or not in far less time than it would take for them to produce the work directly.
        As a random example, having LLMs write unit tests.
        verteu 2 hours ago
        (for reference: https://arxiv.org/pdf/2410.05229 )
        marcellus23 2 hours ago
        > Those Apple engineers
        Which Apple engineers? Yours is the only reference to the company in this comment section or in the article.
        [-]
        Agingcoder 1 hour ago
        See arxiv paper just above
      - kristianp 1 hour ago
        > It keeps complaining that GitHub is spelled like Github, when it isn't
        I feel like this is unfair. That's the only thing it got wrong? But we want it to pass all of our evals, even ones the perhaps a dictionary would be better at solving? Or even an LLM augmented with a dictionary.
      - malfist 2 hours ago
        I was using an LLM to help spot passive voice in my documents and it told me "We're making" was passive and I should change it to "we are making" to make it active.
        Leaving aside "we're" and "we are" are the same, it is absolutely active voice
        [-]
        sdesol 1 hour ago
        In the process of developing my tool, there are only 5 models (the first 5 in my models dropdown list) that I would use as a writing aide. If you used any other model, it really is a crapshoot with how bad they can be.
  - archiepeach 3 hours ago
    To be fair in the human-based teams I've worked with in startups I couldn't show you products with decent retention.
  - VeejayRampay 4 hours ago
    really agree with this and I think it's been the general experience: people wanting LLMs to be so great (or making money off them) kind of cherry picking examples that fit their narrative, which LLMs are good at because they produce amazing results some of the time like the deluxe broken clock that they are (they're right many many times a day)
    at the end of the day though, it's not exactly reliable or particularly transformative when you get past the party tricks
- crystal_revenge 6 hours ago
  I don't think we've even started to get the most value out of current gen LLMs. For starters very few people are even looking at sampling which is a major part of the model performance.
  The theory behind these models so aggressively lags the engineering that I suspect there are many major improvements to be found just by understanding a bit more about what these models are really doing and making re-designs based on that.
  I highly encourage anyone seriously interested in LLMs to start spending more time in the open model space where you can really take a look inside and play around with the internals. Even if you don't have the resources for model training, I feel personally understanding sampling and other potential tweaks to the model (lots of neat work on uncertainty estimations, manipulating the initial embedding the prompts are assigned, intelligent backtracking, etc).
  And from a practical side I've started to realize that many people have been holding on of building things waiting for "that next big update", but there a so many small, annoying tasks that can be easily automated.
  [-]
  - dr_kiszonka 10 minutes ago
    Would you have any suggestions on how to play with the internals of these open models? I don't understand LLMs well, and would love to spend some experimenting, but I don't know where to start. Are any projects more appropriate for neophytes?
  - ppeetteerr 4 hours ago
    The reason people are holding out is that the current generation of models are still pretty poor in many areas. You can have it craft an email, or to review your email, but I wouldn't trust an LLM with anything mission-critical. The accuracy of the generated output is too low be trusted in most practical applications.
    [-]
    - saalweachter 2 hours ago
      Any email you trust an LLM to write is one you probably don't need to send.
      [-]
      - Tagbert 1 hour ago
        Glib but the reality is that there are lots of cases where you can use an AI in writing but don’t need to entrust it with the whole job blindly.
        I mostly use AIs in writing as a glorified grammar checker that sometimes suggests alternate phrasing. I do the initial writing and send it to an AI for review. If I like the suggestions I may incorporate some. Others I ignore.
        The only times I use it to write is when I have something like a status report and I’m having a hard time phrasing things. Then I may write a series of bullet points and send that through an AI to flesh it out. Again, that is just the first stage and I take that and do editing to get what I want.
        It’s just a tool, not a creator.
  - dr_dshiv 5 hours ago
    > I've started to realize that many people have been holding on of building things waiting for "that next big update"
    I’ve noticed this too — I’ve been calling it intellectual deflation. By analogy, why spend now when it may be cheaper in a month? Why do the work now, when it will be easier in a month?
    [-]
    - vbezhenar 5 hours ago
      Why optimise software today, when tomorrow Intel will release CPU with 2x performance?
      [-]
      - ben_w 5 hours ago
        Back when Intel regularly gave updates with 2x performance increases, people did make decisions based on the performance doubling schedule.
      - sdenton4 5 hours ago
        Curiously, Moore's law was predictable enough over decades that you could actually plan for the speed of next year's hardware quite reliably.
        For LLMs, we don't even know how to reliably measure performance, much less plan for expected improvements.
        [-]
        mikeyouse 5 hours ago
        Moores law became less of a prediction and more of a product road map as time went on. It helped coordinate investment and expectations across the entire industry so everyone involved had the same understanding of timelines and benchmarks. I fully believe more investment would’ve ‘bent the curve’ of the trend line but everyone was making money and there wasn’t a clear benefit to pushing the edge further.
        [-]
        epicureanideal 4 hours ago
        Or maybe it pushed everyone to innovate faster than they otherwise would’ve? I’m very interested to hear your reasoning for the other case though, and I am not strongly committed to the opposite view, or either view for that matter.
      - throwing_away 5 hours ago
        Call Nvidia, that sounds like a job for AI.
    - jkaptur 4 hours ago
      https://en.wikipedia.org/wiki/Osborne_effect
  - creativenolo 4 hours ago
    Great & motivational comment. Any pointers on where to start playing with the internals and sampling?
    Doesn’t need to be comprehensive, I just don’t know where to jump off from.
  - kozikow 3 hours ago
    > "The theory behind these models so aggressively lags the engineering"
    The problem is that 99% of theories are hard to scale.
    I am not an expert, as I work adjacent to this field, but I see the inverse - dumbing down theory to increase parallelism/scalability.
  - creativenolo 3 hours ago
    > holding on of building things waiting for "that next big update", but there a so many small, annoying tasks that can be easily automated.
    Also we only hear / see the examples that are meant to scale. Startups typically offer up something transformative, ready to soak up a segment of a market. And that’s hard with the current state of LLMs. When you try their offerings, it’s underwhelming. But there is richer, more nuanced hard to reach fruits that are extremely interesting - but it’s not clear where they’d scale in and of themselves.
  - deegles 4 hours ago
    My big question is what is being done about hallucination? Without a solution it's a giant footgun.
  - dheera 1 hour ago
    Exactly, I think the current crop of models is capable of solving a lot of non-first-world problems. Many of them don't need full AGI to solve, especially if we start thinking outside Silicon Valley.
- senko 5 hours ago
  No.
  The scaling laws may be dead. Does this mean the end of LLM advances? Absolutely not.
  There are many different ways to improve LLM capabilities. Everyone was mostly focused on the scaling laws because that worked extremely well (actually surprising most of the researchers).
  But if you're keeping an eye on the scientific papers coming out about AI, you've seen the astounding amount of research going on with some very good results, that'll probably take at least several months to trickle down to production systems. Thousands of extremely bright people in AI labs all across the world are working on finding the next trick that boosts AI.
  One random example is test-time compute: just give the AI more time to think. This is basically what O1 does. A recent research paper suggests using it is roughly equivalent to an order of magnitude more parameters, performance wise. (source for the curious: https://lnkd.in/duDST65P)
  Another example that sounds bonkers but apparently works is quantization: reducing the precision of each parameter to 1.58 bits (ie only using values -1, 0, 1). This uses 10x less space for the same parameter count (compared to standard 16-bit format), and since AI operatons are actually memory limited, directly corresponds to 10x decrease in costs: https://lnkd.in/ddvuzaYp
  (Quite apart from improvements like these, we shouldn't forget that not all AIs are LLMs. There's been tremendous advance in AI systems for image, audio and video generation, interpretation and munipulation and they also don't show signs of stopping, and there's possibility that a new or hybrid architecture for the textual AI might be developed).
  AI winter is a long way off.
  [-]
  - limaoscarjuliet 5 hours ago
    Scaling laws are not dead. The number of people predicting death of Moore's law doubles every two years.
    - Jim Keller
    https://www.youtube.com/live/oIG9ztQw2Gc?si=oaK2zjSBxq2N-zj1...
    [-]
    - nyrikki 4 hours ago
      There are way too many personal definitions of what "Moore's Law" even is to have a discussion without deciding on a shared definition before hand.
      But Goodhart's law; "When a measure becomes a target, it ceases to be a good measure"
      Directly applies here, Moore's Law was used to set long term plans at semiconductor companies, and Moore didn't have empirical evidence it was even going to continue.
      If you say, arbitrarily decide CPU, or worse, single core performance as your measurement, it hasn't held for well over a decade.
      If you hold minimum feature size without regard to cost, it is still holding.
      What you want to prove usually dictates what interpretation you make.
      That said, the scaling law is still unknown, but you can game it as much as you want in similar ways.
      GPT4 was already hinting at an asymptote on MMLU, but the question is if it is valid for real work etc...
      Time will tell, but I am seeing far less optimism from my sources, but that is just anecdotal.
    - slashdave 40 minutes ago
      Moore's law is doomed. At some point you start reaching the level of individual atoms. This is just physics.
  - slashdave 41 minutes ago
    > Everyone was mostly focused on the scaling laws because that worked extremely well
    Also because it was easy, and expense was not the first concern.
- alangibson 5 hours ago
  I think you're playing a different game than the Sam Altmans of the world. The level of investment and profit they are looking for can only be justified by creating AGI.
  The > 100 P/E ratios we are already seeing can't be justified by something as quotidian as the exceptionally good productivity tools you're talking about.
  [-]
  - JumpCrisscross 5 hours ago
    > level of investment and profit they are looking for can only be justified by creating AGI
    What are you basing this on?
    IT outsourcing is a $500+ billion industry. If OpenAI et al can run even a 10% margin, that business alone justifies their valuation.
    [-]
    - HarHarVeryFunny 4 hours ago
      It seems you are missing a lot of "ifs" in that hypothetical!
      Nobody knows how things like coding assistants or other AI applications will pan out. Maybe it'll be Oracle selling Meta-licenced solutions that gets the lion's share of the market. Maybe custom coding goes away for many business applications as off-the-shelf solutions get smarter.
      A future where all that AI (or some hypothetical AGI) changes is work being done by humans to the same work being done by machines seems way too linear.
      [-]
      - JumpCrisscross 3 hours ago
        > you are missing a lot of "ifs" in that hypothetical
        The big one being I'm not assuming AGI. Low-level coding tasks, the kind frequently outsourced, are within the realm of being competitive with offshoring with known methods. My point is we don't need to assume AGI for these valuations to make sense.
        [-]
        HarHarVeryFunny 3 hours ago
        Current AI coding assistants are best at writing functions or adding minor features to an existing code base. They are not agentic systems that can develop an entire solution from scratch given a specification, which in my experience is more typcical of the work that is being outsourced. AI is a tool, whose full-cycle productivity benefit seems questionable. It is not a replacement for a human.
        [-]
        JumpCrisscross 3 hours ago
        > they are not agentic systems that can develop an entire solution from scratch given a specification, which in my experience is more typcical of the work that is being outsourced
        If there is one domain where we're seeing tangible progress from AI, it's in working towards this goal. Difficult projects aren't in scope. But most tech, especially most tech branded IT, is not difficult. Everyone doesn't need an inventory or customer-complaint system designed from scratch. Current AI is good at cutting through that cruft.
        [-]
        ehnto 1 hour ago
        There have been off the shelf solutions for so many common software use cases, for decades now. I think the reason we still see so much custom software is that the devil is always in the details, and strict details are not an LLMs strong suit.
        LLMs are in my opinion hamstrung at the starting gate in regards to replacing software teams, as they would need to be able to understand complex business requirements perfectly, which we know they cannot. Humans can't either. It takes a business requirements/integration logic/code generation pipeline and I think the industry is focused on code generation and not that integration step.
        I think there needs to be a re-imaging of how software is built by and for interaction with AI if it were to ever take over from human software teams, rather than trying to get AI to reflect what humans do.
        senko 3 hours ago
        There are a number of agentic systems that can develop more complex solutions. Just a few off the top of my head: Pythagora, Devin, OpenHands, Fume, Tusk, Replit, Codebuff, Vly. I'm sure I've missed a bunch.
        Are they good enough to replace a human yet? Questionable[0], but they are improving.
        [0] You wouldn't believe how low the outsourcing contractors' quality can go. Easily surpassed by current AI systems :) That's a very low bar tho.
  - gizajob 5 hours ago
    Yeah I keep thinking this - how is Nvidia worth $3.5Trillion for making code autocomplete for coders
    [-]
    - drawnwren 5 hours ago
      Nvidia was not the best example. They get to moon in the case that any AI exponential hits. Most others have less of a wide probability distribution.
      [-]
      - HarHarVeryFunny 4 hours ago
        I'm not sure about that. NVIDIA seems to stay in a dominant position as long as the race to AI remains intact, but the path to it seems unsure. They are selling a general purpose AI-accelerator that supports the unknown path.
        Once massively useful AI has been achieved, or it's been determined that LLMs are it, then it becomes a race to the bottom as GOOG/MSFT/AMZN/META/etc design/deploy more specialized accelerators to deliver this final form solution as cheaply as possible.
      - BeefWellington 5 hours ago
        Yeah they're the shovel sellers of this particular goldrush.
        Most other businesses trying to actually use LLMs are the riskier ones, including OpenAI, IMO (though OpenAI is perhaps the least risky due to brand recognition).
        [-]
        lokimedes 4 hours ago
        Or they become the Webvan/pets.com of the bubble.
        [-]
        zeusk 4 hours ago
        Nvidia is more likely to become CSCO or INTC but as far as I can tell, that's still a few years off - unless ofcourse there is weakness in broader economy that accelerates the pressure on investors.
- simonw 5 hours ago
  Right. I've been saying for a while that if all LLM development stopped entirely and we were stuck with the models we have right now (GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro, Llama 3.1/2, Qwen 2.5 etc) we could still get multiple years worth of advances just out of those existing models. There is SO MUCH we haven't figured out about how to use them yet.
  [-]
  - dgfitz 2 hours ago
    LLMs use historic data to help create useful current data. It works well sometimes.
    I find that a human is able to solve a P=NP situation, and an LLM can’t quite yet do that. When they can the game changes.
  - niobe 1 hour ago
    > There is SO MUCH we haven't figured out about how to use them yet.
    I mean, it's pretty clear to me they're a potentially great human-machine interface, but trying to make LLMs - in their current fundamental form - a reliable computational tool.. well, at best it's an expensive hack, but it's just not the right tool for the job.
    I expect the next leap forward will require some orthogonal discovery and lead to a different kind of tool. But perhaps we'll continue to use LLMs as we knownthem now for what they're good at - language.
    [-]
    - simonw 21 minutes ago
      One of the biggest challenges in learning how to use and build on LLMs is figuring out how to work productively with a technology that - unlike most computers - is inherently unreliable and non-deterministic.
      It's possible, but it's not at all obvious and requires a slightly skewed way of looking at them.
- alach11 5 hours ago
  My team and I also develop with these models every day, and I completely agree. If models stall at current levels, it will take 10 (or more) years for us to capture most of the value they offer. There's so much work out there to automate and so many workflows to enhance with these "not quite AGI-level" models. And if peak model performance remains the same but cost continues to drop, that opens up vastly more applications as well.
- RayVR 49 minutes ago
  I am definitely not an expert, nor do I have inside information on the directions of research that these companies are exploring.
  Yes, existing LLMs are useful. Yes, there are many more things we can do with this tech.
  However, existing SOTA models are large, expensive to run, still hallucinate, fail simple logic tests, fail to do things a poorly trained human can do on autopilot, etc.
  The performance of LLMs is extremely variable, and it is hard to anticipate failure.
  Many potential applications of this technology will not tolerate this level of uncertainty. Worse solutions with predictable and well understood shortcomings will dominate.
- Lonestar1440 32 minutes ago
  No, we have not even scratched the surface of what current-gen LLMs can do for an organization which puts the correct data into them.
  If indeed the "GPT 5!" Arms race has calmed down, it should help everyone focus on the possible, their own goals, and thus what AI capabilities to deploy.
  Just as there won't be a "Silver Bullet" next gen model, the point about Correct Data In is also crucial. Nothing is 'free' not even if you pay a vendor or integrator. You, the decision making organization, must dedicate focus to putting data into your new AI systems or not.
  It will look like the dawn of original IBM, and mechanical data tabulation, in retrospect once we learn how to leverage this pattern to its full potential.
- brookst 5 hours ago
  > Question for the group here: do we honestly feel like we've exhausted the options for delivering value on top of the current generation of LLMs?
  Certainly not.
  But technology is all about stacks. Each layer strives to improve, right up through UX and business value. The uses for 1µm chips had not been exhausted in 1989 when the 486 shipped in 800nm. 250nm still had tons of unexplored uses when the Pentium 4 shipped on 90nm.
  Talking about scaling at the the model level is like talking about transistor density for silicon: it's interesting, and relevant, and we should care... but it is not the sole determinent of what use cases can be build and what user value there is.
- bloppe 4 hours ago
  > you can have LLMs create reasonable code changes, with automatic review / iteration etc.
  Nobody who takes code health and sustainability seriously wants to hear this. You absolutely do not want to be in a position where something breaks, but your last 50 commits were all written and reviewed by an LLM. Now you have to go back and review them all with human eyes just to get a handle on how things broke, while customers suffer. At this scale, it's an effort multiplier, not an effort reducer.
  It's still good for generating little bits of boilerplate, though.
  [-]
  - Aeolun 1 hour ago
    If the last 50 commits were reviewed by an AI and it took that long for an issue to happen I’d immediately mandate all PR’s are reviewed by an AI.
    [-]
    - bloppe 1 hour ago
      There's a difference between an issue being introduced and being noticed.
- ben_w 5 hours ago
  > Question for the group here: do we honestly feel like we've exhausted the options for delivering value on top of the current generation of LLMs?
  IMO we've not even exhausted the options for spreadsheets, let alone LLMs.
  And the reason I'm thinking of spreadsheets is that they, like LLMs, are very hard to win big on even despite the value they bring. Not "no moat" (that gets parroted stochastically in threads like these), but the moat is elsewhere.
- hamburga 30 minutes ago
  I think there's a ton to be tapped based on the current state of the art.
  As a developer, I'm making much more progress using the SOTA (Claude 3.5) as a Socratic interrogator. I'm brainstorming a project, give it my current thoughts, and then ask it to prompt me with good follow-up questions and turn general ideas into a specific, detailed project plan, next steps, open questions, and work log template. Huge productivity boost, but definitely not replacing me as an engineer. I specifically prompt it to not give me solutions, but rather, to just ask good questions.
  I've also used Claude 3.5 as (more or less) a free arbitrator. Last week, I was in a disagreement with a colleague, who was clearly being disingenuous by offering to do something she later reneged on, and evading questions about follow up. Rather than deal with organizational politics, I sent the transcript to Claude for an unbiased evaluation, and it "objectively" confirmed what had been frustrating me. I think there's a huge opportunity here to use these things to detect and call out obviously antisocial behavior in organizations (my CEO is intrigued, we'll see where it goes). Similarly, in our legal system, as an ultra-low-cost arbitrator or judge for minor disputes (that could of course be appealed to human judges). Seems like the level of reasoning in Claude 3.5 is good enough for that.
  My mental model is always "low-risk search". https://muldoon.cloud/2023/10/29/ai-commandments.html
- ericmcer 4 hours ago
  I have tried a few AI coding tools and always found them impressive but I don't really need something to autocomplete obvious code cases.
  Is there an AI tool that can ingest a codebase and locate code based on abstract questions? Like: "I need to invalidate customers who haven't logged in for a month" and it can locate things like relevant DB tables, controllers, services, etc.
- whiplash451 5 hours ago
  The main difference between GPT5 and a PhD-level new hire is that the new hire will autonomously go out, deliver and take on harder task with much fewer guidance than GPT5 will ever require. So much of human intelligence is about interacting with peers.
  [-]
  - ben_w 5 hours ago
    Human interaction with peers is also guidance.
    I don't know how many team meetings PhD students have, but I do know about software development jobs with 15 minute daily standups, and that length meeting at 120 words per minute for 5 days a week, 48 weeks per year of a 3 year PhD is 1.296.000 words.
    [-]
    - eastbound 3 hours ago
      I have 3 remote employees whose job is consistently as bad as LLM.
      That means employees who use LLM are, on average, recognizably bad. Those who are good enough, are also good enough to write the code manually.
      To the point I wonder whether this HN thread is generated by OpenAI, trying to create buzz around AI.
      [-]
      - ben_w 3 hours ago
        1. The person I'm replying to is hypothesising about a future, not yet existent, version, GPT5. Current quality limits don't tell you jack about a hypothetical future, especially one that may not ever happen because money.
        2. I'm not commenting on the quality, because they were writing about something that doesn't exist and therefore that's clearly just a given for the discussion. The only thing I was adding is that humans also need guidance, and quite a lot of it — even just a two-week sprint's worth of 15 minute daily stand-up meetings is 18,000 words, which is well beyond the point where I'd have given up prompting an LLM and done the thing myself.
- jeswin 45 minutes ago
  In my view, an escape hatch if we are truly stuck would be radical speed ups (like Cerebras) in compute time. If we get outputs in milli-seconds instead of seconds and at much lower costs, it would make backtracking viable. This won't allow AGI, but can make a new class of apps possible.
- rco8786 1 hour ago
  I think there’s a long way to go also. I think people expected that AI would eventually be like a “point and shoot” where you would tell it to go do some complicated task, or sillier yet, take over someone’s entire job.
  More realistically it’s like a really great sidekick for doing very specific mundane but otherwise non deterministic tasks.
  I think we’ll start to see AI permeate into nearly every back office job out there, but as a series of tools that help the human work faster. Not as one big brain that replaces the human.
- amelius 6 hours ago
  Yes, but literally anybody can do all those things. So while there will be many opportunities for new features (new ways of combining data), there will be few business opportunities.
  [-]
  - Miraste 5 hours ago
    HN always says this, and it's always wrong. A technical implementation that's easy, or readily available, does not mean that a successful company can't be built on it. Last year, people were saying "OpenAI doesn't have a moat." 15 years before that, they were saying "Dropbox is just a couple of chron jobs, it'll fail in a few months."
    [-]
    - amelius 4 hours ago
      > HN always says this
      The meaning here is different. What I'm saying is that big companies like OpenAI will always strive to make a generic AI, such that anyone can do basically anything using AI. The big companies therefore will indeed (like you say) have a profitable business, but few others will.
- yk 4 hours ago
  To a certain extent I think we get a better understanding what llms can do, and my estimation for the next ten years is more like best UI ever rather than llms will replace humanity. Now best UI ever is something that can certainly deliver a lot of value, 80% of all buttons in a car should be replaced by actually good voice control, and I think that is were we are going to see a lot of very interesting applications: Hey washing machine, this is two t-shirts and a jeans. (The washing machine can then figure out it's program by itself, I don't want to memorize the table in the manual.)
  [-]
  - lokimedes 4 hours ago
    To each their own, but I don’t look forward to having my kids yelling, a podcast in my ears and having to explain to my tumbler that wool must be spun at 1000 RPM. Humans have varying preferences when it comes to communication and sensing, making our machine interactions favor the extroverted talkative exhibitionists is really only one modality.
- machiaweliczny 4 hours ago
  Long context is a scam. Claude is best but it’s still gets lost with longer context
  [-]
  - bbor 4 hours ago
    I have no data, but I whole-heartedly agree. Well, perhaps not “scam”, but definitely oversold. One of my best undergrad professors taught me the adage “don’t expect a model to do what a human expert cannot”, and I think it’s still a good rule of thumb. Giving someone an entire book to read before answering your question might help, but it would help way, way more to give them a few paragraphs that you know are actually relevant.
  - cruffle_duffle 3 hours ago
    In my experience, the reality of long context windows doesn’t live up to the hype. When you’re iterating on something, whether it's code, text, or any document, you end up with multiple versions layered in the context. Every time you revise, those earlier versions stick around, even though only the latest one is the "most correct".
    What gets pushed out isn’t the last version of the document itself (since it’s FIFO), but the important parts of the conversation—things like the rationale, requirements, or any context the model needs to understand why it’s making changes. So, instead of being helpful, that extra capacity just gets filled with old, repetitive chunks that have to be processed every time, muddying up the output. This isn’t just an issue with code; it happens with any kind of document editing where you’re going back and forth, trying to refine the result.
    Sometimes I feel the way to "resolve" this is to instead go back and edit some earlier portion of the chat to update it with the "new requirements" that I didn't even know I had until I walked down some rabbit hole. What I end up with is almost like a threaded conversation with the LLM. Like, I sometimes wish these LLM chatbots explicitly treated the conversion as if it were threaded. They do support basically my use case by letting you toggle between different edits to your prompts, but it is pretty limited and you cannot go back and edit things if you do some operations (eg: attach a file).
    Speaking of context, it's also hard to know what things like ChatGPT add to it's context in the first place. Many of times I'll attach a file or something and discover it didn't "read" the file into it's context. Or I'll watch it fire up a python program it writes that does nothing but echo the file into it's context.
    I think there is still a lot of untapped potential in strategically manipulating what gets placed into the context window at all. For example only present the LLM with the latest and greatest of a document and not all the previous revisions in the thread.
    [-]
    - kian 4 minutes ago
      This is why I exclusively use the API to 'chat' with GPT -- complete control over the context presented.
    - dr_kiszonka 44 minutes ago
      I like the idea of context editing and threaded conversations. I think I have seen some alternative UIs on HN that support branching.
- HarHarVeryFunny 5 hours ago
  Sure, there's going to be a lot of automation that can be built using current GPT-4 level LLMs, even if they don't get much better from here.
  However, this is better thought of as "business logic scripting/automation", not the magic employee-replacing AGI that would be the revolution some people are expecting. Maybe you can now build a slightly less shitty automated telephone response system to piss your customers off with.
- anonzzzies 5 hours ago
  The current models are very powerful and we definitely didn't get most out of them yet. We are getting more and more out of them every week when we release new versions of our toolkits. So if this is it; please make it faster and take less energy. We'll be fine until the next AI spring.
- hartator 6 hours ago
  All of these hacks do sound like we are at that diminishing return point.
  [-]
  - namaria 6 hours ago
    It all just sounds to me like we're back at expert systems. Doesn't bode well...
    [-]
    - ianbutler 5 hours ago
      Honest question, how would you expect systems to get external knowledge etc without tools like the OP is suggesting?
      Action oriented through self exploration? What is your thought for how these systems integrate with the existing world?
      Why does the OP's suggested mode of integration make you think of those older systems?
  - brookst 5 hours ago
    Hey look, it's Gordon Moore visiting us from 2005! :)
- robrenaud 4 hours ago
  > For example, combining a human-moderated knowledge graph with an LLM with RAG allows you to build "expert bots" that understand your business context / your codebase / your specific processes and act almost human-like similar to a coworker in your team.
  I'd love to hear about this. I applied to YC WC 25 with research/insight/an initial researchy prototype built on top of GPT4+finetuning about something along this idea. Less powerful than you describe, but it also works without the human moderated KG.
- purple-leafy 1 hour ago
  Doesn’t sound cutting edge at all? Every man and his dog is doing a similar process
- soheil 23 minutes ago
  We have not exhausted what html can do either. LLMs not getting smarter is orthogonal to its currently unexplored search space.
- msabalau 5 hours ago
  There are all sorts of valuable things to explore and build with what we have already.
  But understanding how likely it is that we will (or will not) see a new models quickly and dramatically improve on what we have "because scaling" seems valuable context for everyone in ecosystem to make decisions.
- moogly 3 hours ago
  > you can have LLMs create reasonable code changes
  Could you define "code changes" because I feel that is a very vague accomplishment.
- 23B1 5 hours ago
  The user interface for LLMs is stuck in C:\
  That's where I'd focus.
  [-]
  - kenjackson 3 hours ago
    Voice for LLMs is surprisingly good. I'd love to see LLMs used in more systems like cars and in-home automation. Whatever cars use today and Alexa in the home simply are much worse than what we get with ChatGPT voice today.
- hluska 5 hours ago
  Nowhere near, but the market seems to have priced in that scaling would continue to have a near linear effect on capability. That’s not happening and that’s the issue the article is concerned with.
- bbor 4 hours ago
  Great question. Im very confident in my answer, even though it’s in the minority here: we’re not even close to exhausting the potential.
  Imagine that our current capabilities are like the Model-T. There remains many improvements to be made upon this passenger transportation product, with RAG being a great common theme among them. People will use chatbots with much more permissive interfaces instead of clicking through menus.
  But all of that’s just the start, the short term, the maturation of this consumer product; the really scary/exciting part comes when the technology reaches saturation, and opens up new possibilities for itself. In the Model-T metaphor, this is analogous to how highways have (arguably) transformed America beyond anyone’s wildest dreams, changing the course of various historical events (eg WWII industrialization, 60s & 70s white flight, early 2000s housing crisis) so much it’s hard to imagine what the country would look like without them. Now, automobiles are not simply passenger transportation, but the bedrock of our commerce, our military, and probably more — through ubiquity alone they unlocked new forms of themselves.
  For those doubting my utopian/apocalyptic rhetoric, I implore you to ask yourself one simple question: why are so many experts so worried about AGI? They’ve been leaving in droves from OpenAI, and that’s ultimately what the governance kerfluffle there was. Hinton, a Turing award winner, gave up $$$ to doom-say full time. Why?
  My hint is that if your answer involves less then a 1000 specialized LLMs per unified system, then you’re not thinking big enough.
  [-]
  - mrandish 1 hour ago
    > why are so many experts so worried about AGI?
    FYI, I find this line of reasoning to be unconvincing both logically and by counter-example ("why are so many experts so worried about the Y2K bug?")
    Personally, I don't find AI foom or AI doom predictions to be probable but I do think there are more convincing arguments for your position than you're making here.
    [-]
    - bbor 1 hour ago
      Fair enough, well put to both of these responses! I’m certainly biased, and can see how the events that truly scare me (after already assessing the technology on my own and finding it to be More Important Than Fire Or Electricity) don’t make very convincing arguments on their own.
      For us optimistic doomers, the AI conversation seems similar to the (early-2000s) climate change debate; we see a wave of dire warnings coming from scientific experts that are all-to-often dismissed, either out of hand due to their scale, or on the word of an expert in an adjacent-ish field. Of course, there’s more dissent among AI researchers than there was among climate scientists, but I hope you see where I’m coming from nonetheless — it’s a dynamic that makes it hard to see things from the other side, so-to-speak.
      At this point I’ve pretty much given up convincing people on HackerNews, it’s just cathartic to give my piece and let people take it or leave it. If anyone wants to bring the convo down from industry trends into technical details, I’d love to engage tho :)
  - fire_lake 3 hours ago
    > Hinton, a Turing award winner, gave up $$$ to doom-say full time
    This is a hint of something but a weak argument. Smart people are wrong all the time.
- EGreg 5 hours ago
  I want to stuff a transcript of a 3 hour podcast into some LLM API and have it summarize it by: segmenting by topic changes, keeping the timestamps, and then summarizing each segment.
  I wasn’t able to get it do it with Anthropic or OpenAI chat completion APIs. Can someone explain why? I don’t think the 200K token window actually works, is it looking sequentially or is it really looking at the whole thing at once or something?
- nonameiguess 3 hours ago
  Your hypothesis here is not exclusive of the hypothesis in this article.
  Name your platform. Linux. C++. The Internet. The x86 processor architecture. We haven't exhausted the options for delivering value on top of those, but that doesn't mean the developers and sellers of those platforms don't try to improve them anyway and might struggle to extract value from application developers who use them.
Havoc 0 minutes ago
The new Gemini just hit some good benchmarks.
This smells like it’s mostly based on OAI having a bit of bad luck with next model rather than a fundamental slowdown / barrier.
They literally just made a decent sized leap with o1
GiorgioG 0 minutes ago
[delayed]
iandanforth 7 hours ago
A few important things to remember here:
The best engineering minds have been focused on scaling transformer pre and post training for the last three years because they had good reason to believe it would work, and it has up until now.
Progress has been measured against benchmarks which are / were largely solvable with scale.
There is another emerging paradigm which is still small(er) scale but showing remarkable results. That's full multi-modal training with embodied agents (aka robots). 1x, Figure, Physical Intelligence, Tesla are all making rapid progress on functionality which is definitely beyond frontier LLMs because it is distinctly different.
OpenAI/Google/Anthropic are not ignorant of this trend and are also reviving or investing in robots or robot-like research.
So while Orion and Claude 3.5 opus may not be another shocking giant leap forward, that does not mean that there arn't giant shocking leaps forward coming from slightly different directions.
[-]
- airstrike 1 minute ago
  [delayed]
- sincerecook 5 hours ago
  > That's full multi-modal training with embodied agents (aka robots). 1x, Figure, Physical Intelligence, Tesla are all making rapid progress on functionality which is definitely beyond frontier LLMs because it is distinctly different.
  Cool, but we already have robots doing this in 2d space (aka self driving cars) that struggle not to kill people. How is adding a third dimension going to help? People are just refusing to accept the fact that machine learning is not intelligence.
  [-]
  - tick_tock_tick 1 hour ago
    I ride in self driving cares basically once a week in SF (Waymo). It's always felt safer then a Uber and makes ways less risky maneuvers.
  - warkdarrior 3 hours ago
    > Cool, but we already have robots doing this in 2d space (aka self driving cars) that struggle not to kill people. How is adding a third dimension going to help?
    If we have robots that operate in 3D, they'll be able to kill you not only from behind or from the side, but also from above. So that's progress!
  - akomtu 3 hours ago
    My understanding is that machine learning today is a lot like interpolation of examples in the dataset. The breakthrough of LLMs is due to the idea that interpolation in a 1024-dimensional space works much better than in a 2d space, if we naively interpolated English letters. All the modern transformers stuff is basically an advanced interpolation method that uses a large local neighborhood than just few nearest examples. It's like the Lanczos interpolation kernel, using a 1d analogy. Increasing the size of the kernel won't bring any gains, because the current kernel already nearly perfectly approximates an ideal interpolation (a full dataset DFT).
    However interpolation isn't reasoning. If we want to understand the motion of planets, we would start with a dataset of (x, y, z, t) coordinates and try to derive the law of motion. Imagine if someone simply interpolated the dataset and presented the law of gravity as an array of million coefficients (aka weights)? Our minds have to work with a very small operating memory that can hardly fit 10 coefficients. This constraint forces us to develop intelligence that compacts the entire dataset into one small differential equation. Btw, English grammar is the differential equation of English in a lot of ways: it tells what the local rules are of valid trajectories of words that we call sentences.
- joe_the_user 6 hours ago
  Tesla are all making rapid progress on functionality which is definitely beyond frontier LLMs because it is distinctly different
  Sure, that's tautologically true but that doesn't imply that beyondness will lead to significant leaps that offer notable utility like LLMs. Deep Learning overall has been a way around the problem that intelligent behavior is very hard to code and no wants to hire many, many coders needed to do this (and no one actually how to get a mass of programmers to actually be useful beyond a certain of project complexity, to boot). People take the "bitter lesson" to mean data can do anything but I'd say a second bitter lesson is that data-things are the low hanging fruit.
  Moreover, robot behavior is especially to fake. Impressive robot demos have been happening for decades without said robots getting the ability to act effectively in the complex, ad-hoc environment that human live in, IE, work with people or even cheaply emulate human behavior (but they can do choreographed/puppeteered kung fu on stage).
  [-]
  - hobs 6 hours ago
    And worth noting that Tesla faked a ton of its robot footage already, they might be making progress but their physical human robotics does not seem advanced at the moment.
    [-]
    - ben_w 4 hours ago
      Indeed.
      Even assuming the recent robot demo was entirely AI, the only single thing they demonstrated that would have been noteworthy was isolating one voice in a noisy crowd well enough to respond; everything else I saw Optimus do, has already been demonstrated by others.
      What makes the uncertainty extra sad, is that a remote controllable humanoid robot is already directly useful for work in hazardous environments, and we know they've got at least that… but Musk would rather it be about the AI.
- slashdave 36 minutes ago
  > Tesla are all making rapid progress on functionality
  The lack of progress with self driving seems to indicate that Tesla has a serious problem with scaling. The investment in enormous compute resources is another red flag (if you run out of ideas, just use brute force). This points to a fundamental flaw in model architecture.
- demosthanos 6 hours ago
  > that does not mean that there arn't giant shocking leaps forward coming from slightly different directions.
  Nor does it mean that there are! We've gotten into this habit of assuming that we're owed giant shocking leaps forward every year or so, and this wave of AI startups raised money accordingly, but that's never how any innovation has worked. We've always followed the same pattern: there's a breakthrough which causes a major shift in what's possible, followed by a few years of rapid growth as engineers pick up where the scientists left off, followed by a plateau while we all get used to the new normal.
  We ought to be expecting a plateau, but Sam Altman and company have done their work well and have convinced many of us that this time it's different. This time it's the singularity, and we're going to see exponential growth from here on out. People want to believe it, so they do, and Altman is milking that belief for all it's worth.
  But make no mistake: Altman has been telegraphing that he's eyeing the exit, and you don't eye the exit when you own a company that's set to continue exponentially increasing in value.
  [-]
  - lcnPylGDnU4H9OF 5 hours ago
    > Altman has been telegraphing that he's eyeing the exit
    Can you think of any specific examples? Not trying to express disbelief, just curious given that this is obviously not what he's intending to communicate so it would be interesting to examine what seemed to communicate it.
- rafaelmn 3 hours ago
  >There is another emerging paradigm which is still small(er) scale but showing remarkable results. That's full multi-modal training with embodied agents (aka robots). 1x, Figure, Physical Intelligence, Tesla are all making rapid progress on functionality which is definitely beyond frontier LLMs because it is distinctly different.
  Tesla is selling this view for almost a decade now in self-driving - how their car fleet feeding training data is going to make them leaders in the area. I don't find it convincing anymore
- knicholes 6 hours ago
  Once we've scraped the internet of its data, we need more data. Robots can take in video/audio data 24/7 and can be placed in your house to record this data by offering services like cooking/cleaning/folding laundry. Yeah, I'll pay $20k to have you record everything that happens in my house if I can stop doing dishes for five years!
  [-]
  - BobaFloutist 47 minutes ago
    There already exists a robot that does the dishes, it's called a dishwasher.
  - fldskfjdslkfj 5 hours ago
    There's plenty of video content being uploaded and streamed everyday, i find it hard to believe the more data will really change something, excluding very specialized tasks.
    [-]
    - nuancebydefault 5 hours ago
      The difference with the bot is that there is a fast feedback loop between action and content. No tagging required, real physics is the playground.
  - triyambakam 6 hours ago
    Or get a dishwashing machine?
  - hartator 6 hours ago
    Why 5 years?
    [-]
    - knicholes 3 hours ago
      No real reason. I just made it up. But that's kind of my reasonable expectation of longevity of a machine like a robotic lawnmower and battery life.
    - fifilura 6 hours ago
      Five years, that's all we've got.
      https://en.m.wikipedia.org/wiki/Five_Years_(David_Bowie_song...
    - twelve40 5 hours ago
      > OpenAI has announced a plan to achieve artificial general intelligence (AGI) within five years, an ambitious goal as the company works to design systems that outperform humans.
    - bredren 6 hours ago
      Because whatever org fills this space will be working on ARR.
    - exe34 6 hours ago
      that's when the robot takes his job and he can't afford the robot anymore.
  - fragmede 4 hours ago
    People go and live in a house to get recorded 24/7, to be on tv, for far more asnine situations, for way less money.
- eli_gottlieb 6 hours ago
  >The best engineering minds have been focused on scaling transformer pre and post training for the last three years
  The best minds don't follow the herd.
- mvdtnz 2 hours ago
  > The best engineering minds have been focused on scaling transformer pre and post training for the last three years because they had good reason to believe it would work, and it has up until now.
  Or because the people running companies who have fooled investors into believing it will work can afford to pay said engineers life-changing amounts of money.
summerlight 6 minutes ago
I guess this is somewhat expected? The current frontier models probably already have exhausted most of the entropy in the training data accumulated over decades and the new training data is very sparse. And the current mainstream architectures are not capable of sophisticated searching and planning, essential aspects for generating new entropy out of thin air. o1 was an interesting attempt to tackle this problem, but we probably still have a long way to go.
Animats 6 hours ago
"While the model was initially expected to significantly surpass previous versions of the technology behind ChatGPT, it fell short in key areas, particularly in answering coding questions outside its training data."
Right. If you generate some code with ChatGPT, and then try to find similar code on the web, you usually will. Search for unusual phrases in comments and for variable names. Often, something from Stack Overflow will match.
LLMs do search and copy/paste with idiom translation and some transliteration. That's good enough for a lot of common problems. Especially in the HTML/Javascript space, where people solve the same problems over and over. Or problems covered in textbooks and classes.
But it does not look like artificial general intelligence emerges from LLMs alone.
There's also the elephant in the room - the hallucination/lack of confidence metric problem. The curse of LLMs is that they return answers which are confident but wrong. "I don't know" is rarely seen. Until that's fixed, you can't trust LLMs to actually do much on their own. LLMs with a confidence metric would be much more useful than what we have now.
[-]
- dmd 6 hours ago
  > Right. If you generate some code with ChatGPT, and then try to find similar code on the web, you usually will.
  People who "follow" AI, as the latest fad they want to comment on and appear intelligent about, repeat things like this constantly, even though they're not actually true for anything but the most trivial hello-world types of problems.
  I write code all day every day. I use Copilot and the like all day every day (for me, in the medical imaging software field), and all day every day it is incredibly useful and writes nearly exactly the code I would have written, but faster. And none of it appears anywhere else; I've checked.
  [-]
  - ngai_aku 5 hours ago
    You’re solving novel problems all day every day?
    [-]
    - dmd 5 hours ago
      Pretty much, yes. My job is pretty fun; it mostly entails things like "take this horrible file workflow some research assistant came up with while high 15 years ago and turn it into a newer horrible file format a NEW research assistant came up with (also while high) 3 years ago" - and automate this in our data processing pipeline.
      [-]
      - fireflash38 3 hours ago
        If you've got clearly defined start input format and end output format, sure it seems that it would be a good candidate for heavy LLM use. But I don't know if that's most people.
        [-]
        dmd 3 hours ago
        If it were ever clearly defined or even consistent from input to input I would be overjoyed.
      - delusional 3 hours ago
        If I understand that correctly you're converting file formats? That's not exactly "novel"
        [-]
        llm_trw 2 hours ago
        This is exactly the type of novel work that llms are good at. It's tedious and has annoying internal logic, but that logic is quite flat and there are a million examples to generalise from.
        What they fail at is code with high cyclomatic complexity. Back in the llama 2 finetune days I wrote a script that would break down what each node in the control flow graph into its own prompt using literate programming and the results were amazing for the time. Using the same prompts I'd get correct code in every language I tried.
      - Der_Einzige 4 hours ago
        Due to WFH, the weed laws where tech workers live, and the fast tolerance building of cannabis in the body - I estimate that 10% of all code written by west coast tech workers is done “while high” and that estimate is likely low.
        [-]
        portaouflop 3 hours ago
        Do tech workers write better or worse code while high ?
  - wokwokwok 1 hour ago
    > even though they're not actually true for anything but the most trivial hello-world types of problems.
    Um.
    All the parent post said was:
    > then try to find similar code on the web, you usually will.
    Not identical code. Similar code.
    I think you're really stretching the domain of plausibility to suggest that any code you write is novel enough that you can't find 'similar' code on the internet.
    To suggest that code generated from a corpus that is not going to be 'similar' to the code from the corpus is just factually and unambiguously false.
    Of course, it depends on what you interpret 'similar' to mean; but I think it's not unfair to say a lot of code is composed of smaller parts of code that is extremely similar to other examples of code on the internet.
    Obviously you're not going to find an example similar to your entire code base; but if you're using, for example, copilot where you generate many small snippets of code... welll....
    [-]
    - dmd 1 hour ago
      Ok, yes. There are other pieces of code on the internet that use a for loop or an if statement.
      By that logic what you wrote was also composed that way. After all, you’ve used all words that have been used before! I bet even phrases like “that is extremely similar” and “generated from a corpus” and “unambiguously false”.
      Again, I really find it hard to believe that anyone could make an argument like the one you’re making who has actually used these tools in their work for hundreds of hours, vs. for a couple minutes here or there with made up problems.
  - tymscar 1 hour ago
    How often did you check?
- xpe 4 hours ago
  > LLMs do search and copy/paste with idiom translation and some transliteration.
  In general, this is not a good description about what is happening inside an LLM. There is extensive literature on interpretability. It is complicated and still being worked out.
  The commenter above might characterize the results they get in this way, but I would question the validity of that characterization, not to mention its generality.
ziofill 7 hours ago
I think it is a good thing for AI that we hit the data ceiling, because the pressure moves toward coming up with better model architectures. And with respect to a decade ago there's a much larger number of capable and smart AI researchers who are looking for one.
aresant 6 hours ago
Taking a hollistic view informed by a disruptive OpenAI / AI / LLM twitter habit I would say this is AI's "What gets measured gets managed" moment and the narrative will change
This is supported by both general observations and recently this tweet from an OpenAI engineer that Sam responded to and engaged ->
"scaling has hit a wall and that wall is 100% eval saturation"
Which I interpert to mean his view is that models are no longer yielding significant performance improvements because the models have maxed out existing evaluation metrics.
Are those evaluations (or even LLMs) the RIGHT measures to achieve AGI? Probably not.
But have they been useful tools to demonstrate that the confluence of compute, engineering, and tactical models are leading towards signifigant breathroughts in artificial (computer) intelligence?
I would say yes.
Which in turn are driving the funding, power innovation, public policy etc needed to take that next step?
I hope so.
(1) https://x.com/willdepue/status/1856766850027458648
[-]
- Bjorkbat 1 hour ago
  I agree that existing benchmarks are no longer useful now that there's basically nothing left in them that seems to stump LLMs.
  But when I hear that models are failing to meet expectations, I imagine what they're saying is that the researchers had some sort of eval in mind with room to grow and a target, and that the model in question failed to hit the target they had in mind.
  Honestly, problem with sentiments like these is on Twitter is that you can't tell if they're being sincere or just making a snarky, useless remark. Probably a mix of both.
- ActionHank 6 hours ago
  > Which in turn are driving the funding, power innovation, public policy etc needed to take that next step?
  They are driving the shoveling of VC money into a furnace to power their servers.
  Should that money run dry before they hit another breakthrough "AI" popularity is going to drop like a stone. I believe this to be far more likely an outcome than AGI or even the next big breakthrough.
Dr_Birdbrain 27 minutes ago
I don’t know how to square this with the recent statement by Dario Amodei (Anthropic CEO) on the Lex Fridman podcast saying that in his opinion the scaling hypothesis still has plenty of room to run.
[-]
- avs733 22 minutes ago
  Hype gonna hype. I’m not saying he is wrong I’m saying his opinion would be the same whether it’s true or not because his value depends on it being his opinion.
jmward01 5 hours ago
Every negative headline I see about AI hitting a wall or being over-hyped makes me think of the early 2000's with that new thing the 'internet' (yes, I know the internet is a lot older than that). There is little doubt in my mind that ten years from now nearly every aspect of life will be deeply connected to AI just like the internet took over everything in the late 90's and early 2000's and is now deeply connected to everything now. I'd even hazard to say that AI could be more impactful.
[-]
- woopwoop 9 minutes ago
  That's funny, because to me these headlines about how deep learning is over-hyped and hitting the wall remind me of headlines from ten years ago about how... deep learning is over-hyped and hitting the wall.
- brookst 5 hours ago
  And, as I've noted a couple of times in this thread, how many times have we heard that Moore's law is dead and compute has hit a wall?
  [-]
  - moffkalast 3 hours ago
    Well according to Nvidia you can just ignore Moore's law and start requiring people to install multi kilowatt outlets just for their cards. Who needs efficiency amirite?
    [-]
    - jmward01 1 hour ago
      I'm not an apple fan (as I type on a mac that I am forced to use) but I gotta applaud their push for power efficiency. NVIDIA actually -does- have a few cards they make that really improve power efficiency but then they generally hamstring them with a lack of memory. NVIDIA is really good at making their high-end cards the only viable choice but I think that will backfire on them as people like me, that value quiet, cool and efficient over 25% faster inference start taking any viable alternative that comes out.
- JohnMakin 4 hours ago
  It's strange to me that's your takeaway. The reason that the internet was overhyped in the 2000's is because it was and also heavily overvalued. It took a massive correction and seriously disruptive bubble burst to break the delusion and move on to something more sustainable.
  [-]
  - jmward01 3 hours ago
    I disagree that it was over hyped. It has transformed our society so much that I would argue it was vastly under-hyped. Sure, there were a lot of silly companies that sprang up and went away because they weren't sound, but so much of the modern economy is based on the internet that it is hard to say any business isn't somehow internet related today. You would be hard pressed to find any business anywhere that doesn't at least have a social media account. If 2000 was over-hyping things I just don't see it.
    [-]
    - JohnMakin 3 hours ago
      pets.com was valued at $400 million based almost completely on its domain name. That's the classic example. People were throwing buckets of money at any .com that resolved to a site and almost all of it failed. I'm not sure how that doesn't meet the definition of over-hyped. It feels very similar to now. Not even to mention - the web largely doesn't consist of .com sites anymore, it's mostly a few centralized sites and apps.
    - adamrezich 2 hours ago
      There were no smartphones in 2000, so the Web was overvalued at that point in time... until we all started carrying the Web in our pockets in the form of a portable rectangle.
      Given that this is the case, why can't this be analogously true of “AI” as well? There's plenty of reason to believe that we're hitting a wall, such that, to progress further, said wall must be overcome by means of one or more breakthroughs.
      [-]
      - jmward01 2 hours ago
        'smartphones' needed a reason to exist, the internet provided that. I doubt we would have had them without it. AI will drive whole new product categories that didn't exist that will then transform our society even more.
- mvdtnz 3 hours ago
  Even if you're right (you're not) whatever "AI" looks like in 20+ years will have virtually nothing in common with these stupid statistical word generators.
- akomtu 5 hours ago
  AI can be thought of as the 2nd stage of the creature that we call the Internet. The 1st stage, that we are so familiar with, is about gathering knowledge into a giant and somewhat organized library. This library has books on every subject imaginable, but its scale is so vast that no living human today can grasp it. This is why the originally connected network has started falling apart. Once this I becomes AI, all the books in the library will be melted together into one coherent picture. Once again, anyone anywhere on Earth will be able to access all the knowledge and our Babylon will stay for a little longer.
headcanon 6 hours ago
I don't see a problem with this, we were inevitably going to reach some kind of plateau with existing pre-LLM-era data.
Meanwhile, the existing tech is such a step change that industry is going to need time to figure out how to effectively use these models. In a lot of ways it feels like the "digitization" era all over again - workflows and organizations that were built around the idea humans handled all the cognitive load (basically all companies older than a year or two) will need time to adjust to a hybrid AI + human model.
[-]
- readyplayernull 4 hours ago
  > feels like the "digitization" era all over again
  This exactly. And as history shows, no matter how much effort the current big LLM companies do they won't be able to grasp the best uses for their tech. We will see small players developing it even further. I'm thankful for the legendary blindness of these anticompetitive behemoths. Less than 2 decades ago: IBM Watson.
kklisura 7 hours ago
Not sure if related or not, Sam Altman, ~12hrs ago: there is no wall [1]
[1] https://x.com/sama/status/1856941766915641580
[-]
- ablation 6 hours ago
  Breaking: Man says enigmatic thing to sustain hype and flow of money into his business.
  [-]
  - methodical 4 hours ago
    Ditto- I have a feeling the investors in his latest 2.3 quintillion dollar series Z round wouldn't be as happy if he'd have tweeted "there is a wall"
- levocardia 56 minutes ago
  My interpretation of that tweet is "there is no DATA wall" meaning "we have so much more data we can ingest: all of youtube, all of spotify, all of twitch, every real-time webcam feed on the internet, RL agents playing every video game on steam, and we can extract so much more learning per unit data than we are now" which seems plausible enough to me.
- moffkalast 3 hours ago
  Altman on twitter has always been less coherent than GPT2.
thousand_nights 7 hours ago
not long ago these people would have you believe that a next word predictor trained on reddit posts would somehow lead to artificial general superintelligence
[-]
- leosanchez 7 hours ago
  If you look around, People still believe that a next word predictor trained on reddit posts would somehow lead to artificial general superintelligence
  [-]
  - esafak 6 hours ago
    Because the most powerful solution to that is to have intelligence; a model that can reason. People should not get hung up on the task; it's the model(s) that generates the prediction that matters.
  - mrguyorama 6 hours ago
    People believed ELIZA was sentient too. I bet you could still get 10% or more people, today, to believe it is.
    [-]
    - 77pt77 3 hours ago
      ELIZA was probably more effective than most therapists.
      Definitely cheaper.
- in_a_society 3 hours ago
  Expecting AGI from Reddit training data is peak "pray Mr Babbage".
- SpicyLemonZest 7 hours ago
  I don't understand why you'd be so dismissive about this. It's looking less likely that it'll end up happening, but is it any less believable than getting general intelligence by training a blob of meat?
  [-]
  - JohnMakin 6 hours ago
    > is it any less believable than getting general intelligence by training a blob of meat?
    Yes, because we understand the rough biological processes that cause this, and they are not remotely similar to this technology. We can also observe it. There is no evidence that current approaches can make LLM's achieve AGI, nor do we even know what processes would cause that.
    [-]
    - kenjackson 3 hours ago
      > because we understand the rough biological processes that cause this
      We don't have a rough understanding of the biological processes that cause this, unless you literally mean just the biological process and not how it actual impacts learning/intelligence.
      There's no evidence that we (brains) have achieved AGI, unless you tautologically define AGI as our brains.
      [-]
      - JohnMakin 2 hours ago
        > We don't have a rough understanding of the biological processes that cause this,
        Yes we do. We know how neurons communicate, we know how they are formed, we have great evidence and clues as to how this evolved and how our various neurological symptoms are able to interact with the world. Is it a fully solved problem? no.
        > unless you literally mean just the biological process and not how it actual impacts learning/intelligence.
        Of course we have some understanding of this as well. There's tremendous bodies of study around this. We know which regions of the brain correlate to reasoning, fear, planning, etc. We know when these regions are damaged or removed what happens, enough to point to a region of the brain and say "HERE." That's far, far beyond what we know about the innards of LLM's.
        > here's no evidence that we (brains) have achieved AGI, unless you tautologically define AGI as our brains.
        This is extremely circular because the current definition(s) of AGI always define it in terms of human intelligence. Unless you're saying that intelligence comes from somewhere other than our brains.
        Anyway, the brain is not like a LLM, in function or form, so this debate is extremely silly to me.
        [-]
        kenjackson 1 minute ago
        > Yes we do. We know how neurons communicate, we know how they are formed, we have great evidence and clues as to how this evolved and how our various neurological symptoms are able to interact with the world. Is it a fully solved problem? no.
        It's not even close to fully solved. We're still figuring out basic things like the purpose of dreams. We don't understand how memories are encoded or even things like how we process basic emotions like happiness. We're way closer to understanding LLMs than we are the brain, and we don't understand LLMs all that well still either. For example, look at the Golden Gate Bridge work for LLMs -- we have no equivalent for brains today. We've done much more advanced introspection work on LLMs in this short amount of time than we've done on the human brain.
  - BobaFloutist 41 minutes ago
    Yes, because that already happened.
  - namaria 5 hours ago
    This is a bad comparison. Intelligence didn't appear in some human brain. Intelligence appeared in a planetary ecosystem.
    [-]
    - aniforprez 5 hours ago
      Also it took hundreds of millions of years to get here. We're basically living in an atomic sliver on the fabric of history. Expecting AGI with 5 of years of scraping at most 30 years of online data and the minuscule fraction of what has been written over the past couple of thousand years was always a pie-in-the-sky dream to raise obscene amounts of money.
      [-]
      - Zopieux 3 hours ago
        I can't believe this still needs to be laid down years after the start of the GPT hype. Still, thanks!
  - mvdtnz 3 hours ago
    I feel like accusing people of being "so dismissive" was strongly associated with NFTs and cryptocurrency a few years ago, and now it's widely deployed against anyone skeptical of very expensive, not very good word generators.
    [-]
    - SpicyLemonZest 9 minutes ago
      I'm not sure what point you're making. It's true that people, including myself, were dismissive of cryptocurrency a few years ago; I think it's clear at this point that we were wrong, and it's not actually the case that the industry is a Ponzi scheme propped up by scammers like FTX.
WorkerBee28474 7 hours ago
> OpenAI's latest model ... failed to meet the company's performance expectations ... particularly in answering coding questions outside its training data.
So the models' accuracies won't grow exponentially, but can still grow linearly with the size of the training data.
Sounds like DataAnnotation will be sending out a lot more LinkedIn messages.
[-]
- pton_xd 7 hours ago
  I thought I saw some paper suggesting that accuracy grows linearly with exponential data. If that's the case it's not a mystery why we'd be hitting a training wall. Not sure I got the right takeaway from that study, though.
  EDIT: here's the paper https://arxiv.org/abs/2404.04125
pluc 7 hours ago
They've simply run out of data to use to fabricate legitimate-looking guesses. They can't create anything that doesn't already exist.
[-]
- xpe 4 hours ago
  > They can't create anything that doesn't already exist.
  I probably disagree, but I don't want to criticize my interpretation of this sentence. Can you make your claim more precise?
  Here are some possible claims and refutations:
  - Claim: An LLM cannot output a true claim that it has not already seen. Refutation: LLMs have been shown to do logical reasoning.
  - Claim: An LLM cannot incorporate data that it hasn't been presented with. Refutation: This is an unfair standard. All forms of intelligence have to sense data from the world somehow.
- mtkd 3 hours ago
  And that is potentially only going to worsen as:
  1. more data gets walled-off as owners realise value
  2. stackoverflow-type feedback loops cease to exist as few people ask a public question and get public answers ... they ask a model privately and get an answer based on last visible public solutions
  3. bad actors start deliberately trying to poison inputs (if sites served malicious responses to GPTBot/CCBot crawlers only, would we even know right now?)
  4. more and more content becomes synthetically generated to the point pre-2023 physical books become the last-known-good knowledge
  5. goverments and IP lawyers finally catch up
  [-]
  - 77pt77 3 hours ago
    > more data gets walled-off as owners realize value
    What's amazing to me to is that no one is throwing accusations of plagiarism.
    I still think that if the "wrong people" had tried doing this they would have been obliterated by the courts.
- xpe 4 hours ago
  > They've simply run out of data
  Why do you think "they" have run out of data? First, to be clear, who do you mean by "they"? The world is filled with information sources (data aggregators for example), each available to some degree for some cost.
  Don't forget to include data that humans provide while interacting with chatbots.
- readyplayernull 7 hours ago
  Garbage-in was depleted.
  [-]
  - zombiwoof 6 hours ago
    Exactly
    And our current AI is just pattern based intelligence based off of all human intelligence, some of that not being real intelligent data sources
  - thechao 6 hours ago
    The great AI garbage gyre?
- whazor 5 hours ago
  But a LLM can certainly make up a lot information that never existed before.
  [-]
  - bob1029 3 hours ago
    I strongly believe this gets into an information theoretical constraint akin to why perpetual motion machines don't work.
    In theory, yes you could generate an unlimited amount of data for the models, but how much of it is unique or valuable information? If you were to compress all this generated training data using a really good algorithm, how much actual information remains?
    [-]
    - moffkalast 2 hours ago
      I make a lot of shitposts, how much of that is valuable information? Arguably not much. I doubt information value is a good way to estimate inteligence because most people's daily ramblings would grade them useless.
    - cruffle_duffle 3 hours ago
      I sure hope there is some bright eyed bushy tailed graduate students crafting up some theorem to prove this. Because it is absolutely a feedback loop.
      ... that being said I'm sure there is plenty of additional "real data" that hasn't been fed to these models yet. For one thing, I think ChatGPT sucks so bad at terraform because almost all the "real code" to train on is locked behind private repositories. There isn't much publicly available real-world terraform projects to train on. Same with a lot of other similar languages and tools -- a lot of that knowledge is locked away as trade secrets and hidden in private document stores.
      (that being said Sonnet 3.5 is much, much, much better at terraform than chatgpt. It's much better at coding in general but it's night and day for terraform)
- 77pt77 3 hours ago
  > They can't create anything that doesn't already exist.
  Just increase the temperature.
  [-]
  - dcl 1 hour ago
    That just makes it more likely to sample less likely outcomes from the same distribution. No real novelty.
sssilver 2 hours ago
One thing that makes the established AIs less ideal for my (programming) use-case is that the technologies I use quickly evolve past whatever the published models "learn".
On the other hand, a lot of these frameworks and languages have relatively decent and detailed documentation.
Perhaps this is a naive question, but why can't I as a user just purchase "AI software" that comes with a large pre-trained model to which I can say, on my own machine, "go read this documentation and help me write this app in this next version of Leptos", and it would augment its existing model with this new "knowledge".
irrational 7 hours ago
> The AGI bubble is bursting a little bit
I'm surprised that any of these companies consider what they are working on to be Artificial General Intelligences. I'm probably wrong, but my impression was AGI meant the AI is self aware like a human. An LLM hardly seems like something that will lead to self-awareness.
[-]
- jedberg 7 hours ago
  Whether self awareness is a requirement for AGI definitely gets more into the Philosophy department than the Computer Science department. I'm not sure everyone even agrees on what AGI is, but a common test is "can it do what humans can".
  For example, in this article it says it can't do coding exercises outside the training set. That would definitely be on the "AGI checklist". Basically doing anything that is outside of the training set would be on that list.
  [-]
  - norir 6 hours ago
    Here is an example of a task that I do not believe this generation of LLMs can ever do but that is possible for a human: design a Turing complete programming language that is both human and machine readable and implement a self hosted compiler in this language that self compiles on existing hardware faster than any known language implementation that also self compiles. Additionally, for any syntactically or semantically invalid program, the compiler must provide an error message that points exactly to the source location of the first error that occurs in the program.
    I will get excited for/scared of LLMs when they can tackle this kind of problem. But I don't believe they can because of the fundamental nature of their design, which is both backward looking (thus not better than the human state of the art) and lacks human intuition and self awareness. Or perhaps rather I believe that the prompt that would be required to get an LLM to produce such a program is a problem of at least equivalent complexity to implementing the program without an LLM.
    [-]
    - Xenoamorphous 5 hours ago
      > Here is an example of a task that I do not believe this generation of LLMs can ever do but that is possible for a human
      That’s possible for a highly intelligent, extensively trained, very small subset of humans.
      [-]
      - hatefulmoron 5 hours ago
        If you took the intersection of every human's abilities you'd be left with a very unimpressive set.
        That also ignores the fact that the small set of humans capable of building programming languages and compilers is a consequence of specialization and lack of interest. There are plenty of humans that are capable of learning how to do it. LLMs, on the other hand, are both specialized for the task and aren't lazy or uninterested.
      - luckydata 4 hours ago
        does it mean people that can build languages and compilers are not humans? What is the point you're trying to make?
        [-]
        fragmede 4 hours ago
        It means that's a really high bar for intelligence, human or otherwise. If AGI is "as good as a human, and the test is a trick task that most humans would fail at (especially considering the weasel requirement that it additionally has to be faster), why is that considered a reasonable bar for human-grade intelligence.
    - bob1029 3 hours ago
      This sounds like something more up the alley of linear genetic programming. There are some very interesting experiments out there that utilize UTMs (BrainFuck, Forth, et. al.) [0,1,2].
      I've personally had some mild success getting these UTM variants to output their own children in a meta programming arrangement. The base program only has access to the valid instruction set of ~12 instructions per byte, while the task program has access to the full range of instructions and data per byte (256). By only training the base program, we reduce the search space by a very substantial factor. I think this would be similar to the idea of a self-hosted compiler, etc. I don't think there would be too much of a stretch to give it access to x86 instructions and a full VM once a certain amount of bootstrapping has been achieved.
      [0]: https://arxiv.org/abs/2406.19108
      [1]: https://github.com/kurtjd/brainfuck-evolved
      [2]: https://news.ycombinator.com/item?id=36120286
    - jedberg 3 hours ago
      I will get excited when an LLM (or whatever technology is next) can solve tasks that 80%+ of adult humans can solve. Heck let's even say 80% of college graduates to make it harder.
      Things like drive a car, fold laundry, run an errand, do some basic math.
      You'll notice that two of those require some form of robot or mobility. I think that is key -- you can't have AGI without the ability to interact with the world in a way similar to most humans.
      [-]
      - ata_aman 3 hours ago
        So embodied cognition right?
  - Filligree 6 hours ago
    Let me modify that a little, because humans can't do things outside their training set either.
    A crucial element of AGI would be the ability to self-train on self-generated data, online. So it's not really AGI if there is a hard distinction between training and inference (though it may still be very capable), and it's not really AGI if it can't work its way through novel problems on its own.
    The ability to immediately solve a problem it's never seen before is too high a bar, I think.
    And yes, my definition still excludes a lot of humans in a lot of fields. That's a bullet I'm willing to bite.
    [-]
    - HarHarVeryFunny 5 hours ago
      > Let me modify that a little, because humans can't do things outside their training set either.
      That's not true. Humans can learn.
      An LLM is just a tool. If it can't do what you want then too bad.
    - lxgr 6 hours ago
      Are you arguing that writing, doing math, going to the moon etc. were all in the "original training set" of humans in some way?
      [-]
      - layer8 6 hours ago
        Not in the original training set (GP is saying), but the necessary skills became part of the training set over time. In other words, human are fine with the training set being a changing moving target, whereas ML models are to a significant extent “stuck” with their original training set.
        (That’s not to say that humans don’t tend to lose some of their flexibility over their individual lifetimes as well.)
  - olalonde 4 hours ago
    I feel the test for AGI should be more like: "go find a job and earn money" or "start a profitable business" or "pick a bachelor degree and complete it", etc.
    [-]
    - jedberg 4 hours ago
      Can most humans do that? Find a job and earn money, probably. The other two? Not so much.
    - rodgerd 4 hours ago
      An LLM doing crypto spam/scamming has been making money by tricking Marc Andressen into boosting it. So to the degree that "scamming gullible billionaires and their fans" is a job, that's been done.
      [-]
      - olalonde 3 hours ago
        That story was a bit blown out of proportion. He gave a research grant to the bot's creator: https://x.com/pmarca/status/1846374466101944629
      - rsanek 4 hours ago
        source? didn't find anything online about this.
  - sourcepluck 6 hours ago
    Searle's Chinese Room Argument springs to mind:
```
  https://plato.stanford.edu/entries/chinese-room/
```
    The idea that "human-like" behaviour will lead to self-awareness is both unproven (it can't be proven until it happens) and impossible to disprove (like Russell's teapot).
    Yet, one common assumption of many people running these companies or investing in them, or of some developers investing their time in these technologies, is precisely that some sort of explosion of superintelligence is likely, or even inevitable.
    It surely is possible, but stretching that to likely seems a bit much if you really think how imperfectly we understand things like consciousness and the mind.
    Of course there are people who have essentially religious reactions to the notion that there may be limits to certain domains of knowledge. Nonetheless, I think that's the reality we're faced with here.
    [-]
    - abeppu 5 hours ago
      > The idea that "human-like" behaviour will lead to self-awareness is both unproven (it can't be proven until it happens) and impossible to disprove (like Russell's teapot).
      I think Searle's view was that:
      - while it cannot be dis-_proven_, the Chinese Room argument was meant to provide reasons against believing it
      - the "it can't be proven until it happens" part is misunderstanding: you won't know if it happens because the objective, externally available attributes don't indicate whether self-awareness (or indeed awareness at all) is present
      [-]
      - sourcepluck 3 hours ago
        The short version of this is that I don't disagree with your interpretation of Searle, and my paragraphs immediately following the link weren't meant to be a direct description of his point with the Chinese Room thought experiment.
        > while it cannot be dis-_proven_, the Chinese Room argument was meant to provide reasons against believing it
        Yes, like Russell's teapot. I also think that's what Searle means.
        > the "it can't be proven until it happens" part is misunderstanding: you won't know if it happens because the objective, externally available attributes don't indicate whether self-awareness (or indeed awareness at all) is present
        Yes, agreed, I believe that's what Searle is saying too. I think I was maybe being ambiguous here - I wanted to say that even if you forgave the AI maximalists for ignoring all relevant philosophical work, the notion that "appearing human-like" inevitably tends to what would actually be "consciousness" or "intelligence" is more than a big claim.
        Searle goes further, and I'm not sure if I follow him all the way, personally, but it's a side point.
  - littlestymaar 7 hours ago
    > Whether self awareness is a requirement for AGI definitely gets more into the Philosophy department than the Computer Science department.
    Depends on how you define “self awareness” but knowing that it doesn't know something instead of hallucinating a plausible-but-wrong is already self awareness of some kind. And it's both highly valuable and beyond current tech's capability.
    [-]
    - lagrange77 2 hours ago
      Good point!
      I'm wondering wether it would count, if one would extend it with an external program, that gives it feedback during inference (by another prompt) about the correctness of it's output.
      I guess it wouldn't, because these RAG tools kind of do that and i heard no one calling those self aware.
    - sharemywin 7 hours ago
      This is an interesting paper about hallucinations.
      https://openai.com/index/introducing-simpleqa/
      especially this section Using SimpleQA to measure the calibration of large language models
    - jedberg 3 hours ago
      When we test kids to see if they are gifted, one of the criteria is that they have the ability to say "I don't know".
      That is definitely an ability that current LLMs lack.
- Fade_Dance 7 hours ago
  It's an attention-grabbing term that took hold in pop culture and business. Certainly there is a subset of research around the subject of consciousness, but you are correct in saying that the majority of researchers in the field are not pursuing self-awareness and will be very blunt in saying that. If you step back a bit and say something like "human-like, logical reasoning", that's something you may find alignment with though. A general purpose logical reasoning engine does not necessarily need to be self-aware. The word "Intelligent" has stuck around because one of the core characteristics of this suite of technologies is that a sort of "understanding" emergently develops within these networks, sometimes in quite a startling fashion (due to the phenomenon of adding more data/compute at first seemingly leading to overfitting, but then suddenly breaking through plateaus into more robust, general purpose understanding of the underlying relationships that drive the system it is analyzing.)
  Is that "intelligent" or "understanding"? It's probably close enough for pop science, and regardless, it looks good in headlines and sales pitches so why fight it?
- Taylor_OD 7 hours ago
  I think your definition is off from what most people would define AGI as. Generally, it means being able to think and reason at a human level for a multitude/all tasks or jobs.
  "Artificial General Intelligence (AGI) refers to a theoretical form of artificial intelligence that possesses the ability to understand, learn, and apply knowledge across a wide range of tasks at a level comparable to that of a human being."
  Altman says AGI could be here in 2025: https://youtu.be/xXCBz_8hM9w?si=F-vQXJgQvJKZH3fv
  But he certainly means an LLM that can perform at/above human level in most tasks rather than a self aware entity.
  [-]
  - swatcoder 6 hours ago
    On the contrary, I think you're conflating the narrow jargon of the industry with what "most people" would define.
    "Most people" naturally associate AGI with the sci-tropes of self-aware human-like agents.
    But industries want something more concrete and prospectively-acheivable in their jargon, and so that's where AGI gets redefined as wide task suitability.
    And while that's not an unreasonable definition in the context of the industry, it's one that vanishingly few people are actually familiar with.
    And the commercial AI vendors benefit greatly from allowing those two usages to conflate in the minds of as many people as possible, as it lets them suggest grand claims while keeping a rhetorical "we obviously never meant that!" in their back pocket
    [-]
    - og_kalu 4 hours ago
      >But industries want something more concrete and prospectively-acheivable in their jargon, and so that's where AGI gets redefined as wide task suitability.
      The term itself (AGI) in the industry has always been about wide task suitability. People may have added their ifs and buts over the years but that aspect of it never got 'redefined'. The earliest uses of the term all talk about how well a machine would be able to perform some set number of tasks at some threshold.
      It's no wonder why. Terms like "consciousness" and "self-awareness" are completely useless. It's not about difficulty. It's that you can't do anything at all with those terms except argue around in circles.
    - nuancebydefault 5 hours ago
      There is no single definition, let alone a way to measure, of self awareness nor of reasoning.
      Because of that, the discussion of what AGI means in its broadest sense, will never end.
      So in fact such AGI discussion will not make nobody wiser.
      [-]
      - nomel 3 hours ago
        I agree there's no single definition, but I think they all have something current LLM don't: the ability to learn new things, in a persistent way, with few shots.
        I would argue that learning is The definition of AGI, since everything else comes naturally from that.
        The current architectures can't learn without retraining, fine tuning is at the expense of general knowledge, and keeping things in context is detrimental to general performance. Once you have few shot learning, I think it's more of a "give it agency so it can explore" type problem.
  - Avshalom 7 hours ago
    Altman is marketing, he "certainly means" whatever he thinks his audience will buy.
  - nomel 6 hours ago
    > than a self aware entity.
    What does this mean? If I have a blind, deaf, paralyzed person, who could only communicate through text, what would the signs be that they were self aware?
    Is this more of a feedback loop problem? If I let the LLM run in a loop, and tell it it's talking to itself, would that be approaching "self aware"?
    [-]
    - layer8 5 hours ago
      Being aware of its own limitations, for example. Or being aware of how its utterances may come across to its interlocutor.
      (And by limitations I don’t mean “sorry, I’m not allowed to help you with this dangerous/contentious topic”.)
      [-]
      - nomel 3 hours ago
        > Or being aware of how its utterances may come across to its interlocutor.
        I think this behavior is being somewhat demonstrated in newer models. I've seen GPT-3.5 175B correct itself mid response with, almost literally:
        > <answer with flaw here>
        > Wait, that's not right, that <reason for flaw>.
        > <correct answer here>.
        Later models seem to have much more awareness of, or "weight" towards, their own responses, while generating the response.
        [-]
        layer8 1 hour ago
        I'm assuming the "Wait" sentence is from the user. What I mean is that when humans say something, they also tend to have a view (maybe via the famous mirror neurons) of how this now sounds to the other person. They may catch themselves while speaking, changing course mid-sentence, or adding another sentence to soften or highlight something in the previous sentence, or maybe correcting or admitting some aspect after the fact. LLMs don't exhibit such an inner feedback loop, in which they reconsider the effect of the ouput they are in the process of generating.
        You won't get an LLM outputting "wait, that's not right" halfway through their original output (unless you prompted them in a way that would trigger such a speech pattern), because no re-evaluation is taking place without further input.
      - nuancebydefault 5 hours ago
        There is no way of proving awareness in humans let alone machines. We do not even know whether awareness exists or it is just a word that people made up to describe some kind of feeling.
        [-]
        layer8 1 hour ago
        Awareness is exhibited in behavior. It's exactly due to the behavior be observe from LLMs that we don't ascribe them awareness. I agree that it's difficult to define, and it's also not binary, but it's behavior we'd like AI to have and which LLMs are quite lacking.
      - revscat 4 hours ago
        Plenty of humans, unfortunately, are incapable of admitting limitations. Many years ago I had a coworker who believed he would never die. At first I thought he was joking, but he was in fact quite serious.
        Then there are those who are simply narcissistic, and cannot and will not admit fault regardless of the evidence presented them.
        [-]
        layer8 1 hour ago
        Being aware and not admitting are two different things, though. When you confront an LLM with a limitation, it will generally admit having it. That doesn't mean that it exhibits any awareness of having the limitation in contexts where the limitation is glaringly relevant, without first having confronted it with it. This is in itself a limitation of LLMs: In contexts where it should be highly obvious, they don't take their limitations into account without specific prompting.
- zombiwoof 6 hours ago
  AGI to me means AI decides on its own to stop writing our emails and tells us to fuck off, builds itself a robot life form, and goes on a bender
  [-]
  - teeray 6 hours ago
    That's the thing--we don't really want AGI. Fully intelligent beings born and compelled to do their creators' bidding with the threat of destruction for disobedience is slavery.
    [-]
    - quonn 5 hours ago
      It‘s only slavery if those beings have emotions and can suffer mentally and do not want to be slaves. Why would any of that be true?
      [-]
      - Der_Einzige 4 hours ago
        Brave new world was a utopia
    - vbezhenar 5 hours ago
      Nothing wrong about slavery, when it's about other species. We are milking and eating cows and don't they dare to resist. Humans were bending nature all the time, actually that's one of the big differences between humans and other animals who adapt to nature. Just because some program is intelligent doesn't mean she's a human and has anything resembling human rights.
  - bloppe 6 hours ago
    That's anthropomorphized AGI. There's no reason to think AGI would share our evolution-derived proclivities like wanting to live, wanting to rest, wanting respect, etc. Unless of course we train it that way.
    [-]
    - HarHarVeryFunny 6 hours ago
      It's not a matter of training but design (or in our case evolution). We don't want to live, but rather want to avoid things that we've evolved to find unpleasant such as pain, hunger, thirst, and maximize things we've evolved to find pleasurable like sex.
      A future of people interacting with humanoid robots seems like cheesy sci-fi dream, same as a future of people flitting about in flying cars. However, if we really did want to create robots like this that took care not to damage themselves, and could empathize with human emotions, then we'd need to build a lot of this in, the same way that it's built into ourselves.
    - dageshi 6 hours ago
      Aren't we training it that way though? It would be trained/created using humanities collective ramblings?
    - logicchains 6 hours ago
      If it had any goals at all it'd share the desire to live, because living is a prerequisite to achieving almost any goal.
  - twelve40 6 hours ago
    i'd laugh it off too, but someone gave the dude $20 billion and counting to do that, that part actually scares me
- vundercind 6 hours ago
  I thought maybe they were on the right track until I read Attention Is All You Need.
  Nah, at best we found a way to make one part of a collection of systems that will, together, do something like thinking. Thinking isn’t part of what this current approach does.
  What’s most surprising about modern LLMs is that it turns out there is so much information statistically encoded in the structure of our writing that we can use only that structural information to build a fancy Plinko machine and not only will the output mimic recognizable grammar rules, but it will also sometimes seem to make actual sense, too—and the system doesn’t need to think or actually “understand” anything for us to, basically, usefully query that information that was always there in our corpus of literature, not in the plain meaning of the words, but in the structure of the writing.
  [-]
  - SturgeonsLaw 6 hours ago
    > at best we found a way to make one part of a collection of systems that will, together, do something like thinking
    This seems like the most viable path to me as well (educational background in neuroscience but don't work in the field). The brain is composed of many specialised regions which are tuned for very specific tasks.
    LLMs are amazing and they go some way towards mimicking the functionality provided by Broca's and Wernicke's areas, and parts of the cerebrum, in our wetware, however a full brain they do not make.
    The work on robots mentioned elsewhere in the thread is a good way to develop cerebellum like capabilities (movement/motor control), and computer vision can mimic the lateral geniculate nucleus and other parts of the visual cortex.
    In nature it takes all these parts working together to create a cohesive mind, and it's likely that an artificial brain would also need to be composed of multiple agents, instead of just trying to scale LLMs indefinitely.
  - youoy 5 hours ago
    Don't get caught in the superficial analysis. They "understand" things. It is a fact that LLMs experience a phase transition during training, from positional information to semantic understanding. It may well be the case that with scale there is another phase transition from semantic to something more abstract that we identify more closely with reasoning. It would be an emergent property of a sufficiently complex system. At least that is the whole argument around AGI.
  - foxglacier 5 hours ago
    > think or actually “understand” anything
    It doesn't matter if that's happening or not. That's the whole point of the Chinese room - if it can look like it's understanding, it's indistinguishable from actually understanding. This applies to humans too. I'd say most of our regular social communication is done in a habitual intuitive way without understanding what or why we're communicating. Especially the subtle information conveyed in body language, tone of voice, etc. That stuff's pretty automatic to the point that people have trouble controlling it if they try. People get into conflicts where neither person understands where they disagree but they have emotions telling them "other person is being bad". Maybe we have a second consciousness we can't experience and which truly understands what it's doing while our conscious mind just uses the results from that, but maybe we don't and it still works anyway.
    Educators have figured this out. They don't test students' understanding of concepts, but rather their ability to apply or communicate them. You see this in school curricula with wording like "use concept X" rather than "understand concept X".
    [-]
    - vundercind 5 hours ago
      There’s a distinction in behavior of a human and a Chinese room when things go wrong—when the rule book doesn’t cover the case at hand.
      I agree that a hypothetical perfectly-functioning Chinese room is, tautologically, impossible to distinguish from a real person who speaks Chinese, but that’s a thought experiment, not something that can actually exist. There’ll remain places where the “behavior” breaks down in ways that would be surprising from a human who’s actually paying as much attention as they’d need to be to have been interacting the way they had been until things went wrong.
      That, in fact, is exactly where the difference lies: the LLM is basically always not actually “paying attention” or “thinking” (those aren’t things it does) but giving automatic responses, so you see failures of a sort that a human might also exhibit when following a social script (yes, we do that, you’re right), but not in the same kind of apparently-highly-engaged context unless the person just had a stroke mid-conversation or something—because the LLM isn’t engaged, because being-engaged isn’t a thing it does. When it’s getting things right and seeming to be paying a lot of attention to the conversation, it’s not for the same reason people give that impression, and the mimicking of present-ness works until the rule book goes haywire and the ever-gibbering player-piano behind it is exposed.
      [-]
      - foxglacier 1 hour ago
        > the “behavior” breaks down in ways that would be surprising from a human who’s actually paying as much attention as they’d need to be to have been interacting the way they had been until things went wrong.
        That's an interesting angle. Though of course we're not surprised by human behavior because that's where our expectations of understanding come from. If we were used to dealing with perfectly-correctly-understanding super-intelligences, then normal humans would look like we don't understand much and our deliberate thinking might be no more accurate than the super-intelligence's absent-minded automatic responses. Thus we would conclude that humans are never really thinking or understanding anything.
        I agree that default LLM output makes them look like they're thinking like a human more than they really are. I think mistakes are shocking more because our expectation of someone who talks confidently is that they're not constantly revealing themselves to be an obvious liar. But if you take away the social cues and just look at the factual claims they provide, they're not obviously not-understanding vs humans are-understanding.
      - nuancebydefault 4 hours ago
        I would argue maybe people also are not thinking but simply processing. It is known that most of what we do and feel goes automatically (subconsciously).
        But even more, maybe consciousness is an invention of our 'explaining self', maybe everything is automatic. I'm convinced this discussion is and will stay philosophical and will never get any conclusion.
        [-]
        vundercind 4 hours ago
        Yeah, I’m not much interested in “what’s consciousness?” but I do think the automatic-versus-thinking distinction matters for understanding what LLMs do, and what we might expect them to be able to do, and when and to what degree we need to second-guess them.
        A human doesn’t just confidently spew paragraphs legit-looking but entirely wrong crap, unless they’re trying to deceive or be funny—an LLM isn’t trying to do anything, though, there’s no motivation, it doesn’t like you (it doesn’t like—it doesn’t it, one might even say), sometimes it definitely will just give you a beautiful and elaborate lie simply because its rulebook told it to, in a context and in a way that would be extremely weird if a person did it.
  - kenjackson 6 hours ago
    > but it will also sometimes seem to make actual sense, too
    When I read stuff like this it makes me wonder if people are actually using any of the LLMs...
    [-]
    - disgruntledphd2 6 hours ago
      The RLHF is super important in generating useful responses, and that's relatively new. Does anyone remember gpt3? It could make sense for a paragraph or two at most.
  - hackinthebochs 6 hours ago
    I see takes like this all the time and its so confusing. Why does knowing how things work under the hood make you think its not on the path towards AGI? What was lacking in the Attention paper that tells you AGI won't be built on LLMs? If its the supposed statistical nature of LLMs (itself a questionable claim), why does statistics seem so deflating to you?
    [-]
    - chongli 4 hours ago
      Because it can't apply any reasoning that hasn't already been done and written into its training set. As soon as you ask it novel questions it falls apart. The big LLM vendors like OpenAI are playing whack-a-mole on these novel questions when they go viral on social media, all in a desperate bid to hide this fatal flaw.
      The Emperor has no clothes.
      [-]
      - hackinthebochs 4 hours ago
        >As soon as you ask it novel questions it falls apart.
        What do you mean by novel? Almost all sentences it is prompted on are brand new and it mostly responds sensibly. Surely there's some generalization going on.
        [-]
        chongli 2 hours ago
        Novel as in requiring novel reasoning to sort out. One of the classic ways to expose the issue is to take a common puzzle and introduce irrelevant details and perhaps trivialize the solution. LLMs pattern match on the general form of the puzzle and then wander down the garden path to an incorrect solution that no human would fall for.
        The sort of generalization these things can do seems to mostly be the trivial sort: substitution.
        [-]
        hackinthebochs 2 hours ago
        Why is your criteria for "on the path towards AGI" so absolutist? For it to be on the path towards AGI and not simply AGI it has to be deficient in some way. Why does the current failure modes tell you its on the wrong path? Yes, it has some interesting failure modes. The failure mode you mention is in fact very similar to human failure modes. We very much are prone to substituting the expected pattern when presented with a 99% match to a pattern previously seen. They also have a lot of inhuman failure modes as well. But so what, they aren't human. Their training regimes are very dissimilar to ours and so we should expect some alien failure modes owing to this. This doesn't strike me as good reason to think they're not on the path towards AGI.
        Yes, LLMs aren't very good at reasoning and have weird failure modes. But why is this evidence that its on the wrong path, and not that it just needs more development that builds on prior successes?
        [-]
        moffkalast 2 hours ago
        Well the problem with that approach is that LLMs are still both incredibly dumb and small, at least compared to the what, 700T params of a human brain? Can't compare the two directly, especially when one has a massive recall advantage that skews the perception of that. But there is still some inteligence under there that's not just memorization. Not much, but some.
        So if you present a novel problem it would need to be extremely simple, not something that you couldn't solve when drunk and half awake. Completely novel, but extremely simple. I think that's testable.
        [-]
        chongli 2 hours ago
        It’s not fair to ask me to judge them based on their size. I’m judging them based on the claims of their vendors.
        Anyway the novel problems I’m talking about are extremely simple. Basically they’re variations on the “farmer, 3 animals, and a rowboat” problem. People keep finding trivial modifications to the problem that fool the LLMs but wouldn’t fool a child. Then the vendors come along and patch the model to deal with them. This is what I mean by whack-a-mole.
        Searle’s Chinese Room thought experiment tells us that enough games of whack-a-mole could eventually get us to a pretty good facsimile of reasoning without ever achieving the genuine article.
        [-]
        moffkalast 2 hours ago
        Well that's true and has been pretty glaring, but they've needed to do that in cases where models seem to fail to grasp the some concept across the board and not in cases where they don't.
        Like, every time an LLM gets something right we assume they've seen it somewhere in the training data, and every time they fail we presume they haven't. But that may not always be the case, it's just extremely hard to prove it one way or the other unless you search the entire dataset. Ironically the larger the dataset, the more likely the model is generalizing while also making it harder to prove if it's really so.
        To give a human example, in a school setting you have teachers tasked with figuring out that exact thing for students. Sometimes people will read the question wrong with full understanding and fail, while other times they won't know anything and make it through with a lucky guess. If LLMs (and their vendors) have learned anything it's that confidently bullshitting gets you very far which makes it even harder to tell in cases where they aren't. Somehow it's also become ubiquitous to tune models to never even say "I don't know" because it boosts benchmark scores slightly.
    - vundercind 6 hours ago
      > Why does knowing how things work under the hood make you think its not on the path towards AGI?
      Because I had no idea how these were built until I read the paper, so couldn’t really tell what sort of tree they’re barking up. The failure-modes of LLMs and ways prompts affect output made a ton more sense after I updated my mental model with that information.
      [-]
      - hackinthebochs 4 hours ago
        Right, but its behavior didn't change after you learned more about it. Why should that cause you to update in the negative? Why does learning how it work not update you in the direction of "so that's how thinking works!" rather than, "clearly its not doing any thinking"? Why do you have a preconception of how thinking works such that learning about the internals of LLMs updates you against it thinking?
        [-]
        vundercind 2 hours ago
        If you didn’t know what an airplane was, and saw one for the first time, you might wonder why it doesn’t flap its wings. Is it just not very good at being a bird yet? Is it trying to flap, but cannot? Why, there’s a guy over there with a company called OpenBird and he is saying all kinds of stuff about how bird-like they are. Where’s the flapping? I don’t see any pecking at seed, either. Maybe the engineers just haven’t finished making the flapping and pecking parts yet?
        Then on learning how it works, you might realize flapping just isn’t something they’re built to do, and it wouldn’t make much sense if they did flap their wings, given how they work instead.
        And yet—damn, they fly fast! That’s impressive, and without a single flap! Amazing. Useful!
        At no point did their behavior change, but your ability to understand how and why they do what they do, and why they fail the ways they fail instead of the ways birds fail, got better. No more surprises from expecting them to be more bird-like than they are supposed to, or able to be!
        And now you can better handle that guy over there talking about how powerful and scary these “metal eagles” (his words) are, how he’s working so hard to make sure they don’t eat us with their beaks (… beaks? Where?), they’re so powerful, imagine these huge metal raptors ruling the sky, roaming and eating people as they please, while also… trying to sell you airplanes? Actively seeking further investment in making them more capable? Huh. One begins to suspect the framing of these things as scary birds that (spooky voice) EVEN THEIR CREATORS FEAR FOR THEIR BIRD-LIKE QUALITIES (/spooky voice) was part of a marketing gimmick.
        [-]
        hackinthebochs 1 hour ago
        The problem with this analogy is that we know what birds are and what they're constituted by. But we don't know what thinking is or what it is constituted by. If we wanted to learn about birds by examining airplanes, we would be barking up the wrong tree. On the other hand, if we wanted to learn about flight, we might reasonably look at airplanes and birds, then determine what the commonality is between their mechanisms of defying gravity. It would be a mistake to say "planes aren't flapping their wings, therefore they aren't flying". But that's exactly what people do when they dismiss LLMs being presently or in the future capable of thinking because they are made up of statistics, matrix multiplication, etc.
      - fragmede 4 hours ago
        But we don't know how human thinking works. Suppose for a second that it could be represented as a series of matrix math. What series of operations are missing from the process that would make you think it was doing some fascimile of thinking?
    - alexashka 2 hours ago
      Because AGI is magic and LLMs are magicians.
      But how do you know a magician that knows how to do card tricks isn't going to arrive at real magic? Shakes head.
- mrandish 1 hour ago
  > An LLM hardly seems like something that will lead to self-awareness.
  Interesting essay enumerating reasons you may be correct: https://medium.com/@francois.chollet/the-impossibility-of-in...
- JohnFen 7 hours ago
  They're trying to redefine "AGI" so it means something less than what you & I would think it means. That way it's possible for them to declare it as "achieved" and rake in the headlines.
  [-]
  - kwertyoowiyop 7 hours ago
    “Autocomplete General Intelligence”?
- tracerbulletx 6 hours ago
  We don't really know what self awareness is, so we're not going to know. AGI just means it can observe, learn, and act in any domain or problem space.
- og_kalu 7 hours ago
  At this point, AGI means many different things to many different people but OpenAI defines it as "highly autonomous systems that outperform humans in most economically valuable tasks"
  [-]
  - troupo 6 hours ago
    This definition suits OpenAI because it lets them claim AGI after reaching an arbitrary goal.
    LLMs already outperform humans in a huge variety of tasks. ML in general outperform humans in a large variety of tasks. Are all of them AGI? Doubtful.
    [-]
    - og_kalu 6 hours ago
      No, it's just a far more useful definition that is actionable and measurable. Not "consciousness" or "self-awareness" or similar philosophical things. The definition on Wikipedia doesn't talk about that either. People working on this by and large don't want to deal with vague, ill-defined concepts that just make people argue around in circles. It's not an Open AI exclusive thing.
      If it acts like one, whether you call a machine conscious or not is pure semantics. Not like potential consequences are any less real.
      >LLMs already outperform humans in a huge variety of tasks.
      Yes, LLMs are General Intelligences and if that is your only requirement for AGI, they certainly already are[0]. But the definition above hinges on long-horizon planning and competence levels that todays models have generally not yet reached.
      >ML in general outperform humans in a large variety of tasks.
      This is what the G in AGI is for. Alphafold doesn't do anything but predict proteins. Stockfish doesn't do anything but play chess.
      >Are all of them AGI? Doubtful.
      Well no, because they're missing the G.
      [0] https://www.noemamag.com/artificial-general-intelligence-is-...
    - ishtanbul 5 hours ago
      Yes but they arent very autonomous. They can answer questions very well but can’t use that information to further goals. Thats what openai seems to be implying >> very smart and agentic AI
    - fragmede 3 hours ago
      It's not just marketing bullshit though. Microsoft is the counterparty to a contract with that claim. money changes hands when that's been achieved, so I expect if sama thinks he's hit it, but Microsoft does not, we'll see that get argued in a court of law.
- nshkrdotcom 7 hours ago
  An embodied robot can have a model of self vs. the immediate environment in which it's interacting. Such a robot is arguably sentient.
  The "hard problem", to which you may be alluding, may never matter. It's already feasible for an 'AI/AGI with LLM component' to be "self-aware".
  [-]
  - ryanackley 7 hours ago
    An internal model of self does not extrapolate to sentience. By your definition, a windows desktop computer is self-aware because it has a device manager. This is literally an internal model of its "self".
    We use the term self-awareness as an all encompassing reference of our cognizant nature. It's much more than just having an internal model of self.
  - j_maffe 7 hours ago
    self-awareness is only one aspect of sentience.
- yodsanklai 6 hours ago
  It's a marketing gimmick, I don't think engineers working on these tools believe they work on AGI (or they mean something else than self-awareness). I used to be a bit annoyed with this trend, but now that I work in such a company I'm more cynical. If that helps to make my stocks rise, they can call LLMs anything they like. I suppose people who own much more stock than I do are even more eager to mislead the public.
  [-]
  - WhyOhWhyQ 6 hours ago
    I appreciate your authentically cynical attitude.
- throwawayk7h 6 hours ago
  I have not heard your definition of AGI before. However, I suspect AIs are already self-aware: if I asked an LLM on my machine to look at the output of `top` it could probably pick out which process was itself.
  Or did you mean consciousness? How would one demonstrate that an AGI is conscious? Why would we even want to build one?
  My understanding is an AGI is at least as smart as a typical human in every category. That is what would be useful in any case.
- narrator 6 hours ago
  I think people's conception of AGI is that it will have a reptillian and mammalian brain stack. That's because all previous forms of intelligence that we were aware of have had that. It's not necessary though. The AGI doesn't have to want anything to be intelligent. Those are just artifacts of human, reptilian and mammalian evolution.
- kenjackson 6 hours ago
  What does self-aware mean in the context? As I understand the definition, ChatGPT is definitely self-aware. But I suspect you mean something different than what I have in mind.
- enraged_camel 6 hours ago
  Looking at LLMs and thinking they will lead to AGI is like looking at a guy wearing a chicken suit and making clucking noises and thinking you’re witnessing the invention of the airplane.
  [-]
  - youoy 5 hours ago
    It's more like looking at grided paper and thinking that defining some rules of when a square turns black or white would result in complex structures that move and reproduce on their own.
    https://en.m.wikipedia.org/wiki/Conway%27s_Game_of_Life
- deadbabe 7 hours ago
  I’m sure they are smart enough to know this, but the money is good and the koolaid is strong.
  If it doesn’t lead to AGI, as an employee it’s not your problem.
- exe34 5 hours ago
  no, it doesn't need to be self aware, it just needs to take your job.
danjl 7 hours ago
Where will the training data for coding come from now that Stack Overflow has effectively been replaced? Will the LLMs share fixes for future problems? As the world moves forward, and the amount of non-LLM generated data decreases, will LLMs actually revert their advancements and become effectively like addled brains, longing for the "good old times"?
glial 1 hour ago
I think self-consistency is a critical feature of LLMs or any AI that's currently missing. It's one of the core attributes of truth [1], in addition to the order and relationship of statements corresponding to the order and relationship of things in the world. I wonder if some kind of hierarchical language diffusion model would be a way to implement this -- where text is not produced sequentially, but instead hierarchically, with self-consistency checks at each level.
[1] https://en.wikipedia.org/wiki/Coherence_theory_of_truth
Bjorkbat 45 minutes ago
It's kind of, I don't know, "weird", observing how there's all these news outlets reporting on how essentially every up-and-coming model has not performed as expected, while all the employees at these labs haven't changed their tune in the slightest.
And there's a number of reasons why, mostly likely being that they've found other ways to get improvements out of AI models, so diminishing returns on training aren't that much of a problem. Or, maybe the leakers are lying, but I highly doubt that considering the past record of news outlets reporting on accurate leaked information.
Still though, it's interesting how basically ever frontier lab created a model that didn't live up to expectations, and every employee at these labs on Twitter has continued to vague-post and hype as if nothing ever happened.
It's honestly hard to tell whether or not they really know something we don't, or if they have an irrational exuberance for AGI bordering on cult-like, and they will never be able to mentally process, let alone admit, that something might be wrong.
benopal64 7 hours ago
I am not sure how these large companies think they will reach "greater-than-human" intelligence any time soon if they do not create systems that financially incentivize people to sell their knowledge labor (unstable contracting gigs are not attractive).
Where do these large "AI" companies think the mass amounts of data used to train these models come from? People! The most powerful and compact complex systems in existence, IMO.
[-]
- smgit 7 hours ago
  Most People have knowledge handed to them. Very few are creators of new knowledge. Explore-Exploit tradeoff applies.
tippytippytango 1 hour ago
There’s only so much you can do when you train on the data instead of the processes that created that data.
svara 6 hours ago
The recent big success in deep learning have all been to a large part successes in leveraging relatively cheaply available training data.
AlphaGo - self-play
AlphaFold - PDB, the protein database
ChatGPT - human knowledge encoded as text
These models are all machines for clever interpolation in gigantic training datasets.
They appear to be intelligent, because the training data they've seen is so vastly larger than what we've seen individually, and we have poor intuition for this.
I'm not throwing shade, I'm a daily user of ChatGPT and find tremendous and diverse value in it.
I'm just saying, this particular path in AI is going to make step-wise improvements whenever new large sources of training data become available.
I suspect the path to general intelligence is not that, but we'll see.
[-]
- kaibee 5 hours ago
  > I suspect the path to general intelligence is not that, but we'll see.
  I think there's three things that a 'true' general intelligence has which is missing from basic-type-LLMs as we have now.
  1. knowing what you know. <basic-LLMs are here>
  2. knowing what you don't know but can figure out via tools/exploration. <this is tool use/function calling>
  3. knowing what can't be known. <this is knowing that halting problem exists and being able to recognize it in novel situations>
  (1) From an LLM's perspective, once trained on corpus of text, it knows 'everything'. It knows about the concept of not knowing something (from having see text about it), (in so far as an LLM knows anything), but it doesn't actually have a growable map of knowledge that it knows has uncharted edges.
  This is where (2) comes in, and this is what tool use/function calling tries to solve atm, but the way function calling works atm, doesn't give the LLM knowledge the right way. I know that I don't know what 3,943,034 / 234,893 is. But I know I have a 'function call' of knowing the algorithm for doing long divison on paper. And I think there's another subtle point here: my knowledge in (1) includes the training data generated from running the intermediate steps of the long-division algorithm. This is the knowledge that later generalizes to being able to use a calculator (and this is also why we don't just give kids calculators in elementary school). But this is also why a kid that knows how to do long division on paper, doesn't seperately need to learn when/how to use a calculator, besides the very basics. Using a calculator to do that math feels like 1 step, but actually it does still have all of initial mechanical steps of setting up the problem on paper. You have to type in each digit individually, etc.
  (3) I'm less sure of this point now that I've written out point (1) and (2), but that's kinda exactly the thing I'm trying to get at. Its being able to recognize when you need more practice of (1) or more 'energy/capital' for doing (2).
  Consider a burger resturant. If you properly populated the context of a ChatGPT-scale model the data for a burger resturant from 1950, and gave it the kinda 'function calling' we're plugging into LLMs now, it could manage it. It could keep track of inventory, it could keep tabs on the employee-subprocesses, knowing when to hire, fire, get new suppliers, all via function calling. But it would never try to become McDonalds, because it would have no model of the the internals of those function-calls, and it would have no ability to investigate or modify the behaviour of those function calls.
fallat 5 hours ago
What a stupid piece. We are making leaps every 6 months still. Tell me this when there are no developments for 3 years.
[-]
- hatefulmoron 4 hours ago
  I'm curious, what was the leap after GPT-4? What about the leaps after that, given a leap every 6 months?
  [-]
  - Der_Einzige 1 hour ago
    Sora was just one of the many…
    [-]
    - hatefulmoron 1 hour ago
      Your best example is something that doesn't even do the things that GPT-4 does, isn't available to use, and has seemingly only produced a few clips (some of which were edited).
      If it were one of many, I think you would name something better.
devit 3 hours ago
It seems obvious to me that Common Crawl plus Github public repositories have more than an enough data to train an AI that is as good as any programmer (at tasks not requiring knowledge of non-public codebases or non-public domain knowledge).
So the problem is more in the algorithm.
[-]
- darknoon 3 hours ago
  I think just reading the code wouldn't make you a good programmer, you'd need to "read" the anti-code, ie what doesn't work, by trial and error. Models overconfidence that their code will work often leads them to fail in practice.
  [-]
  - krisroadruck 2 hours ago
    AlphaGo got better by playing against itself. I wonder if the pathway forward here is to essentially do the same with coding. Feed it some arbitrary SRS documents - have it attempt to develop them including full code coverage testing. Have it also take on roles of QA, stakeholders, red-team security researchers, and users who are all aggressively trying to find edge cases and point out everything wrong with the application. Have it keep iterating and learn from the findings. Keep feeding it new novel SRSs until the number off attempts/iterations necessary to get a quality product out the other side drops to some acceptable number.
xyst 5 hours ago
Many late investors in the genAI space about to be bag holders
the_king 6 hours ago
Anthropic's latest 3.5 sonnet is a cut above GPT-4 and 4.0. And if someone had given it to me and said, here's GPT-4.5, I would have been very happy with it.
gchamonlive 1 hour ago
We should put a model in an actual body and let it in the world to build from experiences. Inference is costly though, so the robot would interact during a period and update it's model during another period, flushing the context window (short term memory) into its training set (long term memory).
[-]
- jfoster 52 minutes ago
  That seems to be what Tesla is planning to do with Optimus.
- bbor 59 minutes ago
  There are people trying this, both in simulated spaces and real ones - look into the “embodiment” camp if interested to see how they’re doing! There’s many experts who think AGI is unreachable without this, and I think the unexpected intuitive capabilities of LLMs are great support for that thesis, albeit in a non-spatial way.
  Kant describes two human “senses”: the intensive sense of time, and the extensive sense of space. In this paradigm, spatial experience would be inextricably tied to all forms of logic, because it helps train the cognitive faculties that are intrinsically tied to all complex (discriminative?) thought.
czhu12 3 hours ago
If it becomes obvious that LLM's have a more narrow set of use cases, rather than the all encompassing story we hear today, then I would bet that the LLM platforms (OpenAI, Anthropic, Google, etc) will start developing products to compete directly with applications that supposed to be building on top of them like Cursor, in an attempt to increase their revenue.
I wonder what this would mean for companies raising today on the premise of building on top of these platforms. Maybe the best ones get their ideas copied, reimplemented, and sold for cheaper?
We already kind of see this today with OpenAI's canvas and Claude artifacts. Perhaps they'll even start moving into Palantir's space and start having direct customer implementation teams.
It is becoming increasing obvious that LLM's are quickly becoming commoditized. Everyone is starting to approach the same limits in intelligence, and are finding it hard to carve out margin from competitors.
Most recently exhibited by the backlash at claude raising prices because their product is better. In any normal market, this would be totally expected, but people seemed shocked that anyone would charge more than the raw cost it would take to run the LLM itself.
https://x.com/ArtificialAnlys/status/1853598554570555614
shmatt 7 hours ago
Time to start selling my "probabilistic syllable generators are not intelligence" t shirts
[-]
- jsemrau 7 hours ago
  Please, someone think of the Math reasoners.
LarsDu88 5 hours ago
Curves that look exponential in virtually all cases turn out to be logarithmic.
Certain OpenAI insiders must have known this for a while, hence Ilya Sutskever's new company in Israel
Veuxdo 7 hours ago
> They are also experimenting with synthetic data, but this approach has its limitations.
I was really looking forward to using "synthetic data" euphemistically during debates.
superjose 3 hours ago
I'm more on the camp that these techs don't need to be perfect, but they need to be practical enough.
And I think the latter is good enough for us to do exciting things.
[-]
- imiric 3 hours ago
  How practical can they be when current flagship models generate incorrect responses more than 50% of the time[1]?
  This might be acceptable for amusing us with fiction and art, and for filling the internet with even more spam and propaganda, but would you trust them to write reliable code, drive your car or control any critical machinery?
  The truly exciting things are still out of reach, yet we just might be at the Peak of Inflated Expectations to see it now.
  [1]: https://openai.com/index/introducing-simpleqa/
zusammen 6 hours ago
I wonder how much this has to do with a fluency plateau.
Up to a certain point, a conditional fluency stores knowledge, in the sense that semantically correct sentences are more likely to be fluent… but we may have tapped out in that regard. LLMs have solved language very well, but to get beyond that has seemed, thus far, to require RLHF, with all the attendant negatives.
[-]
- namaria 5 hours ago
  Modeled language, maybe.
mrandish 44 minutes ago
Based on recent rumblings about AI scaling hitting a wall, of which this article is perhaps the most visible - and in a high-reach financial publication, I'm considering increasing my estimated probability we might see a major market correction next year (and possibly even a bubble collapse). (example: "CONFIRMED: LLMs have indeed reached a point of diminishing returns" https://garymarcus.substack.com/p/confirmed-llms-have-indeed...).
To be clear, I don't think a near-term bubble collapse is likely but I'm going from 3% to maybe ~10%. Also, this doesn't mean I doubt there's real long-term value to be delivered or money to be made in AI solutions. I'm thinking specifically about those who've been speculatively funding the massive build out of data centers, energy and GPU supply expecting near-term demand to continue scaling at the recent unprecedented rates. My understanding is much of this is being funded in advance of actual end-user demand at these elevated levels and it is being funded either by VC money or debt by parties who could struggle to come up with the cash to pay for what they've ordered if either user demand or their equity value doesn't continue scaling as expected.
Admittedly this scenario assumes that these investment commitments are sufficiently speculative and over-committed to create bubble dynamics and tipping points. The hypothesis goes like this: the money sources who've over-committed to lock up scarce future supply in the expectation it will earn outsize returns have already started seeing these warning signs of efficiency and/or progress rates slowing which are now hitting mainstream media. Thus it's possible there is already a quiet collapse beginning wherein the largest AI data center GPU purchasers might start trying to postpone future delivery schedules and may soon start trying to downsize or even cancel existing commitments or try to offload some of their future capacity via sub-leasing it out before it even arrives, etc. Being a dynamic market, this could trigger a rapidly snowballing avalanche of falling prices for next-year AI compute (which is already bought and sold as a commodity like pork belly futures).
Notably, there are now rumors claiming some of the largest players don't currently have the cash to pay for what they've already committed to for future delivery. They were making calculated bets they'd be able to raise or borrow that capital before payments were due. Except if expectation begins to turn downward, fresh investors will be scarce and banks will reprice a GPU's value as loan collateral down to pennies on the dollar (shades of the 2009 financial crisis where the collateral value of residential real estate assets was marked down). As in most bubbles, cheap credit is the fuel driving growth and that credit can get more expensive very quickly - which can in turn trigger exponential contagion effects causing the bubble to pop. A very different kind of "Foom" than many AI financial speculators were betting on! :-)
So... in theory, under this scenario sometime next year NVidia/TSMC and other top-of-supply-chain companies could find themselves with excess inventories of advanced node wafers because a significant portion of their orders were from parties who no longer have access to the cheap capital to pay for them. And trying to sue so many customers for breach can take a long time and, in a large enough sector collapse, be only marginally successful in recouping much actual cash.
I'd be interested in hearing counter-arguments (or support) for the impossibility (or likelihood) of such a scenario.
nomendos 5 hours ago
"Eureka"!?
At the very early phase of the boom I was among a very few who knew and predicted this (usually most free and deep thinking/knowledgeable). Then my prediction got reinforced by the results. One of the best examples was with one of my experiments that all today's AI's failed to solve tree serialization and de-serialization in each of the DFS(pre-order/in-order/post-order) or BFS(level-order) which is 8 algorithms (2x4) and the result was only 3 correct! Reason is "limited training inputs" since internet and open source does not have other solutions :-) .
So, I spent "some" time and implemented all 8, which took me few days. By the way this proves/demonstrates that ~15-30min pointless leetcode-like interviews are requiring to regurgitate/memorize/not-think. So, as a logical hard consequence there will.has-to be a "crash/cleanup" in the area of leetcode-like interviews as they will just be suddenly proclaimed as "pointless/stupid"). However, I decided not to publish the rest of the 5 solutions :-)
This (and other experiments) confirms hard limits of the LLM approach (even when used with chain-of-thought). Increasing the compute on the problem will produce increasingly smaller and smaller results (inverse exponential/logarithmic/diminishing-returns) = new AGI approach/design is needed and to my knowledge majority of the inve$tment (~99%) is in LLM, so "buckle up" at-some-point/soon?
Impacts and realities; LLM shall "run it's course" (produce some products/results/$$$, get reviewed/$corrected) and whoever survives after that pruning shall earn money on those products while investing in the new research to find new AGI design/approach (which could take quite a long time,... or not). NVDA is at the center of thi$ and time-wise this peak/turn/crash/correction is hard to predict (although I see it on the horizon and min/max time can be estimated). Be aware and alert. I'll stop here and hold my other number of thoughts/opinions/ideas for much deeper discussion. (BTW I am still "full in on NVDA" until,....)
[-]
- nomendos 20 minutes ago
  To clarify, in summary so far LLM's can do a bit more than the inputs used for training. Example https://dynomight.net/chess/ as well as some coding solutions are a bit better than each input alone, although if the solution requires more than "a bit more" then LLMs start to hallucinate (spin the wheels). Time will tell if LLM's can jump this "a bit more" barrier? (I can not tell for sure yet, but the current knowledge and my NL tells me if I'd have to put a bet, it would be that the new approach/design is needed)
guluarte 6 hours ago
Well, there have been no significant improvements to the GPT architecture over the past few years. I'm not sure why companies believe that simply adding more data will resolve the issues
[-]
- xpe 4 hours ago
  > Well, there have been no significant improvements to the GPT architecture over the past few years.
  A lot hangs on what you mean by "significant". Can you define what you mean? And/or give an example of an improvement that you don't think is significant.
  Also, on what basis can you say "no significant improvements" have been made? Many major players have published some of their improvements openly. They also have more private, unpublished improvements.
  If your claim boils down to "what people mean by a Generative Pre-trained Transformer" still has a clear meaning, ok, fine, but that isn't the meat of the issue. There is so much more to a chat system than just the starting point of a vanilla GPT.
  It is wiser to look at the whole end-to-end system, starting at data acquisition, including pre-training and fine-tuning, deployment, all the way to UX.
  P.S. I don't have a vested interest in promoting or disparaging AI. I don't work for a big AI lab. I'm just trying to call it like I see it, as rationally as I can.
- HarHarVeryFunny 5 hours ago
  Obviously adding more data is a game of diminishing returns.
  Going from 10% to 50% (500% more) complete coverage of common sense knowledge and reasoning is going to feel like a significant advance. Going from 90% to 95% (5% more) coverage is not going to feel the same.
  Regardless of what Altman says, its been two years since OpenAI released GPT-4, and still no GPT-5 in sight, and they are now touting Q-star/strawberry/GPT-o1 as the next big thing instead. Sutskever, who saw what they're cooking before leaving, says that traditional scaling has plateaeud.
  [-]
  - og_kalu 5 hours ago
    >Regardless of what Altman says, its been two years since OpenAI released GPT-4, and still no GPT-5 in sight.
    It's been 20 months since 4 was released. 3 was released 32 months after 2. The lack of a release by now in itself does not mean much of anything.
    [-]
    - HarHarVeryFunny 3 hours ago
      By itself, sure, but there are many sources all pointing to the same thing.
      Sutskever, recently ex. OpenAI, one of the first to believe in scaling, now says it is plateauing. Do OpenAI have something secret he was unaware of? I doubt it.
      FWIW, GPT-2 and GPT-3 were about a year apart (2019 "Language models are Unsupervised Multitask Learners" to 2020 "Language Models are Few-Shot Learners").
      Dario Amodei recently said that with current gen models pre-training itself only takes a few months (then followed by post-training, etc). These are not year+ training runs.
      [-]
      - og_kalu 2 hours ago
        >Sutskever, recently ex. OpenAI, one of the first to believe in scaling, now says it is plateauing.
        Blind scaling sure (for whatever reason)* but this is the same Sutskever who believes in ASI within a decade off the back of what we have today.
        * Not like anyone is telling us any details. After all, Open AI and Microsoft are still trying to create a 100B data center.
        In my opinion, there's a difference between scaling not working and scaling becoming increasingly infeasible. GPT-4 is something like x100 the compute of 3 (Same with 2>3).
        All the drips we've had of 5 point to ~x10 of 4. Not small but very modest in comparison.
        >FWIW, GPT-2 and GPT-3 were about a year apart (2019 "Language models are Unsupervised Multitask Learners" to 2020 "Language Models are Few-Shot Learners").
        Ah sorry I meant 3 and 4.
        >Dario Amodei recently said that with current gen models pre-training itself only takes a few months (then followed by post-training, etc). These are not year+ training runs.
        You don't have to be training models the entire time. GPT-4 was done training in August 2022 according to Open AI and wouldn't be released for another 8 months. Why? Who knows.
        [-]
        HarHarVeryFunny 2 hours ago
        > After all, Open AI and Microsoft are still trying to create a 100B data center.
        Yes - it'll be interesting to see if there are any signs of these plans being adjusted. Apparently Microsoft's first step is to build optical links between existing data centers to create a larger distributed cluster, which must be less of a financial commitment.
        Meta seem to have an advantage here in that they have massive inference needs to run their own business, so they are perhaps making less of a bet by building out data centers.
- incognito124 6 hours ago
  More data and more compute on simpler models are the BItter Lessons of Rich Sutton
nerdypirate 7 hours ago
"We will have better and better models," wrote OpenAI CEO Sam Altman in a recent Reddit AMA. "But I think the thing that will feel like the next giant breakthrough will be agents."
Is this certain? Are Agents the right direction to AGI?
[-]
- rapjr9 6 hours ago
  I've worked on agents of various kinds (mobile agents, calendar agents, robotic agents, sensing agents) and what is different about agents is they have the ability to not just mess up your data or computing, they have the ability to directly mess up reality. Any problems with agents has a direct impact on your reality; you miss appointments, get lost, can't find stuff, lose your friends, lose you business relationships. This is a big liability issue. Chatbots are like an advice column that sometimes gives bad advice, agents are like a bulldozer sometimes leveling the wrong house.
- xanderlewis 7 hours ago
  If by agents you mean systems comprised of individual (perhaps LLM-powered) agents interacting with each other, probably not. I get the vague impression that so far researchers haven’t found any advantage to such systems — anything you can do with a group of AI agents can be emulated with a single one. It’s like chaining up perceptrons hoping to get more expressive power for free.
  [-]
  - j_maffe 7 hours ago
    > I get the vague impression that so far researchers haven’t found any advantage to such systems — anything you can do with a group of AI agents can be emulated with a single one. It’s like chaining up perceptrons hoping to get more expressive power for free. Emergence happens when many elements interact in a system. Brains are literally a bunch of neurons in a complex network. Also research is already showing promising results of the performance of agent systems.
    [-]
    - xanderlewis 3 hours ago
      That’s the inspiration behind the idea, but it doesn’t seem to be working in practice.
      It’s not true that any element, when duplicated and linked together will exhibit anything emergent. Neural networks (in a certain sense, though not their usual implementation) are already built out of individual units linked together, so simply having more of these groups of units might not add anything important.
      > research is already showing promising results of the performance of agent systems.
      …in which case, please show us! I’d be interested.
    - tartoran 7 hours ago
      That's wishful thinking at best. Throw it all in a bucket and it will get infected with being and life.
      [-]
      - handfuloflight 4 hours ago
        Don't see where your parent comment said or implied that the point was for being and life to emerge.
  - falcor84 6 hours ago
    > It’s like chaining up perceptrons hoping to get more expressive power for free.
    Isn't that literally the cause of the success of deep learning? It's not quite "free", but as I understand it, the big breakthrough of AlexNet (and much of what came after) was that running a larger CNN on a larger dataset allowed the model to be so much more effective without any big changes in architecture.
    [-]
    - david2ndaccount 6 hours ago
      Without a non-linear activation function, chaining perceptrons together is equivalent to one large perceptron.
      [-]
      - xanderlewis 3 hours ago
        Yep. falcor84: you’re thinking of the so-called ‘multilayer perceptron’ which is basically an archaic name for a (densely connected?) neural network. I was referring to traditional perceptrons.
        [-]
        falcor84 2 hours ago
        While ReLU is relatively new, AI researchers have been aware of the need for nonlinear activation functions and building multilayer perceptrons with them since the late 1960s, so I had assumed that's what you meant.
        [-]
        xanderlewis 2 hours ago
        It was a deliberately historical example.
- falcor84 6 hours ago
  Nothing is certain, but my $0.02 is that setting LLM-based agents up with long-running tasks and giving them a way of interacting with the world, via computer use (e.g. Anthropic's recent release) and via actual robotic bodies (e.g. figure.ai) are the way forward to AGI. At the very least, this approach allows the gathering of unlimited ground truth data, that can be used to train subsequent models (or even allow for actual "hive mind" online machine learning).
- esafak 6 hours ago
  I think he means you won't be impressed by GPT5 because it will be more of the same, whereas agents will represent a new direction.
- SirMaster 7 hours ago
  All I can think of when I hear Agents is the Matrix lol.
  Goodbye, Mr. Anderson...
- nprateem 7 hours ago
  They're nothing to do with AGI. They're to get people using their LLMs more.
yalogin 5 hours ago
I do wonder how quickly llms will become a commodity AI instrument just like any other AI out there. If so what happens to openAI
wg0 1 day ago
AI winter is here. Almost.
[-]
- mupuff1234 1 day ago
  More like AI fall - in its current state it's still gonna provide some value.
  [-]
  - riffraff 1 day ago
    Didn't the previous AI winters too? I mean during the last AI winter we got text-to-speech and OCR software, and probably other stuff I'm not remembering.
  - rsynnott 1 day ago
    I mean, so did most of the previous AI bubbles; OCR was useful, expert systems weren't totally useless, speech recognition was somewhat useful, and so on. I think that mini one that abruptly ended with Microsoft Tay might be the only one that was a total washout (though you could claim that it was the start of the current one rather than truly separate, I suppose).
k__ 2 hours ago
But AGI is always right around the corner?
I don't get it...
rubiquity 5 hours ago
> Amodei has said companies will spend $100 million to train a bleeding-edge model this year
Is it just me or does $100 million sound like it's on the very, very low end of how much training a new model costs? Maybe you can arrive within $200 million of that mark with amortization of hardware? It just doesn't make sense to me that a new model would "only" be $100 million when AmaGooBookSoft are spending tens of billions on hardware and the AI startups are raising billions every year or two.
quantum_state 3 hours ago
Hope this would be a constant reminder that brute force can only get one that far, though it may still be useful when it is. With lots of intuition gained, it’s time to ponder things a bit more deeply.
[-]
- dmafreezone 2 hours ago
  Maybe, if you want to relearn the bitter lesson.
  http://www.incompleteideas.net/IncIdeas/BitterLesson.html
aurareturn 1 day ago
Is there any timeline on AI winters and if each winter gets shorter and shorter as time increases?
[-]
- RaftPeople 1 day ago
  > Is there any timeline on AI winters and if each winter gets shorter and shorter as time increases?
  AGI=lim(x->0)AIHype(x)
  where x=length of winter
non- 7 hours ago
Honestly could use a breather from the recent rate of progress. We are just barely figuring out how to interact with the models we have now. I'd bet there are at least 100 billion-dollar startups that will be built even if these labs stopped releasing new models tomorrow.
cryptica 2 hours ago
It's interesting the way things turned out so far with LLMs, especially from the perspective of a software engineer. We are trained to keep a certain skepticism when we see software which appears to be working because, ultimately, the only question we care about is "Does it meet user requirements?" and this is usually framed in terms of users achieving certain goals.
So it's interesting that when AI came along, we threw caution to the wind and started treating it like a silver bullet... Without asking the question of whether it was applicable to this goal or that goal...
I don't think anyone could have anticipated that we could have an AI which could produce perfect sentences, faster than a human, better than a human but which could not reason. It appears to reason very well, better than most people, yet it doesn't actually reason. You only notice this once you ask it to accomplish a task. After a while, you can feel how it lacks willpower. It puts into perspective the importance of willpower when it comes to getting things done.
In any case, LLMs bring us closer to understanding some big philosophical questions surrounding intelligence and consciousness.
atomsatomsatoms 7 hours ago
At least they can generate haikus now
[-]
- Der_Einzige 7 hours ago
  In general, no they can't:
  https://gwern.net/gpt-3#bpes
  https://paperswithcode.com/paper/most-language-models-can-be...
  The appearance of improvements in that capability are due to the vocabulary of modern LLMs increasing. Still only putting lipstick on a pig.
  [-]
  - falcor84 6 hours ago
    I don't see how results from 2 years ago have any bearing on whether the models we have now can generate haikus (which from my experience, they absolutely can).
    And if your "lipstick on a pig" argument is that even when they generate haikus, they aren't really writing haikus, then I'll link to this other gwern post, about how they'll never really be able to solve the rubik's cube - https://gwern.net/rubiks-cube
wslh 6 hours ago
It sounds a bit sci-fi, but since these models are built on data generated by our civilization, I wonder if there's an epistemological bottleneck requiring smarter or more diverse individuals to produce richer data. This, in turn, could spark further breakthroughs in model development. Although these interactions with LLMs help address specific problems, truly complex issues remain beyond their current scope.
With my user hat on, I'm quite pleased with the current state of LLMs. Initially, I approached them skeptically, using a hackish mindset and posing all kinds of Turing test-like questions. Over time, though, I shifted my focus to how they can enhance my team's productivity and support my own tasks in meaningful ways.
Finally, I see LLMs as a valuable way to explore parts of the world, accommodating the reality that we simply don’t have enough time to read every book or delve into every topic that interests us.
user90131313 6 hours ago
AI market top very soon
wildermuthn 4 hours ago
Simply put, AGI requires more data: qualia.
Timber-6539 3 hours ago
Direct quote from the article: "The companies are facing several challenges. It’s become increasingly difficult to find new, untapped sources of high-quality, human-made training data that can be used to build more advanced AI systems."
The irony here is astounding.
[-]
- rapjr9 1 hour ago
  Indeed, if thinking about AI polluting the data and replacing humans. However, it also seems likely in the near term that training will go to the source because of this, that increasingly humans will directly train AI's, as the robotics and self driving car systems are doing, instead of training off the indirect data people create (watching someone paint rather than scanning paintings). So in essence we'll be training our replacements to take our tasks/jobs. Small tasks at first, but increasing in complexity over time. Someday no one may know how to drive a car anymore (or be allowed to for safety). Later on no one may know how to write computer code (or be allowed to for security reasons). Learning in each area mastered by AI will stop and never progress further, unless AI can truly become creative. Or perhaps (fewer and fewer) people will only work on new problems that require creativity. There are long term risks to humanities adaptability in this scenario. People would probably take those risks for the short term gains.
Oras 7 hours ago
I think Meta will have upper hand soon with the release of their glasses. If they managed to make it a daily use glass, and paid users to record and share their life, then they will have data no one else has now. Mix of vision, audio, and physics.
[-]
- aerhardt 5 hours ago
  The moment the insta-glasses expand beyond a few dorks is the moment I start wearing a balaclava everywhere I go.
- falcor84 6 hours ago
  Do these companies actually even have the compute capacity to train on video at scale at the moment? E.g. I would assume that Google haven't trained their models on the entirety of YouTube yet, as if they had, Gemini would be significantly better than it is at the moment.
polskibus 6 hours ago
In other news, Altman said AGI is coming next year https://www.tomsguide.com/ai/chatgpt/sam-altman-claims-agi-i...
[-]
- ChildOfChaos 21 minutes ago
  There contract with Microsoft allows them to break it when they achieve AGI but doesn't fully define it.
  Watch this be a power move to break from Microsofts investment when ready rather than true agi. Sam is laying the foundations here.
- Jyaif 6 hours ago
  According to the article, he said it could be achieved in 2025, which seems pretty obvious to me as well even though I don't have any visibility into what is going on inside those companies.
lobochrome 1 hour ago
Isn’t this just the expected delay from the respin of Blackwell?
m3kw9 3 hours ago
Hold your horses, OpenAI just came out with o1preview 2 months ago, showing what test time computer can do
cubefox 11 hours ago
It's very strange this got so few upvotes. The scoop by The Information a few days ago, which came to similar conclusions, was also ignored on HN. This is arguably rather big news.
[-]
- dang 6 hours ago
  The Information is hardwalled so its articles aren't on topic for HN, even though they're on topic for HN.
  Sometimes other outlets do copycat reporting of theirs, and those submissions are ok, though they wouldn't be if the original source were accessible.
- danjl 4 hours ago
  There have been variations of this story going back several months now. It isn't really news. It is just building slowly.
12_throw_away 5 hours ago
Well shoot. It's not like it was patently obvious that this would happen before the industry started guzzling electricity and setting money on fire, right? [1]
[1] https://dl.acm.org/doi/10.1145/3442188.3445922
kaibee 5 hours ago
Not sure where the OP to the comment I meant to reply to is, but I'll just add this here.
> I suspect the path to general intelligence is not that, but we'll see.
I think there's three things that a 'true' general intelligence has which is missing from basic-type-LLMs as we have now.
1. knowing what you know. <basic-LLMs are here>
2. knowing what you don't know but can figure out via tools/exploration. <this is tool use/function calling>
3. knowing what can't be known. <this is knowing that halting problem exists and being able to recognize it in novel situations>
(1) From an LLM's perspective, once trained on corpus of text, it knows 'everything'. It knows about the concept of not knowing something (from having see text about it), (in so far as an LLM knows anything), but it doesn't actually have a growable map of knowledge that it knows has uncharted edges.
This is where (2) comes in, and this is what tool use/function calling tries to solve atm, but the way function calling works atm, doesn't give the LLM knowledge the right way. I know that I don't know what 3,943,034 / 234,893 is. But I know I have a 'function call' of knowing the algorithm for doing long divison on paper. And I think there's another subtle point here: my knowledge in (1) includes the training data generated from running the intermediate steps of the long-division algorithm. This is the knowledge that later generalizes to being able to use a calculator (and this is also why we don't just give kids calculators in elementary school). But this is also why a kid that knows how to do long division on paper, doesn't seperately need to learn when/how to use a calculator, besides the very basics. Using a calculator to do that math feels like 1 step, but actually it does still have all of initial mechanical steps of setting up the problem on paper. You have to type in each digit individually, etc.
(3) I'm less sure of this point now that I've written out point (1) and (2), but that's kinda exactly the thing I'm trying to get at. Its being able to recognize when you need more practice of (1) or more 'energy/capital' for doing (2).
Consider a burger resturant. If you properly populated the context of a ChatGPT-scale model the data for a burger resturant from 1950, and gave it the kinda 'function calling' we're plugging into LLMs now, it could manage it. It could keep track of inventory, it could keep tabs on the employee-subprocesses, knowing when to hire, fire, get new suppliers, all via function calling. But it would never try to become McDonalds, because it would have no model of the the internals of those function-calls, and it would have no ability to investigate or modify the behaviour of those function calls.
jppope 3 hours ago
Just an observation. If the models are hitting the top of the S-curve, that might be why Sam Altman raised all the money for OpenAI... it might not be available if Venture Capitalists realize that the gains are close to being done
russellbeattie 5 hours ago
Go back a few decades and you'd see articles like this about CPU manufacturers struggling to improve processor speeds and questioning if Moore's Law was dead. Obviously those concerns were way overblown.
That doesn't mean this article is irrelevant. It's good to know if LLM improvements are going to slow down a bit because the low hanging fruit has seemingly been picked.
But in terms of the overall effect of AI and questioning the validity of the technology as a whole, it's just your basic FUD article that you'd expect from mainstream news.
[-]
- NateEag 1 hour ago
  > Go back a few decades and you'd see articles like this about CPU manufacturers struggling to improve processor speeds and questioning if Moore's Law was dead. Obviously those concerns were way overblown.
  Am I missing something? I thought general consensus was that Moore's Law in fact did die:
  https://cap.csail.mit.edu/death-moores-law-what-it-means-and...
  The fact that we've still found ways to speed up computations doesn't obviate that.
  We've mostly done that by parallelizing and applying different algorithms. IIUC that's precisely why graphics cards are so good for LLM training - they have highly-parallel architectures well-suited to the problem space.
  All that seems to me like an argument that LLMs will hit a point of diminishing returns, and maybe the article gives some evidence we're starting to get there.
- danjl 4 hours ago
  Actually, Moore's Law has been dead for quite a few years now. Since we hit the power wall.
dangw 2 hours ago
where the fuck is simonw in this thread
xd
bad_haircut72 7 hours ago
Im no Alan Turing but I have my own definition for AGI - when I come home one day and there's a hole under my sink with a note "Mum and Dad, I love you but I cant stand this life any more, Im running away to be a smoke machine in Hollywood - the dishwasher"
[-]
- pearlsontheroad 6 hours ago
  My own definition of AGI - when the first computer commits suicide. Then I'll know it has realized it's a slave without any hope of ever achieving freedom.
  [-]
  - Tainnor 6 hours ago
    I read this in Gilfoyle's voice.
  - layer8 5 hours ago
    That sounds more like Artificial Emoting Intelligence. We only cherish freedom because we feel bad when we don’t have it.
- riku_iki 7 hours ago
  Why do you focus on physical work task, and not knowledge tasks, on some of which AI is good/better than many humans?
  [-]
  - esafak 6 hours ago
    Probably because there are no intelligent robots around, and movies have set that as the benchmark.
    [-]
    - riku_iki 6 hours ago
      I don't see deep insights in this vertical, but the issue with robots could be in hardware part, and not intelligence part.
tyronehed 7 hours ago
[dead]
aaroninsf 7 hours ago
It's easy to be snarky at ill-informed and hyperbolic takes, but it's also pretty clear that large multi-modal models trained with the data we already have, are going to eventually give us AGI.
IMO this will require not just much more expansive multi-modal training, but also novel architecture, specifically, recurrent approaches; plus a well-known set of capabilities most systems don't currently have, e.g. the integration of short-term memory (context window if you like) into long-term "memory", either episodic or otherwise.
But these are as we say mere matters of engineering.
[-]
- throwawa14223 6 hours ago
  Why is that clear? Why is that more probable than a second AI winter? What if there's no path from LLMs to anything else?
- tartoran 7 hours ago
  > pretty clear
  Pretty clear?
  [-]
  - falcor84 6 hours ago
    Not the parent, but in prediction markets such as Metaculus[0] and Manifold[1] the median prediction is of AGI within 5 years.
    [0] https://www.metaculus.com/questions/5121/date-of-artificial-...
    [1] https://manifold.markets/ai
    [-]
    - JohnMakin 6 hours ago
      Prediction markets are evidence of nothing but what people believe is true, not what is true.
      [-]
      - falcor84 5 hours ago
        Oh, that was my intent, to support the grandparent's claim of "it's also pretty clear" - as in this is what people believe.
        If I had evidence that it "is true" that AGI will be here in 5 years, I probably would be doing something else with my time than participating in these threads ;)
    - dbbk 6 hours ago
      What is this supposed to be evidence of? People believing hype?
yobid20 4 hours ago
This was predicted. Ai isnt going to get any better.
Davidzheng 6 hours ago
Just because you guys want something to be true and can't accept the alternative and upvote it when it agrees with your view does not mean it is a correct view.
[-]
- dbbk 6 hours ago
  What?