Amateur armed with ChatGPT solves an Erdős problem

(scientificamerican.com)

170 points | by pr337h4m 10 hours ago

19 comments

  • adamgordonbell 2 hours ago
    Here is the chat:

        don't search the internet. This is a test to see how well you can craft non-trivial, novel and creative proofs given a "number theory and primitive sets" math problem. Provide a full unconditional proof or disproof of the problem.
    
        {{problem}}
    
        REMEMBER - this unconditional argument may require non-trivial, creative and novel elements.
    
    Then "Thought for 80m 17s"

    https://chatgpt.com/share/69dd1c83-b164-8385-bf2e-8533e9baba...

    • cryptoegorophy 1 hour ago
      Mine took 20min. Pro. https://chatgpt.com/share/69ed83b1-3704-8322-bcf2-322aa85d7a... But I wish I was math smart to know if it worked or not.
    • nycdatasci 26 minutes ago
      [dead]
    • ipaddr 1 hour ago
      Tried the same prompt and ended up no where close on the free plan.
      • jasonfarnon 1 hour ago
        Is there a known lag that it takes the Pro plan's abilities to migrate to the free plans?
        • brianjking 1 hour ago
          GPT 5.5 Pro is not available to any plan outside of ChatGPT Pro ($100 or $200) tier or the API as far as consumer access.
          • jasonfarnon 1 hour ago
            Yes, but don't we expect GPT 5.5 Pro will eventually be a free tier? Maybe I'm missing something because I only use the free tier. But the free tier has gotten way better over the last few years. I'm pretty sure, based on descriptions on this site from paid subscribers, that the free tier now is better than the paid tier of say 2 years ago. That's the lag I'm wondering about.
            • manfromchina1 14 minutes ago
              Free ChatGPT is like a fast car with a barely responsive steering wheel. Guardrails on that thing are insane. Even for math. It wont let you think. It will try to fix mistakes you havent even made yet based on intent that was ascribed to you for no reason. It veers off in some crazy directions thinking that's what you meant and trying to address even a little bit of that creates almost a combinatorial explosion of even more wrong things. Is why I stick to Claude. The latter is chill and only addresses what you had typed. Isn't verbose and actually asks you what you getting at with your post. That said, ChatGPT is more technical and can easily solve math problems that stump Claude.
            • vessenes 23 minutes ago
              I do not think this is true. You will continue to get smaller, cheaper-to-host models in the free tier that are distilled from current and former frontier models. They will continue to improve, but I’d be very surprised if, e.g., 5.4-mini (I think this is the free tier model) beat o3 on many benchmarks, or real world use cases.

              I won’t even leave chatGPT on “Auto” under any circumstances - it’s vastly worse on hallucinations, sycophancy, everything, basically.

              Anyway, your needs may be met perfectly fine on the free tier product, but you’re using a very different product than the Pro tier gets.

            • hyraki 48 minutes ago
              You should pay for it if you find value in it.
              • amazingman 6 minutes ago
                They pay for it with their personal data.
        • andai 1 hour ago
          Tangential but I learned today that GPT-5.5 in ChatGPT (Plus) has a smaller context window than the one in the API. (Or at least it thinks it does.)

          I'd guess / hope the Pro one has the full context window.

          • refulgentis 27 minutes ago
            Notably, 5.5 has a higher price on API for context > ChatGPT, and 5.5 Pro on API does not differentiate based on context size (it’s eye bleeding expensive already :)
        • vessenes 1 hour ago
          Do not use the free plan. It is not good.
      • Someone1234 1 hour ago
        Does the free plan even have access to thinking models?
        • jychang 1 hour ago
          Technically yes, gpt-5.4-mini is available on the free plan
      • Matticus_Rex 1 hour ago
        Was this a surprise?
    • ArtIntoNihonjin 4 minutes ago
      When shall we concede that the AI has real intelligence? All we'll ever need is ChatGPT + Coq = mathematics solved.
  • userbinator 1 hour ago
    The LLM took an entirely different route, using a formula that was well known in related parts of math, but which no one had thought to apply to this type of question.

    Of course LLMs are still absolutely useless at actual maths computation, but I think this is one area where AI can excel --- the ability to combine many sources of knowledge and synthesise, may sometimes yield very useful results.

    Also reminds me of the old saying, "a broken clock is right twice a day."

    • jaggederest 1 hour ago

          > Every Mathematician Has Only a Few Tricks
          > 
          > A long time ago an older and well-known number theorist made some disparaging remarks about Paul Erdös’s work.
          > You admire Erdös’s contributions to mathematics as much as I do,
          > and I felt annoyed when the older mathematician flatly and definitively stated
          > that all of Erdös’s work could be “reduced” to a few tricks which Erdös repeatedly relied on in his proofs.
          > What the number theorist did not realize is that other mathematicians, even the very best,
          > also rely on a few tricks which they use over and over.
          > Take Hilbert. The second volume of Hilbert’s collected papers contains Hilbert’s papers in invariant theory.
          > I have made a point of reading some of these papers with care.
          > It is sad to note that some of Hilbert’s beautiful results have been completely forgotten.
          > But on reading the proofs of Hilbert’s striking and deep theorems in invariant theory,
          > it was surprising to verify that Hilbert’s proofs relied on the same few tricks.
          > Even Hilbert had only a few tricks!
          > 
          > - Gian-Carlo Rota - "Ten Lessons I Wish I Had Been Taught"
      
      https://www.ams.org/notices/199701/comm-rota.pdf
      • yayachiken 12 minutes ago
        I think when thinking about progress as a society, people need to internalize better that we all without exception are on this world for the first time.

        We may have collectively filled libraries full of books, and created yottabytes of digital data, but in the end to create something novel somebody has to read and understand all of this stuff. Obviously this is not possible. Read one book per day from birth to death and you still only get to consume like 80*365=29200 books in the best case, from the millions upon millions of books that have been written.

        So these "few tricks" are the accumulation of a lifetime of mathematical training, the culmination of the slice of knowledge that the respective mathematician immersed themselves into. To discover new math and become famous you need both the talent and skill to apply your knowledge in novel ways, but also be lucky that you picked a field of math that has novel things with interesting applications to discover plus you picked up the right tools and right mental model that allows you to discover these things.

        This does not go for math only, but also for pretty much all other non-trivial fields. There is a reason why history repeats.

        And it's actually a compelling argument why AI is still a big deal even though it's at its core a parrot. It's a parrot yes, but compared to a human, it actually was able to ingest the entirety of human knowledge.

    • nopinsight 39 minutes ago
      > "a broken clock is right twice a day."

      The combinatorial nature of trying things randomly means that it would take millennia or longer for light-speed monkeys typing at a keyboard, or GPUs, to solve such a problem without direction.

      By now, people should stop dismissing RL-trained reasoning LLMs as stupid, aimless text predictors or combiners. They wouldn’t say the same thing about high-achieving, but non-creative, college students who can only solve hard conventional problems.

      Yes, current LLMs likely still lack some major aspects of intelligence. They probably wouldn’t be able to come up with general relativity on their own with only training data up to 1905.

      Neither did the vast majority of physicists back then.

      • amazingman 0 minutes ago
        > Yes, current LLMs likely still lack some major aspects of intelligence.

        Indeed, and so do humans! And just like LLMs, humans are bad at keeping this fact in view.

        On a more serious note, we're going to have a hard time until we can psychologically decouple the concepts of intelligence and consciousness. Like, an existentially hard time.

    • y0eswddl 1 hour ago
      Yeah, they're great at interpolation - they'll just never be worth much at extrapolation.
      • SR2Z 48 minutes ago
        Luckily for us, whole fortunes can be made by filling in the blanks between what we know and what we realize.
    • keyle 1 hour ago
      The ultimate generalist
    • karlgkk 1 hour ago
      Also just the sheer value of brute force.

      80 hours! 80 hours of just trying shit!

      • FrasiertheLion 1 hour ago
        It's 80 minutes, not 80 hours.
        • jasonfarnon 1 hour ago
          and you can be sure mathematicians spent way more than 80 hrs on it
        • ChrisGreenHeur 1 hour ago
          80 minutes! 80 minutes of just trying shit!
          • peteforde 1 hour ago
            ... shit that solved an apparently significant Erdős problem.

            That is not nothing, no matter how much you hate AI.

            • userbinator 1 hour ago
              It shows that AI is apparently very good at brute-forcing.
              • TOMDM 18 minutes ago
                Are the human mathematicians who wanted to solve this problem just too stupid to brute force for 80 minutes?
              • alex_sf 38 minutes ago
                This isn't brute force.
      • brokencode 1 hour ago
        How long do you figure it’d take to solve the problem yourself?
    • tptacek 1 hour ago
      Wait, what do you mean "LLMs are still absolutely useless at actual maths computation"? I rely on them constantly for maths (linear algebra, multivariable calc, stat) --- literally thousands of problems run through GPT5 over the last 12 months, and to my recollection zero failures. But maybe you're thinking of something more specific?
      • nozzlegear 6 minutes ago
        > literally thousands of problems run through GPT5 over the last 12 months, and to my recollection zero failures

        Damn, I'd start buying lottery tickets if I had that kind of luck.

      • schneems 1 hour ago
        They are bad at math. But they are good at writing code and as an optimization some providers have it secretly write code to answer the problem, run it and give you the answer without telling you what it did in the middle part.
        • avaer 1 hour ago
          Someone should tell the mathematicians if they use a calculator or a whiteboard or heavens forbid a computer they are "bad at math".
        • tptacek 22 minutes ago
          What would I do to demonstrate that they are bad at math? If by "maths" we mean things like working out a double integral for a joint probability problem, or anything simpler than that, GPT5 has been flawless.
        • tempaccount5050 54 minutes ago
          Are they bad at math? Or are they bad at arithmetic?
          • tptacek 2 minutes ago
            Neither.
          • lacunary 39 minutes ago
            if you don't know much math, it's easy to confuse the two
      • jasonfarnon 1 hour ago
        What tier are you using? I have run lots of problems and am very impressed, but I find stupid errors a lot more frequently than that, e.g., arithmetic errors buried in a derivation or a bad definition, say 1/15 times. I would love to get zero failures out of thousands of (what sounds like college-level math) posed problems.
        • tptacek 21 minutes ago
          I have a standard OpenAI/ChatGPT Pro account; GPT5 is my daily driver for math, and Claude for code.
  • ripped_britches 1 hour ago
    At this point we should make a GitHub repo with a huge list of unsolved “dry lab” problems and spin up a harness to try and solve them all every new release.
    • abdullahkhalids 35 minutes ago
      There is in fact just such a repo maintained by Terence Tao and other mathematicians [1] who are actively using LLMs to try to find solutions to them.

      [1] https://github.com/teorth/erdosproblems

      • vessenes 21 minutes ago
        …and this problem was in fact sourced directly from that list!
    • johntopia 45 minutes ago
      that's actually a brilliant idea
  • resident423 1 hour ago
    I wonder if the rationalizations people come up with for why this isn't real intelligence will be as creative as ChatGPTs solution.
    • vatsachak 0 minutes ago
      Well it still gets easy problems wrong

      With real general intelligence you'd expect it to solve problems above a certain difficulty with a good clip

    • thesmtsolver2 46 minutes ago
      Remember when people thought multiplying numbers, remembering a large number of facts, and being good at rote calculations was intelligence?

      Some people think that multiplying numbers, remembering a large number of facts, and being good at calculations is intelligence.

      Most intelligent people do not think that.

      Eventually, we will arrive at the same conclusion for what LLMs are doing now.

      • resident423 26 minutes ago
        Remember when people thought solving Erdos problems required intelligence? Is there anything an LLM could ever do that would cound as intelligence? Surely the trend has to break at some point, if so what would be the thing that crosses the line to into real intelligence?
        • noosphr 18 minutes ago
          I've spend a good chunk of time formalising mathematics.

          Doing formalized mathematics is as intelligent as multiplying numbers together.

          The only reason why it's so hard now is that the standard notation is the equivalent of Roman numerals.

          When you start using a sane metalanguage, and not just augmrnted English, to do proofs you gain the same increase in capabilities as going from word equations to algebra.

    • famouswaffles 18 minutes ago
      None of it is really from logical thought. The rationalizations don't make any sense, but they haven't for a while. It's an emotional response. Honestly, It's to be expected.
    • bsder 4 minutes ago
      Everybody who retried the problem on ChatGPT spent on the order of the same amount of time on it.

      Do you not see the issue?

    • walrus01 1 hour ago
      For one, everything its 'intelligence' knows about solving the problem is contained within the finite context window memory buffer size for the particular model and session. Unless the memory contents of the context window are being saved to storage and reloaded later, unlike a human, it won't "remember" that it solved the problem and save its work somewhere to be easily referenced later.
      • jychang 1 hour ago
        There's humans that have memory issues, or full blown Anterograde amnesia.
        • emp17344 40 minutes ago
          There are humans who can’t read. That doesn’t mean Grammarly is “intelligent”. These things are tools - nothing more, nothing less.
      • resident423 1 hour ago
        What your describing sounds more like the model is lacking awareness than lacking intelligence? Why does it need to know it solved the problem to be intelligent?
        • walrus01 1 hour ago
          We say African Elephants are intelligent for a number of reasons, one of which is because they remember where sources of water are in very dry conditions, and can successfully navigate back to them across relatively large distances. An intelligent being that can't remember its own past is at a significant disadvantage compared to others that can, which is exactly one of the reasons why alzheimers patients often require full time caregivers.
          • resident423 48 minutes ago
            There's probably a limit to how intelligent something can be with no long term memory, but solving Erdos problems in 80 minutes is clearly not above it, and I think the true limit is probably much higher than that.
          • peteforde 1 hour ago
            You are confusing lack of intelligence with the presence of impairment.
      • bpodgursky 37 minutes ago
        All modern harnesses write memory files for context later.
    • techblueberry 1 hour ago
      This is real intelligence is the bear position, so I think it’s real intelligence.
    • tomlockwood 59 minutes ago
      I think one day the VCs will have given the monkeys on typewriters enough money that these kinds of comments can be generated without human intervention.
    • otabdeveloper3 5 minutes ago
      [dead]
    • catcowcostume 35 minutes ago
      You're really telling on yourself if you think LLM is intelligence
    • 0xBA5ED 1 hour ago
      And how about the creative rationalizations about how statistical text generation is actual intelligence? As if there is any intent or motive behind the words that are generated or the ability to learn literally any new thing after it has been trained on human output?
      • tptacek 3 minutes ago
        2022 called, wants this argument back. When you're "statistically generating text" to find zero-day vulnerabilities in hard targets, building Linux kernel modules, assembly-optimizing elliptic curve signature algorithms, and solving arbitrary undergraduate math problems instantaneously --- not to mention apparently solving Erdos problems --- the "statistical text" stuff has stopped being a useful description of what's happening, something closer to "it's made of atoms and obeys the laws of thermodynamics" than it is to "a real boundary condition of what it can accomplish".
      • resident423 32 minutes ago
        Solving open math problems is strong evidence of intelligence so there's not really any need for rationalization? I don't understand why intelligence would require intent or motive? Isn't intent just the behaviour of making a specific thing happen rather than other things?
        • x3ro 27 minutes ago
          I'm curious, do you think that this also applies to stable diffusion? Are these models "creative" too?
          • resident423 14 minutes ago
            I haven't used stable diffusion enough to have a strong opinion on it. But my thinking is LLMs have only recently started contributing novel solutions to problems, so maybe there is some threshold above which there's less sloppy remixing of training data and more ability to form novel insights, and image generators haven't crossed this line yet.
          • famouswaffles 17 minutes ago
            Yeah? Those models are creative.
        • 0xBA5ED 23 minutes ago
          The LLM did not solve the problem.
  • LPisGood 28 minutes ago
    Some Erdős problems are basically trivial using sophisticated techniques that were developed later.

    I remember one of my professors, a coauthor of Erdős boasted to us after a quiz how proud he was that he was able to assign an Erdős problem that went unsolved for a while as just a quiz problem for his undergrads.

    • vessenes 20 minutes ago
      Tao mentions that the conventional approach for this problem seems to be a dead-end, but it’s apparently a super ‘obvious’ first step. This seems very hopeful to me — in that we now have a new approach line to evaluate / assess for related problems.
  • debo_ 1 hour ago
    > “The raw output of ChatGPT’s proof was actually quite poor. So it required an expert to kind of sift through and actually understand what it was trying to say,” Lichtman says.

    This is how I feel when I read any mathematics paper.

  • winwang 14 minutes ago
    Obviously nowhere near Erdos problem complexity but I've been using GPT (in Codex) to prove a couple theorems (for algos) and I've found it a bit better than Claude (Code) in this aspect.
  • jzer0cool 13 minutes ago
    Could someone share a bit into the problem and the key portion from proof? For someone just knowing basics on proofs.
  • ravenical 1 hour ago
  • Eufrat 1 hour ago
    Humans and very often the machines we create solve problems additively. Meaning we build on top of existing foundations and we can get stuck in a way of thinking as a result of this because people are loathe to reinvent the wheel. So, I don’t think it’s surprising to take a naïve LLM and find out that because of the way it’s trained that it came up with something that many experts in the field didn’t try.

    I think LLMs can help in limited cases like this by just coming up with a different way of approaching a problem. It doesn’t have to be right, it just needs to give someone an alternative and maybe that will shake things up to get a solution.

    That said, I have no idea what the practical value of this Erdős problem is. If you asked me if this demonstrates that LLMs are not junk. My general impression is that is like asking me in 1928 if we should spent millions of dollars of research money on number theory. The answer is no and get out of my office.

  • iqihs 1 hour ago
    referring to Tao as just a 'mathematician' gave me a good chuckle
  • homo__sapiens 1 hour ago
    Big if true.
  • wizardforhire 1 hour ago
    WTF!?
  • haricomputer 1 hour ago
    [dead]
  • tomlockwood 1 hour ago
    My big question with all these announcements is: How many other people were using the AI on problems like this, and, failing? Given the excitement around AI at the moment I think the answer is: a lot.

    Then my second question is how much VC money did all those tokens cost.

    • ecshafer 43 minutes ago
      I've tried my hand at a few of the Erdos problems and came up short, you didn't hear about them. But if a Mathematician at Harvard solved on, you would probably still hear about it a bit. Just the possibility that a pro subscription for 80 minutes solved an Erdos problem is astounding. Maybe we get some researchers to get a grant and burn a couple data centers worth of tokens for a day/week/month and see what it comes up with?
    • peteforde 1 hour ago
      Can you imagine how many bags of chips we could buy if we stopped funding cancer research?

      It's so expensive!

      • tomlockwood 56 minutes ago
        Can you imagine how much ChatGPT cancer research we could fund if we stopped funding cancer research?
    • gdhkgdhkvff 1 hour ago
      Why do you care about either of those questions?
      • tomlockwood 1 hour ago
        Because it could be a massive waste of time and money.
      • Eufrat 1 hour ago
        I think we should at least ask the latter, if it turned out it cost $100,000 to generate this solution, I would question the value of it. Erdős problems are usually pure math curiosities AFAIK. They often have no meaningful practical applications.
        • jasonfarnon 1 hour ago
          Also, it's one thing if the AI age means we all have to adopt to using AI as a tool, another thing entirely if it means the only people who can do useful research are the ones with huge budgets.
          • peteforde 1 hour ago
            Your logic undoes your point, because the kid who "solved" this technically didn't even have to invest in a degree.
            • tomlockwood 52 minutes ago
              America should fund tertiary education better, and that would solve even more problems.
              • peteforde 23 minutes ago
                Getting off-topic, but as a successful high-school dropout I am compelled to remind anyone reading this that [the American] college [system] is a scam.

                That's not to say that there aren't benefits to tertiary education, for many people in different contexts. It's just not the golden path that it's made out to be.

                Many people currently in college are just wasting their money and should enroll in trades programs instead.

                Meanwhile, nothing about being in or out of school is mutually exclusive to using LLMs as a force multiplier for learning - or solving math problems, apparently.

        • anematode 1 hour ago
          Neither does the Collatz conjecture, Fermat's last theorem, ....

          (Of course, those problems are on another plane than this one.)

          • Eufrat 1 hour ago
            But that’s exactly my point.

            These are absolutely worth studying, but being what they are, nobody should be dumping massive amounts of money on them. I would not find it persuasive if researchers used LLMs to solve the Collatz conjecture or finally decode Etruscan. These are extremely valuable, but it is unlikely to be worth it for an LLM just grinding tokens like crazy to do it.

            • anematode 1 hour ago
              Maybe... but I would love if 1% of the investment in AI were redirected to the mathematics education and professional research that would allow progress on any of these problems...
            • mhb 1 hour ago
              Is it worth it to buy a super-yacht?
        • inerte 1 hour ago
          I would question at $60k. At $100k is a steal.
        • dinkumthinkum 20 minutes ago
          No meaningful, practical applications? You realize that sounds incredibly naive in the history of mathematics, right? People thought this way about number theory in general, and many other things that turned out to have quite important practical applications. Your statement is also a bit odd in that researchers are already paid throughout their whole careers to solve such problems. I don't know.
  • mhb 1 hour ago
    > He’s 23 years old and has no advanced mathematics training.

    How is he even posing the question and having even a vague idea of what the proof means or how to understand it?

    • hx8 1 hour ago
      > “I didn’t know what the problem was—I was just doing Erdős problems as I do sometimes, giving them to the AI and seeing what it can come up with,” he says. “And it came up with what looked like a right solution.” He sent it to his occasional collaborator Kevin Barreto, a second-year undergraduate in mathematics at the University of Cambridge.

      Seems like standard 23 year old behavior. You're spending $100-$200/mo on the pro subscription, and want to get your money's worth. So you burn some tokens on this legendarily hard math problem sometimes. You've seen enough wrong answers to know that this one looks interesting and pass it on to a friend that actually knows math, who is at a place where experts can recognize it as correct.

      Seems like a classic example of in-expert human labeling ML output.

    • ChrisGreenHeur 1 hour ago
      my guess would be due to having an interest in the field
  • ghstinda 1 hour ago
    Scientific American going out of business next lol, weak headline. Chat GPT let's have a better headline for the God among Men that realized the capability of the new tool, many underestimate or puff up needlessly. Fun times we live in. One love all.