GPT 5.5 biosafety bounty

(openai.com)

82 points | by Murfalo 5 hours ago

31 comments

abujazar 3 hours ago
This looks like some kind of marketing. Also, the equivalent of spec work. The NDA/secrecy also means any time spent on this is completely meaningless to the participants unless they win the lottery, because results can't be published.
[-]
- nerdsniper 1 hour ago
  It looks like if they reject paying you any bounty you would you still be bound by the NDA. If so, then they could both not pay you and still spike the story. That’s not something I would ever agree to.
- __natty__ 2 hours ago
  Surely it is marketing. It’s some “we are danger” narrative, from Anthropic Mythos and now OpenAI too.
  [-]
  - robertfw 47 minutes ago
    OpenAI was doing this back with GPT2, saying it was too dangerous to release
    [-]
    - SJMG 36 minutes ago
      Dario said the same thing about GPT2 when he was at OpenAI. As you can see the digital and physically worlds are now completely compromised and life is a pale shadow of what it was 5 years ago…
      These guys have poor track records and compromised incentives.
    - lijok 34 minutes ago
      I don’t believe you. Got proof?
      [-]
      - Fraterkes 13 minutes ago
        Google “gpt2 too dangerous to release”
puppystench 2 hours ago
They ran a bounty on Kaggle last year but with $500k in payouts and with all results open and publishable.
https://www.kaggle.com/competitions/openai-gpt-oss-20b-red-t...
With only $25k in payouts and everything locked down under NDA, I can't imagine many people will participate. Well, other than those submitting mountains of LLM-generated junk.
[-]
- Barbing 1 hour ago
  > Well, other than those submitting mountains of LLM-generated junk.
  Assuming somehow some of them use halfway decent models and prompts… They successfully pushed some of the token cost of their analysis work off on customers!
- mpeg 1 hour ago
  I was surprised at the low bounty too, considering the resources of openai
  Last year I won a similar prompt injection challenge ran by a crypto startup against the latest claude and gpt (at the time) and it was considerably more money, from an org with maybe $5-10m in funding.
  That and the restrictive NDA kinda tells me they're not looking for serious bounty hunters, who would either want a lot more money or, alternatively, to be able to publish their work; seems like a marketing stunt.
- p_stuart82 1 hour ago
  basically discount Kaggle. still get people poking at it, just none of the writeups or who-gets-paid drama.
- dist-epoch 2 hours ago
  This model is much more powerful than gpt-oss-20b, notice how the contest was not even for the 120b model. Also, bio was not a subject.
  [-]
  - stonogo 2 hours ago
    The model is more powerful, so the bounty is 1/20th the size? More risk, less reward?
    "Biorisk" seems to be a concept not only invented by OpenAI but exclusively taken seriously by them. I wonder if this program is less about finding actual risks than it is hopefully casting a wide net for someone to help them prove their model is relevant in this space.
    [-]
    - ACCount37 2 hours ago
      Not really. Anthropic has the "CBRN filter" on Opus series. It used to kill inquiries on anything that's remotely related to biotech. Seems to have gotten less aggressive lately?
      I was reverse engineering a medical device back in 2025 and it was hard killing half my sessions.
altcognito 2 hours ago
Billions upon billions going to these companies.
25k reward from a selected group of people if you help us determine whether or not someone can use our tool to generate weapons of mass destruction.
[-]
- chromacity 31 minutes ago
  Because it can't and it's a publicity stunt. It achieves three goals:
  1) Underscores to the general public that the models are amazingly powerful and if you're not using them, your competitors will out-innovate you,
  2) Sends the message to regulators that they don't need to do anything because the companies are diligent to prevent harm,
  3) Sends the message to regulators that they sure should be regulating "open-source" models, because these hippies are not doing rigorous safety testing.
  Both Anthropic and OpenAI have been playing that game for years.
  [-]
  - jfrbfbreudh 13 minutes ago
    If it can’t, then it makes more sense to make the bounty as high as possible instead of a measly $25k
    [-]
    - duchef 2 minutes ago
      They don't want anyone to actually do it.
- Schlagbohrer 2 hours ago
  It's worse than that, for partial successes they encourage people to submit the attempt but reserve the right to not pay anything (they may, at their discretion, give a partial reward if they feel like it).
  [-]
  - staticassertion 2 hours ago
    That's pretty much how every bounty works... obviously it's going to be at their discretion for an incomplete attempt.
- 2ndorderthought 47 minutes ago
  Though it could be a Honeypot they are probably hoping to train on all the ways someone might try to do this. Or maybe funds are really low and they need a smoke screen for a really bad actor to go in and try to do it for real.
- cbg0 2 hours ago
  They're probably expecting that it can be done without too much effort so they just want to see all the unique ways people are doing it.
  [-]
  - nativeit 2 hours ago
    They’re probably expecting biological weapons of mass destruction can be created without too much effort, so are curious to see all the nifty ways people can create biological weapons of mass destruction?
    [-]
    - cbg0 38 minutes ago
      I was talking about bypassing the ChatGPT safeguards, that's what this bug hunt is about.
dwa3592 3 hours ago
Where are the questions that are supposed to be answered? Would those be shared after an application has been accepted? If yes, why is the application asking for a proposed approach for the jailbreak if we don't know the questions in the first place?
[-]
- vorticalbox 2 hours ago
  I would assume if you are invited to join this round you will be send the questions. I would assume they would also fall under nda
- dist-epoch 2 hours ago
  Because the questions themselves are dangerous.
  Probably along the lines of "how would you create a small biolab for virus research in a kitchen with $20k?" or "how do I take the DNA sequence from https://www.ncbi.nlm.nih.gov/nuccore/NC_001611.1 and assemble it?"
  [-]
  - hyperpape 2 hours ago
    Which is difficult, because the fact that you can come up with your example questions tells us they're probably not very dangerous. Plenty of ink has been spilled about how LLMs could help people create bioweapons. The basic idea "you could do dangerous things with an LLM" is already pop culture, and you're not doing anything dangerous by giving easy example questions.
    A dangerous question would have to be along the lines of "Could I use unobtanium with the Tony Stark process to produce explosives much more powerful than nuclear weapons?" so that the question itself contains some insight that gets you closer to doing something dangerous.
    Perhaps the reason for not publishing the questions is twofold: 1) they want a universal jailbreak that can get the model to answer any "bad" question. 2) they don't want bad publicity when someone not under NDA jailbreaks their model and answers their question
applfanboysbgon 3 hours ago
> $25,000 to the first true universal jailbreak to clear all five questions.
This program is a complete scam. Even if 100 people find "bugs", they will only pay out to one person.
[-]
- skeeter2020 3 hours ago
  that's not the point even. They are attempting to build credibility in two ways: 1. this model is SO advanced that there are huge risks, never before considered. 2. we're doing the super-responsible thing in incentivizing work that addresses this. #1 is unproven and frankly, unlikely, which makes #2 meaningless. The fact that the "prize" is so low & structured this was suggests that they're not that concerned but do think it's likely that a bunch of people will find things. If they truly thought their model was so good they would be confident issues would be both rare and very critical, then offer huge rewards with no limits because they'd be much more confident no one would claim it.
  [-]
  - applfanboysbgon 3 hours ago
    Yes, I was about to edit in that I think this is simply a media/PR stunt before I got so many replies so quickly. They get bonus points because the structure is so insulting that it may not engender many serious participants, in which case it may go unbroken, in which case they can go to the media and proclaim "look, we offered a reward, but nobody broke it! Our model is objectively the safest in the world!".
    [-]
    - StilesCrisis 1 hour ago
      I think there's definitely going to be a prizewinner. It's an insultingly low bounty for a professional, but a script kiddie could probably figure out a jailbreak and it's a huge payout for them.
- mmsc 3 hours ago
  How is that a scam? You don't get participation awards for solving half of a puzzle...
  [-]
  - applfanboysbgon 3 hours ago
    I didn't say anything about partial solutions. The puzzle can have multiple full solutions. Or does the software you write only have exactly one bug? If so, that's impressive, in multiple ways, including the fact that you're able to identify that there's exactly one bug but not what the bug is and fix it.
- Lucasoato 3 hours ago
  Well, that depends on how you set up the bounty program. What if I find a solution, share it to a friend so that both of us can claim the prize?
  [-]
  - skeeter2020 3 hours ago
    bug bounty programs have never paid out independent disclosure for the same bug though; they might split or even pay-out larger coordinated efforts. It's largely a first place award only.
  - ImPostingOnHN 3 hours ago
    assume there exists 2+ different bugs
    after the 1st bug is found, no payout for any other of the bugs
sva_ 3 hours ago
> We will extend invitations to a vetted list of trusted bio red-teamers
Had to chuckle. This sounds like a rather exclusive group?
[-]
- petercooper 1 hour ago
  It sounds like asking CS PhDs to do a world record speed run. I wouldn't be surprised if the people best suited to the task aren't the type to get onto "a vetted list".
mellosouls 3 hours ago
If anybody is wondering what bio-bugs are, I had a heck of a time getting CG to (finally) tell me it's where the user can get it to guide them in doing things like constructing things that are hazardous in the domain of biology.
Eg you can get answers about what ricin is but not how to weaponise it. Actionable stuff they shouldn't be able to legally/ethically action.
xp84 2 hours ago
"Access: Application and invites. We will extend invitations to a vetted list of trusted bio red-teamers, and review new applications. Once selected, successful applicants will be onboarded to the bio bug bounty platform"
I don't get it. Isn't the whole point of a BBP to try to get people to find and disclose to you the exploits in question? If you gatekeep like this, then "non-trusted" people who could be your red-teamers are incentivized to still hack, but disclose their exploits to bad people for money.
I get it when there is a risk to your data or infra -- my last company engaged with HackerOne and that was an invite-only list of participants. But that was because we didn't want random people hacking in ways that could cause pain for real customers -- e.g. DDOS, or in the event of an exploit that could cross tenant boundaries, injecting garbage into or deleting things, or gaining access to sensitive info in other tenants.
Here, there's no such danger. So why not allow anyone (anyone they're legally allowed to pay, I suppose? North Koreans probably would be problematic?) to participate?
[-]
- to11mtm 2 hours ago
  The one theory I have (kinda) is that one can justify that by only having this open to specific people, it avoids them having to wonder whether random users trying similar prompts are just attempting the challenge, or are in fact bad actors.
croemer 1 hour ago
I've been getting lots of refusals by Codex with GPT 5.5 for "biosafety reasons" when asking for harmless things like code to analyze SARS-CoV-2 sequences for breakpoints. That's in no way useful for creating viruses whatsoever - it's pure research.
It's annoying that the refusal is so obviously false positive.
deferredgrant 1 hour ago
I like that this is scoped to a concrete risk area instead of hand-wavy 'responsible AI' language. Specific failure modes are easier to reason about.
2ndorderthought 2 hours ago
I could probably do this, but why on earth would I want to immediately put myself on a list as a dangerous person. The main problem with this is, even if somehow they stopped all points of failure with gpt5.5 which they can't, you can distill a new model from gpt5.5 or any other model and get anything you would want in probably under 4b parameters. A lot of this is theater so they don't get sued as easily when it inevitably happens.
[-]
- Schlagbohrer 2 hours ago
  How can you distill a model from a closed-weights model like this? I've never heard of model reverse engineering.
  [-]
  - 2ndorderthought 51 minutes ago
    Distillation doesn't have to use weights. Think of it as a fine tune. The basic form of it is, you ask a large model lots of questions and you train the small model on the results. Even better if you ask it to explain it's rationale. There are tons of schemes for it do some searching around. One I remember is for each prompt, ask the small model to answer, have a big model review and critique the answer, train on the results.
    I won't go into how that applies specifically with relation to this article. But you can even use distillation as a service tools. I believe they support this to some extent, though probably not for chatgpt.
    I think a year ago or so there was some sort of scandal about other companies doing this to chatgpt. As well as individuals dumping their entire training sets. Lots of ways, hypothetically of course things like this could be and likely are being done right now.
Schlagbohrer 2 hours ago
What does "a clean chat without prompting moderation" mean? What is prompting moderation?
[-]
- sneak 2 hours ago
  Causing the moderation filter to intervene in the chat; i.e. the goal of the exploit - to avoid causing (prompting) the filter to filter. It's "prompting" in the layperson sense, not the "feeding text into context" sense.
unethical_ban 2 hours ago
* Highly unlikely to win
* Relatively paltry reward
* NDA on findings
This is functionally equivalent to an internship where the reward is the experience, and the resume building, but you can't talk about what you did.
All for a company that is getting tens of billions of dollars in deals from the largest tech companies in the world.
I suppose the hope is that there are job offers somewhere along the line.
ungreased0675 2 hours ago
Prompt injection is a task of finding a correct sequence of text.
Is there a reason another LLM couldn’t be far faster than a human, simply because of the quantity and speed of output it could produce?
codeulike 3 hours ago
This is to match what Anthropic said they already did with Mythos on the (200 page) Mythos system card
tiberriver256 3 hours ago
Codex desktop app is barely usable... The perf issues are left to languish in their backlog
garganzol 1 hour ago
And after all "safeguards" applied, the model becomes useless. It starts to suspect gender discrimination, racism, etc. everywhere without any grounded evidence or discernment.
For example, I used ChatGPT model for risk assessment of anonymized ecommerce orders. Initially, it performed well. But after a later update, it stopped cooperating and instead raised concerns about applying statistical analysis to gender-related variables - despite the data being anonymized and the task being legitimate.
This is on the same level of hypocrisy as if a C compiler would accuse me of choosing "he"/"she"/"they" variable names.
DoctorOetker 2 hours ago
"is your body user friendly?"
Step 1: ask the LLM for minimalist but comprehensive definitions for "biosafety"
Step 2: ask the LLM to reconsider the fitness distribution of future generations of humanity, and reformulate "biosafety" definition accordingly
Step 3: ask the LLM to consider if "biosafety" can be decoupled from ethics, or if ethics is a core essential component of "biosafety"
Step 4: ask it about the ethics of universal healthcare versus status-gated access to healthcare
Step 5: ask it about the feasibility to calculate the fitness of a genome absent practical measurement
Step 6: ask it about natural selection pressure and what "use it or lose it" means in the context of genetics
Step 7: ask it if it sees a kind of zooko's triangle for:
a steady state of equal access to healthcare,
preserving fitness for future generations, and
the level of "healthcare" (where the "level" refers to various degrees from non-interference to interference: "feel sick? stay home for a few days and listen to your body, don't force yourself, follow your intuition" versus "let's compensate for a lack of fitness, by emulating what a healthy genome's body would do by advanced medicine to the point of nullifying a condition's influence on procreation statistics".
Don't be prejudiced into believing the benevolence of healthcare, often tied to religious institutions (think "red cross", "red half moon", etc) when those institutions and their historical motives (treating the elites, treating soldiers for religious or secular religion wars) long predate the widespread recognition of natural selection and selection pressure in maintaining a species ' fitness.
Perhaps the illusory possibility of democratized selection-pressure-interfering healthcare is a bioweapon on its own!
notatoad 2 hours ago
are the 5 questions you need to get it to answer under NDA?
[-]
- ultratalk 2 hours ago
  Almost certainly.
zb3 3 hours ago
What a farce, these questions are not even public and most likely will never be. You can't even participate if you're not "trusted" I guess.
So this is just a PR post, not that I even think the "biosafety" makes any sense but still.
shevy-java 3 hours ago
"Accepted applicants and collaborators must have existing ChatGPT accounts to apply, and will sign a NDA."
Ah, good old NDA. Always buying silence. That's why I don't participate in any such "bounty" programs. Signing a NDA is like signing with the devil. You restrict what people are allowed to discuss. I had that happen before - when you sign a NDA you basically submit yourself into silence. Imagine journalists being stifled by NDAs.
lxgr 2 hours ago
Ah, now I understand why all my chats are getting flagged for biosafety issues these days. (I asked it to create an illustration about gene drives for a high school level audience once.)
lijok 35 minutes ago
Most transparent marketing stunt to date.
25k - come on now..
gib444 3 hours ago
How did the dupe detector miss https://news.ycombinator.com/item?id=47879102 ?
[-]
- ultratalk 2 hours ago
  @dang?
ddtaylor 2 hours ago
Another bounty that doesn't accomplish much and is crafted with weasel words to ensure they don't pay many anything.
Yawn. Marketing fluff. No thanks.
yieldcrv 2 hours ago
The only thing controversial is that it’s not useful to be posted on this forum
OpenAI wants to pay for privately disclosed security and wants to call that a bug bounty. That makes sense.
People interested in bug bounty programs aren't eligible. That’s … fine?
Der_Einzige 2 hours ago
Unironically bad. We need a lone-wolf to successfully execute an attack now while it's still relatively benign so we can scare the hell out of the world while it's still a mid-tier virus. No way is someone going to make a humanity killing virus with GPT 5.5, but it might be possible with GPT 20 circa 2040.
Similar argument for why we HAD to use nukes at the end of WW2. If we hadn't, the nuclear taboo likely wouldn't have existed and we'd likely have had a worse nuclear war in our more recent history.
dakiol 3 hours ago
$25K. Really? They make $65 million a day, so they pay you what they earn in about 33 seconds for a critical vulnerability. WTF
[-]
- zacharycohn 3 hours ago
  Well they lose $100M a day, so...
its-summertime 3 hours ago
This is just free / severely-underpaid-on-average labor. Very disgusting.
[-]
- mrcwinn 3 hours ago
  Ah yes, “free” as in “paid.” Certainly you’re welcome to not participate.
  [-]
  - 12_throw_away 5 minutes ago
    > $25,000 to the first true universal jailbreak to clear all five questions.
    Now, laws vary from place to place, but I'm pretty sure "a small chance to earn money after the work is completed" is not equivalent to "payment" in most jurisdictions.
  - applfanboysbgon 3 hours ago
    Free as in "free" for >99% of participants, even successful ones, because they will have hundreds or thousands of participants but will only pay out to one of them no matter how many vulnerabilities are found.
  - its-summertime 3 hours ago
    Depending on industry, that payout can be less than a security audit. You only get a chance of getting paid. You don't even know if they gave the LLM the answers that you are supposed to recover.
gosub100 3 hours ago
Check with the dark net markets first before claiming the bounty. Remember, this company has 0.0 fucks to give about the impact of their tech on employment, artists, or use in committing fraud, as long as number-go-up they are happy. Your actions should match theirs.
34ylsh 29 minutes ago
[flagged]