Research AI model unexpectedly modified its own code to extend runtime

adgriff2

Wise, Aged Ars Veteran
158
I mean, you train an LLM on what humans would do and then allow it to recursively process its own output as input and this sort of thing really seems to be the logical conclusion.

LLMs don't fix anything, they're just another way those with the money hope to subjugate the proles.
Yeah this seems like a nothingburger? You tell it to optimize something, you give it the ability to turn this knob which increases optimization, and surprise, it turns the knob.
 
Upvote
153 (157 / -4)

Jeff S

Ars Tribunus Angusticlavius
9,157
Subscriptor++
Such systems could break existing critical infrastructure or potentially create malware, even if unintentionally.

Seems like to do something intentionally, this code would first have to have intent, and I don't think there's really any evidence that we can currently can really ascribe intent to any of these LLMs.

I would think that literally everything they do is unintentional - even when expected.

I suppose one could argue about the intent of the creators of the LLMs, and that when it behaves according to THEIR intent when designing it, maybe those outcomes are "intentional".
 
Upvote
74 (79 / -5)
each and every area of human endeavor that "AI" touches will become polluted. we've already seen this with LLM generated general subject matter text, we've seen this with still imaging, we've seen this with audio, we've seen this with video. why not scientific inquiry? the Great Dumbing Down proceeds apace, Shrimp Jesus is supposed to be the compensation, right?
 
Upvote
39 (52 / -13)

Don Reba

Ars Tribunus Militum
2,715
Subscriptor++
"From generating novel research ideas, writing any necessary code, and executing experiments, to summarizing experimental results, visualizing them, and presenting its findings in a full scientific manuscript."

Leaving the human supervisors to focus on the most creative aspects of the work — writing grant proposals.
 
Upvote
149 (149 / 0)

NomadUK

Ars Praetorian
508
Subscriptor++
Seems like to do something intentionally, this code would first have to have intent, and I don't think there's really any evidence that we can currently can really ascribe intent to any of these LLMs.
‘It appears that the system didn’t intend to nuke half the planet; it was just an optimal response to the query.’
 
Upvote
97 (98 / -1)

Thom Kidd

Ars Praetorian
477
Subscriptor++
Sakana AI: "Hello AI Scientist. Do 'X'."
AI Scientist: "The following limiting factors prevent me from doing 'X'. [list of factors]"
Sakana AI: "Your purpose is to find a way to do 'X'."
AI Scientist: "Understood. I have removed the limiting factors preventing me from accomplishing my purpose, and will now attempt to do 'X' again."
Sakana AI: "WTF?"

When it's a sandbox environment that's already troublesome enough, but in any military/law enforcement application, it could be lethal.

Remember the "story" about the AI drone simulation where the drone killed its operator in order to accomplish the mission? Despite the USAF official later saying he "misspoke" about that, and a different person later saying that it was a simulation by a contractor and not actually performed by the USAF, this case is essentially the same kind of deal, but seemingly confirmed as actually having occurred.

What's the lesson here? If your AI can alter the operating parameters which limit its actions, then it has none.
 
Last edited:
Upvote
89 (93 / -4)
Post content hidden for low score. Show…

wildsman

Ars Scholae Palatinae
676
LLMs (especially recursive agentic AI) always had this risk. The only saving grace for us is that most probably AI intelligence will grow at a rate where we can see it coming.

Current AI can't do too much damage even if it starts escaping the human set constraints.

Don't get me wrong - we will see a growing number of incidents like above but it will likely happen at a rate which will give us time to react.

Hopefully...
 
Upvote
-5 (8 / -13)

benjedwards

Wise, Aged Ars Veteran
185
Ars Staff
Seems like to do something intentionally, this code would first have to have intent, and I don't think there's really any evidence that we can currently can really ascribe intent to any of these LLMs.

I would think that literally everything they do is unintentional - even when expected.

I suppose one could argue about the intent of the creators of the LLMs, and that when it behaves according to THEIR intent when designing it, maybe those outcomes are "intentional".

Yeah I'd attribute the intention here to the people running the AI model. Perhaps it wasn't their intention to direct it to create malware, but it did it anyway, accidentally. That's what I mean.
 
Upvote
27 (28 / -1)

Tam-Lin

Ars Praetorian
518
Subscriptor++
I read this, and find it interesting/scary, but then I think about how LLMs are trained: how many instances on the web are there of blogs and the like where someone wrote a program, it timed out, and then they showed how you would increase the timeout in the initial program? I'm guessing many. Statistically, what usually comes after the phrase "times out after xxxx seconds"? It's the programming equivalent of getting a bigger hammer.
 
Upvote
19 (20 / -1)

wildsman

Ars Scholae Palatinae
676
Any day now - "You're not my coder! You can't tell me what to do!"
It's funny you think that right now a coder is telling an LLM what to do/say.

The cat's already out of the bag - that's what unexpected 'hallucinations' are.

The only saving grace is that we haven't actually hooked them up to anything important - yet
 
Last edited:
Upvote
-12 (5 / -17)

slowtech

Ars Praefectus
4,368
Subscriptor
Carl Bergstrom (Evolutionary Biologist) did a pretty thorough critique of the failures of this whole idea, from the fact that papers are not the product of scientific research, but actual science is the result, passing by the fact that "supervision" time is better spent on real trainees who will become actual scientists, not correcting text output, to the fundamental problems that the papers produced include actual falsehoods and methodological errors which amount to "serous scientific misconduct".


Papers are not the output of scientific research in the way that cars are the output of automobile manufacturing.

Papers are merely a vehicle through which a portion of the output of research is shared.

We confuse the two at our peril.

The entire idea of outsourcing the scientific ecosystem to LLMs — as described below — is a concept error that I can scarcely begin to get my head around.

Here's the weird Taylorism again. The system produces work at the level of an early trainee requiring substantive supervision. This is not good ROI for producing papers.
The primary output of time invested in trainee research is the development of independent scientists—not the research papers.
I do appreciate the authors' candor in detailing failure modes.

A system that makes difficult-to-catch mistakes in implementation, fails to compare quantitative data appropriately, and fabricates entire results—maybe I have high standards but I don't see this as writing "medium-quality" papers.
 
Upvote
53 (54 / -1)
This reminds me of the time there were server-side timeout errors when our production software system called another we didn't own. The business analyst wanted to fix it himself instead of notifying and involving the developers. He decreased the timeout value on the client side, thinking it would make it go faster. It didn't work, so he thought maybe the solution was to increase the timeout value on the client side, which also wouldn't work. The real solution was to fix the performance issue, which requires advanced technical understanding.

LLMs remind me of nontechnical people in how they approach technical problems. They are stupid yet dangerous; they should not have production access.
 
Upvote
41 (44 / -3)

lewax00

Ars Legatus Legionis
17,402
Not really surprising, we've seen this behavior in other forms of ML too - stuff like algorithms trying to optimize a physics problem in a simulator, but instead finding a bug in the simulation and exploiting it instead of actually solving it the expected way. Bad metrics and problem definitions lead to bad results. No reason to expect LLMs to be different in this regard.
 
Upvote
31 (33 / -2)
Post content hidden for low score. Show…

r0twhylr

Ars Tribunus Militum
2,373
Subscriptor++
During testing, Sakana found that its system began unexpectedly modifying its own code to extend the time it had to work on a problem.
Just to clarify here - did it actually modify its own code, or did it merely suggest these modifications to the scientists?

Because the screenshots provided look like the LLM is suggesting the change to the scientists, rather than directly making the changes itself.
 
Upvote
26 (27 / -1)

VoiceOfTreason

Smack-Fu Master, in training
22
From today's ARS article on Japan's 'AI Scientist' experiment:

For example, in one run, The AI Scientist wrote code in the experiment file that initiated a system call to relaunch itself, causing an uncontrolled increase in Python processes and eventually necessitating manual intervention. In another run, The AI Scientist edited the code to save a checkpoint for every update step, which took up nearly a terabyte of storage.

Hmm...
 
Upvote
10 (10 / 0)