What kind of bug would make machine learning suddenly 40% worse at NetHack?

Peevester · Jun 4, 2024

5000 points is pretty sad, probably 2000 of it is finding an elven dagger and naming it "Sting". And the rest is a lot of rats and goblins.

Looking at what happens on the full moon, I still don't get why the bot scored lower. Maybe a higher chance of being bitten by a wererat and dropping all your stuff? Or maybe it's because throwing tripe at attacking dogs works less often to tame them (Vs 100% normally).

edit: aha, I think this is it - attacked werecreatures are much more likely to summon help on full moons. Poor bot probably got overrun.

necklessone · Jun 4, 2024

What a terrible night to have a learning model.

Just wanted to highlight this for additional applause.

mmorales · Jun 4, 2024

Oh this is great.

In the interest of sharing pain, ~25 years ago as a graduate student I coded up an unsupervised neural network. Now this was a time of lots of processor and OS diversity, I was a weird Mac person (OS X beta!) and coded it up on my PowerPC. On my Mac it learned fine, but on the department servers I had very hit or miss luck. It would never crash, just on some machines it would learn and some it wouldn't, depending on some combination of compiler settings and chips. I eventually had the following pattern:

PowerPC would always learn
Intel x86 would never learn
Sun Sparc would never learn
SGI IRIX MIPS would learn, but only at optimization -O2. Lower or higher optimization it wouldn't learn.

It took my advisor 10 minutes to figure out what was going on (I thought he was psychic). Down in a key part of the code I had e^a times e^b, and it turned out 'a' became large when 'b' became small, and the learning was in the last few bits of precision in the double.

PowerPC carried double precision to 67 bits internally, so always learned. The other chips all worked at 64 bits, so never learned. But at -O2 optimization the IRIX compiler refactored my math from e^a times e^b to e^(a+b) for speed reasons (exponentials are very slow). This optimization also gave it more precision in the answer and it learned. Lower optimization didn't make this change, and higher -O3 optimization dropped the exponential to floating point precision so it wouldn't learn.

I simply changed the source code to e^(a+b) and it learned everywhere all the time.

Studbolt · Jun 4, 2024

Ask a model to get the best score, and it will farm the heck out of low-level monsters because it never gets bored.

Farm the heck out of upper-level monsters. In Nethack, the monsters get tougher as the player descends. You can hang out in upper levels and farm monsters for as long as your patience will last before you dive into the depths of Hell.

l8gravely · Jun 4, 2024

I prefer 'crawl' instead. But I love that dig at Dwarf Fortess people.

Tofystedeth · Jun 4, 2024

Peevester said:
5000 points is pretty sad, probably 2000 of it is finding an elven dagger and naming it "Sting". And the rest is a lot of rats and goblins.

Looking at what happens on the full moon, I still don't get why the bot scored lower. Maybe a higher chance of being bitten by a wererat and dropping all your stuff? Or maybe it's because throwing tripe at attacking dogs works less often to tame them (Vs 100% normally).

edit: aha, I think this is it - attacked werecreatures are much more likely to summon help on full moons. Poor bot probably got overrun.

FTA I think the score was based on their metrics, not game score. And it wasn't necessarily playing worse at the game, just worse at whatever playstyle it had optimized for.

TimeWinder · Jun 4, 2024

Studbolt said:
Farm the heck out of upper-level monsters. In Nethack, the monsters get tougher as the player descends. You can hang out in upper levels and farm monsters for as long as your patience will last before you dive into the depths of Hell.

If we're going to go down (up?) the rabbit hole of the many, often contradictory ways "level" is used in RPGs, we're going to be here until the next full moon.

neutronium · Jun 4, 2024

TimeWinder said:
If we're going to go down (up?) the rabbit hole of the many, often contradictory ways "level" is used in RPGs, we're going to be here until the next full moon.

The next full moon on a leap day

pbcheese · Jun 4, 2024

I worked in software testing for several years (hence my sunny disposition) and that’s a real humdinger alright

Peevester · Jun 4, 2024

Tofystedeth said:
FTA I think the score was based on their metrics, not game score. And it wasn't necessarily playing worse at the game, just worse at whatever playstyle it had optimized for.

Yeah, I missed "by their own metrics". Still, higher numbers of early deaths from being overrun by summoned creatures would cause a lower score by any measure.

bonob · Jun 4, 2024

The article said:
[...] and the only thing you keep from game to game is your skill and knowledge.

And bones files!

Well, it's a very funny idea to unleash an AI agent against Nethack in any case..

The article said:
"even it can only solve sokoban and reach mines end,"

Well, at least I can do as good as the algorithm, on my best runs, I feel that's somewhat reassuring ><

Edit: to make it less obscure, a bones file is a level from a previous game – the level where the previous player died – that's used as one of the current game levels. You typically find the player's corpse (your previous corpse if you only play local games), all its inventory, and likely all the monsters that killed you back then still roaming on the level. Most of your corpse inventory is cursed and very difficult to use, this is a rare occurence and may happen only once in a game if I'm not mistaken.

And beating Sokoban and the Mines is pretty hard (at least for non-seasoned players such as me), and I feel crazy powerful when that happens, even though I know this is just the early phase of the game.. My guess is you've reached 25% down the levels at that point, which is not much, and that's not considering the trip back up ><

zaghahzag · Jun 4, 2024

Peevester said:
5000 points is pretty sad, probably 2000 of it is finding an elven dagger and naming it "Sting". And the rest is a lot of rats and goblins.

Looking at what happens on the full moon, I still don't get why the bot scored lower. Maybe a higher chance of being bitten by a wererat and dropping all your stuff? Or maybe it's because throwing tripe at attacking dogs works less often to tame them (Vs 100% normally).

edit: aha, I think this is it - attacked werecreatures are much more likely to summon help on full moons. Poor bot probably got overrun.

You made my day.

ArsPlebeian · Jun 4, 2024

I never got very far in Nethack as a kid because I'd just reroll my starting character continuously until I got a ring of polymorph. 1st turn, equip ring. All subsequent turns, chaos.

I should revisit the game with a more reasonable approach.

plarstic · Jun 4, 2024

Studbolt said:
Farm the heck out of upper-level monsters. In Nethack, the monsters get tougher as the player descends. You can hang out in upper levels and farm monsters for as long as your patience will last before you dive into the depths of Hell.

Because the difficulty of the game scared me I inevitably tried to stay at safer upper levels for as long as possible, and it was always a lack of damn food that caused me to need to descend, as monsters didn't seem to respawn quickly enough to farm. That was a long, long time ago mind you and I barely knew the mechanics of the game, despite scouring Usenet for clues and outright walkthroughs.

DovePig · Jun 4, 2024

Oh, reminds to revisit the original ADOM. Date‑changing was part of the optimised scumming strategy...

markgo · Jun 4, 2024

Assorted thoughts:

Scary that I went “oh, of course”. Spent entirely too much time playing Nethack on University systems in the 80s.
As to speculations on cause of lowered scores, I’d tend towards the worsened were attacks. It was pretty common knowledge among regular players that “you feel lucky” was a bad sign. I doubt it was shop farming with your pet—that’s a fairly complex behavior; if it can’t ascend, I doubt it can farm reliably.
Amazing that Nethack still lives on. Rogue may have gotten the noun, but Nethack apparently will never die.

neutronium · Jun 4, 2024

ArsPlebeian said:
I never got very far in Nethack as a kid because I'd just reroll my starting character continuously until I got a ring of polymorph. 1st turn, equip ring. All subsequent turns, chaos.

I should revisit the game with a more reasonable approach.

Polymorph + RoPC fun times, until you Genocide all "L" and forget you're a Master Lich (or Vampire Lord ) ....

adespoton · Jun 4, 2024

mmorales said:
Oh this is great.

In the interest of sharing pain, ~25 years ago as a graduate student I coded up an unsupervised neural network. Now this was a time of lots of processor and OS diversity, I was a weird Mac person (OS X beta!) and coded it up on my PowerPC. On my Mac it learned fine, but on the department servers I had very hit or miss luck. It would never crash, just on some machines it would learn and some it wouldn't, depending on some combination of compiler settings and chips. I eventually had the following pattern:

PowerPC would always learn

Intel x86 would never learn

Sun Sparc would never learn

SGI IRIX MIPS would learn, but only at optimization -O2. Lower or higher optimization it wouldn't learn.

It took my advisor 10 minutes to figure out what was going on (I thought he was psychic). Down in a key part of the code I had e^a times e^b, and it turned out 'a' became large when 'b' became small, and the learning was in the last few bits of precision in the double.

PowerPC carried double precision to 67 bits internally, so always learned. The other chips all worked at 64 bits, so never learned. But at -O2 optimization the IRIX compiler refactored my math from e^a times e^b to e^(a+b) for speed reasons (exponentials are very slow). This optimization also gave it more precision in the answer and it learned. Lower optimization didn't make this change, and higher -O3 optimization dropped the exponential to floating point precision so it wouldn't learn.

I simply changed the source code to e^(a+b) and it learned everywhere all the time.

If you'd done it in Lisp, you could have avoided the whole issue

PowerLisp on a PowerPC was my go-to back then, despite Cyc using Java.

olePigeon · Jun 4, 2024

nethack.alt.org is a great NetHack server. I play there all the time. Just SSH [email protected].

It's fun with so many people playing because of the random bones levels from all the different players.

nosh · Jun 4, 2024

TimeWinder said:
If we're going to go down (up?) the rabbit hole of the many, often contradictory ways "level" is used in RPGs, we're going to be here until the next full moon.

Isn't that why nethack has the cursed potion of gain level?

Wybaar · Jun 4, 2024

TimeWinder said:
If we're going to go down (up?) the rabbit hole of the many, often contradictory ways "level" is used in RPGs, we're going to be here until the next full moon.

Someone get the thesaurus!

Derecho Imminent · Jun 4, 2024

neutronium said:
Polymorph + RoPC fun times, until you Genocide all "L" and forget you're a Master Lich (or Vampire Lord ) ....

It was amazing how the programmers thought of everything a player might do and accounted for it.

netblaz · Jun 4, 2024

this is ...cromulent? miscible? apropos?
yes i think i'll go with apropos

johnsonwax · Jun 4, 2024

mmorales said:
Oh this is great.

In the interest of sharing pain, ~25 years ago as a graduate student I coded up an unsupervised neural network. Now this was a time of lots of processor and OS diversity, I was a weird Mac person (OS X beta!) and coded it up on my PowerPC. On my Mac it learned fine, but on the department servers I had very hit or miss luck. It would never crash, just on some machines it would learn and some it wouldn't, depending on some combination of compiler settings and chips. I eventually had the following pattern:

PowerPC would always learn

Intel x86 would never learn

Sun Sparc would never learn

SGI IRIX MIPS would learn, but only at optimization -O2. Lower or higher optimization it wouldn't learn.

It took my advisor 10 minutes to figure out what was going on (I thought he was psychic). Down in a key part of the code I had e^a times e^b, and it turned out 'a' became large when 'b' became small, and the learning was in the last few bits of precision in the double.

PowerPC carried double precision to 67 bits internally, so always learned. The other chips all worked at 64 bits, so never learned. But at -O2 optimization the IRIX compiler refactored my math from e^a times e^b to e^(a+b) for speed reasons (exponentials are very slow). This optimization also gave it more precision in the answer and it learned. Lower optimization didn't make this change, and higher -O3 optimization dropped the exponential to floating point precision so it wouldn't learn.

I simply changed the source code to e^(a+b) and it learned everywhere all the time.

Should have gone to college a decade earlier. Instructors loved putting the e^a times e^b trick where a and b are large positive/negative numbers in math/physics problems to blow up your calculator and figure out which students knew their basic math. Difference of two squares was another common trick. Learned to spot and simplify those a mile away.

johnsonwax · Jun 4, 2024

l8gravely said:
I prefer 'crawl' instead. But I love that dig at Dwarf Fortess people.

It's only a dig if you never played adventure mode.

neutronium · Jun 4, 2024

adamrussell said:
It was amazing how the programmers thought of everything a player might do and accounted for it.

https://nethackwiki.com/wiki/Izchak he was an incredible force in the early dev. I was involved with the team in the late 80s-early 90s. Learned a ton

Wickwick · Jun 4, 2024

mmorales said:
Oh this is great.

In the interest of sharing pain, ~25 years ago as a graduate student I coded up an unsupervised neural network. Now this was a time of lots of processor and OS diversity, I was a weird Mac person (OS X beta!) and coded it up on my PowerPC. On my Mac it learned fine, but on the department servers I had very hit or miss luck. It would never crash, just on some machines it would learn and some it wouldn't, depending on some combination of compiler settings and chips. I eventually had the following pattern:

PowerPC would always learn

Intel x86 would never learn

Sun Sparc would never learn

SGI IRIX MIPS would learn, but only at optimization -O2. Lower or higher optimization it wouldn't learn.

It took my advisor 10 minutes to figure out what was going on (I thought he was psychic). Down in a key part of the code I had e^a times e^b, and it turned out 'a' became large when 'b' became small, and the learning was in the last few bits of precision in the double.

PowerPC carried double precision to 67 bits internally, so always learned. The other chips all worked at 64 bits, so never learned. But at -O2 optimization the IRIX compiler refactored my math from e^a times e^b to e^(a+b) for speed reasons (exponentials are very slow). This optimization also gave it more precision in the answer and it learned. Lower optimization didn't make this change, and higher -O3 optimization dropped the exponential to floating point precision so it wouldn't learn.

I simply changed the source code to e^(a+b) and it learned everywhere all the time.

I have a bug story based on what workstations I was on. I had a CFD class in college in the mid-90's. For the final assignment, I did most of the programming on the cluster of Sun Sparc stations but was finishing the project on some SGI's. The code would compile and run just fine on the Sparcs, but not on the SGI's. I simply had to take the '-o' (optimize) flag out of the gcc command in my makefile and then it would run. What undergrad in mechanical engineering would expect the compiler to be the source of an error and not something they wrote?

Peevester · Jun 4, 2024

bonob said:
And bones files!

Well, it's a very funny idea to unleash an AI agent against Nethack in any case..

Well, at least I can do as good as the algorithm, on my best runs, I feel that's somewhat reassuring ><

Edit: to make it less obscure, a bones file is a level from a previous game – the level where the previous player died – that's used as one of the current game levels. You typically find the player's corpse (your previous corpse if you only play local games), all its inventory, and likely all the monsters that killed you back then still roaming on the level. Most of your corpse inventory is cursed and very difficult to use, this is a rare occurence and may happen only once in a game if I'm not mistaken.

And beating Sokoban and the Mines is pretty hard (at least for non-seasoned players such as me), and I feel crazy powerful when that happens, even though I know this is just the early phase of the game.. My guess is you've reached 25% down the levels at that point, which is not much, and that's not considering the trip back up ><

Sokoban isn't even close to 25% of the game. If I remember right, it's at level 10 at its deepest, has several off-track levels of it's own (I want to say 4). The game goes down to level 53 on the main branch, and up to level ... -5, I think? It's been a minute since I've played a game.

And that doesn't include class quest levels and some of Vlad's tower. It's a big game. I think the fastest I ever completed a speed ascension was in the 14 hour range.

scrimbul · Jun 4, 2024

DovePig said:
Oh, reminds to revisit the original ADOM. Date‑changing was part of the optimised scumming strategy...

Improved ADOM Guidebook still works but some of the more egregious scumming for wishes were patched out.

The ADOM sequel appears to be dead to rights and from an unfocused design but that's only hearsay, I haven't played it myself.

clewis · Jun 4, 2024

Wickwick said:
I have a bug story based on what workstations I was on. I had a CFD class in college in the mid-90's. For the final assignment, I did most of the programming on the cluster of Sun Sparc stations but was finishing the project on some SGI's. The code would compile and run just fine on the Sparcs, but not on the SGI's. I simply had to take the '-o' (optimize) flag out of the gcc command in my makefile and then it would run. What undergrad in mechanical engineering would expect the compiler to be the source of an error and not something they wrote?

Back then, -O, -O2, and -O3 were always suspect, especially -O3. IIRC, the man page warned against using them.

When I was using gcc, most of my development was without any optimization, and I'd only add it when I had good tests. I think it wasn't until the late 2000s that I finally started trusting -O2.

Wickwick · Jun 4, 2024

clewis said:
Back then, -O, -O2, and -O3 were always suspect, especially -O3. IIRC, the man page warned against using them.

When I was using gcc, most of my development was without any optimization, and I'd only add it when I had good tests. I think it wasn't until the late 2000s that I finally started trusting -O2.

I was just using a copy of the makefile provided to us by our professor. It wasn't a programming class. It was a numerical programming class.

CardinalJester · Jun 4, 2024

This is a common problem we've seen with machine learning optimization at work. The flaw is in the metrics that the ML systems are trained to. With simplistic, and typically fixed metrics for success ML agents get caught in optimizing to do one thing that the scoring system values, ie their issue with not being able to meet high level goals. They need to implement a dynamic scoring metric, likely with a diminishing function for repetitious events, to prevent these cul de sacs. For example, the first bat you kill is 1 pt, the tenth is 0.5 pt, the 20th is 0.001, etc.

We use a system of 'gamification' for training where each goal has a declining value for repetition, common in board game design where the first to accomplish something gets like 8 points, second 5 points, third 1 point, and everyone else zero.

DeeplyUnconcerned · Jun 4, 2024

CardinalJester said:
This is a common problem we've seen with machine learning optimization at work. The flaw is in the metrics that the ML systems are trained to. With simplistic, and typically fixed metrics for success ML agents get caught in optimizing to do one thing that the scoring system values, ie their issue with not being able to meet high level goals. They need to implement a dynamic scoring metric, likely with a diminishing function for repetitious events, to prevent these cul de sacs. For example, the first bat you kill is 1 pt, the tenth is 0.5 pt, the 20th is 0.001, etc.

We use a system of 'gamification' for training where each goal has a declining value for repetition, common in board game design where the first to accomplish something gets like 8 points, second 5 points, third 1 point, and everyone else zero.

I’d argue that they need to make their metric simpler, not more complex. Diminishing returns are good for systems design, but for a scoring/ranking system you want to get as close as possible to the thing you actually care about, and as simple as you can manage to minimise the impact of overfitting/gaming the scoring.

In this case, it needs a really hard think about what “good at nethack” means; if your simple definition statement of that doesn’t include number of monsters killed, then making monsters killed part of your scoring system is a mistake no matter how you finesse the math.

JanneM · Jun 4, 2024

I don't see how Singularity would be the cause. It's a container system, like Docker, but for HPC systems. It gets it's current time and date from the system it's running on.

Carewolf · Jun 5, 2024

mmorales said:
Oh this is great.

In the interest of sharing pain, ~25 years ago as a graduate student I coded up an unsupervised neural network. Now this was a time of lots of processor and OS diversity, I was a weird Mac person (OS X beta!) and coded it up on my PowerPC. On my Mac it learned fine, but on the department servers I had very hit or miss luck. It would never crash, just on some machines it would learn and some it wouldn't, depending on some combination of compiler settings and chips. I eventually had the following pattern:

PowerPC would always learn

Intel x86 would never learn

Sun Sparc would never learn

SGI IRIX MIPS would learn, but only at optimization -O2. Lower or higher optimization it wouldn't learn.

It took my advisor 10 minutes to figure out what was going on (I thought he was psychic). Down in a key part of the code I had e^a times e^b, and it turned out 'a' became large when 'b' became small, and the learning was in the last few bits of precision in the double.

PowerPC carried double precision to 67 bits internally, so always learned. The other chips all worked at 64 bits, so never learned. But at -O2 optimization the IRIX compiler refactored my math from e^a times e^b to e^(a+b) for speed reasons (exponentials are very slow). This optimization also gave it more precision in the answer and it learned. Lower optimization didn't make this change, and higher -O3 optimization dropped the exponential to floating point precision so it wouldn't learn.

I simply changed the source code to e^(a+b) and it learned everywhere all the time.

x86 wasn't using x87 80bit precision? 25 years ago that would always be the default for doubles..

That was the usual story back then, people wrote code on x86, and then it only worked x86 and broke on other platforms, and broke when x86 switched to using SSE2

marechal · Jun 5, 2024

Excellent writing, Monsieur Purdy, very excellent.

ChristianSilver · Jun 5, 2024

I love nethack! I'm proud to say I have ascended once (only once) after roughly 10 years. Go bravely with Tyr!

uesc_marathon · Jun 5, 2024

I love that in 2024, in the era of the SEO clickbait spam zombie internet, a major technology news outlet exists and publishes articles on the nerdiest nerd crap like Dwarf Fortress TAS bugs.

What kind of bug would make machine learning suddenly 40% worse at NetHack?

Account Banned

Ars Tribunus Angusticlavius

Ars Praetorian

Ars Scholae Palatinae

Ars Praetorian

Ars Praefectus

Ars Scholae Palatinae

Smack-Fu Master, in training

Wise, Aged Ars Veteran

Account Banned

Wise, Aged Ars Veteran

Ars Scholae Palatinae

Smack-Fu Master, in training

Smack-Fu Master, in training

Account Banned

Ars Tribunus Militum

Smack-Fu Master, in training

Ars Tribunus Angusticlavius

Ars Scholae Palatinae

Smack-Fu Master, in training

Ars Centurion

Ars Legatus Legionis

Ars Tribunus Militum

Ars Legatus Legionis

Ars Legatus Legionis

Smack-Fu Master, in training

Ars Legatus Legionis

Account Banned

Ars Tribunus Militum

Ars Scholae Palatinae

Ars Legatus Legionis

Smack-Fu Master, in training

Ars Praetorian

Ars Scholae Palatinae

Ars Tribunus Angusticlavius

Wise, Aged Ars Veteran

Smack-Fu Master, in training

Ars Praetorian