ChatGPT 'got absolutely wrecked' by Atari 2600 in beginner's chess match — OpenAI's newest model bamboozled by 1970s logic

Lifecoach5000@lemmy.world · 21 days ago

ChatGPT 'got absolutely wrecked' by Atari 2600 in beginner's chess match — OpenAI's newest model bamboozled by 1970s logic

arc99@lemmy.world · 20 days ago

Hardly surprising. Llms aren’t -thinking- they’re just shitting out the next token for any given input of tokens.

Steve Dice@sh.itjust.works · 19 days ago

That’s exactly what thinking is, though.

arc99@lemmy.world · edit-2 18 days ago

An LLM is an ordered series of parameterized / weighted nodes which are fed a bunch of tokens, and millions of calculations later result generates the next token to append and repeat the process. It’s like turning a handle on some complex Babbage-esque machine. LLMs use a tiny bit of randomness (“temperature”) when choosing the next token so the responses are not identical each time.

But it is not thinking. Not even remotely so. It’s a simulacrum. If you want to see this, run ollama with the temperature set to 0 e.g.

ollama run gemma3:4b
>>> /set parameter temperature 0
>>> what is a leaf

You will get the same answer every single time.

AlecSadler@sh.itjust.works · 21 days ago

ChatGPT has been, hands down, the worst AI coding assistant I’ve ever used.

It regularly suggests code that doesn’t compile or isn’t even for the language.

It generally suggests AC of code that is just a copy of the lines I just wrote.

Sometimes it likes to suggest setting the same property like 5 times.

It is absolute garbage and I do not recommend it to anyone.

Mobiuthuselah@lemm.ee · 20 days ago

I don’t use it for coding. I use it sparingly really, but want to learn to use it more efficiently. Are there any areas in which you think it excels? Are there others that you’d recommend instead?

Uairhahs@lemmy.world · 20 days ago

Use Gemini (2.5) or Claude (3.7 and up). OpenAI is a shitshow

j4yt33@feddit.org · 20 days ago

I find it really hit and miss. Easy, standard operations are fine but if you have an issue with code you wrote and ask it to fix it, you can forget it

Blackmist@feddit.uk · 20 days ago

It’s the ideal help for people who shouldn’t be employed as programmers to start with.

I had to explain hexadecimal to somebody the other day. It’s honestly depressing.

AlecSadler@sh.itjust.works · 20 days ago

I’ve found Claude 3.7 and 4.0 and sometimes Gemini variants still leagues better than ChatGPT/Copilot.

Still not perfect, but night and day difference.

I feel like ChatGPT didn’t focus on coding and instead focused on mainstream, but I am not an expert.

DragonTypeWyvern@midwest.social · edit-2 20 days ago

Gemini will get basic C++, probably the best documented language for beginners out there, right about half of the time.

I think that might even be the problem, honestly, a bunch of new coders post bad code and it’s fixed in comments but the LLM CAN’T realize that.

NoiseColor @lemmy.world · 20 days ago

I like tab coding, writing small blocks of code that it thinks I need. Its On point almost all the time. This speeds me up.

whoisearth@lemmy.ca · 20 days ago

Bingo. If anything what you’re finding is the people bitching are the same people that if given a bike wouldn’t know how to ride it, which is fair. Some people understand quicker how to use the tools they are given.

nutsack@lemmy.dbzer0.com · 20 days ago

my favorite thing is to constantly be implementing libraries that don’t exist

arc99@lemmy.world · 20 days ago

It’s even worse when AI soaks up some project whose APIs are constantly changing. Try using AI to code against jetty for example and you’ll be weeping.

jj4211@lemmy.world · 20 days ago

Oh man, I feel this. A couple of times I’ve had to field questions about some REST API I support and they ask why they get errors when they supply a specific attribute. Now that attribute never existed, not in our code, not in our documentation, we never thought of it. So I say “Well, that attribute is invalid, I’m not sure where you saw to do that”. They get insistent that the code is generated by a very good LLM, so we must be missing something…

Blackmist@feddit.uk · 20 days ago

You’re right. That library was removed in ToolName [PriorVersion]. Please try this instead.

*makes up entirely new fictitious library name*

Etterra@discuss.online · 20 days ago

That’s because it doesn’t know what it’s saying. It’s just blathering out each word as what it estimates to be the likely next word given past examples in its training data. It’s a statistics calculator. It’s marginally better than just smashing the auto fill on your cell repeatedly. It’s literally dumber than a parrot.

AnUnusualRelic@lemmy.world · 20 days ago

Parrots are actually intelligent though.

arc99@lemmy.world · 20 days ago

All AIs are the same. They’re just scraping content from GitHub, stackoverflow etc with a bunch of guardrails slapped on to spew out sentences that conform to their training data but there is no intelligence. They’re super handy for basic code snippets but anyone using them anything remotely complex or nuanced will regret it.

NateNate60@lemmy.world · 20 days ago

One of my mates generated an entire website using Gemini. It was a React web app that tracks inventory for trading card dealers. It actually did come out functional and well-polished. That being said, the AI really struggled with several aspects of the project that humans would not:

It left database secrets in the code
The design of the website meant that it was impossible to operate securely
The quality of the code itself was hot garbage—unreadable and undocumented nonsense that somehow still worked
It did not break the code into multiple files. It piled everything into a single file

ILikeBoobies@lemmy.ca · 20 days ago

I’ve had success with splitting a function into 2 and planning out an overview, though that’s more like talking to myself

I wouldn’t use it to generate stuff though

🇰 🌀 🇱 🇦 🇳 🇦 🇰 🇮 @pawb.social · 21 days ago

There was a chess game for the Atari 2600? :O

I wanna see them W I D E pieces.

FMT99@lemmy.world · 21 days ago

Prepare to be delighted. Full disclosure, my Atari isn’t hooked up and also I don’t have the Video Chess cart even if it was, so this was fetched from Google Images.

NotMyOldRedditName@lemmy.world · 21 days ago

I’m annoyed the pieces are bottom adjusted…

Beacon@fedia.io · 21 days ago

I bet that’s a slightly unfair representation of what it actually looked like. Graphics back then were purposely designed for how they would look on CRT tvs which add a lot of specific distortions to images. So taking a screenshot of a game running in an emulator without using a high quality crt filter added to the image will be a very untrue representation of what the game actually looked like.

(Don’t get me wrong, I’m not saying it actually looked great when displayed correctly, but i am saying it would’ve looked considerably better than this emulator screenshot)

🇰 🌀 🇱 🇦 🇳 🇦 🇰 🇮 @pawb.social · 21 days ago

Those are some funky looking knights lol

Rhaedas@fedia.io · 21 days ago

There’s some very odd pieces on high dollar physical chess sets too.

SinningStromgald@lemmy.world · 21 days ago

Never seen a snake/horse hybrid before?

SpaceNoodle@lemmy.world · 21 days ago

Snorse

Optional@lemmy.world · 21 days ago

Can confirm.

And if you play it on expert mode, you can leave for college and get your degree before it’s your turn again.

youngalfred@lemm.ee · 21 days ago

Here you go (online emulator): https://www.retrogames.cz/play_716-Atari2600.php

over_clox@lemmy.world · 21 days ago

I wasn’t aware of that either, now I’m kinda curious to try to find it in my 512 Atari 2600 ROMs archive…

Endymion_Mallorn@kbin.melroy.org · 21 days ago

I mean, that 2600 Chess was built from the ground up to play a good game of chess with variable difficulty levels. I bet there’s days or games when Fischer couldn’t have beaten it. Just because a thing is old and less capable than the modern world does not mean it’s bad.

vane@lemmy.world · 21 days ago

It’s not that hard to beat dumb 6 year old who’s only purpose is mine your privacy to sell you ads or product place some shit for you in future.

untakenusername@sh.itjust.works · 19 days ago

this is because an LLM is not made for playing chess

NotMyOldRedditName@lemmy.world · 21 days ago

Okay, but could ChatGPT be used to vibe code a chess program that beats the Atari 2600?

GreenKnight23@lemmy.world · 21 days ago

no.

the answer is always, no.

NotMyOldRedditName@lemmy.world · 21 days ago

The answer might be no today, but always seems like a stretch.

FMT99@lemmy.world · 21 days ago

Did the author thinks ChatGPT is in fact an AGI? It’s a chatbot. Why would it be good at chess? It’s like saying an Atari 2600 running a dedicated chess program can beat Google Maps at chess.

adhdplantdev@lemm.ee · 21 days ago

Articles like this are good because it exposes the flaws with the ai and that it can’t be trusted with complex multi step tasks.

Helps people see that think AI is close to a human that its not and its missing critical functionality

FMT99@lemmy.world · 20 days ago

The problem is though that this perpetuates the idea that ChatGPT is actually an AI.

Broken@lemmy.ml · 21 days ago

I agree with your general statement, but in theory since all ChatGPT does is regurgitate information back and a lot of chess is memorization of historical games and types, it might actually perform well. No, it can’t think, but it can remember everything so at some point that might tip the results in it’s favor.

Eagle0110@lemmy.world · 21 days ago

Regurgitating am impression of, not regurgitating verbatim, that’s the problem here.

Chess is 100% deterministic, so it falls flat.

Raltoid@lemmy.world · edit-2 20 days ago

I’m guessing it’s not even hard to get it to “confidently” violate the rules.

saltesc@lemmy.world · 21 days ago

I like referring to LLMs as VI (Virtual Intelligence from Mass Effect) since they merely give the impression of intelligence but are little more than search engines. In the end all one is doing is displaying expected results based on a popularity algorithm. However they do this inconsistently due to bad data in and limited caching.

TowardsTheFuture@lemmy.zip · 21 days ago

I think that’s generally the point is most people thing chat GPT is this sentient thing that knows everything and… no.

NoiseColor @lemmy.world · 21 days ago

Do they though? No one I talked to, not my coworkers that use it for work, not my friends, not my 72 year old mother think they are sentient.

merdaverse@lemm.ee · edit-2 20 days ago

OpenAI has been talking about AGI for years, implying that they are getting closer to it with their products.

https://openai.com/index/planning-for-agi-and-beyond/

https://openai.com/index/elon-musk-wanted-an-openai-for-profit/

Not to even mention all the hype created by the techbros around it.

FMT99@lemmy.world · 20 days ago

Hey I didn’t say anywhere that corporations don’t lie to promote their product did I?

suburban_hillbilly@lemmy.ml · 21 days ago

Most people do. It’s just called AI in the media everywhere and marketing works. I think online folks forget that something as simple as getting a Lemmy account by yourself puts you into the top quintile of tech literacy.

Opinionhaver@feddit.uk · 20 days ago

Yet even on Lemmy people can’t seem to make sense of these terms and are saying things like “LLM’s are not AI”

iAvicenna@lemmy.world · 20 days ago

well so much hype has been generated around chatgpt being close to AGI that now it makes sense to ask questions like “can chatgpt prove the Riemann hypothesis”

snooggums@lemmy.world · 21 days ago

AI including ChatGPT is being marketed as super awesome at everything, which is why that and similar AI is being forced into absolutely everything and being sold as a replacement for people.

Something marketed as AGI should be treated as AGI when proving it isn’t AGI.

pelespirit@sh.itjust.works · 21 days ago

Not to help the AI companies, but why don’t they program them to look up math programs and outsource chess to other programs when they’re asked for that stuff? It’s obvious they’re shit at it, why do they answer anyway? It’s because they’re programmed by know-it-all programmers, isn’t it.

driving_crooner@lemmy.eco.br · 21 days ago

If you pay for chatgpt you can connect it with wolfrenalpha and it’s relays the maths to it

NoiseColor @lemmy.world · 21 days ago

…or a simple counter to count the r in strawberry. Because that’s more difficult than one might think and they are starting to do this now.

four@lemmy.zip · 21 days ago

I think they’re trying to do that. But AI can still fail at that lol

fmstrat@lemmy.nowsci.com · 20 days ago

This is where MCP comes in. It’s a protocol for LLMs to call standard tools. Basically the LLM would figure out the tool to use from the context, then figure out the order of parameters from those the MCP server says is available, send the JSON, and parse the response.

rebelsimile@sh.itjust.works · 21 days ago

Because they’re fucking terrible at designing tools to solve problems, they are obviously less and less good at pretending this is an omnitool that can do everything with perfect coherency (and if it isn’t working right it’s because you’re not believing or paying hard enough)

MrJgyFly@lemmy.world · 21 days ago

Or they keep telling you that you just have to wait it out. It’s going to get better and better!

CileTheSane@lemmy.ca · 20 days ago

why don’t they program them to look up math programs and outsource chess to other programs when they’re asked for that stuff?

Because the AI doesn’t know what it’s being asked, it’s just a algorithm guessing what the next word in a reply is. It has no understanding of what the words mean.

“Why doesn’t the man in the Chinese room just use a calculator for math questions?”

NobodyElse@sh.itjust.works · 21 days ago

Because the LLMs are now being used to vibe code themselves.

veroxii@aussie.zone · 21 days ago

They are starting to do this. Most new models support function calling and can generate code to come up with math answers etc

ImplyingImplications@lemmy.ca · 21 days ago

why don’t they program them

AI models aren’t programmed traditionally. They’re generated by machine learning. Essentially the model is given test prompts and then given a rating on its answer. The model’s calculations will be adjusted so that its answer to the test prompt will be closer to the expected answer. You repeat this a few billion times with a few billion prompts and you will have generated a model that scores very high on all test prompts.

Then someone asks it how many R’s are in strawberry and it gets the wrong answer. The only way to fix this is to add that as a test prompt and redo the machine learning process which takes an enormous amount of time and computational power each time it’s done, only for people to once again quickly find some kind of prompt it doesn’t answer well.

There are already AI models that play chess incredibly well. Using machine learning to solve a complexe problem isn’t the issue. It’s trying to get one model to be good at absolutely everything.

NoiseColor @lemmy.world · 21 days ago

I don’t think ai is being marketed as awesome at everything. It’s got obvious flaws. Right now its not good for stuff like chess, probably not even tic tac toe. It’s a language model, its hard for it to calculate the playing field. But ai is in development, it might not need much to start playing chess.

BassTurd@lemmy.world · 21 days ago

Marketing does not mean functionality. AI is absolutely being sold to the public and enterprises as something that can solve everything. Obviously it can’t, but it’s being sold that way. I would bet the average person would be surprised by this headline solely on what they’ve heard about the capabilities of AI.

NoiseColor @lemmy.world · 21 days ago

I don’t think anyone is so stupid to believe current ai can solve everything.

And honestly, I didn’t see any marketing material that would claim that.

BassTurd@lemmy.world · 21 days ago

You are both completely over estimating the intelligence level of “anyone” and not living in the same AI marketed universe as the rest of us. People are stupid. Really stupid.

NoiseColor @lemmy.world · 21 days ago

I don’t understand why this is so important, marketing is all about exaggerating, why expect something different here.

BassTurd@lemmy.world · 20 days ago

It’s not important. You said AI isn’t being marketed to be able to do everything. I said yes it is. That’s it.

petrol_sniff_king@lemmy.blahaj.zone · 21 days ago

The Zoom CEO, that is the video calling software, wanted to train AIs on your work emails and chat messages to create AI personalities you could send to the meetings you’re paid to sit through while you drink Corona on the beach and receive a “summary” later.

The Zoom CEO, that is the video calling software, seems like a pretty stupid guy?

Yeah. Yeah, he really does. Really… fuckin’… dumb.

jubilationtcornpone@sh.itjust.works · 21 days ago

Same genius who forced all his own employees back into the office. An incomprehensibly stupid maneuver by an organization that literally owes its success to people working from home.

4am@lemm.ee · 21 days ago

Really then why are they cramming AI into every app and every device and replacing jobs with it and claiming they’re saving so much time and money and they’re the best now the hardest working most efficient company and this is the future and they have a director of AI vision that’s right a director of AI vision a true visionary to lead us into the promised land where we will make money automatically please bro just let this be the automatic money cheat oh god I’m about to

NoiseColor @lemmy.world · 21 days ago

Those are two different things.

they are craming ai everywhere because nobody wants to miss the boat and because it plays well in the stock market.
the people claiming it’s awesome and that they are doing I don’t know what with it, replacing people are mostly influencers and a few deluded people.

Ai can help people in many different roles today, so it makes sense to use it. Even in roles that is not particularly useful, it makes sense to prepare for when it is.

petrol_sniff_king@lemmy.blahaj.zone · 21 days ago

it makes sense to prepare for when it is.

Pfft, okay.

vinnymac@lemmy.world · 21 days ago

What the tech is being marketed as and what it’s capable of are not the same, and likely never will be. In fact all things are very rarely marketed how they truly behave, intentionally.

Everyone is still trying to figure out what these Large Reasoning Models and Large Language Models are even capable of; Apple, one of the largest companies in the world just released a white paper this past week describing the “illusion of reasoning”. If it takes a scientific paper to understand what these models are and are not capable of, I assure you they’ll be selling snake oil for years after we fully understand every nuance of their capabilities.

TL;DR Rich folks want them to be everything, so they’ll be sold as capable of everything until we repeatedly refute they are able to do so.

NoiseColor @lemmy.world · 21 days ago

I think in many cases people intentionally or unintentionally disregard the time component here. Ai is in development. I think what is being marketed here, just like in the stock market, is a piece of the future. I don’t expect the models I use to be perfect and not make mistakes, so I use them accordingly. They are useful for what I use them for and I wouldn’t use them for chess. I don’t expect that laundry detergent to be just as perfect in the commercial either.

Empricorn@feddit.nl · 20 days ago

You’re not wrong, but keep in mind ChatGPT advocates, including the company itself are referring to it as AI, including in marketing. They’re saying it’s a complete, self-learning, constantly-evolving Artificial Intelligence that has been improving itself since release… And it loses to a 4KB video game program from 1979 that can only “think” 2 moves ahead.

FMT99@lemmy.world · 20 days ago

That’s totally fair, the company is obviously lying, excuse me “marketing”, to promote their product, that’s absolutely true.

x00z@lemmy.world · 21 days ago

In all fairness. Machine learning in chess engines is actually pretty strong.

AlphaZero was developed by the artificial intelligence and research company DeepMind, which was acquired by Google. It is a computer program that reached a virtually unthinkable level of play using only reinforcement learning and self-play in order to train its neural networks. In other words, it was only given the rules of the game and then played against itself many millions of times (44 million games in the first nine hours, according to DeepMind).

https://www.chess.com/terms/alphazero-chess-engine

jeeva@lemmy.world · 20 days ago

Sure, but machine learning like that is very different to how LLMs are trained and their output.

malwieder@feddit.org · 20 days ago

Google Maps doesn’t pretend to be good at chess. ChatGPT does.

whaleross@lemmy.world · 20 days ago

A toddler can pretend to be good at chess but anybody with reasonable expectations knows that they are not.

MelodiousFunk@startrek.website · 20 days ago

Plot twist: the toddler has a multi-year marketing push worth tens if not hundreds of millions, which convinced a lot of people who don’t know the first thing about chess that it really is very impressive, and all those chess-types are just jealous.

xavier666@lemm.ee · 20 days ago

Have you tried feeding the toddler gallons of baby-food? Maybe then it can play chess

baggachipz@sh.itjust.works · 20 days ago

They’ve been feeding the toddler everybody else’s baby food and claiming they have the right to.

xavier666@lemm.ee · 20 days ago

“If we have to ask every time before stealing a little baby food, our morbidly obese toddler cannot survive”

FartMaster69@lemmy.dbzer0.com · 21 days ago

I mean, open AI seem to forget it isn’t.

seven_phone@lemmy.world · 21 days ago

You say you produce good oranges but my machine for testing apples gave your oranges a very low score.

wizardbeard@lemmy.dbzer0.com · 20 days ago

No, more like “Your marketing team, sales team, the news media at large, and random hype men all insist your orange machine works amazing on any fruit if you know how to use it right. It didn’t work my strawberries when I gave it all the help I could, and was outperformed by my 40 year old strawberry machine. Please stop selling the idea it works on all fruit.”

This study is specifically a counter to the constant hype that these LLMs will revolutionize absolutely everything, and the constant word choices used in discussion of LLMs that imply they have reasoning capabilities.

IsaamoonKHGDT_6143@lemmy.zip · 21 days ago

They used ChatGPT 4o, instead of using o1 or o3.

Obviously it was going to fail.

wizardbeard@lemmy.dbzer0.com · edit-2 20 days ago

Other studies (not all chess based or against this old chess AI) show similar lackluster results when using reasoning models.

Edit: When comparing reasoning models to existing algorithmic solutions.

Steve Dice@sh.itjust.works · 19 days ago

2025 Mazda MX-5 Miata ‘got absolutely wrecked’ by Inflatable Boat in beginner’s boat racing match — Mazda’s newest model bamboozled by 1930s technology.

oni ᓚᘏᗢ@lemmy.world · 21 days ago

This made my day

hogmomma@lemmy.world · 21 days ago

Get your booty on the floor tonight.

OBJECTION!@lemmy.ml · 21 days ago

Tbf, the article should probably mention the fact that machine learning programs designed to play chess blow everything else out of the water.

wewbull@feddit.uk · 20 days ago

It does

andallthat@lemmy.world · edit-2 20 days ago

Machine learning has existed for many years, now. The issue is with these funding-hungry new companies taking their LLMs, repackaging them as “AI” and attributing every ML win ever to “AI”.

Yes, ML programs designed and trained specifically to identify tumors in medical imaging have become good diagnostic tools. But if you read in news that “AI helps cure cancer”, it makes it sound like a bunch of researchers just spent a few minutes engineering the right prompt for Copilot.

That’s why, yes a specifically-designed and finely tuned ML program can now beat the best human chess player, but calling it “AI” and bundling it together with the latest Gemini or Claude iteration’s “reasoning capabilities” is intentionally misleading. That’s why articles like this one are needed. ML is a useful tool but far from the “super-human general intelligence” that is meant to replace half of human workers by the power of wishful prompting

bier@feddit.nl · 21 days ago

Yeah its like judging how great a fish is at climbing a tree. But it does show that it’s not real intelligence or reasoning

13igTyme@lemmy.world · 20 days ago

Don’t call my fish stupid.

Zenith@lemm.ee · 20 days ago

I forgot which airline it is but one of the onboard games in the back of a headrest TV was a game called “Beginners Chess” which was notoriously difficult to beat so it was tested against other chess engines and it ranked in like the top five most powerful chess engines ever

Furbag@lemmy.world · 21 days ago

Can ChatGPT actually play chess now? Last I checked, it couldn’t remember more than 5 moves of history so it wouldn’t be able to see the true board state and would make illegal moves, take it’s own pieces, materialize pieces out of thin air, etc.

skisnow@lemmy.ca · edit-2 20 days ago

It can’t, but that didn’t stop a bunch of gushing articles a while back about how it had an ELO of 2400 and other such nonsense. Turns out you could get it to have an ELO of 2400 under a very very specific set of circumstances, that include correcting it every time it hallucinated pieces or attempted to make illegal moves.

Robust Mirror@aussie.zone · edit-2 20 days ago

It could always play it if you reminded it of the board state every move. Not well, but at least generally legally. And while I know elites can play chess blind, the average person can’t, so it was always kind of harsh to hold it to that standard and criticise it not being able to remember more than 5 moves when most people can’t do that themselves.

Besides that, it was never designed to play chess. It would be like insulting Watson the Jeopardy bot for losing against the Atari chess bot, it’s not what it was designed to do.

Pamasich@kbin.earth · 20 days ago

There are custom GPTs which claim to play at a stockfish level or be literally stockfish under the hood (I assume the former is still the latter just not explicitly). Haven’t tested them, but if they work, I’d say yes. An LLM itself will never be able to play chess or do anything similar, unless they outsource that task to another tool that can. And there seem to be GPTs that do exactly that.

As for why we need ChatGPT then when the result comes from Stockfish anyway, it’s for the natural language prompts and responses.

bountygiver [any]@lemmy.ml · 20 days ago

and still lose to stockfish even after conjuring 3 queens out of thin air lol

ToastedRavioli@midwest.social · 21 days ago

ChatGPT must adhere honorably to the rules that its making up on the spot. Thats Dallas

nednobbins@lemm.ee · 20 days ago

Sometimes it seems like most of these AI articles are written by AIs with bad prompts.

Human journalists would hopefully do a little research. A quick search would reveal that researches have been publishing about this for over a year so there’s no need to sensationalize it. Perhaps the human journalist could have spent a little time talking about why LLMs are bad at chess and how researchers are approaching the problem.

LLMs on the other hand, are very good at producing clickbait articles with low information content.

Lovable Sidekick@lemmy.world · edit-2 20 days ago

In this case it’s not even bad prompts, it’s a problem domain ChatGPT wasn’t designed to be good at. It’s like saying modern medicine is clearly bullshit because a doctor loses a basketball game.

nednobbins@lemm.ee · 20 days ago

I imagine the “author” did something like, “Search http://google.scholar.com/ find a publication where AI failed at something and write a paragraph about it.”

It’s not even as bad as the article claims.

Atari isn’t great at chess. https://chess.stackexchange.com/questions/24952/how-strong-is-each-level-of-atari-2600s-video-chess
Random LLMs were nearly as good 2 years ago. https://lmsys.org/blog/2023-05-03-arena/
LLMs that are actually trained for chess have done much better. https://arxiv.org/abs/2501.17186

Lovable Sidekick@lemmy.world · 20 days ago

Wouldn’t surprise me if an LLM trained on records of chess moves made good chess moves. I just wouldn’t expect the deployed version of ChatGPT to generate coherent chess moves based on the general text it’s been trained on.

nednobbins@lemm.ee · 20 days ago

I wouldn’t either but that’s exactly what lmsys.org found.

That blog post had ratings between 858 and 1169. Those are slightly higher than the average rating of human users on popular chess sites. Their latest leaderboard shows them doing even better.

https://lmarena.ai/leaderboard has one of the Gemini models with a rating of 1470. That’s pretty good.

nova_ad_vitum@lemmy.ca · 20 days ago

Gotham chess has a video of making chatgpt play chess against stockfish. Spoiler: chatgpt does not do well. It plays okay for a few moves but then the moment it gets in trouble it straight up cheats. Telling it to follow the rules of chess doesn’t help.

This sort of gets to the heart of LLM-based “AI”. That one example to me really shows that there’s no actual reasoning happening inside. It’s producing answers that statistically look like answers that might be given based on that input.

For some things it even works. But calling this intelligence is dubious at best.

JacksonLamb@lemmy.world · 20 days ago

ChatGPT versus Deepseek is hilarious. They both cheat like crazy and then one side jedi mind tricks the winner into losing.

Schadrach@lemmy.sdf.org · 20 days ago

So they are both masters of troll chess then?

See: King of the Bridge

Ultraviolet@lemmy.world · edit-2 20 days ago

Because it doesn’t have any understanding of the rules of chess or even an internal model of the game state, it just has the text of chess games in its training data and can reproduce the notation, but nothing to prevent it from making illegal moves, trying to move or capture pieces that don’t exist, incorrectly declaring check/checkmate, or any number of nonsensical things.

interdimensionalmeme@lemmy.ml · 20 days ago

I think the biggest problem is it’s very low ability to “test time adaptability”. Even when combined with a reasonning model outputting into its context, the weights do not learn out of the immediate context.

I think the solution might be to train a LoRa overlay on the fly against the weights and run inference with that AND the unmodified weights and then have an overseer model self evaluate and recompose the raw outputs.

Like humans are way better at answering stuff when it’s a collaboration of more than one person. I suspect the same is true of LLMs.

Noodle07@lemmy.world · 20 days ago

Hallucinating 100% of the time 👌