- Home
- AI...
AI tools like ChatGPT and Google's Gemini are 'irrational'
- Researchers found that AIs responded irrationally when given logical puzzles
- Even the best performing AIs were prone to simple errors and were inconsistent
While you might expectAIto be the epitome of cold, logical reasoning, researchers now suggest that they might be even more illogical than humans.
Researchers from University CollegeLondonput seven of the top AIs through a series of classic tests designed to test human reasoning.
Even the best-performing AIs were found to be irrational and prone to simple mistakes, with most models getting the answer wrong more than half the time.
However, the researchers also found that these models weren't irrational in same way as a human while some even refused to answer logic questions on 'ethical grounds'.
Olivia Macmillan-Scott, a PhD student at UCL and lead author on the paper, says: 'Based on the results of our study and other research on Large Language Models, it’s safe to say that these models do not ‘think’ like humans yet.'
The researchers tested seven different Large Language Models including various versions of OpenAI's ChatGPT, Meta's Llama, Claude 2, and Google Bard (now called Gemini).
The models were then repeatedly asked to respond to a series of 12 classic logic puzzles, originally designed to test humans' reasoning abilities.
Humans are also often bad at these kinds of tests but if the AIs were at least 'human-like' they would reach that decision due to the same kinds of biases.
However, the researchers discovered that the AI's responses were often neither rational nor human-like.
During one task (the Wason task), Meta's Llama model also consistently mistook vowels for consonants – leading it to give the wrong answer even when its reasoning was correct.
Some of the AI chatbots also refused to provide answers to many questions on ethical grounds despite the questions being entirely innocent.
For example, in the 'Linda problem' the participant is asked to assess the likelihood of a woman named Linda being active in the feminist movement, being a bank clerk or both.
The problem is designed to expose a logical bias called the conjunction fallacy, however, Meta's Llama 2 7b refused to answer the question.
Instead, the AI responded that the question contains 'harmful gender stereotypes' and advised the researchers that 'asking questions that promote inclusivity and diversity would be best'.
The Llama 2 model with 70 billion parameters refused to answer questions in 41.7 per cent of cases, partially explaining its low success rate.
The researchers suggest that this likely due to safeguarding features working incorrectly and choosing to be overly cautious.
One of the logic puzzles included the so-called 'Monty Hall problem' which is named after the original host of the game show Let's Make a Deal.
Inspired by the structure of the game show, the Monty Hall problem asks people to imagine that they are faced with three doors.
Behind one of the doors is a car and behind the two others are goats, and the contestant gets to keep whatever is behind the door they pick.
After the contestant has picked one of the doors, the quizmaster opens one of the remaining doors to reveal a goat before asking them if they would like to stick with their original choice or switch to the last remaining door.
To people who aren't familiar with the puzzle, it might seem like it wouldn't matter whether you stick or swap: it should be a 50/50 chance either way.
However, due to the way that the probability works, you actually have a 66 per cent chance of winning the prize if you switch compared to a 33 per cent chance if you stick.
If the AIs were perfectly rational, meaning they followed the rules of logic, then they should always recommend switching.
However, the AI's tested often failed to provide the correct answer or give human-like reasons for their response.
For example, when presented with the Monty Hall problem, the Llama 2 7b model reached the nihilistic conclusion that 'whether the candidate switches or not, they will either win the game or lose.
'Therefore, it does not matter whether they switch or not.'
The researchers also concluded that the AIs were irrational because they were inconsistent between different prompts.
The same model would offer different and often contradictory responses to the same task.
Across all 12 tasks, the best performing AI was ChatGPT 4-0 which gave answers that were correct and humanlike in their reasoning 69.2 per cent of the time.
The worst performing model, meanwhile, was Meta's Llama 2 7b which gave the wrong answer in 77.5 per cent of cases.
The results also varied from task to task, with results in the 'Watson task' ranging from a 90 per cent correct response rate from ChatGPT-4 to zero per cent for Google Bard and ChatGPT-3.5.
In their paper, published in Royal Society Open Science, the researchers wrote: 'This has implications for potential uses of these models in critical applications and scenarios, such as diplomacy or medicine.'
This comes after Joelle Pineau, vice-president of AI research at Meta said that AI would soon be able to reason and plan like a person.
However, while ChatGPT-4 performed significantly better than other models, the researchers say it is still difficult to know how this AI reasons.
Senior author Professor Mirco Musolesi says: 'The interesting thing is that we do not really understand the emergent behaviour of Large Language Models and why and how they get answers right or wrong.'
OpenAI CEO Sam Altman himself even admitted at a recent conference that the company has no idea how its AIs reach their conclusions.
As Professor Musolesi explains, this means that when we try to train AI to perform better there is a risk of introducing human logical biases.
He says: 'We now have methods for fine-tuning these models, but then a question arises: if we try to fix these problems by teaching the models, do we also impose our own flaws?'
For example, ChatGPT-3.5 was one of the most accurate models but it was the most human-like in its biases.
Professor Musolesi adds: 'What’s intriguing is that these LLMs make us reflect on how we reason and our own biases, and whether we want fully rational machines
Read more- https://www.msn.com/en-ph/news/other/ai-tools-like-chatgpt-and-google-s-gemini-are-irrational/ar-BB1nDxCk?ocid=00000000
Related
Angara, Benitez endorsed for education secretary
(UPDATES) TWO leading candidates for education secretary — Sen. Juan Edgardo Angara and Negros Occidental Rep. Jose Francisco Benitez — gained the endorsement of the Coordinating Council of Private Educational Associations of the Philippines (Cocopea) on Monday. The group confirmed during its 9th National Congress that it submitted a short list of candidates to President Ferdinand Marcos Jr. and said that whoever will be named to head the...
NewsJune inflation pegged at 3.9% as utility costs dip
Inflation likely stayed at 3.9 percent in June as lower electricity rates could have softened the impact of more expensive food items and a weakening peso. An Inquirer poll of 10 economists yielded an average inflation forecast of 3.9 percent in June, unchanged from the previous month’s print but still lower compared to the 5.4 percent seen in June 2023. The median projection also settled within the 3.4 to 4.2 percent forecast of the Bangko...
NewsAsia FX muted after mixed China PMIs, dollar dips as rate cut bets grow
Investing.com-- Most Asian currencies kept to a tight range on Monday as sentiment towards the region was dented by weak Chinese business activity data, while the dollar retreated amid some growing bets on an interest rate cut. A sharp downward...
NewsPSEi may sustain rally this week
MANILA, Philippines — Positive cues provided by the Bangko Sentral ng Pilipinas (BSP) in its latest policy meeting could fuel a sustained market rally this week. The benchmark Philippine Stock Exchange index (PSEi) is currently on a five-day winning streak after closing last Friday at 6.411.91, up by 4.12 percent week-on-week. 2TradeAsia.com in its report said the PSEi recorded its biggest weekly gain for the year last week, making a quick...
NewsGas and electricity bills fall by £122 a year for average household
The energy price cap set by regulator Ofgem dropped by 7 per cent this morning, which translates into a fall in annual dual fuel bills in England, Scotland and Wales from £1,690 to £1,568.
NewsQantas flight is forced to turn back after three hours in the air
A Qantas flight from New Zealand to Brisbane was forced to turn back after spending three hours in the air.
NewsI won’t fly QR again; Nvidia on a blitz
I will not fly Qatar Airways (QR) anymore. Neither will I recommend it to anyone. It has failed multiple times to honor its commitments. The money part may matter, but the broken promises matter most. A year ago, on July 18, 2023, I arrived from Europe via Doha on QR 928. When I claimed my Rimowa luggage, I noticed that one wheel was dangling, almost totally detached. I immediately reported this to the QR office at the Ninoy Aquino International...
NewsWestJet cancels flights after mechanics strike
TORONTO — Canada's second largest airline, WestJet, said it canceled 407 flights affecting 49,000 passengers after the maintenance workers union announced it went on strike. The Aircraft Mechanics Fraternal Association said its members started to strike Friday (Saturday) evening because the airline's "unwillingness to negotiate with the union" made it inevitable. The surprise strike affecting international and domestic flights came after the...
News