Do you ask symptoms to the AI? Study detects medical errors in more than 20% of responses

A Penn State study found that AI answers medical queries with 76.2% accuracy, but fails in more than 20% of cases

News Desk
Thu, Jun 11 2026 09:24:45 UTC

Time to Read 5 Min

More and more people are asking artificial intelligence what a pain, a spot on the skin, a fever that doesn't go down, or a symptom they don't understand can mean. It's fast, convenient and often sounds convincing. Therein lies the problem.

A new study led by researchers at Penn State's College of Health and Human Development (HHD) draws a clear line: Chatbots may get many health consultations right, but they still get too many wrong to use as a replacement for a doctor's consultation.

According to the analysis, the responses generated by language models had an overall accuracy of 76.2%. But the error rate exceeded 20%, about double that seen in human doctors, according to the researchers.

What the study analyzed

The work sought to measure how an ordinary person uses artificial intelligence when they have a medical question. It was not an idealized exam or questions designed only for specialists, but rather queries similar to those that any user could write on the Internet.

The researchers organized a competition called Diagnose-a-thon at Penn State. 34 people participated, including professors, employees, and undergraduate and graduate students. In total, they presented 212 AI-generated questions and answers about real and imagined health problems.

Participants could choose between four models: ChatGPT-4o, ChatGPT-3.5, Gemini-1.5 Pro and Llama3-8b. Nine certified doctors then evaluated the accuracy of the answers and the potential harm they could cause, using a six-point scale.

The AI got it right enough, but not enough

The result has two readings. The first is that AI can already answer many basic medical questions quite accurately. The second, less comfortable, is that “enough” is not enough in health.

The researchers found that 76.2% of the AI-generated responses offered accurate information. The best areas were obstetrics and gynecology, and otorhinolaryngology, where the models had high validity scores and lower risk.

But performance fell in internal medicine, neurology and dermatology. In those areas, the models showed lower validity and a greater possibility of harmful responses. This matters because many real consultations start right there: persistent pain, dizziness, neurological symptoms, skin lesions or discomfort that is difficult to describe.

Why the error can be serious

An AI medical error does not always mean an absurd or easy-to-detect phrase. Sometimes the problem is more subtle: an answer that sounds safe, seems reasonable, and leaves out important information.

It can minimize a symptom that requires urgent attention. It may suggest the wrong cause. You may give an incomplete recommendation. Or you can mix correct information with other that does not correspond to the case.

That is one of the most difficult risks for the common user: they do not always know how to distinguish which part of the response is useful and which part can lead them astray.

AI works best in the hands of doctors

The researchers do not propose that AI is useless. On the contrary: they see real opportunities to improve health care. But the point is who uses it and with what criteria. The study suggests that these tools may be more useful for health professionals than for patients looking to diagnose their own symptoms.

A doctor can detect an incomplete response, ask for more data, correct a hypothesis, or decide when a symptom requires immediate attention. A patient, on the other hand, may take a convincing response as if it were a medical indication.

You can see: When emotional discomfort is reflected in our body, even if there is no longer a physical illness of origin

Which questions gave the best results?

The team observed that very specific questions and queries between 60 and 250 characters tended to generate more accurate answers. That does not mean that it is advisable to use AI to self-diagnose. But it does leave a useful clue: vague questions often increase the risk of poor answers.

Writing “my head hurts” is not the same as explaining the age, duration of the pain, associated symptoms, history, and whether there was fever, bumps, vomiting, blurred vision, or loss of strength.

Even so, in the event of intense, persistent or new symptoms, the recommendation does not change: you must consult a professional.

What a patient can do with AI without putting themselves at risk

AI can be used to organize ideas before a consultation. It can help you prepare questions for the doctor, understand terms in a report, or make a list of symptoms so you don't forget anything.

It can also be useful to ask for general explanations: what a test means, what questions to ask at an appointment, what warning signs to watch for, or how to prepare for a study.

What it should not do is replace a clinical evaluation. The AI does not examine the patient, does not listen to the heart, does not palpate the abdomen, does not see a real injury in context and does not know the entire medical history unless the user tells it well.

What other studies say

Penn State's work is not the only one calling for caution. A study published in BMJ Open analyzed five popular chatbots and found that half of the answers to evidence-based health questions were “somewhat” or “very” problematic. Researchers warned that deploying these systems without public education or oversight can amplify medical misinformation.

Another study led by the University of Oxford and published in Nature Medicine concluded that language models did not improve the decisions of ordinary users compared to traditional methods, such as online searches or their own judgment. The problem, according to the researchers, is that the models can give inconsistent answers and users do not always know what information they need to provide to receive useful guidance.

The uncomfortable conclusion

Artificial intelligence is already part of everyday life, also in health. People use it and will continue to use it. Denying it doesn't work. But treating her like a 24-hour doctor is something else. The study shows that it can guide, but it can also fail in moments where error weighs more than in any other issue.

This news has been tken from authentic news syndicates and agencies and only the wordings has been changed keeping the menaing intact. We have not done personal research yet and do not guarantee the complete genuinity and request you to verify from other sources too.

What the study analyzed

The AI ​​got it right enough, but not enough

Why the error can be serious

AI works best in the hands of doctors

Which questions gave the best results?

What a patient can do with AI without putting themselves at risk

What other studies say

The uncomfortable conclusion

Also Read This:

Trump's "smart wall" advances with AI, sensors and billions to protect the border

The key element you should check before renewing your phone (and it is not the camera)

Why the megapixels on your cell phone are the biggest marketing trap in the tech industry

How to know if my WhatsApp was cloned: 2 quick tricks

Samsung Galaxy S26: Its Possible Launch Price Leaked

xAI files suit against user for using Grok to generate explicit images of minors

Turki Al-Alshik willing to smooth things over between Dana White and Eddie Hearn

The end of the physical format on PlayStation already has a date

The 3 reasons why your phone's apps close unexpectedly

Step-by-step guide to installing Microsoft Office on your computer for free

Featured News

Bloatware: the reason your phone has been slow since day one

Nearly 150,000 ballots were not counted in California primaries

Nicol�s Otamendi shared his emotions when confirming his retirement from the Argentina national team

Google removes reviews of ICE detention centers and deletes immigrant testimonials

The coffee trick that can save your marriage: what it consists of

California reinforces the protection of students against immigration operations: what changes for families

California turned the tables and set the course: other states are already following in its footsteps

Colorectal cancer: the early signs that many people overlook

Bad weather forces United Airlines flight to land at Arizona military base

United investigates employee who threatened to report California passenger to ICE

Recent News

More serious than believed?: Cyclosporiasis outbreak spreads across 41 states and exceeds 4,000 cases

More serious than believed?: Cyclosporiasis outbreak spreads across 41 states and exceeds 4,000 cases

Families suffer economic damage; It's time to invest in the children

At least six people are injured in an armed attack in Chicago's Humboldt Park neighborhood

At least six people are injured in an armed attack in Chicago's Humboldt Park neighborhood

Richardson Hitchins debuted with a victory at the Zuffa Boxing event against the Mexican Salas

Richardson Hitchins debuted with a victory at the Zuffa Boxing event against the Mexican Salas

The impact of the Venezuelan earthquakes on a rescuer who worked 18 days in La Guaira

Edgar Berlanga recovered from a knockdown to knock out Steven Butler in seven rounds

New USCIS rule could allow judges to decide asylum or deportation cases faster

The AI got it right enough, but not enough