Sunrise:
Sunset:
°C
Follow Us

Do you ask symptoms to the AI? Study detects medical errors in more than 20% of responses

A Penn State study found that AI answers medical queries with 76.2% accuracy, but fails in more than 20% of cases

Do you ask symptoms to the AI Study detects medical errors in more than 20 of responses
Time to Read 5 Min

More and more people are asking artificial intelligence what a pain, a spot on the skin, a fever that doesn't go down, or a symptom they don't understand can mean. It's fast, convenient and often sounds convincing. Therein lies the problem.

A new study led by researchers at Penn State's College of Health and Human Development (HHD) draws a clear line: Chatbots may get many health consultations right, but they still get too many wrong to use as a replacement for a doctor's consultation.

According to the analysis, the responses generated by language models had an overall accuracy of 76.2%. But the error rate exceeded 20%, about double that seen in human doctors, according to the researchers.

What the study analyzed

The work sought to measure how an ordinary person uses artificial intelligence when they have a medical question. It was not an idealized exam or questions designed only for specialists, but rather queries similar to those that any user could write on the Internet.

The researchers organized a competition called Diagnose-a-thon at Penn State. 34 people participated, including professors, employees, and undergraduate and graduate students. In total, they presented 212 AI-generated questions and answers about real and imagined health problems.

Participants could choose between four models: ChatGPT-4o, ChatGPT-3.5, Gemini-1.5 Pro and Llama3-8b. Nine certified doctors then evaluated the accuracy of the answers and the potential harm they could cause, using a six-point scale.

The AI ​​got it right enough, but not enough

The result has two readings. The first is that AI can already answer many basic medical questions quite accurately. The second, less comfortable, is that “enough” is not enough in health.

The researchers found that 76.2% of the AI-generated responses offered accurate information. The best areas were obstetrics and gynecology, and otorhinolaryngology, where the models had high validity scores and lower risk.

But performance fell in internal medicine, neurology and dermatology. In those areas, the models showed lower validity and a greater possibility of harmful responses. This matters because many real consultations start right there: persistent pain, dizziness, neurological symptoms, skin lesions or discomfort that is difficult to describe.

Why the error can be serious

An AI medical error does not always mean an absurd or easy-to-detect phrase. Sometimes the problem is more subtle: an answer that sounds safe, seems reasonable, and leaves out important information.

It can minimize a symptom that requires urgent attention. It may suggest the wrong cause. You may give an incomplete recommendation. Or you can mix correct information with other that does not correspond to the case.

That is one of the most difficult risks for the common user: they do not always know how to distinguish which part of the response is useful and which part can lead them astray.

AI works best in the hands of doctors

The researchers do not propose that AI is useless. On the contrary: they see real opportunities to improve health care. But the point is who uses it and with what criteria. The study suggests that these tools may be more useful for health professionals than for patients looking to diagnose their own symptoms.

A doctor can detect an incomplete response, ask for more data, correct a hypothesis, or decide when a symptom requires immediate attention. A patient, on the other hand, may take a convincing response as if it were a medical indication.

You can see: When emotional discomfort is reflected in our body, even if there is no longer a physical illness of origin

Which questions gave the best results?

The team observed that very specific questions and queries between 60 and 250 characters tended to generate more accurate answers. That does not mean that it is advisable to use AI to self-diagnose. But it does leave a useful clue: vague questions often increase the risk of poor answers.

Writing “my head hurts” is not the same as explaining the age, duration of the pain, associated symptoms, history, and whether there was fever, bumps, vomiting, blurred vision, or loss of strength.

Even so, in the event of intense, persistent or new symptoms, the recommendation does not change: you must consult a professional.

What a patient can do with AI without putting themselves at risk

AI can be used to organize ideas before a consultation. It can help you prepare questions for the doctor, understand terms in a report, or make a list of symptoms so you don't forget anything.

It can also be useful to ask for general explanations: what a test means, what questions to ask at an appointment, what warning signs to watch for, or how to prepare for a study.

What it should not do is replace a clinical evaluation. The AI ​​does not examine the patient, does not listen to the heart, does not palpate the abdomen, does not see a real injury in context and does not know the entire medical history unless the user tells it well.

What other studies say

Penn State's work is not the only one calling for caution. A study published in BMJ Open analyzed five popular chatbots and found that half of the answers to evidence-based health questions were “somewhat” or “very” problematic. Researchers warned that deploying these systems without public education or oversight can amplify medical misinformation.

Another study led by the University of Oxford and published in Nature Medicine concluded that language models did not improve the decisions of ordinary users compared to traditional methods, such as online searches or their own judgment. The problem, according to the researchers, is that the models can give inconsistent answers and users do not always know what information they need to provide to receive useful guidance.

The uncomfortable conclusion

Artificial intelligence is already part of everyday life, also in health. People use it and will continue to use it. Denying it doesn't work. But treating her like a 24-hour doctor is something else. The study shows that it can guide, but it can also fail in moments where error weighs more than in any other issue.

This news has been tken from authentic news syndicates and agencies and only the wordings has been changed keeping the menaing intact. We have not done personal research yet and do not guarantee the complete genuinity and request you to verify from other sources too.

Also Read This:




Share This:


About | Terms of use | Privacy Policy | Cookie Policy