Teres is a critical care physician and clinical instructor in public health and community medicine. Oppenlander is an associate professor of statistics and bioethics. Strosberg is a professor of healthcare policy and bioethics.
Science is like baseball. To score a run — to convincingly show that a drug or intervention is safe and effective — all the bases have to be rounded. In the case of science, the bases frequently include, in stepwise progression, studies in test tubes (in vitro studies), animal studies, randomized double-blind placebo-controlled trials, publication in peer-reviewed journals, meta-analyses of all high-quality randomized trials, FDA approval, real world studies, and consensus guidelines produced by professional societies. Like baseball, the great majority of the interventions never get to first base. And there will be close plays at home plate, where many will vehemently disagree with the umpire’s call. So too with science.
But unlike baseball, science is always open to challenge and revision, at any base, in light of new and more definitive evidence. Take for example the emergency use authorizations (EUAs) from the FDA in times of public health emergencies. While the medications and treatments are rigorously monitored and evaluated, they may not always turn out to do what they were believed to do. One instance of this? Hydroxychloroquine: after receiving an EUA to treat COVID, it was found to have no benefit, and the EUA was rescinded.
One drug that never got past first base was ivermectin, which was touted to be both preventive and a treatment for COVID on an outpatient and inpatient basis. At the peak of COVID, over 84,000 prescriptions were filled at pharmacies in just one week for this drug that was thoroughly debunked through extensive evaluation. However, because the drug is FDA-approved, physicians can prescribe it off-label.
Advocates for the use of ivermectin, including physician groups such as Front Line COVID-19 Critical Care Alliance, continue to advance their claims using their websites and social media. Many of the physicians in these organizations are well-credentialed and reference randomized clinical trials, observational studies, meta-analyses and other studies to support their claims.
Science is of course a complicated and often slow-moving process. The results are difficult to communicate to the public. Risk-benefit arguments based on an evolving understanding of a disease and its treatments are not easy to comprehend. The main message can be drowned out by one eye-catching headline that has gone viral or overwhelmed by misinformation or disinformation. A recent JAMA Network article identified 52 physicians who promoted COVID misinformation on various social media platforms with a reach of millions of followers.
People who do their own research on the web may find slick patient guidelines on prevention and treatment of COVID that includes zinc and other supplements, oral and nasal rinses, and melatonin for restful sleep. Yet, they fail to recommend proven, FDA-approved treatments like nirmatrelvir-ritonavir (Paxlovid) and vaccines. Furthermore, their guidelines are often exactly the same for influenza, yet they fail to recommend oseltamivir (Tamiflu) and vaccines. Medical guidelines from respected hospital systems and traditional information sources often cannot compete.
The Challenge
Where can a confused or perplexed citizen find an umpire to help make the call? Could it be AI? This past year, open-source ChatGPT burst upon the world scene, readily available to the general public. It was startling to see how quickly it was adopted and put to so many uses.
As AI becomes increasingly accessible, can it make a difference to people who have an open mind and are really searching for answers? Can AI serve as a source of credible information on the effectiveness of treatments, information that is often not available on mainstream medical websites, and if it is available, it is not trusted by a certain segment of the population?
The Promise of AI
We asked open-source ChatGPT (version 3.5) about its capability to answer questions about the quality of the research that the Front Line COVID-19 Critical Care Alliance used to make their claims for the effectiveness of ivermectin.
In response, ChatGPT (3.5) mentioned the following capabilities:
Sentiment Analysis: AI can analyze the sentiment expressed in research papers, press releases, or statements made by the organization. This analysis can help determine if there is a bias in how the information is presented.
Fact-Checking: AI can assist in fact-checking claims made in research papers or public statements against existing scientific literature and reliable sources. It can help identify discrepancies or inaccuracies in the information presented.
Data Analysis: AI can assess the quality and integrity of the data used in research studies. It can flag potential issues such as small sample sizes, flawed study designs, or data manipulation.
Publication Bias: AI can analyze whether there is a publication bias in the research cited. It can check if negative studies on ivermectin’s effectiveness are ignored or if only positive results are highlighted.
Peer Review Analysis: AI can examine whether the research papers have undergone a rigorous peer-review process, which is an important indicator of research quality.
Not Ready for the Major Leagues
With regard to claims for effectiveness of ivermectin against COVID, we experimented with several freely available AI chatbots (computer applications that use large language models trained on vast amounts of data and interact with humans through natural language), including ChatGPT (versions 3.5 and 4.0), Bing Chat, and Google Bard. Bing Chat and Google Bard generally performed better than ChatGPT since they can access all information currently available on the Internet, including fact-checking sites such as Snopes.com. ChatGPT is only current to January 2022.
It is our assessment that the technology is not yet ready to help the general public sort out complicated or controversial healthcare questions. We found that different applications returned different results for the same questions, with many results being simplistic or inconclusive. In some instances, the data upon which the application is based was not up to date and some of the references were not verifiable.
With regard to queries about the reliability and quality of specific studies cited by the Front Line COVID-19 Critical Care Alliance, we found that, although AI in some cases provided an accurate summary of an article, the interpretation of the results was often ambiguous, leaving it to the user to struggle with difficult technical concepts such as risk-benefit and study design.
In short, we believe AI cannot yet satisfactorily deliver the goods based on the capabilities asserted above. At this time, its use by the general public to referee even seemingly straightforward questions like the effectiveness of ivermectin for COVID is quite limited.
No doubt, AI will make progress in this area. Of course, even if (when) AI is ultimately successful in fulfilling its promise, the next question is what percentage of the population would be willing to accept AI’s evaluation as convincing evidence. To be continued.
Daniel Teres, MD, is a critical care physician and clinical instructor in public health and community medicine at Tufts University School of Medicine in Boston. Jane Oppenlander, PhD, is associate professor and chair in the Bioethics Department at Clarkson University in Schenectady, New York. Martin A. Strosberg, MPH, PhD, is emeritus professor of healthcare policy and bioethics at Union College and Clarkson University.
Please enable JavaScript to view the