Generative AI is having a moment. Peter Lee, who oversees Microsoft’s approach to health care and co-authored the 2023 book “The AI Revolution in Medicine: GPT-4 and Beyond,” calls it “the most transformative tool ever developed in all aspects of health care and medicine.”
Microsoft has been collaborating with the organization OpenAI for several years, so when OpenAI released ChatGPT in November 2022, Lee quickly realized there was a need to educate the medical world on the technology.
advertisement
“I started to get emails from doctor friends of mine around the world, more or less all saying the same thing: ‘Wow, Peter, this is amazing stuff and I’m using it in my clinic for this purpose,’” Lee told STAT. “That was frightening. … That started us down the long road of deeply investigating the benefits, as well as the limitations, of these models in the medical domain, as well as writing educational material, including a whole book written explicitly for doctors and nurses.”
Lee — one of the 50 influential people named to the 2024 STATUS List — sat down with STAT to discuss his thoughts on how the technology can help physicians with tasks like after-visit summaries, as well as concerns about how it may perpetuate bias.
What do you find beautiful about computer science?
advertisement
So much of my headspace, and the headspace of a lot of computer scientists, is on generative AI. That’s due to our collaboration with OpenAI. You have this big AI model that’s been trained to do exactly one thing, which is predict the next word in a conversation. That’s all that it has been optimized for. It [has] literally not been optimized for anything else.
Like ChatGPT in this case?
Yep! And yet, to do that really, really well, the machine learning system has had to discover how to do arithmetic. If you want to pick the perfect next word in the conversation, if someone says, “2+2 = ‘blank,’” the best way to answer that question is to actually discover how to do addition. Or to predict the next word in a conversation where the last sentence is “… and the killer is ‘blank.’” To be able to solve a problem like that, you have to be able to read an entire murder mystery and do all the deductive logic to do that simple thing in next word prediction.
What’s so beautiful is that we think of next word prediction as just being this trivial thing, sort of like the autocomplete on your iPhone. But when you do it at this gigantic, unimaginable scale, the machine learning system has had to self-discover how to do all this thinking, and there’s just something that I find so incredibly beautiful about that. And that even then maps to medicine and health care.
How so?
Let’s have a conversation: You’re my patient. I’m a doctor. We’re talking back and forth. I signed you up for labs, and I get a lab report and then the last sentence in the conversation is, “… and the diagnosis is ‘blank.’” To optimally predict that next word in the conversation implies that the machine learning has to learn medicine. Isn’t that wild? I think it’s the most surprising, astounding, and beautiful thing in the world today.
What are your big concerns with generative AI being used in the medical field?
The main thing I’ve been trying to teach doctors and nurses is that if your mental model of a computer is that it’s a machine that does perfect memory recall and perfect calculation — then the most important thing to understand about generative AI is that it’s not a computer. It’s a reasoning engine or a thinking machine, but it also has some of the same limitations as a human brain.
If you ask it to regurgitate something by rote, it might hallucinate because it can’t remember it perfectly. If you ask it to do a big, complicated math problem, it might get it wrong in the same way that a human can. At the same time, it can opine in incredibly sophisticated ways about connections between concepts.
Sticking with this theme of finding beauty in what seems complex, what is so beautiful about GPT-4 and its applications in medicine?
Maybe two things. The most surprising one has been GPT-4’s ability to grasp what psychologists call “theory of mind.”
One of Epic’s uses of GPT-4 is to help doctors write “after-visit summaries” to their patients. So you’re my patient, you come to see me, and then you get sent home. Then I have to send you an email with instructions on how to take care of yourself after I’ve treated you. After-visit summaries are a pain in the ass for doctors to write because they have to get them right. They have to access four or five parts of the electronic health record to make sure that they’re getting the prescriptions and the other things to do at home. The chance of a malpractice suit if they get this wrong is extremely high.
Epic has integrated GPT-4 to go grab all that information and then draft a note for the doctor to review before sending it out. In the early tests, these are controlled clinical studies, patients are rating the GPT-4-written emails as more human than the ones written by doctors. It’s not the case that an AI is more human than a doctor. It’s just the opposite, obviously. But the AI has the tireless ability to pick out [personal information from] all of the health records and all the transcripts of the conversations [and then] put in that extra little line like, “Congratulations on just becoming a grandparent,” or, “Best wishes on your daughter’s wedding in Maine next month.” Those extra little touches where AI adds that personal touch actually makes a meaningful difference in the patient experience.
The second thing is the general intelligence of GPT-4. We find when it’s trying to make a medical diagnosis, it’s able to triangulate between multiple specialties. So if you come to me feeling anemic, it could be a problem for an endocrinologist, or a cardiologist, or a nephrologist, or for a psychologist. Depending on what doctor you go to or what specialist your primary care physician refers you to, you’re going to get different diagnoses. GPT-4 is able to look at your condition, your labs, and your initial presentation from all those perspectives at once. What we see consistently is that by doing that, it’s coming to more well-rounded assessments.
That’s fascinating. So can generative AI help human doctors be more human?
It’s such a good question. We talk about humans prompting the AI, but there are times when the AI can prompt the human to just take a step back and reflect, just for a moment longer, about a potentially difficult situation. Put yourself in the shoes of the doctor. [Epic using GPT-4] proposes a note that puts at the end, “Congratulations on your son’s high school basketball team winning the championship!” It actually causes the doctor reading that draft, maybe for just three seconds longer, to reflect on the life of that patient. It’s what I’ve been calling a “reverse prompt.”
It sounds beautiful and creepy and wonderful and disturbing all at the same time.
As we’re marching toward this future where generative AI is in doctors’ offices and operating rooms, how do we deal with concerns over privacy and bias?
Privacy, I don’t think is an issue. The OpenAI services on Microsoft’s Azure cloud provide HIPAA compliance. We’re very proud of that but our cloud is not unique. AWS and Google Cloud provide the same sort of compliance to enterprise customers of those clouds. That’s different from the consumer space. If you’re using a consumer product like Google Search or ChatGPT the privacy guarantees aren’t as strict. But if you’re a healthcare organization that subscribes to Microsoft Azure, you get HIPAA compliance.
Bias, that’s a serious issue and a potentially devastating one because these models are, in my view, hopelessly biased. They’re hopelessly biased because they learn from us and they are hopelessly biased in the same way that human beings are hopelessly biased. There are computer scientists, colleagues of mine, who believe that we can fix these things, but I don’t believe it. The thing to understand is that while they are hopelessly biased, these AI systems also understand the concept of bias and why it’s bad.
One of the things we find in our research is that if you describe a situation that involves a decision about a patient and you give it to GPT-4 and ask it to check for biases, it can outperform human beings in identifying biased decision-making. In an experiment with the New England Journal of Medicine, we were having GPT-4 read submitted manuscripts. Consistently, GPT-4 was able to spot non-inclusive language and bias in the manuscripts that escaped the attention of the human reviewers. There’s a potential here that generative AI can be one of the most powerful tools in combating bias, even though those tools themselves will be prone to making biased decisions in the same way that humans [are prone to].
I would not trust an AI system on its own to make decisions on whether my health insurance claim should get reimbursed or not. I think a human being should be on the line to do that. But I would like a generative AI system to be a second set of eyes to check whether that human being that’s deciding my insurance is being biased against me or not, because I think the AI system can be really good at that.