Generative AI and Healthcare: Pragmatic Considerations for Proof of Concept Frameworks

In pure technology terms, there is no better example of an idealized solution in search of problems than AI in healthcare.  Does it have great promise?  Absolutely.  Will it move from pilot to production-ready to solve a broad range of critical industry and organizational level problems rapidly, even this year? Unlikely. But hype cycles usually outpace reality when it comes to usability and the early adoption of highly disruptive technologies. The promise of AI in the healthcare space is no exception. From the perspective of a C-level technology executive in a managed care organization, or an integrated healthcare delivery system, let’s look at a few pragmatic and well-grounded considerations through an AI proof of concept (PoC) or demonstration project lens. 

Whether an AI platform or service developed by a digital health or SaaS company, or a new enterprise offering developed corporately or institutionally, the same issues apply when it comes to POCs or pilot demonstrations.  Given the sensational aura that AI, especially generative AI, has garnered this year in practically every board room or strategy retreat, many C-level technology executives have been tasked to answer the “can you demonstrate use cases for AI with value proposition for our organization” question. Realistically, seasoned healthcare technology leaders are not so anxious to lead from a bleeding edge, and AI introduces several unknowns. Re-centering the hype to highlight evidence of risk/benefit is an outcome best achieved in a controlled and stage-gated sandbox.  

A well-constructed PoC is essentially an existential demonstration of a new technology’s capability to solve real problems, with real data. It is an organizationally contextualized showcase of functionality and value proposition, typically driven by one or more well-defined use cases. However, for a novel technology characterized by low trust, high disruption, and broad applicability, producing scripted outcomes in a demo that sources mock data is a meaningless exercise. There is no real test of variability. In fact, most healthcare enterprises are opting to demonstrate baseline generic AI functioning and performance first, to establish trust, before applying clinical or operational trials.  For purposes of clarification, let’s use the term AI collectively here to exclusively represent generative AI, which creates new data from its training model, based on its ability to learn, including retrieval augmented frameworks. 

Ultimately, healthcare leadership will be looking to answer several over-arching thematic questions early in any AI trial.  Evidence that the application consistently solves clearly identified problems or pain points and that it reliably produces the outcomes intended with high validity are critically necessary, but not sufficient, conditions of use.  Also, does it produce a quantifiable top line and/or bottom-line impact, and can it embed seamlessly in our workflows? Even more importantly, would enterprise use increase or decrease our potential liabilities, including physician, clinical, or regulatory compliance exposure? These unvalidated ambiguities represent potentially significant barriers to AI adoption, and early anticipation of these issues can go a long way toward the effectiveness of a phased PoC with clear success criteria.

Very often, executive and board leadership will prefer a more basic demonstration of AI capabilities, before turning attention to focused AI solutions that may drive care enablement and delivery, or to close diagnostic, real-time intelligence, or predictive gaps to improve outcomes. Early evidence of validity and reliability is an absolute pre-requisite to acceptance, as AI in healthcare has a low threshold for error.  Using a less conspicuous but representative use case with contextual (non-PHI) data to answer elementary questions, or perform core functions, is a reasonable place to start, perhaps chatbots for operational services or automation efficiencies. For example, with a prototype use case, in PoC terms, can we prove that a language model can solve a specific problem, without the use of patient or member data, or at least PHI de-identified data as defined by HIPAA safe harbor guidelines? Does the model consistently and reliably produce results within an acceptable range of error over time?  Formally addressing, or embedding, evidence of core capabilities that demonstrate validity, security, and usability in a use case driven AI trial can help to mitigate critical unknowns when presenting outcomes at a board level.

Regarding validity, the long-term organizational case for AI rests on capabilities for AI model evaluation and regression testing.  Keep in mind that demonstration of these capabilities is often as important as the general trial outcome itself.  While still not generalizable across independent use cases, it still shows the beginning of an enterprise methodology, and an evaluation capacity that can be repeatable across a variety of data sources and dimensions, with controls for model bias. Demonstrating validity is important for any PoC, but it also displays a core capability for further AI use, necessary for all applications of AI, whether internally derived or integrated via a third-party SaaS platform. Evaluation of LLMs is a complex process, and although there are similarities to traditional applications testing methodologies, it is a materially different capability for most healthcare organizations. QA and regression testing frameworks have historically attributed errors logically to structured code using predictive test case iterations. LLMs are much more dynamic,  generating algorithms with infinite inputs and outputs that change as they learn over time. This requires new frameworks and core capabilities that should not be altogether outsourced, given the risks and the scope of potential use of AI in healthcare.

Performance of AI and language models, as well as their development and practices for validation, will likely be introduced as novel components in evolving regulatory requirements in the near term, to ensure reliability, consistency, and validity without bias. Given the magnitude of HIPAA and PHI potential liabilities, the risk of getting ahead of the lagging regulatory response to AI is another perceived liability for senior healthcare technology leadership, and that represents another possible barrier for slow organizational adoption, especially for diagnostic or clinically significant use.  

Considering security, even proposed models that mask PHI data will require significant vetting by security boards and risk evaluation committees prior to PoC approval.  Realistically, this can take months of effort, depending on the scope of the trial, and the organization’s requirements.  Address these issues early in the planning of your PoC initiative. Also, be prepared for Hi Trust certification requirements, and consider incorporation of these components in AI framework and architecture, and in PoC artifacts well before the introduction of healthcare use cases. As AI operates outside of traditional application development boundaries, it also presents new risks. Due to its scale and volume, AI broadens the attack surface across its data pipeline, from retrieval to storage and transfer, especially when combined with sensitive protected data.  AI security is another core capability to be vetted early in PoC trials.  LLMs store and process massive volumes of data, making them prime targets for data breaches.  Collectively speaking, systematic monitoring and sanitization of LLM inputs, in addition to highly secure data and encryption protocols, is critically necessary. 

In terms of usability, whether fully automated or to leapfrog analytics self-service bottlenecks, experience mapping is critical, to understand how AI will be consumed and used organizationally.  Whether a human interface or not, there is an AI experience to navigate with significant implications for monitoring and governance at micro and macro levels. This exercise will frame initial thoughts about requirements for governance. Also, as with any application of a disruptive technology, the quality of outcomes is directly correlated with the quality of data, so AI in practice will expose data weaknesses, both structured and unstructured, and not mask them. While AI has the potential to revolutionize care delivery and essential services, it also introduces risks and challenges for day-to-day governance. This necessitates quality and compliance monitoring, and an ongoing awareness of how AI is assimilated and used for decision-making or action across the enterprise. 

Within the context of a defined problem, it is always a good practice to map pain points or process gaps that AI intends to solve from beginning to end, at a workflow level, to demonstrate and align the value proposition with outcomes. This experience mapping model is core to a framework for trust and usability across all use cases. The downside of AI hype is the rush to healthcare markets with a solution that presumptively solves all problems out of the box. Design thinking principles are out of order when solutions precedes the discipline of problem definition, resulting in unvalidated assumptions that increase risks (and costs) where AI and technology-enabled services are concerned. This is precisely why healthcare, among other industries, seems to struggle with clear AI value propositions, leaving many organizations to question why presumed solutions fail, or at least fail to produce intended outcomes. 

When channeled through a user-centered experience, AI can solve many organizational and industry-level problems in healthcare. Ideally, near-term use will focus on level-setting existing transformation initiatives that have been lagging, and to broadly identify hidden opportunities to bend the cost curve across the healthcare value chain. For example, the actualization of value-based and consumer-driven care relies significantly on the effectiveness of the payor-provider-patient digital ecosystem; however, providers are significantly behind in the development and adoption of these capabilities. As a breakthrough technology, AI has the potential to eliminate these disparities, to drive consumer care enablement, to democratize data use and interoperability, and to normalize processes for continuous improvement in health outcomes. Organizationally, this also means closing clinical practice and documentation gaps for value-based care reimbursements, as well as the capture of risk adjustments and resulting revenues.  Beyond the solving of current problems, there is also great promise to accelerate groundbreaking innovations, such as the convergence of genomics and translational medicine with next-generation healthcare delivery.

In conclusion, the generative AI hype and the rush to market may lead to healthcare PoCs that are a bridge too far, too quickly. The promise of this technology is pervasive and differentiating, and forward-thinking leaders will invest in the development of core capabilities to leverage AI solutions across many problems. The vast majority will struggle to assess the performance, value effectiveness, and verifiability of AI models, and enterprises will be challenged to determine which companies have the necessary data and expertise to tune AI models, and to effectively structure and map healthcare data.  To this extent, outsourcing multiple vendor “one off” solutions, with multiple PoC trials and frameworks, is untenable from an enterprise management perspective. 

Why would technology leadership outsource AI solutions and carry all the downstream risks for competitive expediency, or present this concept to their boards? It makes more sense to present core capabilities early in fundamental AI trials, to show evaluative capacities to manage this strategic asset across healthcare use cases. Additionally, this methodology will naturally expose the rationale for AI policies that do not yet exist. It is wiser to incorporate the considerations that we have proposed within any use-case driven POC, before the training wheels come off.  Such an approach will help to reduce barriers to adoption, and it will be more effective in moving healthcare use cases from pilot to production. To help mitigate these challenges, AI solution providers and industry leaders should consider embedding these concepts in PoC frameworks, to demonstrate new capabilities for evaluating the use of AI at the enterprise level. 


About Rob Bowman

Rob Bowman is an Operating Partner, Technology Strategy Services with CXO Partners in Atlanta, GA, a national firm delivering advisory, interim, and fractional executive services across healthcare industry verticals.