From Algorithm to Real Care: The Importance of Clinical Validation in AI

Blog AI in healthcare

From Algorithm to Real Care: The Importance of Clinical Validation in AI

Summary:

Artificial intelligence is advancing rapidly in medicine, but a statistic from MIT reveals that approximately 95% of AI pilot projects fail when implemented in real-world settings. The reason is simple: outstanding technical accuracy in controlled environments does not necessarily translate into safety at the bedside.

The third article in the editorial series “AI in Healthcare: Trust, Safety, and Impact on Clinical Practice” explores the critical pillar of clinical validation. Learn why testing algorithms across diverse care settings and addressing hidden biases in healthcare datasets is essential to transforming AI from a promising technology into a truly reliable and safe clinical tool.

Key Topics Covered:

The gap between theory and real-world practice
Technical validation vs. clinical validation
The risks of hidden biases
The practical approach behind the Epimed Prediction Models
The importance of representative datasets
Clinical responsibility as a commitment

Content:

In recent years, advances in artificial intelligence (AI) have generated considerable excitement and expectation. New models emerge every day, capable of processing vast volumes of data and identifying patterns that previously went unnoticed. Healthcare, historically one of the most cautious sectors in adopting new technologies, has now become one of the three leading industries where AI is being adopted at an accelerated pace.¹

Over the past ten years, nearly 300,000 scientific articles on the subject have been indexed in PubMed, the leading life sciences research database, reflecting exponential growth in the field.

Do algoritmo ao cuidado real: a importância da validação clínica da IA

Publications indexed in PubMed containing “artificial intelligence,”
“machine learning,” and “AI agents” (search conducted on May 28th, 2026).

However, when the conversation moves from scientific publications to the real world, the definition of success changes dramatically. An algorithm may achieve strong results in a study, often conducted using retrospective datasets or controlled environments. In hospital settings, what is at stake is not the computational power of a tool, but patient safety and the accuracy of clinical decision-making. A recent study from MIT² found that approximately 95% of AI pilot projects fail when implemented in real-world settings.

In healthcare, this challenge is even more critical. When a new AI model is introduced, the most common question is, “What is its accuracy?” Yet this is not the most important question. The question that truly matters is, “Does this model work reliably, consistently, and safely in everyday clinical practice and real-world decision-making?”

The distinction between a model that performs well in a controlled environment and one that can be trusted in clinical practice is what we call clinical validation. It is the most important—and often the most overlooked—criterion when evaluating any AI solution in healthcare.

Technical Validation and Clinical Validation Are Not the Same

Virtually all AI models undergo some form of validation before being released. However, in most cases, this validation occurs under conditions that do not reflect real-world practice, as discussed in the previous article in this series. ³

During technical validation, a model’s performance is evaluated using a test dataset. The primary objective is to measure metrics such as sensitivity, specificity and accuracy, often through the area under the ROC curve. These metrics are undoubtedly important, but they answer a retrospective question: Did the model learn effectively from the data it was given? It is also important to note that these datasets may consist of information collected many years ago and may contain biases or patterns that no longer accurately reflect current clinical realities.

Clinical validation represents a much more demanding stage. It answers a different, more demanding question: Does the model maintain strong performance when the context changes or when patients differ from those on whom the algorithm was originally trained?

Real-world clinical data are heterogeneous. They often contain missing variables, implausible values, inconsistent documentation, and populations with different epidemiological profiles. A model trained on data from academic hospitals in the United States may perform very differently in a general ICU in Brazil. Clinical validation is the rigorous process of determining whether a model performs effectively in the environment where it will actually be used.

The Risk of Hidden Bias

For many years, discussions about the risks of AI in healthcare focused primarily on hallucinations—instances in which language models generate incorrect or entirely fabricated information. This is a genuine concern, but it is not necessarily the most prevalent risk in healthcare AI.

More recently, biases within training datasets have emerged as one of the most significant concerns. Unlike hallucinations, these biases do not appear as obvious errors. Instead, they may cause a model to systematically underestimate risk in certain patient populations while overestimating it in others. They may also perform poorly among groups that are underrepresented in the training data. Compounding the problem, these hidden biases often accumulate silently. By the time they are detected, real harm may already have occurred. ⁴

Rigorous clinical validation remains the most effective safeguard against this risk. It requires testing models across diverse populations and healthcare settings, combined with prospective monitoring of outcomes.

A Real-World Example: The Epimed Prediction Models

Epimed Solutions was founded in 2008 by intensive care physicians with a clear understanding of the safety and evidence requirements that guide healthcare decision-making. As a pioneer in deploying AI-driven healthcare models through Epimed Monitor Performance over the past decade, Epimed views AI not as a recent technological trend but as a natural evolution of its analytical solutions.

The Epimed Prediction Models are predictive models that use machine learning techniques to estimate key clinically relevant outcomes in critical care. They have been implemented at scale throughout Brazil and Latin America—not as pilot projects or proof-of-concept initiatives, but as real-world solutions embedded within ICU workflows across hospitals of different sizes and profiles.

These models were developed using the world’s largest database of critically ill patients, built over nearly eighteen years through continuous scientific and technical curation by intensive care physicians with deep knowledge of the Brazilian healthcare landscape.

The database includes: more than 9 million hospital admissions, over 900 hospitals, institutions of varying sizes and profiles, coverage across all 27 Brazilian states, approximately 50% of the country’s ICU bed capacity. This history is not a minor detail; it is the key differentiator that enables robust clinical validation based on reliable, structured and representative data.

When healthcare professionals receive: a mortality risk estimate, a prediction of prolonged hospitalization, a forecast of extended mechanical ventilation or an ICU readmission risk alert generated by an AI model, they need to trust that information. Not because the system claims high accuracy, but because evidence generated in real-world settings demonstrates that the model works. Trust is not declared. It is built through high-quality data, rigorous validation, monitored implementation and a commitment to continuous improvement. Clinical validation is not bureaucracy: it is clinical responsibility. That is the commitment of Epimed Solutions.

_____________________________________________________________________

¹ AI Adoption by the Numbers
https://www.a16z.news/p/ai-adoption-by-the-numbers

² The GenAI Divide: State of AI in Business 2025
https://mlq.ai/media/quarterly_decks/v0.1_State_of_AI_in_Business_2025_Report.pdf

³ The Role of Data Curation in Reliable Healthcare AI
https://www.epimedsolutions.com/en/the-role-of-data-curation-in-reliable-healthcare-ai/

⁴ Bias recognition and mitigation strategies in artificial intelligence healthcare applications
https://www.nature.com/articles/s41746-025-01503-7

______________________________________________________________________________________________________

This is the third publication in the editorial series “AI in Healthcare: Trust, Safety, and Impact on Clinical Practice,” produced by Epimed Solutions.

Author: Dr. Marcio Soares, physician-scientist and senior researcher in Intensive Care at IDOR, co-founder and vice president of Research and Development at Epimed Solutions, associate professor in the Graduate Program in Internal Medicine at UFRJ; ranked among the top 2% of the world’s most influential scientists (Stanford–Elsevier, 2020–2025).