Clinical AI Has Arrived. Now Comes the Hard Part.

By Mike Phillips

The numbers look impressive. As of mid-2025, the FDA had authorized more than 1,250 AI-enabled medical devices for marketing in the United States — up from roughly 950 just a year earlier. In 2025 alone, the agency issued nearly 300 AI/ML device clearances, a pace that would have been unthinkable five years ago. Week after week, new tools are clearing the regulatory pipeline, covering everything from neurology and cardiology to dental imaging and diabetes management.

By any measure, clinical AI has arrived. What hasn’t arrived yet — not fully, not consistently — is the clinical reality that was supposed to come with it.

That gap is becoming the defining story of AI in healthcare heading into the back half of this decade.

The Clearance Pipeline Is Not the Same as the Clinic

A cleared device and a deployed device are not the same thing. The FDA’s authorization process determines that a tool is safe and substantially equivalent to an existing technology. It says less about whether the tool will function as intended inside the chaotic, legacy-infrastructure reality of an American hospital system — where EHRs from different vendors don’t communicate, where nurses are already running three documentation tools simultaneously, and where clinicians are skeptical of new software for reasons that have nothing to do with the software.

A January 2026 report from the ARISE network — drawing on researchers across Stanford, Harvard, and affiliated health systems — put it plainly: the field is moving faster than its own evaluation practices. The report found a consistent gap between what AI tools demonstrate in controlled studies and what holds up in actual medical workflows. In simulated EHR environments, even reasoning models that performed well on benchmarks showed failures that were, as the report put it, more informative than the wins.

That’s a civil way of saying the tools work until they don’t, and understanding when and why they don’t is still an open research question.

Where AI Is Actually Working

That said, there are places where clinical AI is delivering real, measurable results — and the pattern of where it works is instructive.

The strongest outcomes are concentrated in high-volume, image-heavy, pattern-recognition tasks: reading chest X-rays for lung nodule triage, flagging stroke-indicating abnormalities on CT scans, detecting diabetic retinopathy in retinal images. These are settings where the AI is doing one thing well, in a defined domain, with a human expert available to review its output. The FDA cleared Aidoc’s CARE1 foundation model for radiology triage in February 2025 — the first foundation model clearance in the imaging space — and it represents the direction the technology is heading: broader, more generalizable, but still anchored in image analysis.

The other area showing consistent results is administrative and documentation automation. Ambient AI scribes — tools that listen to patient-physician conversations and draft clinical notes automatically — have become the sleeper success story of the last two years. Tampa General Hospital reported that physicians who had been on the verge of retirement decided to extend their careers after implementation, citing the elimination of what clinicians call “pajama time”: the late-night charting sessions that have become a main cause of physician burnout. That’s a significant finding that goes well beyond efficiency metrics.

Where the Gap Stays Wide

Step outside those domains, and the picture gets more complicated.

On the treatment recommendation and clinical decision support side — where AI might suggest diagnoses, flag drug interactions, or guide therapeutic choices — adoption remains limited and clinician confidence low. The American Hospital Association tracked a substantial year-over-year increase in AI use for billing automation and appointment scheduling, but found that usage for treatment recommendations and patient monitoring remained relatively low. The reason, the AHA concluded, was a lack of clinical confidence in the accuracy and reliability of those tools. That’s not technophobia. That’s earned skepticism.

The deployment gap also runs along familiar institutional fault lines. Large academic medical centers and well-resourced health systems are piloting and scaling AI tools. Community hospitals, rural facilities, and safety-net institutions — which often serve patients with the highest disease burden and the fewest clinical resources — are not keeping pace. The infrastructure required to deploy, monitor, and govern AI tools is itself expensive and expertise-intensive. A hospital system that is already understaffed and running on thin margins is not in a position to build an AI oversight committee.

The Regulatory Picture Is Shifting

On the regulatory side, two developments from early 2026 are worth watching.

In January, the FDA clarified that many low-risk AI-enabled software tools — particularly those where a clinician can independently review the AI’s recommendations before acting — fall outside the full scope of medical device regulation. The intent was to reduce friction for developers and expand access to lower-risk digital health tools. The practical effect is a larger category of clinical AI operating with less regulatory oversight, at a moment when the clinical community is still working out how to evaluate and govern the tools it already has.

The EU is moving in the opposite direction. The AI Act, which entered into force in August 2024, classifies AI-enabled medical devices as high-risk under its framework, with full compliance required by August 2027. That divergence — lighter oversight in the U.S., stricter in Europe — is going to create different market incentives and different standards of evidence on each side of the Atlantic. American health systems would do well to watch what the EU’s framework produces in terms of outcome data over the next few years.

Meanwhile, Congress has been moving slowly on a dedicated Medicare reimbursement pathway for AI diagnostic devices, and the CPT 2026 code set added 288 new codes covering digital health and AI services. These are the less-reported but operationally important mechanisms through which AI tools either get paid for or don’t — and payment structures, more than clinical evidence, tend to determine what actually gets deployed at scale.

What to Watch

The ARISE report’s framing is the right one: the question is no longer whether AI can perform clinical tasks. In controlled settings, many tools can. The question is whether AI performs those tasks reliably in real clinical environments, with real patient populations, under real working conditions — and whether the health systems deploying these tools have the governance infrastructure to catch it when it doesn’t.

Healthcare leaders surveyed heading into 2026 were nearly unanimous on one point: this is the year the conversation shifts from deployment to validation. Governance committees are being stood up. Evaluation frameworks are being built. The tools that will survive the transition from pilot to production are the ones whose developers can answer the harder questions: not just whether the algorithm works, but whether it works equitably, what happens when it fails, and who is responsible when it does.

That’s where the interesting work is happening now. Not in the clearance announcements — in the messy, unglamorous infrastructure of making AI safe at scale.

Sources: Figures on FDA AI/ML device clearances and the 1,250+ authorized devices as of mid-2025 are drawn from IntuitionLabs’ FDA AI Medical Device Tracker (intuitionlabs.ai, March 2026) and the Bipartisan Policy Center’s issue brief on FDA oversight of health AI tools (bipartisanpolicy.org, November 2025). The ARISE network report — The State of Clinical AI (2026) — was released in January 2026 and is summarized via Stanford Medicine’s Department of Medicine news coverage (medicine.stanford.edu). The Aidoc CARE1 foundation model clearance is documented in IntuitionLabs’ AI Medical Device tracker. The Tampa General Hospital ambient scribe findings are reported in Do Systems Inc.’s healthcare AI analysis (dosystemsinc.com, March 2026). AHA data on AI adoption patterns and the gap between operational and clinical use is drawn from the AHA Center for Health Innovation Market Scan (aha.org, November 2025). The FDA’s January 2026 guidance on low-risk AI software oversight is covered by Telehealth.org (January 2026). EU AI Act compliance timelines and the CPT 2026 code expansion are documented in IntuitionLabs’ FDA AI Medical Device Tracker. Industry outlook on governance and the shift to validation draws on expert surveys published by Wolters Kluwer Health (wolterskluwer.com, December 2025) and Healthcare IT Today (healthcareittoday.com, December 2025).

Mike Phillips Tech