Hawaii Medical Journal

ISSN 2026-XXXX | Volume 1 | March 2026

Radiologists Can't Spot AI Deepfake X-Rays, Study Finds

A 2026 Radiology study reveals board-certified radiologists cannot reliably detect AI-generated chest X-rays, raising serious patient safety concerns.

7 min read
Medical professional reviewing MRI scans on computer screens for diagnostic purposes.

Surveillance data from the medical imaging community point to an emerging vulnerability in diagnostic radiology that carries substantial implications for patient safety, clinical decision-making, and the integrity of health information systems. A study published in March 2026 in the journal Radiology demonstrates that board-certified radiologists struggle to distinguish artificially generated chest radiographs from authentic clinical images, with detection rates that fall well below thresholds acceptable for reliable diagnostic practice.

The findings arrive at a moment when generative artificial intelligence (AI) tools have achieved widespread accessibility. The researchers, an international team whose work appears in one of the field’s most established peer-reviewed publications, constructed a controlled evaluation using images produced by ChatGPT, the large language model and multimodal AI system developed by OpenAI. Using straightforward text prompts specifying anatomical location, disorder type, and image noise level, the researchers generated synthetic radiographs that were, by the study’s own assessment, sufficiently realistic to mislead trained specialists.

Detection Performance Among Radiologists

The study enrolled 17 radiologists and presented them with a diagnostic scenario in which they reviewed radiographic images and rendered clinical assessments. Under initial conditions, in which participants were not forewarned that synthetic images might be present, only 41% of radiologists identified that anything was irregular. Put another way, the majority of trained clinicians proceeded through a diagnostic workflow without recognizing that the images under review had been computationally generated rather than captured through clinical imaging equipment.

The performance improved, though not to levels that would satisfy most quality assurance benchmarks, once participants were explicitly informed that deepfake radiographs could be present in the image set. Under these conditions, accurate differentiation between authentic and synthetic images rose to 75%. That figure, while a marked improvement over the unsuspecting baseline, still implies that one in four synthetic radiographs evaded detection even when radiologists were actively seeking to identify fabricated content.

These detection rates carry direct clinical relevance. Radiologists function as a critical checkpoint within the diagnostic pipeline. When a synthetic image passes through that checkpoint undetected and is interpreted as a genuine clinical finding, the downstream consequences extend to diagnostic conclusions, treatment plans, medication decisions, and procedural interventions. A patient whose imaging study is unknowingly replaced with a fabricated radiograph reflecting a false pathology could receive unnecessary treatment. Conversely, a synthetic image that obscures or replaces evidence of genuine disease could delay or prevent appropriate care.

AI Systems Evaluated as Detectors

Given that AI systems generated these synthetic images, the researchers also examined whether AI systems might serve as a countermeasure, functioning as automated detectors of fabricated radiographs. Four multimodal AI models were evaluated, including the same ChatGPT-based model that had produced the synthetic images under scrutiny. The results did not support optimism.

Detection accuracy across the four models ranged from 57% to 85%. The lower bound of that range, 57%, represents performance only marginally better than chance for a binary classification task. The upper bound, 85%, is notable but still permits a substantial proportion of synthetic images to pass through automated review undetected. These figures indicate that neither human expertise nor current AI-based detection tools offer reliable safeguards against the introduction of fabricated radiographic content into clinical or research workflows.

The inability of the generating model to reliably detect its own output is a particularly consequential finding. It suggests that the image generation process does not embed detectable artifacts in a manner that can be consistently identified, even by a system with direct knowledge of the generative process. This dynamic complicates proposals for deploying AI detection tools as a practical solution within clinical radiology departments or hospital imaging networks.

Generation Methodology and Accessibility

One of the most clinically consequential aspects of the study concerns the simplicity of the image generation process. The researchers did not employ specialized software, custom-trained models, or technical expertise beyond the capacity to construct basic natural language prompts. They specified an anatomical region, identified a target pathology, and indicated a preferred noise level. ChatGPT produced radiographic images meeting those specifications.

This accessibility removes a barrier that, in prior years, might have constrained the misuse of fabricated medical imaging to sophisticated actors with technical resources. The threshold for producing a convincing synthetic radiograph now sits within reach of any individual with access to a commercially available generative AI platform. The public health implications of this accessibility extend beyond individual patient encounters to encompass clinical trial integrity, insurance documentation, disability adjudication, and the evidentiary basis for medicolegal proceedings.

From an epidemiological perspective, the population potentially exposed to harms arising from synthetic medical imaging is not confined to a narrow clinical subgroup. Any patient whose care depends on radiographic documentation, any institution that relies on submitted imaging data, and any regulatory body that accepts imaging as evidence operates within a system that the current study suggests is not equipped to verify image authenticity at acceptable reliability levels.

Contextualizing the Threat Within Health Information Security

The radiographic deepfake problem does not exist in isolation. It represents one dimension of a broader challenge confronting health information systems as generative AI capabilities advance. Medical imaging constitutes a substantial portion of diagnostic data generated within health systems. The American College of Radiology and national health surveillance bodies have historically treated imaging archives as reliable objective records. The current findings complicate that assumption.

Prior research has examined adversarial manipulation of AI-assisted diagnostic tools, including studies demonstrating that subtle pixel-level alterations to digital images can cause AI classification systems to generate incorrect diagnoses. The mechanism identified in the Radiology study differs in that it operates not at the level of adversarial perturbation but at the level of wholesale image fabrication using commercially available tools. The fabricated images are not modified authentic images but entirely synthetic constructs designed from a prompt.

This distinction matters for detection strategy. Methods developed to identify adversarial perturbations in authentic images may have limited applicability to the detection of entirely synthetic radiographs. The field of medical image forensics will require expanded research investment to address generation modalities that current detection frameworks were not designed to evaluate.

Implications for Clinical Radiology Practice

Hawaii’s clinical radiology infrastructure, like that of health systems nationally, operates under assumptions of image provenance that the current study places under scrutiny. Teleradiology services, which are particularly relevant in Hawaii given the geographic distribution of patient populations across the state’s islands, transmit imaging data through digital networks. The chain of custody for those images and the verification mechanisms applied at each transmission point represent areas requiring review in light of these findings.

The Hawaii Department of Health and hospital credentialing bodies may find it productive to examine current policies governing the verification of submitted imaging data. Institutions that accept radiographic images through electronic health record (EHR) portals, patient-submitted uploads, or third-party imaging platforms should consider whether existing technical safeguards are adequate to identify synthetically generated content.

At the level of radiology training, the study’s findings carry curriculum implications. If 41% detection under unsuspecting conditions represents the current baseline for trained radiologists, incorporating detection training into residency programs and continuing medical education (CME) requirements represents a reasonable near-term response. The improvement from 41% to 75% observed when radiologists were alerted to the possibility of synthetic images suggests that awareness itself functions as a partial countermeasure, though not a sufficient one.

Research Priorities and Surveillance Gaps

The study cohort of 17 radiologists, while sufficient to generate findings of concern, represents a limited sample from which to draw population-level conclusions about radiologist detection performance. The researchers acknowledge that broader replication with larger and more demographically diverse cohorts of imaging specialists would strengthen the evidence base. Detection performance may vary by subspecialty training, years of experience, institutional volume, or familiarity with AI-generated image artifacts.

Surveillance data on the actual prevalence of synthetic radiographs entering clinical or administrative workflows do not currently exist in a systematic form. The absence of such data reflects both the novelty of the threat and the absence of detection infrastructure capable of generating reliable prevalence estimates. Establishing baseline surveillance would be a productive priority for health information security researchers and radiology professional societies.

The publication of this study in Radiology in March 2026 should be understood as an early signal within what is likely to be a sustained and evolving research conversation. The generative capabilities of large language models and multimodal AI systems will continue to advance. Detection methodologies must advance in parallel, and the clinical and public health communities will require updated evidence as both the threat landscape and the technical response capabilities develop.

The core finding remains: the tools required to generate convincing synthetic radiographs are now accessible to a broad population of users, trained clinicians cannot reliably identify those images under typical diagnostic conditions, and current AI-based detection systems do not close that gap with adequate reliability. For a health system that depends on the integrity of imaging data to make consequential decisions about patient care, those three conditions together constitute a challenge that warrants immediate and systematic attention.

Priya Patel

Public Health Correspondent

View all articles →