Bridging performance and practice: the next step for artificial intelligence in basic life support education

Tehran University of Medical Sciences Frontiers in Emergency Medicine 2717-3593 9 4 2025 12 05 Bridging performance and practice: the next step for artificial intelligence in basic life support education e37 e37 EN Hamideh Akbari Emergency Medicine Department, Tehran University of Medical Sciences, Tehran, Iran. 2025 11 07 2025 11 24 Recent studies show that artificial intelligence (AI) has performed well on standardized basic life support (BLS) examinations. King et al. report that GPT-4V achieved 96% and 90% accuracy on the 2016 AHA BLS and advanced cardiac life support (ACLS) exams, respectively, including competent electrocardiograph (ECG) interpretation. This finding reflects substantial progress in multimodal model reasoning and suggests potential use in assessment and personalized learning. Nevertheless, multiple evaluations of large-language models demonstrate highly variable accuracy in BLS scenarios—ranging from approximately 48% in question-based assessments to 85% in adult cardiac-arrest simulations and poor performance in pediatric and infant cases. Even GPT-4, the most consistent performer (κ ≈ 0.65), exhibits incomplete guideline adherence and limited reliability for unsupervised application. Thus, success in static examinations does not ensure reliable or safe behavior in dynamic clinical settings. In contrast, Semeraro et al. highlight persistent weaknesses of current multimodal systems such as Qwen 2.5-Max and ChatGPT-4o, whose automatically generated cardiopulmonary resuscitation (CPR) training materials often lack anatomical accuracy, clinical validity, and adherence to professional standards. This discrepancy underscores the translational gap between algorithmic performance and genuine educational reliability. The broader literature supports that AI, while capable of improving early cardiac arrest detection, compression precision, and feedback interactivity in simulation-based training, still yields inconsistent educational results. These mixed findings indicate that high exam scores do not necessarily guarantee pedagogically sound or clinically applicable training outcomes. To enable responsible integration of AI in resuscitation education, three priorities should be addressed. First, structured collaboration between AI developers and certified resuscitation educators is required to align algorithmic outputs with American heart association (AHA) and European Resuscitation Council (ERC) standards. Second, expansion of curated, medically verified multimodal datasets—including high-fidelity ECG and procedural imagery—should support model training and validation. Third, independent quality-assurance frameworks are essential to evaluate AI-generated educational content for factual, ethical, and pedagogical integrity before dissemination. Artificial intelligence demonstrates significant potential to augment BLS education and improve preparedness for cardiac arrest. However, this promise will be realized only through rigorous interdisciplinary oversight, transparent evaluation, and sustained commitment to evidence-based implementation. https://fem.tums.ac.ir/index.php/fem/article/view/1686 https://fem.tums.ac.ir/index.php/fem/article/download/1686/538