A research lab in medical AI, built inside a medical school.
Our team at the University of St Andrews has published peer-reviewed research on AI assessment validity, responsible generation of clinical content, demographic representation in AI imagery, and the future of teleconsultation training. Every claim on this site is grounded in work you can read.
published in BMC, JMIR, PLOS ONE, Advances in Simulation & more
7
Open access
every link on this page is a verifiable DOI
700+
Medical students
active at the University of St Andrews and ScotGEM
Flagship · The Diversity Engine
We quantified AI’s diversity problem. Then we built the fix.
In our landmark study in JMIR AI, our team found that DALL-E and Midjourney significantly under-represent darker skin tones when generating medical imagery. The bias was significant (P < .001) and held across hundreds of generated images.
Our method overcomes this bias such that there is no statistically significant difference between AI-generated images and the real-life distribution of skin tones. This approach is now the basis of the SimPatient Diversity Engine.
Source paper
Ensuring appropriate representation in AI-generated medical imagery: a methodological approach to address skin tone bias
O’Malley AS, Veenhuizen MA, Ahmed A. · JMIR AI 2024;3:e58275
The Diversity Engine has since been extended to gender, age, and ethnicity dimensions, and is calibrated against Scotland 2022 Census benchmarks in addition to US demographic data.
Headline finding
P < .001 → P = .04
The Diversity Engine reduced the demographic bias of standard generative models from statistically large to near-representative.
The only AI simulation software underpinned by scientific research.
Authored by the St Andrews team. Published in BMC Medical Education, the JMIR family, PLOS ONE, Advances in Simulation, Simulation in Healthcare and more.
AI in assessmentOpen access
Quality assurance and validity of AI-generated single best answer questions
Ahmed A, Kerr E, O'Malley AS
BMC Medical Education · 2025
Quality-assured AI-authored exam questions performed no differently from human-authored questions when tested on 142 St Andrews students.
Enhancing diagnostic accuracy of ophthalmological conditions with complex prompts in GPT-4
M'gadzah S, O'Malley AS
JMIR Formative Research · 2025
Complex prompting raised GPT-4 diagnostic accuracy from 60.4% to 90.1% (p < .001), with the largest gains on conditions prevalent in low- and middle-income countries.
Ensuring appropriate representation in AI-generated medical imagery: addressing skin tone bias
O'Malley AS, Veenhuizen MA, Ahmed A
JMIR AI · 2024
Standard generative models under-represented darker skin tones (P < .001 vs US demographics). A custom model reduced the gap to P = .04. Foundation of the Diversity Engine.
Medical students' and educators' opinions of teleconsultation in practice and undergraduate education: a UK-based mixed-methods study
Wetzlmair-Kephart LC, O'Malley A, O'Carroll V
PLOS ONE · 2025
248 questionnaire participants and 23 interviews across UK medical schools revealed a clear gap in teleconsultation training and an appetite to close it.
Teleconsultation in health and social care professions education: a systematic review
Wetzlmair L, O'Carroll V, O'Malley AS, Murray SW
The Clinical Teacher · 2022
Systematic review of 14 studies (JBI methodology). Teleconsultation education increases student knowledge, confidence and satisfaction; further high-quality research and educator guidance is warranted.
Investigating and combating gender bias in generative large language models
Veenhuizen MA, O'Malley AS
Medicine & Health (AIMEC 2024 proceedings) · 2024
Conference paper at the 1st International Conference on AI in Medical Education (AIMEC 2024). Extended the diversity work to gender representation in clinical LLMs.
Demographic biases in AI-generated simulated patient cohorts: a comparative analysis against census benchmarks
Veenhuizen MA, O'Malley AS
Advances in Simulation · 2025
Compared AI-generated simulated patient cohorts against census benchmarks, extending the representation work from imagery to whole patient populations.
Solving educational capacity challenges with an AI-powered patient simulator
O'Malley AS, Duggal S, Gordon I, Murad S, Wang X
SESAM 2025 · Society for Simulation in Europe, Valencia
Enhancing undergraduate clinical communication teaching and learning through AI simulation
Duggal S, Hughes A, O'Malley AS, Wang X, Murad S, Zachariou M, Gordon I
UKCCC 2025 · UK Council of Clinical Communications
Currently under review
Large language models display human-like social desirability biases in health screening questionnaires
O'Malley AS
Under review at Computers in Medicine
Deployed at scale
Already in use across two flagship Scottish medical programmes.
SimPatient is being rolled out across the University of St Andrews School of Medicine and the Scottish Graduate-Entry Medicine (ScotGEM) programme. Early data shows strong learner engagement.
Primary deployment
University of St Andrews School of Medicine
700+
Medical students active on the platform
AY 25/26
Live in the curriculum this academic year
SimPatient is integrated into the medical-school curriculum at St Andrews. The School of Medicine’s Education Division is co-evaluating learning gain through the academic year.
Programme partner
ScotGEM Scottish Graduate-Entry Medicine
Rural
Trains doctors for remote and rural Scotland
GMC-aligned
Scenarios mapped to the Medical Licensing Assessment
The St Andrews lead of the ScotGEM programme (Dr Andrew O’Malley, Senior Lecturer in the School of Medicine) is using SimPatient to scale consultation practice for students training to serve rural and remote communities across Scotland.
The frameworks we ship against
Built on the frameworks your faculty already trust.
SimPatient’s rubric system isn’t invented in-house. It implements the published frameworks medical schools and GMC accreditation already use, with citations on every page.
Calgary-Cambridge
Clinical communication
Calgary-Cambridge model
The five-stage clinical-communication model: initiating, gathering information, physical examination, explanation & planning, and closing. Twenty named sub-skills across the consultation.
Kurtz SM, Silverman J, Draper J. Teaching and Learning Communication Skills in Medicine (2nd ed.). Radcliffe, 2005.
In product: Built-in rubric in SimPatient. Every consultation can be graded against the 20 sub-skills with section scores and transcript citations.
CRI-HT
Clinical reasoning
Clinical Reasoning Indicators for History Taking (CRI-HT)
Eight indicators rated 1–5 covering lead-taking, recognising relevant information, symptom specification, pathophysiological thinking, logical questioning, checking with the patient, summarising, and overall data quality.
Fürstenberg S, et al. Med Teach. 2020;42(8):914–921.
In product: Built-in rubric in SimPatient. Used by programmes that grade clinical reasoning explicitly alongside communication.
LCSAS
Validated assessment
Liverpool Communication Skills Assessment Scale (LCSAS)
A validated assessment scale developed at the University of Liverpool, with calibrated rating-scale labels and per-anchor descriptors for high-stakes communication-skills assessment.
University of Liverpool Communication Skills Assessment Scale.
In product: LCSAS-style anchors are first-class in SimPatient's custom-rubric editor, for programmes running validated, calibrated rubrics.
GMC MLA
UK regulatory
GMC Medical Licensing Assessment
The General Medical Council's Medical Licensing Assessment content map. The regulatory framework all UK medical schools must align to from 2024–25 onward.
General Medical Council, UK.
In product: Every SimPatient scenario maps to GMC MLA categories. Twelve curriculum categories aligned at launch.
Scottish Doctor
Scottish curriculum
Scottish Doctor learning outcomes
The agreed Scottish-medical-schools curriculum framework defining learning outcomes for undergraduate medical education in Scotland.
Scottish Deans' Medical Curriculum Group.
In product: AI-generated feedback maps performance to Scottish Doctor outcomes, alongside GMC MLA.
The path in
Bring SimPatient into your programme.
Primary
Book a demo
A 30-minute call with a clinician on our team. We’ll show the wizard, run a live consultation in your preferred mode, and walk through how rubric grading would map to your existing curriculum.
Run a 4-week pilot with a single cohort. We’ll help you set up your org, import your existing marking scheme, and share a written report at the end measuring usage, learner sentiment, and rubric performance.