Research

A research lab in medical AI, built inside a medical school.

Our team at the University of St Andrews has published peer-reviewed research on AI assessment validity, responsible generation of clinical content, demographic representation in AI imagery, and the future of teleconsultation training. Every claim on this site is grounded in work you can read.

The Diversity Engine Our published research Deployed at scale Frameworks we ship against

Peer-reviewed papers

published in BMC, JMIR, PLOS ONE, Advances in Simulation & more

Open access

every link on this page is a verifiable DOI

700+

Medical students

active at the University of St Andrews and ScotGEM

Flagship · The Diversity Engine

We quantified AI’s diversity problem. Then we built the fix.

In our landmark study in JMIR AI, our team found that DALL-E and Midjourney significantly under-represent darker skin tones when generating medical imagery. The bias was significant (P < .001) and held across hundreds of generated images.

Our method overcomes this bias such that there is no statistically significant difference between AI-generated images and the real-life distribution of skin tones. This approach is now the basis of the SimPatient Diversity Engine.

Source paper

Ensuring appropriate representation in AI-generated medical imagery: a methodological approach to address skin tone bias

O’Malley AS, Veenhuizen MA, Ahmed A. · JMIR AI 2024;3:e58275

Read in JMIR AI

The Diversity Engine has since been extended to gender, age, and ethnicity dimensions, and is calibrated against Scotland 2022 Census benchmarks in addition to US demographic data.

Headline finding

P < .001 → P = .04

The Diversity Engine reduced the demographic bias of standard generative models from statistically large to near-representative.

300

AI-generated medical images analysed

Citations on Google Scholar

Our research

The only AI simulation software underpinned by scientific research.

Authored by the St Andrews team. Published in BMC Medical Education, the JMIR family, PLOS ONE, Advances in Simulation, Simulation in Healthcare and more.

AI in assessmentOpen access

Quality assurance and validity of AI-generated single best answer questions

Ahmed A, Kerr E, O'Malley AS

BMC Medical Education · 2025

Quality-assured AI-authored exam questions performed no differently from human-authored questions when tested on 142 St Andrews students.

Read in BMC Medical Education

Responsible AIOpen access

Enhancing diagnostic accuracy of ophthalmological conditions with complex prompts in GPT-4

M'gadzah S, O'Malley AS

JMIR Formative Research · 2025

Complex prompting raised GPT-4 diagnostic accuracy from 60.4% to 90.1% (p < .001), with the largest gains on conditions prevalent in low- and middle-income countries.

Read in JMIR Formative Research

Responsible AIOpen access

Ensuring appropriate representation in AI-generated medical imagery: addressing skin tone bias

O'Malley AS, Veenhuizen MA, Ahmed A

JMIR AI · 2024

Standard generative models under-represented darker skin tones (P < .001 vs US demographics). A custom model reduced the gap to P = .04. Foundation of the Diversity Engine.

Read in JMIR AI

TeleconsultationOpen access

Medical students' and educators' opinions of teleconsultation in practice and undergraduate education: a UK-based mixed-methods study

Wetzlmair-Kephart LC, O'Malley A, O'Carroll V

PLOS ONE · 2025

248 questionnaire participants and 23 interviews across UK medical schools revealed a clear gap in teleconsultation training and an appetite to close it.

Read in PLOS ONE

TeleconsultationOpen access

Teleconsultation in health and social care professions education: a systematic review

Wetzlmair L, O'Carroll V, O'Malley AS, Murray SW

The Clinical Teacher · 2022

Systematic review of 14 studies (JBI methodology). Teleconsultation education increases student knowledge, confidence and satisfaction; further high-quality research and educator guidance is warranted.

Read in The Clinical Teacher

Responsible AI

Investigating and combating gender bias in generative large language models

Veenhuizen MA, O'Malley AS

Medicine & Health (AIMEC 2024 proceedings) · 2024

Conference paper at the 1st International Conference on AI in Medical Education (AIMEC 2024). Extended the diversity work to gender representation in clinical LLMs.

Read in Medicine & Health (AIMEC 2024 proceedings)

Responsible AIOpen access

Demographic biases in AI-generated simulated patient cohorts: a comparative analysis against census benchmarks

Veenhuizen MA, O'Malley AS

Advances in Simulation · 2025

Compared AI-generated simulated patient cohorts against census benchmarks, extending the representation work from imagery to whole patient populations.

Read in Advances in Simulation

AI in assessmentOpen access

Plan-Do-Study-Act (PDSA) prompting: a structured approach to prompt engineering in medical education

O'Malley A, Veenhuizen M

Journal of the Academy of Medical Educators · 2026

Introduces a structured PDSA cycle for prompt engineering, giving educators a repeatable method for building reliable AI teaching tools.

Read in Journal of the Academy of Medical Educators

AI in assessment

Reflections on confronting a capacity challenge with an AI-powered patient simulator (SimPatient)

O'Malley A

Simulation in Healthcare · 2026

A reflective account of deploying SimPatient to meet a real teaching-capacity challenge in undergraduate medical education.

Read in Simulation in Healthcare

Conference papers · 2025

Solving educational capacity challenges with an AI-powered patient simulator
O'Malley AS, Duggal S, Gordon I, Murad S, Wang X
SESAM 2025 · Society for Simulation in Europe, Valencia
Enhancing undergraduate clinical communication teaching and learning through AI simulation
Duggal S, Hughes A, O'Malley AS, Wang X, Murad S, Zachariou M, Gordon I
UKCCC 2025 · UK Council of Clinical Communications

Currently under review

Large language models display human-like social desirability biases in health screening questionnaires
O'Malley AS
Under review at Computers in Medicine

Deployed at scale

Already in use across two flagship Scottish medical programmes.

SimPatient is being rolled out across the University of St Andrews School of Medicine and the Scottish Graduate-Entry Medicine (ScotGEM) programme. Early data shows strong learner engagement.

Primary deployment

University of St Andrews
School of Medicine

700+

Medical students
active on the platform

AY 25/26

Live in the curriculum
this academic year

SimPatient is integrated into the medical-school curriculum at St Andrews. The School of Medicine’s Education Division is co-evaluating learning gain through the academic year.

Programme partner

ScotGEM
Scottish Graduate-Entry Medicine

Rural

Trains doctors for remote
and rural Scotland

GMC-aligned

Scenarios mapped to the
Medical Licensing Assessment

The St Andrews lead of the ScotGEM programme (Dr Andrew O’Malley, Senior Lecturer in the School of Medicine) is using SimPatient to scale consultation practice for students training to serve rural and remote communities across Scotland.

The frameworks we ship against

Built on the frameworks your faculty already trust.

SimPatient’s rubric system isn’t invented in-house. It implements the published frameworks medical schools and GMC accreditation already use, with citations on every page.

Calgary-Cambridge

Clinical communication

Calgary-Cambridge model

The five-stage clinical-communication model: initiating, gathering information, physical examination, explanation & planning, and closing. Twenty named sub-skills across the consultation.

Kurtz SM, Silverman J, Draper J. Teaching and Learning Communication Skills in Medicine (2nd ed.). Radcliffe, 2005.

In product: Built-in rubric in SimPatient. Every consultation can be graded against the 20 sub-skills with section scores and transcript citations.

CRI-HT

Clinical reasoning

Clinical Reasoning Indicators for History Taking (CRI-HT)

Eight indicators rated 1–5 covering lead-taking, recognising relevant information, symptom specification, pathophysiological thinking, logical questioning, checking with the patient, summarising, and overall data quality.

Fürstenberg S, et al. Med Teach. 2020;42(8):914–921.

In product: Built-in rubric in SimPatient. Used by programmes that grade clinical reasoning explicitly alongside communication.

LCSAS

Validated assessment

Liverpool Communication Skills Assessment Scale (LCSAS)

A validated assessment scale developed at the University of Liverpool, with calibrated rating-scale labels and per-anchor descriptors for high-stakes communication-skills assessment.

University of Liverpool Communication Skills Assessment Scale.

In product: LCSAS-style anchors are first-class in SimPatient's custom-rubric editor, for programmes running validated, calibrated rubrics.

GMC MLA

UK regulatory

GMC Medical Licensing Assessment

The General Medical Council's Medical Licensing Assessment content map. The regulatory framework all UK medical schools must align to from 2024–25 onward.

General Medical Council, UK.

In product: Every SimPatient scenario maps to GMC MLA categories. Twelve curriculum categories aligned at launch.

Scottish Doctor

Scottish curriculum

Scottish Doctor learning outcomes

The agreed Scottish-medical-schools curriculum framework defining learning outcomes for undergraduate medical education in Scotland.

Scottish Deans' Medical Curriculum Group.

In product: AI-generated feedback maps performance to Scottish Doctor outcomes, alongside GMC MLA.

The path in

Bring SimPatient into your programme.

Primary

Book a demo

A 30-minute call with a clinician on our team. We’ll show the wizard, run a live consultation in your preferred mode, and walk through how rubric grading would map to your existing curriculum.

Book a demo

Lower-stakes

Pilot programme

Run a 4-week pilot with a single cohort. We’ll help you set up your org, import your existing marking scheme, and share a written report at the end measuring usage, learner sentiment, and rubric performance.

Request a pilot

All pricing is per-organisation and includes unlimited learners. Get tailored pricing on the call.