Can a large language model inform patients about auditory brainstem implants?

PARLAK KOCABAY, AYSUN; İKİZ BOZSOY, MERVE; EROĞLU, ERGİN

doi:10.1007/s00405-025-09789-9

Can a large language model inform patients about auditory brainstem implants?

PARLAK KOCABAY A., İKİZ BOZSOY M., EROĞLU E.

European Archives of Oto-Rhino-Laryngology, 2025 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Basım Tarihi: 2025
Doi Numarası: 10.1007/s00405-025-09789-9
Dergi Adı: European Archives of Oto-Rhino-Laryngology
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, BIOSIS, CAB Abstracts, EMBASE, MEDLINE, Veterinary Science Database
Anahtar Kelimeler: Artificial intelligence, Auditory brainstem implant, ChatGPT, Large language models
Anadolu Üniversitesi Adresli: Evet

Özet

Objective: This study evaluated the performance of ChatGPT-4o in responding to patient-centered questions concerning auditory brainstem implantation (ABI), with a focus on content quality and readability. Methods: A total of 51 real-world patient questions related to ABI were reviewed and grouped into five thematic categories: diagnosis and candidacy, surgical procedures and complications, device function and mapping, rehabilitation and expected outcomes, and daily life and long-term concerns. Responses were independently assessed by two audiologists and one otologist across four domains—accuracy, comprehensiveness, clarity, and credibility—using a 5-point Likert scale. Readability was evaluated using the Flesch Reading Ease (FRE) and Flesch-Kincaid Grade Level (FKGL) formulas. Kruskal–Wallis and Friedman tests were used to examine statistical differences across question categories and evaluation dimensions. Results: ChatGPT-4o achieved consistently high scores across all evaluative domains, with mean values exceeding 4.5. Clarity received the highest average score (4.72). No significant differences were found between thematic categories or across dimensions. However, readability analysis revealed that most responses required college-level reading proficiency (FKGL = 13.3 ± 2), particularly in the domains of diagnosis and surgical content, and were rated as “difficult” according to Flesch Reading Ease scores (FRE < 50). Conclusion: ChatGPT-4o shows potential as a supportive communication tool in the context of ABI patient education. However, its application in clinical practice remains limited by issues of readability and clinical specificity. Ongoing refinement and medical oversight will be essential to ensure safe and effective integration into healthcare settings.