Artificial intelligence in practice: measuring its medical accuracy in oculoplastics consultations

Adam J. Neuhouser; Alisha Kamboj; Ali Mokhtarzadeh; Andrew R. Harrison

doi:10.35119/maio.v6i1.137

Vol. 6 No. 1 (2024), Original Articles

Vol. 6 No. 1 (2024)

Artificial intelligence in practice: measuring its medical accuracy in oculoplastics consultations

Original Articles

https://doi.org/10.35119/maio.v6i1.137

Published 2024-05-10

Adam J. Neuhouser⁺⁻
Alisha Kamboj⁺⁻
Ali Mokhtarzadeh⁺⁻
Andrew R. Harrison⁺⁻

Adam J. Neuhouser

Department of Ophthalmology and Visual Neurosciences, University of Minnesota, Minneapolis, MN, USA

Alisha Kamboj

Department of Ophthalmology and Visual Neurosciences, University of Minnesota, Minneapolis, MN, USA

https://orcid.org/0000-0003-2162-4560

Ali Mokhtarzadeh

Department of Ophthalmology and Visual Neurosciences, University of Minnesota, Minneapolis, MN, USA

Andrew R. Harrison

Department of Ophthalmology and Visual Neurosciences, University of Minnesota, Minneapolis, MN, USA; Department of Otolaryngology and Head and Neck Surgery, University of Minnesota, Minneapolis, MN, USA

https://orcid.org/0000-0003-4005-2772

MAIO 137 Neuhouser PDF

Supplementary Files

MAIO 137 Neuhouser Appendix

How to Cite

1.

Neuhouser AJ, Kamboj A, Mokhtarzadeh A, Harrison AR. Artificial intelligence in practice: measuring its medical accuracy in oculoplastics consultations. MAIO [Internet]. 2024 May 10 [cited 2025 Apr. 26];6(1):1-11. Available from: https://www.maio-journal.com/index.php/MAIO/article/view/137

Copyright notice

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Keywords

artificial intelligence; Chat GPT; DALLE; oculoplastics; patient information

Abstract

Purpose: The aim of this study was to investigate the medical accuracy of responses produced by Chat Generative Pretrained Transformer 4 (Chat GPT-4) and DALLE-2 in relation to common questions encountered during oculoplastic consultations.

Methods: The 5 most frequently discussed oculoplastic procedures on social media were selected for evaluation using Chat GPT-4 and DALLE-2. Questions were formulated from common patient concerns and inputted into Chat GPT-4, and responses were assessed on a 3-point scale. For procedure imagery, descriptions were submitted to DALLE-2, and the resulted images were graded for anatomical and surgical accuracy. Grading was completed by 5 oculoplastic surgeons through a 110-question survey.

Results: Overall, 87.3% of Chat GPT-4’s responses achieved a score of 2 or 3 points, denoting a good to high level of accuracy. Across all procedures, questions about pain, bruising, procedure risk, and adverse events garnered high scores. Conversely, responses regarding specific case scenarios, procedure longevity, and procedure
definitions were less accurate. Images produced by DALLE-2-were notably subpar, often failing to accurately depict surgical outcomes and realistic details.

Conclusions: Chat GPT-4 demonstrated a creditable level of accuracy in addressing common oculoplastic procedure concerns. However, its limitations in handling case-based scenarios suggests that it is best suited as a supplementary source of information rather than a primary diagnostic or consultative tool. The current state of medical imagery generated by means of artificial intelligence lacks anatomical accuracy. Significant technological advancements are necessary before such imagery can complement oculoplastic consultations effectively.

https://doi.org/10.35119/maio.v6i1.137

MAIO 137 Neuhouser PDF

References

Akosman S, Qi L, Pakhchanian H, Foos W, Maliakkal J, Raiker R, Belyea DA, Geist C. Using infodemiology metrics to assess patient demand for oculoplastic surgeons in the United States: insights from Google Search Trends. Orbit. 2022 Nov 12;1-7. https://doi.org/10.1080/01676830.2022.2142945

Cohen SA, Tijerina JD, Kossler A. The Readability and Accountability of Online Patient Education Materials Related to Common Oculoplastics Diagnoses and Treatments. Semin Ophthalmol. 2023;38(4):387-393. https://doi.org/10.1080/08820538.2022.2158039

Chen J, Wang Y. Social Media Use for Health Purposes: Systematic Review. Journal of Medical Internet Research. 2021;23(5):e17917. https://doi.org/10.2196/17917

Arab K, Barasain O, Altaweel A, et al. Influence of Social Media on the Decision to Undergo a Cosmetic Procedure. Plastic and Reconstructive Surgery Global Open. 2019;7(8):e2333. https://doi.org/10.1097/GOX.0000000000002333

Schmuter G, North VS, Kazim M, Tran AQ. Medical Accuracy of Patient Discussions in Oculoplastic Surgery on Social Media. Ophthalmic Plastic and Reconstructive Surgery. 2023;39(2):132-135. https://doi.org/10.1097/IOP.0000000000002257

Bartz D, Bartz D. As ChatGPT’s popularity explodes, U.S. lawmakers take an interest. Reuters. 2023 Feb 13.

Nayak LM, Linkov G. Social Media Marketing in Facial Plastic Surgery: What Has Worked? Facial Plastic Surgery Clinics of North America. 2019;27(3):373-377. https://doi.org/10.1016/j.fsc.2019.04.002

DATAtab Team. Cite DATAtab: DATAtab: Online Statistics Calculator. DATAtab e.U. Graz, Austria; 2023.

Tilores.io. Jaccard Similarity Coefficient Algorithm Online Tool. [Accessed September 5, 2023]. Available from: https://tilores.io/jaccard-similarity-coefficient-algorithm-online-tool

Mago J, Sharma M. The Potential Usefulness of ChatGPT in Oral and Maxillofacial Radiology. Cureus.2023;15(7):e42133. https://doi.org/10.7759/cureus.42133

Hu X, Ran AR, Nguyen TX, et al. What can GPT-4 do for Diagnosing Rare Eye Diseases? A Pilot Study. Ophthalmology Therapy. 2023. https://doi.org/10.1007/s40123-023-00789-8

Lahat A, Shachar E, Avidan B, et al. Evaluating the use of large language model in identifying top research questions in gastroenterology. Sci Rep. 2023;13:4164. https://doi.org/10.1038/s41598-023-31412-2

Biswas S, Logan NS, Davies LN, Sheppard AL, Wolffsohn JS. Assessing the utility of ChatGPT as an artificial intelligence-based large language model for information to answer questions on myopia. Ophthalmic and Physiological Optics. 2023;43(6):1562-1570. https://doi.org/10.1111/opo.13207

Samaan JS, Yeo YH, Rajeev N, Hawley L, Abel S, Ng WH, Samakar K. Assessing the Accuracy of Responses by the Language Model ChatGPT to Questions Regarding Bariatric Surgery. Obesity Surgery. 2023;33(6):1790-1796. https://doi.org/10.1007/s11695-023-06603-5

Johnson D, Goodman R, Patrinely J, et al. Assessing the Accuracy and Reliability of AI-Generated Medical Responses: An Evaluation of the Chat-GPT Model. Preprint. Research Square. 2023 Feb 28. https://doi.org/10.21203/rs.3.rs-2566942/v1

Sarraju A, Bruemmer D, Van Iterson E, Cho L, Rodriguez F, Laffin L. Appropriateness of Cardiovascular Disease Prevention Recommendations Obtained From a Popular Online Chat-Based Artificial Intelligence Model. JAMA. 2023;329(10):842-844. https://doi.org/10.1001/jama.2023.1044

Karako K, Song P, Chen Y, Tang W. New Possibilities for Medical Support Systems Utilizing Artificial Intelligence (AI) and Data Platforms. Bioscience Trends. 2023;17(3):186-189. https://doi.org/10.5582/bst.2023.01138

MAIO 137 Neuhouser PDF

1

Artificial intelligence in practice: measuring its medical accuracy in oculoplastics consultations

Supplementary Files

Categories

How to Cite

Download Citation

Copyright notice

Keywords

Abstract

References