Healthcare Economist October 2, 2024
The time may be fast approaching. A paper by Goh et al. 2024 sampled 50 physicians and examined which was better: physicians alone, physicians with access to GPT-4, or GPT-4 alone. The primary outcome was how well each group diagnosed the case (i.e., diagnostic reasoning score). The authors found that:
The median diagnostic reasoning score per case was 76.3 percent (IQR 65.8 to 86.8) for the GPT-4 group and 73.7 percent (IQR 63.2 to 84.2) for the conventional resources group, with an adjusted difference of 1.6 percentage points (95% CI −4.4 to 7.6; p=0.60). The median time spent on cases for the GPT-4 group was 519 seconds (IQR 371 to 668 seconds), compared to 565 seconds (IQR 456 to...