PD-L1 inhibitors have shown remarkable results in oncology, yet many patients fail to respond, underscoring the importance of reliable assessment of PD-L1 expression for patient selection. PD-L1 scoring, especially the Combined Positive Score (CPS), is hindered by inter-observer variability, complex staining patterns, and technical discrepancies across platforms and antibody clones. These challenges may impact therapeutic decisions. Artificial intelligence (AI) offers a solution by standardizing PD-L1 evaluation. This study evaluates Diadeep PD-L1 CPS AI solution designed to provide reproducible and robust PD-L1 scoring across diverse tumors and conditions.
AI performance was validated on 142 formalin-fixed, paraffin-embedded samples spanning multiple tumor types (GI, head and neck, breast, uterine cervix) and sourced from four centers, reflecting diverse staining protocols (22C3 and QR001 clones; BenchMark ULTRA and Omnis/Dako platforms). The routine scores were available for these cases. A Gold Standard was established through independent retrospective scoring by three blinded senior pathologists, which allowed to compute the intraclass correlation coefficient (ICC). The scoring was followed by collegial discussions to resolve discordant cases and ensure medical consensus. After a washout period, pathologists re-evaluated the cases with the AI assistance. AI-computed scores and routine manual scores were evaluated and compared by using the Gold Standard as a reference and the organ-specific recommended cut-offs.
The AI assistance improved interobserver agreement among pathologists, with the ICC increasing from 0.62 to 0.74. This effect was particularly pronounced for challenging cases with CPS < 20 (n = 91), where ICC improved from 0.19 to 0.62, underscoring the AI’s value in reducing variability near clinical decision thresholds. Moreover, the AI-based scoring tool demonstrated superior accuracy (88%) compared to routine manual scoring (75%) in classifying PD-L1 expression based on clinical cutoffs. Sensitivity was significantly higher with AI (96% vs. 78%, p < 0.001), while the positive predictive value was comparable (88% vs. 87%), indicating an improved ability to detect true positive cases.
This study highlights the potential of an AI-driven tool to enhance PD-L1 scoring by significantly improving accuracy and reducing inter-observer variability, particularly in cases near clinical decision thresholds where consistency is critical. By delivering reliable and reproducible results, the AI algorithm addresses key challenges in PD-L1 evaluation, ensuring more precise patient stratification for immunotherapy. Beyond accuracy, the integration of such tools into clinical workflows could optimize patient selection and improve therapeutic outcomes, offering oncologists greater confidence in treatment decisions.
Céline Bossard, Claire Magois, Hélène Roussel, Nathalie Rioux-Leclercq, Florian Thomas, Baptiste Gourdin, Bénédicte Cormier, Alexandre Collin, Valérie Lemerle, Ilham Chokri, Laëtitia Lambros, Frédérique Jossic, Francois Leclair, Jean-François Jazeron, Caroline Eymerit-Morin, Abdelmajid Dhouibi, Nizar Labaied, Aurore Mensah, Yahia Salhi, Jérôme Chetritt