ESTRO 2020 Abstract book

S395 ESTRO 2020

criteria, 10% low dose threshold, and the EPID image of the first treatment fraction as reference. The model inputs were the average γ value, standard deviation, and average value of the top 1% of γ values, averaged over all beams in a fraction. The HMM classified each fraction into one of three categories: no anatomical change (Cat1), some change (no clinical action needed, Cat2) and severe change (clinical action needed, Cat3). The external validation dataset consisted of 760 TI EPID images from 266 fractions (31 patients) treated at institute B with VMAT or hybrid plans (static beams and VMAT arcs). Features in both datasets were extracted in the same way. For patients in the validation set, a cone beam CT (CBCT) scan was made before each fraction. Contours were propagated from the planning CT to the CBCTs using Mirada (Mirada Medical Ltd., Oxford, UK), and the dose was recalculated. DVH metrics for targets and organs-at-risk (OARs) were extracted for each fraction, and compared to the planned dose. Mann-Whitney U tests were performed to evaluate statistical significance of deviations in DVH metrics between each pair of HMM categories. Results The HMM achieved 78.9% accuracy compared to threshold classification based on the average γ value alone (a surrogate for clinical classification). The confusion matrix (Fig.1) shows that the HMM overestimates the amount of fractions in Cat2 compared to both Cat1 and Cat3. Fig.2 shows that for lungs-GTV, heart and mediastinum, there is a trend towards higher deviations in DVH metrics with classification into higher categories by the HMM.

Fig.2: Boxplots for the deviations in the DVH metrics, excluding outliers. x-axis: HMM classification, y-axis: change in DVH metric, *: p<0.05, **: p<0.01. Conclusion The HMM performs well on an external dataset considering accuracy, showing that it can be transferred between institutes. However, underestimation of categories can lead to relevant fractions not being flagged, potentially missing anatomical changes (false negatives), while overestimation leads to unnecessary flagging (false positives), thus increasing workload. Model fine-tuning may resolve this. Relating HMM classification based on γ features to increasing DVH differences is possible for some OARs, but not for the target volumes. PH-0652 Standardization influences repeatability and prognostic value of radiomic features I. Fornacon-wood 1 , J. O'Connor 1 , C. Faivre-Finn 1 , G. Price 1 1 The University of Manchester, Division of Cancer Sciences, Manchester, United Kingdom Purpose or Objective Radiomic features have shown potential as prognostic biomarkers for personalizing radiotherapy treatment. However, standardization issues hinder clinical translation. The image biomarker standardization initiative (IBSI) makes recommendations to address these issues. In this work we assessed the benefit of IBSI compliance for radiomic feature calculation packages in lung and head and neck (H&N) cancer datasets. The impact of package choice on predicting survival for patients with H&N cancer was also investigated. Material and Methods Four widely used radiomics software packages were evaluated: LIFEx, PyRadiomics and CERR (all of which are IBSI compliant) and IBEX (not IBSI-compliant). 17 radiomic features common to all packages were calculated on the planning CT scan for the GTV of small cell lung cancer (N=37) and H&N cancer (N=111) patients treated with radiotherapy (Table 1). Agreement between features across packages was assessed using the intraclass

Fig.1: Confusion matrix comparing HMM classification to threshold classification based on the average γ value.

Made with FlippingBook - Online magazine maker