ESTRO 2021 Abstract Book

S522

ESTRO 2021

Conclusion Lower MAE pCT were linked to higher pass rates and reduced standard deviations. Extreme pCT generation errors were considered here and proved moderate generation errors had relatively small impact on dosimetry quality for photon-based rotational treatments in our cohort. 3D treatments are much impacted by such deviations. Further analysis of the correlation between MAE, tumor volume and dosimetric results is in progress. PH-0653 Deep learning based auto-segmentation: how much training data do we really need? J. Kleijnen 1 , J. Penninkhof 1 , H. Akhiat 2 , M. Hoogeman 1 , S. Petit 1 1 Erasmus MC Cancer Institute, University Medical Center Rotterdam, Department of Radiotherapy, Rotterdam, The Netherlands; 2 Elekta, AB, Stockholm, Sweden Purpose or Objective Deep learning (DL) is a promising technique for auto-segmentation in radiotherapy. Performance of DL based auto-segmentation (DLAS) is known to be dependent on the training dataset size, but the exact relationship is unknown. Therefore, aim of this study is to determine the impact of the training dataset size on DLAS performance for clinical use. Additionally, the DLAS performance was benchmarked against our clinically used atlas-based auto-segmentation (ABAS). Materials and Methods CT and structure sets of a clinical cohort of 400 breast cancer patients was included. Structure sets consisted of the clinical manual delineations of the heart, left lung, right lung and body contour. Data of 80 patients were used for evaluation and the remaining 320 patients were used for DLAS model training. The training dataset was randomly split in half 6 times resulting in training datasets of 320, 160, 80, 40, 20, 10, and 5 patients. In this way, patients in the smaller datasets were all included in the larger training datasets. Using a research tool in Admire 3.7 (Elekta AB), DLAS models were trained for all these datasets resulting in 7 DLAS models. These models were evaluated by applying the models to the evaluation dataset and comparing the output to the clinical delineations. As a measure for segmentation error, average and near maximum (99th percentile) contour-to-contour distances were calculated. Dice similarity coefficients (DCS) were determined as measure for overlap. In addition, the DLAS output was compared to our clinically used multi-atlas ABAS (Admire 2.5, Elekta AB) using 10 atlases Results All DLAS models performed equally or better (i.e. higher DSC and lower average / near maximum errors) compared to ABAS, except for segmentations of the heart by models trained on 5 and 10 patients. Overall segmentation quality of DLAS was very good, with mean DSC ≥ 0.95 in 93% of the cases (figure 1). Average segmentation errors were below 1 mm for both lungs and patient outline and below 2 mm for the heart, except for the model trained on 5 patients (figure 2). For the heart, which has moderate image contrast on CT, increasing the training dataset from 5 to 320 patients improved the accuracy up to a training set size of 20 patients. Conversely, for the lungs and body contour, which have good image contrast on CT, increasing the training dataset size did not result in performance improvements.

Made with FlippingBook Learn more on our blog