neural networks (CNN). Deformable strategies may be used to minimize registration errors; however, these are resource-intensive. We aim to develop an efficient end-to-end solution without compromising segmentation quality. Materials and Methods We included planning CT, PET, and MRI(T1-weighted and T2-weighted) from 154 HNSCC patients treated with primary curative (chemo-)radiotherapy. Clinical delineations of gross tumor volume(GTVt) and involved lymph nodes(GTVn) on CT were considered ground truth. All modalities were resampled to a volumetric isotropic 1mm voxel grid, and MRI images were registered to PET/CT using either rigid registration (RR) or deformable registration (DR) with Elastix. We used a 3D UNet with deep supervision as a baseline CNN segmentation model. We designed two approaches from architecture and data directions (1) deformable convolution networks (DCN) and (2) channel translation augmentation (CHTL). DCN employs offsets to enable free-form deformation of CNN from the fixed 3D sampling grid. We modified the MRI path of UNet's first block with DCN to enhance the geometric transformation modeling capacity. For CHTL augmentation, we randomly shift MRI channels with a maximum of 3 mm on the x, y, and z axis. Six segmentation groups were compared: RR images with UNet, DCN and CHTL; DR images with UNet, DCN and CHTL. Data was split to uniform train(93 pt), validation(31 pt), test(30 pt), and trained for 200 epochs independently. Results of the test set were evaluated on the union of GTVt and GTVn, using Dice similarity coefficient(Dice), Hausdorff Distance 95 percentile(HD95), Mean Surface Distance(MSD) and training time. The networks were trained with the sum of Dice and Top-k loss, Stochastic Gradient Descent optimizer with batch size 2, initial learning rate 0.01 with decay, patches sampling with the size of 128*128*128. Universal augmentation operations include scaling, rotations and mirroring. Results Table 1 shows median scores of segmentation for all six groups. The MSD was significantly reduced for DR UNet compared to RR UNet. Comparable MSD could also be achieved using either DCN or CHTL for RR data. A further improvement was achieved by using DCN or CHTL on DR data, shown in Figure 1. There were no significant differences in either DICE or HD95. DCN took threefold longer time and more GPU memory to train compared to the other methods.

