SPADA Meeting Book

most computational models of PCR perform satisfactorily within the “well-behaved” limits of 606 perfect match primers, short amplicons, no-template near neighbors, and single-plex PCR, many 607 real-world applications do not fall into this category. Thus, a “ground truth” dataset is needed to 608 help determine model accuracy. The dataset could be used to objectively evaluate algorithms 609 from different research groups. The experimental, “ground truth” PCR dataset would need to 610 capture many details including: (a) the target genome, (b) the presence of contaminating 611 organisms (determined through NGS sequencing), (c) the enzyme and buffer compositions, (d) 612 the primer and probe concentrations, (e) the composition of the amplicon products (by NGS 613 sequencing to reveal the concentrations of the desired amplicon and off-target amplicons, primer 614 dimers, etc.), and (f) the composition of the PCR reaction at each cycle of PCR (e.g., real time 615 monitoring of the fluorescence, along with quantification of primer concentrations and enzyme 616 activity). For this training dataset, both the PCR inputs and outputs would be publicly revealed to 617 enable the user community to improve and validate their in silico methods. 618 619 5.3 PCR Datasets in Support of Competitions to Spur the Community Forward 620 621 an open competition to assess the performance of different computational approaches for in 622 silico PCR using experimental data. Competitions could provide a quantitative ranking of models 623 by accuracy and spur the development of improved in silico models. For the training set, both the 624 PCR inputs and outputs (described above in section 5.2 Assessing Model Accuracy ) would be 625 publicly revealed. For the validation sets, only the PCR inputs would be revealed. Outputs 626 would be used for evaluation of contestant methods (i.e. PCR predictions from different research 627 groups) by independent referees. The final goal would be to evaluate quantitatively the 628 Similar to the Critical Assessment of Protein Structure Prediction (CASP), there is a need for

Commented [LN(4]: Do any available datasets come  close to this ideal training dataset?  Commented [SS5R4]: Gans question.

34

Made with FlippingBook - Online magazine maker