SPADA Draft Documents

alignments of the final design region, however, are helpful for displaying the variations present 342 and they provide a helpful reality check after a design region is discovered with a k-mer 343 approach. Thus, we do recommend using an MSA that is restricted to the design region of 344 interest, but not for the entire genome. 345 A superior approach for determining the optimal design region(s) is to analyze targets using 346 k-mers (i.e. substrings of length k, usually 14-25, depending on application; the rationale is 347 described in the publications listed below). Such k-mer algorithms are computationally efficient 348 for large databases and long sequences and can be applied to databases of pathogenic viruses and 349 bacteria. An optimal design region from a pathogen would show high conservation among the 350 variants of the desired target (e.g., clinical isolates of a pathogen) and show a lack of 351 conservation to near-neighbor organisms or to contaminating organisms that could cause false 352 positives. Thus, we recommend the use of k-mer algorithms to analyze inclusivity and 353 exclusivity genome databases to determine optimal locations of signature design regions. One 354 such algorithm is described in the literature by Yuriy Fofanov’s group (8) and applied to the 355 development of an assay for the 2001 pandemic H1N1 influenza A. Such a k-mer algorithm is 356 also available in the commercial PanelPlex-Consensus program (DNA Software, Inc., Ann 357 Arbor, MI). Other alternative approaches include: Uniquemer (11); BioVelocity (9); or Core/pan 358 genome analyses to identify unique genes that can be assay targets (12, 13). In all of these 359 approaches, the key first step is to create the inclusivity, exclusivity and environmental 360 background panels.

361 362

19

Made with FlippingBook flipbook maker