SPADA Docs

with the smallest number of assays, including unmodified partial sequences will force assays to 287 cover the regions that have been sequenced most often, rather than focusing on the regions of the 288 genome that are actually most conserved. While this can be a good strategy when dealing with 289 highly variable genomes for which strain diversity is better represented by available amplicon 290 sequences than available whole genome sequences, it is a poor strategy if the available amplicon 291 sequences are generated from a hypervariable region or a region that is perfectly conserved in 292 near neighbors. An alternate strategy is to “fill in” and “extend” missing sequence data by 293 interpolating and extrapolating partial and incomplete sequences (10). 294 Bacteria also present challenges for many design algorithms since they usually have circular 295 genomes without a defined starting point, and they code for proteins on both strands. As a result, 296 different sequencing labs can publish the genomes with different strands and/or starting points. 297 Thus, it is useful to perform work up front to include the same strand in the inclusivity database 298 for all members of the set. Bacteria also present challenges due to their genomic DNA size that is 299 roughly 100 times to 1000 times larger than that of viruses, thereby placing demands on 300 computational CPU (Central Processing Unit) and memory resources for signature analysis 301 algorithms (below, we describe efficient k-mer algorithms that are capable of handling bacterial 302 genomes). For bacterial inclusivity databases, it is recommended that partial genomes be 303 segregated into a separate database from the full-length genomes. Partial genomes can then be 304 avoided for purposes of design but later included in testing for coverage with an algorithm such 305 as Primer-BLAST or ThermoBLAST using a combined database of full-length and partial 306 genomes. In instances where there is an abundance of sequencing for a particular gene (e.g., 16S 307 ribosomal RNA, a particular conserved virulence factor, or a toxin gene) from an organism, it is 308

17

Made with FlippingBook - professional solution for displaying marketing and sales documents online