SPADA Meeting Book

Figure 6. Plot showing the total number of bacterial genomes in GenBank and the percentage 323 that are complete as a function of year. These plots were made by parsing the 324 “prokaryotes.txt ” that NCBI 325 provides as an inventory of all bacterial genomes. Data accessed on May 24, 2019.

326 327

4.3.3 Exclusivity and Background Databases

328

For the purposes of checking for false positive amplifications, it is useful to construct 329 exclusivity and environmental background databases. For computational efficiency, we 330 recommend populating the exclusivity database with near-neighbor sequences (i.e. organisms 331 that are phylogenetically distinct but closely related to those in the inclusivity dataset). All other 332 distantly related organisms that may be present in the sample matrix and might cause false 333 positives, can be placed into the background database. Typically, the background database 334 consists of unrelated genomes that are contaminants in the sample matrix preparation such as the 335 human genome, human microbiome, soil microbiome, etc. For both the exclusivity and 336 background databases, sequence quality is generally not an issue and it is recommended to 337 include partial sequences as well as complete genomes. A common practice is to check primers 338 for reactivity with all known organisms (such as the GenBank non-redundant (nr) or nucleotide 339 (nt) databases) using a program such as BLAST or primer BLAST to detect all off-target hits and 340 amplicons. However, it is not recommended to use such exhaustive databases during the design 341 stage because the nt and nr databases contain many sequences that have no possibility of ever 342 occurring in the sample matrix and thus, including such exhaustive sequences would provide 343 restrictive design constraints that are not valid and could result in a sub-optimal design.

344 345

19

Made with FlippingBook - Online magazine maker