SPADA Draft Documents
Genome Count
%Complete Genomes
1 10 100 1000 10000 100000 1000000
100 120
0 20 40 60 80
Genome
Number of Bacterial
%Complete Bacterial
genomes in GenBank
1990 2000 2010 2020 2030
305 Figure 6. Plot showing the total number of bacterial genomes in GenBank and the percentage of 306 complete bacterial genomes as a function of year. These plots were made by parsing the 307 “prokaryotes.txt < Caution-ftp://ftp.ncbi.nlm.nih.gov/genomes/GENOME_REPORTS/ >” that 308 NCBI provides as an inventory of all bacterial genomes.
309 310
311
Exclusivity and Background Databases
For the purposes of checking for false positive amplifications, it is useful to construct 312 exclusivity and environmental background databases. For computational efficiency, we 313 recommend populating the exclusivity database with near-neighbor sequences (i.e. organisms 314 that are phylogenetically distinct but closely related to those in the inclusivity dataset). All other 315 distantly related organisms that may be present in the sample matrix and might cause false 316 positives, can be placed into the background database. Typically, the background database 317 consists of unrelated genomes that are contaminants in the sample matrix preparation such as the 318
17
Made with FlippingBook flipbook maker