The MgDb Class in the metagenomeFeatures package includes the sequences and taxonomic information for a 16S database. The following vignette demonstrates the class methods for exploring and subsetting a MgDb-class
object using the gg85
included in the metagenomeFeatures
package. MgDb-class
object with full databases are in separate packages such as the greengenes13.5MgDb
package.
MgDb-class
Object## Loading required package: Biobase
## Loading required package: BiocGenerics
## Loading required package: parallel
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
##
## clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
## clusterExport, clusterMap, parApply, parCapply, parLapply,
## parLapplyLB, parRapply, parSapply, parSapplyLB
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## Filter, Find, Map, Position, Reduce, anyDuplicated, append,
## as.data.frame, basename, cbind, colMeans, colSums, colnames,
## dirname, do.call, duplicated, eval, evalq, get, grep, grepl,
## intersect, is.unsorted, lapply, lengths, mapply, match, mget,
## order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind,
## rowMeans, rowSums, rownames, sapply, setdiff, sort, table,
## tapply, union, unique, unsplit, which, which.max, which.min
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## 'browseVignettes()'. To cite Bioconductor, see
## 'citation("Biobase")', and for packages 'citation("pkgname")'.
## Warning: replacing previous import 'lazyeval::is_formula' by
## 'purrr::is_formula' when loading 'metagenomeFeatures'
## Warning: replacing previous import 'lazyeval::is_atomic' by
## 'purrr::is_atomic' when loading 'metagenomeFeatures'
## MgDb object:[1] "Metadata"
## |ACCESSION_DATE: Mon Apr 2 13:30:09 2018
## |URL: ftp://greengenes.microbio.me/greengenes_release/gg_13_8_otus
## |DB_TYPE_NAME: GreenGenes
## |DB_VERSION: 13.8 85% OTUS
## |DB_TYPE_VALUE: MgDb
## |DB_SCHEMA_VERSION: 2.0
## [1] "Sequence Data:"
## [1] "DECIPHER formatted seqDB"
## [1] "Taxonomy Data:"
## # Source: table<Seqs> [?? x 11]
## # Database: sqlite 3.22.0
## # [/tmp/Rtmp941GRZ/Rinst75cf5201c4be/metagenomeFeatures/extdata/gg13.8_85.sqlite]
## row_names identifier description Keys Kingdom Phylum Class Ord Family
## <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 1 MgDb 1111561 1111… k__Bac… p__Pr… c__G… o__L… f__
## 2 2 MgDb 1111421 1111… k__Bac… p__Pr… c__A… o__R… f__
## 3 3 MgDb 1111090 1111… k__Bac… p__Ac… c__N… o__N… f__Ni…
## 4 4 MgDb 1110893 1110… k__Bac… p__Ba… c__[… o__[… f__Sa…
## 5 5 MgDb 1110814 1110… k__Bac… p__BR… c__ o__ f__
## 6 6 MgDb 1110088 1110… k__Bac… p__Pr… c__G… o__ f__
## 7 7 MgDb 1109993 1109… k__Bac… p__Ch… c__D… o__ f__
## 8 8 MgDb 1109948 1109… k__Bac… p__Pl… c__[… o__B… f__W4
## 9 9 MgDb 1109493 1109… k__Bac… p__Pl… c__v… o__ f__
## 10 10 MgDb 1109328 1109… k__Bac… p__Ch… c__A… o__S… f__
## # ... with more rows, and 2 more variables: Genus <chr>, Species <chr>
## [1] "Tree Data:"
##
## Phylogenetic tree with 5088 tips and 5087 internal nodes.
##
## Tip labels:
## 4479984, 540377, 811993, 823988, 4397176, 4446470, ...
##
## Rooted; includes branch lengths.
taxa_keytypes
## [1] "row_names" "identifier" "description" "Keys" "Kingdom"
## [6] "Phylum" "Class" "Ord" "Family" "Genus"
## [11] "Species"
## [1] "Keys" "Kingdom" "Phylum" "Class" "Ord" "Family" "Genus"
## [8] "Species"
## # A tibble: 6 x 1
## Kingdom
## <chr>
## 1 k__Bacteria
## 2 k__Bacteria
## 3 k__Bacteria
## 4 k__Bacteria
## 5 k__Bacteria
## 6 k__Bacteria
Used to retrieve db entries for a specified taxonomic group or id list, can return either taxonomic, sequences information, or both.
mgDb_select(gg85, type = "taxa",
keys = c("Vibrionaceae", "Enterobacteriaceae"),
keytype = "Family")
## # A tibble: 27 x 8
## Keys Kingdom Phylum Class Ord Family Genus Species
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 10479… k__Bact… p__Prote… c__Gamma… o__Ente… f__Enter… g__ s__
## 2 818108 k__Bact… p__Prote… c__Gamma… o__Ente… f__Enter… g__ s__
## 3 651366 k__Bact… p__Prote… c__Gamma… o__Ente… f__Enter… g__ s__
## 4 592303 k__Bact… p__Prote… c__Gamma… o__Ente… f__Enter… g__Pro… s__
## 5 575794 k__Bact… p__Prote… c__Gamma… o__Ente… f__Enter… g__ s__
## 6 559954 k__Bact… p__Prote… c__Gamma… o__Ente… f__Enter… g__ s__
## 7 368586 k__Bact… p__Prote… c__Gamma… o__Ente… f__Enter… g__ s__
## 8 289174 k__Bact… p__Prote… c__Gamma… o__Ente… f__Enter… g__Ple… s__shig…
## 9 268585 k__Bact… p__Prote… c__Gamma… o__Ente… f__Enter… g__Cit… s__
## 10 232927 k__Bact… p__Prote… c__Gamma… o__Vibr… f__Vibri… g__ s__
## # ... with 17 more rows
## A DNAStringSet instance of length 27
## width seq names
## [1] 1366 ATTGAACGCTGGCGGCAGGC...GTGAATACGTTCCCGGGCCT 1047956
## [2] 1410 ACGGTACACAGAGAGCTTGC...TTCGGGAGGGCGCTTACCAC 818108
## [3] 1421 ATTGAACGCTGGCGGCAAGC...GCCCGTCACACCATGGGAGT 651366
## [4] 1453 AGTCGAGCGGTAACAGTGGG...CATGACTGGGGGAAGTCGTA 592303
## [5] 1419 ATTGAACGCTGGCGGCAAGC...GCCCGTCACACCATGGGAGT 575794
## ... ... ...
## [23] 1383 TGGGAAACTGCCTGATGGAG...AACCTTCGGGAGGGCGGTTT 4336809
## [24] 1443 GGGTGAGTAATGTCTGGGAA...GGTTGCAAAAGAAGTAGGTA 656881
## [25] 1563 AGAGTTTGATCCTGGCTCAG...GAAGTCGTAACAAGGTAACC 4371215
## [26] 1392 GCGGCGGACGGGTGAGTAAT...TGGGTAGTTTAACCTTCGGG 4375861
## [27] 1389 TCGTGCGGTAATAGAGGAAC...AGCAAGTAGTTTAACCTAAA 4443068
## $taxa
## # A tibble: 2 x 8
## Keys Kingdom Phylum Class Ord Family Genus Species
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 661785 k__Bacte… p__Proteo… c__Gammap… o__Vibr… f__Vibri… g__Vi… s__
## 2 43758… k__Bacte… p__Proteo… c__Gammap… o__Vibr… f__Vibri… g__Vi… s__
##
## $seq
## A DNAStringSet instance of length 2
## width seq names
## [1] 1420 AGAGTTTGATCATGGCTCAGA...TTCATGACTGGGGTGAAGTC 661785
## [2] 1392 GCGGCGGACGGGTGAGTAATG...TGGGTAGTTTAACCTTCGGG 4375861
##
## $tree
##
## Phylogenetic tree with 2 tips and 1 internal nodes.
##
## Tip labels:
## [1] "661785" "4375861"
##
## Rooted; includes branch lengths.
## R version 3.5.1 Patched (2018-07-12 r74967)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.5 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.8-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.8-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] metagenomeFeatures_2.2.0 Biobase_2.42.0
## [3] BiocGenerics_0.28.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.19 XVector_0.22.0 compiler_3.5.1
## [4] pillar_1.3.0 dbplyr_1.2.2 bindr_0.1.1
## [7] zlibbioc_1.28.0 tools_3.5.1 digest_0.6.18
## [10] bit_1.1-14 nlme_3.1-137 RSQLite_2.1.1
## [13] evaluate_0.12 memoise_1.1.0 tibble_1.4.2
## [16] lattice_0.20-35 pkgconfig_2.0.2 rlang_0.3.0.1
## [19] cli_1.0.1 DBI_1.0.0 yaml_2.2.0
## [22] bindrcpp_0.2.2 stringr_1.3.1 dplyr_0.7.7
## [25] knitr_1.20 IRanges_2.16.0 Biostrings_2.50.0
## [28] S4Vectors_0.20.0 stats4_3.5.1 rprojroot_1.3-2
## [31] bit64_0.9-7 grid_3.5.1 tidyselect_0.2.5
## [34] glue_1.3.0 R6_2.3.0 fansi_0.4.0
## [37] rmarkdown_1.10 DECIPHER_2.10.0 purrr_0.2.5
## [40] blob_1.1.1 magrittr_1.5 backports_1.1.2
## [43] htmltools_0.3.6 assertthat_0.2.0 ape_5.2
## [46] utf8_1.1.4 stringi_1.2.4 lazyeval_0.2.1
## [49] crayon_1.3.4