Isaac Scientific Publishing

Journal of Advanced Statistics

Shannon Entropy Ratio, a Bayesian Biodiversity Index Used in the Uncertainty Mixtures of Metagenomic Populations

Download PDF (586.3 KB) PP. 23 - 34 Pub. Date: December 15, 2019

DOI: 10.22606/jas.2019.44001


  • Toni Monleón-Getino*
    Section of Statistics. Department of Genetics, Microbiology and Statistics. University of Barcelona, Barcelona, Spain; GRBIO. Research Group in Biostatistics and Bioinformatics; BIOST3. Research Group in Clinical Statistics, Bioinformatics and Computacional Biodiversity
  • Clara I Rodríguez-Casado
    Section of Statistics. Department of Genetics, Microbiology and Statistics. University of Barcelona, Barcelona, Spain; BIOST3. Research Group in Clinical Statistics, Bioinformatics and Computacional Biodiversity
  • Pablo Emilio Verde
    Heinrich-Heine-Universität Düsseldorf, Düsseldorf, Germany


The microbial communities contain a unique complexity that makes difficult studying their diversity. Unfortunately the culture of microorganisms is very complex for their identification and is necessary study the microbial communities sampled directly from their natural environment using metagenomic approach. An important problem in metagenomics is measuring the diversity in a population (entropy) and the variation between subpopulations (beta-diversity) in uncertainty conditions. A good method that we propose can be use the Bayesian Shannon index and Shannon entropy ratio (SER) to estimate it, using a prior information based on a phylogenetic previously estimation. Bayesian methods improve the precision of parameter estimates, and uncertainty in parameter estimates can be easily propagated in calculations. The Bayesian diversity estimates were higher than their frequentist counterparts and had lower standard errors, so this approximation is present the diversity mixed index by means Markov Chain Monte Carlo (MCMC) simulation using JAGS with R.


Entropy, Bayesian methods, biodiversity, probability, categorical data, metagenomics, microbiology.


[1] C. M. Guinane and P. D. Cotter. “Role of the gut microbiota in health and chronic gastrointestinal disease: understanding a hidden metabolic organ”, Therapeutic advances in gastroenterology, vol. 6, no. 4, pp. 295–308, 2013.

[2] M. Pollan. “Some of My Best Friends Are Germs”. 2013. Available: 05/19/magazine/say-hello-to-the-100-trillion-bacteria-that-make-up-your-microbiome.html?_r=1

[3] J. Handelsman. “Metagenomics: Application of Genomics to Uncultured Microorganisms”. Microbiology and Molecular Biology Review, vol. 68, no. 4, pp: 669–685, 2004.

[4] CI. Rodríguez and T. Monleón-Getino. “A new R library for discriminating groups based on abundance profile and biodiversity in microbiome metagenomic matrices”. International Journal of Scientific and Engineering Research, vol. 7, no. 10, pp: 243-253, 2016.

[5] D. Marco (editor). “Metagenomics: Current Innovations and Future Trends”. Caister Academic Press, 2011.

[6] B. J. M. Bohannan and J. Hughesy. “New approaches to analyzing microbial biodiversity data” Current Opinion in Microbiology, vol. 6, pp: 282–287. 2003. Available: Bohannan%20and%20Hughes03%20copy.pdf

[7] C. E. Shannon. "A Mathematical Theory of Communication". Bell System Technical Journal (PDF), vol. 27, no. 3, pp: 379–423, 1948.

[8] WD. Wadsworth, R. Argiento, M. Guindani, J. Galloway-Pena, SA. Shelbourne and M. Vannucci. “An integrative Bayesian Dirichlet-multinomial regression model for the analysis of taxonomic abundances in microbiome data”. BMC Bioinformatics, vol, 18, no, 94. 2017.

[9] J. R. Doroghazi and D. H. Buckley. “Evidence from GC-TRFLP that Bacterial Communities in Soil Are Lognormally Distributed”. PLoS One, vol. 3 no. 8, pp. e2910, 2008.

[10] R. A. Fisher, Corbet A. S., and C. B. Williams. “The relation between the number of species and the number of individuals in a random sample of an animal population”. Journal of Animal Ecology, vol. 12, pp. 42–58, 1943.

[11] D. J. Golichier, R. B. O’Hara, L. Ruíz-M, and L. Cayuela. “Lifting a veil on diversity: a Bayesian approach to fitting relative-abundance models”. Ecological Applications, vol. 16, no. 1, pp. 202–212, 2006

[12] S. D. Hooper, D. Dalevi, A. Pati, K. Mavromatis, N. N. Ivanova and N. C. Kyrpides. “Estimating DNA coverage and abundance in metagenomes using a Gamma approximation”. Bioinformatics, vol. 26, pp. 295–301, 2010.

[13] JA. Royle “N-mixture models for estimating population size from spatially replicated counts”. Biometrics, vol. 60, no. 1: pp. 108-115, 2004.

[14] M.S. Lindner and B. Y. Renard. “Metagenomic abundance estimation and diagnostic testing on species level”. Nucleic Acids research, vol. 41 n. 1, pp. e10, 2012.

[15] I. Holmes, K. Harris and C. Quince. “Dirichlet Multinomial Mixtures: Generative Models for Microbial Metagenomics”. PLoS ONE, vol. 7 no. 2, 2012.

[16] P. Pizarro. “Bacterial Metagenomics: Associated Probability Distributions and Profile Analysis”. Master thesis of the master in Biostatistics and Bioinformatics (UOC-OPC, Barcelona, Spain). Adviced by Toni Monleón Getino. 2016

[17] B. S. Kim and B. H. Margolin. “Testing Goodness of Fit of a Multinomial Model Against Overdispersed Alternatives”. Biometrics, vol, 48, pp. 711-719, 1992.

[18] A.A. Niane, M. Singh and P. C. Strulk. “Bayesian estimation of shrubs diversity in rangelands under two management systems in northern Syria”. Open Journal of Ecology, vol. 4, pp. 163-173, 2004

[19] D. Lunn, D. Spiegelhalter, A. Thomas and N. Best. “The BUGS project: Evolution, critique and future directions”. Statistics in Medicine, vol. 28, pp. 3049-67, 2009.

[20] M. A. McCarthy. Bayesian Methods for Ecology”. Cambridge University Press, 2007.

[21] M. Plummer “JAGS: A Program for Analysis of Bayesian Graphical Models Using Gibbs Sampling”. Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003). March 20–22. Vienna, Austria, 2003.

[22] R Core Team. “R: A language and environment for statistical computing”. R Foundation for Statistical Computing. Vienna. Austria. 2016. Available:

[23] CI. Rodríguez, T. Monleón-Getino, M. Cubedo, M. Ríos-Alcolea. “A priori groups based on Bhattacharyya distance and partitioning around medoids (PAM) with applications to metagenomics”. IOSR Journal of Mathematics, vol. 13, no. 3, pp. 24-32, 2017.

[24] A. Monleon-Getino, CI. Rodríguez-Casado and J. Méndez-Viera. “Sample size in metagenomics. a bayesian approach using BDSbiost3 for R”. XVI Spanish Biometric Conference. CEB, Sevilla, Spain, 2017. Library for R BDSbiost3, avalaible at: at:

[25] J. K. Kruschke. “Doing Bayesian Data Analysis A Tutorial with R. JAGS. and Stan”. Academic Press / Elsevier, 2015.