Mean Variance Relationships of Genome Size and GC Content

Sunil Kanti Mondal

Department of Biotechnology, University of Burdwan, Burdwan, W. B., India.

Rabindra Nath Das

Department of Statistics, University of Burdwan, Burdwan, West Bengal, India.

Sudip Kundu

Department of Biophysics, Molecular Biology and Bioinformatics, Calcutta University, Kolkata, W.B., India.

Jinseog Kim *

Department of Statistics and Information Science, Dongguk University, Gyeongju, Korea.

Gurprit Grover

Department of Statistics, University of Delhi, Delhi, India.

Shamim Akhtar Ansari

Tropical Forest Research Institute, RFRC, Jabalpur, India.

*Author to whom correspondence should be addressed.


Abstract

The present article focuses how the genome size and GC content are explained based on codon and amino-acid usage. This current study aims to identify the statistically significant factors of genome size and GC content using statistical modeling. The present analyses show that habitat (P = 0.08), taxonomy (P = 0.02), genome GC content (P < 0.01), isolation temperature (P< 0.01), GC% of the 2nd position within a codon for protein coding part (P< 0.01), number of total tRNA genes within genome (P< 0.01), lower (P< 0.01) and upper (P = 0.01) boundary of GC% for tRNA encoding genes, average frequency (within 100) of non-polar aliphatic (P< 0.01), aromatic (P< 0.01), and positively charged r group containing amino acids (P< 0.01) are statistically significant effects of entire genome size. On the other hand, taxonomy (P = 0.03), genome size (P< 0.01), isolation temperature (P = 0.02), GC% of protein coding part of total genome (P< 0.01), GC% of the 1st (P< 0.01), 2nd (P< 0.01), and 3rd position (P< 0.01) within a codon for protein coding part, number of total tRNA genes within genome (P< 0.01), lower (P< 0.01) and upper (P< 0.01) boundary of GC% for tRNA encoding genes, average frequency (within 100) of non-polar aliphatic (P< 0.01), aromatic (P< 0.01) and negatively charged r group containing amino acids (P = 0.01) are statistically significant effects of entire genome GC content. These analyses support, and also try to remove some conflicts of many earlier research findings. However, the present analyses also have identified all new causal factors in the variance models, and many additional causal factors in the mean models of genome size and genome GC content, which was not reported by the earlier investigators.

Keywords: Amino acid, codon, genome size, genome GC content, joint generalized linear models, log-normal model, gamma model, non-constant variance.


How to Cite

Mondal, Sunil Kanti, Rabindra Nath Das, Sudip Kundu, Jinseog Kim, Gurprit Grover, and Shamim Akhtar Ansari. 2015. “Mean Variance Relationships of Genome Size and GC Content”. Annual Research & Review in Biology 7 (4):206-21. https://doi.org/10.9734/ARRB/2015/16709.

Downloads

Download data is not yet available.