Informatica logo


Login Register

  1. Home
  2. Issues
  3. Volume 30, Issue 3 (2019)
  4. Local Symmetry of Non-Coding Genetic Seq ...

Informatica

Information Submit your article For Referees Help ATTENTION!
  • Article info
  • Full article
  • Cited by
  • More
    Article info Full article Cited by

Local Symmetry of Non-Coding Genetic Sequences
Volume 30, Issue 3 (2019), pp. 553–571
Marijus Radavičius   Tomas Rekašius   Jurgita Židanavičiūtė  

Authors

 
Placeholder
https://doi.org/10.15388/Informatica.2019.218
Pub. online: 1 January 2019      Type: Research Article      Open accessOpen Access

Received
1 November 2018
Accepted
1 May 2019
Published
1 January 2019

Abstract

The simplest hypothesis of DNA strand symmetry states that proportions of nucleotides of the same base pair are approximately equal within single DNA strands. Results of extensive empirical studies using asymmetry measures and various visualization tools show that for long DNA sequences (approximate) strand symmetry generally holds with rather rare exceptions. In the paper, a formal definition of DNA strand local symmetry is presented, characterized in terms of generalized logits and tested for the longest non-coding sequences of bacterial genomes. Validity of a special regression-type probabilistic structure of the data is supposed. This structure is compatible with probability distribution of random nucleotide sequences at a steady state of a context-dependent reversible Markov evolutionary process. The null hypothesis of strand local symmetry is rejected in majority of bacterial genomes suggesting that even neutral mutations are skewed with respect to leading and lagging strands.

References

 
Agresti, A. (1990). Categorical Data Analysis. John Wiley & Sons, New York.
 
Arndt, P.F., Burge, Ch.B., Hwa, T. (2003). DNA sequence evolution with neighbor-dependent mutation. Journal of Computational Biology, 10(3–4), 313–322.
 
Afreixo, V., Rodriges, J.M.O.S., Bastos, C.A.C., Tavares, A.H.M.P., Silva, R.M. (2017). Exceptional symmetry by genomic word: a statistical analysis. Interdisciplinary Sciences Computational Life Sciences, 9, 14–23.
 
Baisnée, P.-F., Hampson, S., Baldi, P. (2002). Why are complementary DNA strands symmetric? Bioinformatics, 18(8), 1021–1033.
 
Bérard, J., Guéguen, L. (2012). Accurate estimation of substitution rates with neighbor-dependent models in a phylogenetic context. Systematic Biology, 61(3), 510–521.
 
Besag, J. (1974). Spatial interactions and the statistical analysis of lattice systems. Journal of the Royal Statistical Society, 36, 192–236.
 
Cristadoro, G., Esposti, M.D., Altmann, E.G. (2018). The common origin of symmetry and structure in genetic sequences. Scientific Reports, 8, 158171644.
 
Faith, J.J., Pollock, D.D. (2003). Likelihood analysis of asymmetrical mutation bias gradients in vertebrate mitochondrial genome. Genetics, 165(2), 735–745.
 
Goldman, N., Yang, Z. (1994). A codon-based model of nucleotide substitution for protein-coding DNA sequences. Molecular Biology and Evolution, 11, 725–736.
 
Hart, A., Martínez, S. (2011). Statistical testing of chargaff’s second parity rule in bacterial genome sequences. Stochastic Models, 27, 272–317.
 
Hart, A., Martínez, S., Olmos, F. (2012). A gibbs approach to chargaff’s second parity rule. Journal of Statistical Physics, 146, 408–422.
 
Hintze, J.L., Nelson, R.D. (1998). Violin plots: a box plot-density trace synergism. The American Statistician, 52(2), 181–184.
 
Jensen, J.L. (2005). Context dependent DNA evoliutionary models. Research Reports, 458.
 
Kong, S.-G., Fan, W.-L., Chen, H.-D., Hsu, Z.-T., Zhou, N., Zheng, B., Lee, H.-C. (2009). Inverse symmetry in complete genomes and whole-genome inverse duplication. PLOS ONE, Nov. 09. https://doi.org/10.1371/journal.pone.0007553.
 
Lobry, J.R. (1995). Properties of a general model of DNA evolution under no-strand-bias conditions. Journal of Molecular Evolution, 40, 326–330. Journal of Molecular Evolution, 41, 680.
 
Lunter, G., Hein, J. (2004). A nucleotide substitution model with nearest-neighbour interactions. Bioinformatics, 20(18), 216–223.
 
Marchetti, G.M., Wermuth, N. (2016). Palindromic Bernoulli distributions. Electronic Journal of Statistics, 10(2), 2435–2460. also on. arXiv:1510.09072.
 
Marchetti, G.M., Wermuth, N. (2017). Explicit, identical maximum likelihood estimates: for some cyclic Gaussian and cyclic Ising models. Stat, 6(1).
 
Marin, A., Xia, X. (2008). GC skew in protein-coding genes between the leading and lagging strands in bacterial genomes: new substitution models incorporating strand bias. Journal of Theoretical Biology, 253, 508–513.
 
Parks, S.L. (2015). Mathematical Models and Statistics for Evolutionary Inference. PhD thesis, University of Cambridge, Cambridge.
 
Petoukhov, S., Petukhova, E., Svirin, V. (2018). New symmetries and fractal-like structures in the genetic coding system. In: Hu, Z., Petoukhov, S., Dychka, I., He, M. (Eds.), Advances in Computer Science for Engineering and Education, ICCSEEA 2018. Advances in Intelligent Systems and Computing, Vol. 754. Springer, Cham, pp. 588–600.
 
Powdel, B.R., Satapathy, S.S., Kumar, A., Jha, P.K., Buragohain, A.K., Borah, M., Ray, S.K. (2009). A study in entire chromosomes of violations of the intra-strand parity of complementary nucleotides Chargaff’s second parity rule. DNA Research, 16(6), 325–343.
 
R Core Team (2018). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria. https://www.R-project.org/.
 
Rosandić, M., Vlahović, I., Glunčić, M., Paar, V. (2016). Trinucleotide’s quadruplet symmetries and natural symmetry law of DNA creation ensuing Chargaff’s second parity rule. Journal of Biomolecular Structure and Dynamics, 34(7), 1383–1394.
 
Rudner, R., Karkas, J.D., Chargaff, E. (1968). Separation of B. subtilis DNA into complementary strands. III. Direct analysis. Proceedings of the National Academy of Sciences of the USA, 60, 921–922.
 
Shporer, S., Chor, B., Rosset, S., Horn, D. (2016). Inversion symmetry of DNA k-mer counts: validity and deviations. BMC Genomics, 17(696), 1–13.
 
Siepel, A., Haussler, D. (2004). Phylogenetic estimation of context-dependent substitution rates by maximum likelihood. Molecular Biology and Evolution, 21(3), 468–488.
 
Simons, G., Yao, Y.-C., Morton, G. (2005). Global Markov models for eukaryote nucleotide data. Journal of Statistical Planning and Inference, 130, 251–275.
 
Sobottka, M., Hart, A.G. (2011). A model capturing novel strand symmetries in bacterial DNA. Biochemical and Biophysical Research Communications, 410, 823–828.
 
Stokes, M.E., Davis, C.S., Koch, G.S. (2001). Categorical Data Analysis Using the SAS System. SAS Institute, Cary, NC.
 
Sueoka, N. (1995). Intrastrand parity rules of DNA base composition and usage biases of synonymous codons. Journal of Molecular Evolution, 40, 318–325.
 
Tavares, A.H., Raymaekers, J., Rousseeuw, P.J., Silva, R.M., Bastos, C.A.C., Pinho, A., Brito, P., Afreixo, V. (2018). Comparing reverse complementary genomic words based on their distance distributions and frequencies. Interdisciplinary Sciences Computational Life Sciences, 10(1), 1–11.
 
Wickham, H. (2016). R: ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag, New York. 2016.
 
Zhang, S.-H., Huang, Y.-Z. (2008). Characteristics of oligonucleotide frequencies across genomes: Conservation versus variation, strand symmetry, and evolutionary implications. Nature Precedings. hdl:10101/npre.2008.2146.1.
 
Židanavičiūtė, J. (2010). Dependence Structure Analysis of Categorical Variables with Applications in Genetics. Doctoral thesis, Vilnius Gediminas Technical University, Vilnius, Lithuania. Retrieved from https://vb.vgtu.lt/object/elaba:2115290/2115290.pdf.

Biographies

Radavičius Marijus
marijus.radavicius@mii.vu.lt

M. Radavičius, Assoc. Prof. Dr., is a senior researcher at Institute of Data Science and Digital Technologies and a professor at Institute of Applied Mathematics, Vilnius University. He received a PhD degree (probability and statistics) in 1982 from the Steklov Institute of Mathematics of Russian Academy of Sciences (St. Petersburg Department). His major research interests include asymptotic statistics, nonparametric and adaptive estimation, dimension reduction and data sparsity, cluster analysis, applications of statistics in life sciences, medicine, linguistics and education.

Rekašius Tomas
tomas.rekasius@vgtu.lt

T. Rekašius, Assoc. Prof. Dr., is working at Department of Mathematical Statistics, Vilnius Gediminas Technical University. He received a PhD degree (mathematics) in 2007 from Vilnius Gediminas Technical University and Institute of Mathematics and Informatics, Vilnius. His major research interests include bioinformatics, applications of statistics in life sciences and medicine.

Židanavičiūtė Jurgita
jurgita.zidanaviciute@vgtu.lt

J. Židanavičiūtė, Dr., received a master’s degree in statistics from 2003 and a PhD degree in mathematics from 2010 from Vilnius Gediminas Technical University. She has been working at Vilnius Gediminas Technical University for 15 years. Her major research interests is applications of statistics in engineering, medicine and other fields.


Full article Cited by PDF XML
Full article Cited by PDF XML

Copyright
© 2019 Vilnius University
by logo by logo
Open access article under the CC BY license.

Keywords
generalized logit DNA strand symmetry Markov random field characterization hypothesis testing

Metrics
since January 2020
1151

Article info
views

699

Full article
views

520

PDF
downloads

208

XML
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

INFORMATICA

  • Online ISSN: 1822-8844
  • Print ISSN: 0868-4952
  • Copyright © 2023 Vilnius University

About

  • About journal

For contributors

  • OA Policy
  • Submit your article
  • Instructions for Referees
    •  

    •  

Contact us

  • Institute of Data Science and Digital Technologies
  • Vilnius University

    Akademijos St. 4

    08412 Vilnius, Lithuania

    Phone: (+370 5) 2109 338

    E-mail: informatica@mii.vu.lt

    https://informatica.vu.lt/journal/INFORMATICA
Powered by PubliMill  •  Privacy policy