Fractal Analysis of DNA by Nonlinear Genome Signal Processing for Exon and Intron Separation
Ali Karmi
Molecular Biology Research Center, Baqiyatallah University of Medical Sciences, Tehran, Iran.
Ali Najafi *
Molecular Biology Research Center, Baqiyatallah University of Medical Sciences, Tehran, Iran.
Peyman Gifani
Molecular Biology Research Center, Baqiyatallah University of Medical Sciences, Tehran, Iran.
Sahand Khakabimamaghani
Molecular Biology Research Center, Baqiyatallah University of Medical Sciences, Tehran, Iran.
*Author to whom correspondence should be addressed.
Abstract
Aims: To provide a new reasonable measure for distinguishing between coding and non-coding regions of DNA sequences based on its fractal nature and self-similarity.
Study Design: After conducting background studies on the fractal structure of DNA sequences, the application of Detrended Fluctuation Analysis for identifying coding and non-coding regions in those sequences was investigated. Finally, the propositions were tested on a standard dataset of 195 genes.
Place and Duration of Study: Sample: We use a common data set, “HMR 195”, which has been used in conventional tools, between December 2012 and July 2013.
Methodology: The Fractal Scaling Exponent (FSE) of the numerical signal, produced by converting a DNA string to a numerical sequence via a number mapping algorithm, was calculated for exons and introns of 195 genes. This calculation was repeated twice: once for computing the optimal values of FSE, and once for non-optimal FSEs. Analysis of Variance (ANOVA) was used for investigating the significance of difference between the average FSE of exons versus that of introns in both optimal and non-optimal cases.
Results: ANOVA indicated a significant gap between the optimal mean FSE of exons (0.65) and introns (0.72). The difference, although smaller, was significant for non-optimal values as well.
Conclusion: Throughout this study, the FSE is proved to be a reliable measure for distinguishing between coding and non-coding regions of DNA gene sequences based on our experiments. Accordingly, this metric can be used for predicting exons/introns when embedded within current tools such as TestCode. However, its contribution to the predictive accuracy of current methods requires more investigation in the future works.
Keywords: DNA sequence, fractal scaling exponent, exon, intron.