Summary
This study analyzed the multifractal behavior of the human genome T2T-CHM13v2.0 using Chaos Game Representation (CGR) and other methods, revealing distinct distributions and structural patterns in individual chromosomes and the entire genome.
Highlights
- The study applied CGR to the complete human genomic sequence T2T-CHM13v2.0, analyzing the entire chromosome assembly and each chromosome separately.
- Multifractal spectra were determined using two types of box-counting coverage, revealing slight variations across most chromosomes.
- Chromosomes 9 and Y exhibited the greatest differences in singularity (Ho¨lder exponent), with minor variations in their fractal support.
- The CGR distributions generally demonstrated an approximate separation between encoding and non-coding sections, as well as CpG or GpC islands.
- A base-by-base analysis of the fractal support of the CGR uncovered characteristic structural bands in chromosome sequences, which align with patterns identified in cytogenetic studies.
- The study compared two alternative representations: the Binary Genomic Representation (RGB) and the Markov Chain (MC) representation.
- The optimal fit was achieved using MC for twelve-base chains, yielding an average percentage error of 2% relative to the full genomic assembly.
Key Insights
- The multifractal analysis revealed that the human genome exhibits a complex and non-uniform structure, with distinct distributions and patterns in individual chromosomes and the entire genome.
- The CGR method was effective in capturing the multifractal behavior of the genome, but had limitations in terms of memory usage and computational complexity.
- The RGB and MC representations offered alternative approaches to analyzing the genome, with MC providing a better fit to the data for longer sequence lengths.
- The study's findings have implications for our understanding of the structure and function of the human genome, and may inform future research in genomics and epigenomics.
- The use of multifractal analysis and CGR may also have applications in the study of other complex biological systems and datasets.
- The study highlights the importance of considering the complex and non-uniform nature of biological systems when analyzing and interpreting genomic data.
- The findings of this study may also have implications for the development of new methods and tools for analyzing and visualizing large-scale genomic data.
Mindmap
Citation
Alvarez-Ballesteros, Y. A., Quiroz-Juarez, M. A., Del-Rio-Correa, J. L., & Escobar-Ruiz, A. M. (2024). Exploring the Multifractal Behavior of the Human Genome T2T-CHM13v2.0: Graphical Representations and Cytogenetics (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2412.16705