From an Information Theory POV, We Are Mere Approximations of Ourselves Because We Are the Total Sum of Our Information Transfer Errors
Not feeling quite yourself today? From an information theory standpoint, all of us are mere inaccurate approximations of who we "are"... and there are interesting variations on this theme!
The human genome is estimated to contain approximately 3 billion base pairs of DNA, which is equivalent to 6 billion bytes of information. However, the information is layered and coded via the genetic code, which is the codex all of life uses to transform sequences in genomes into proteins.
One layer of information is the wrapping of information about proteins into protein-coding genes. The human genome project originally estimated that we had around 40,000 protein-coding genes. A closer look told us that there are only roughly 26,000 protein-coding genes in our genomes.
Another layer of information is how the pieces of protein-coding genes that actually code for segments of proteins are stitched together. Most protein-coding genes are arranged as exons, which are pieces transcribed to proteins, separated by introns, which are not. When a protein coding gene’s DNA is “read off” as messenger RNA, the non-coding introns are spliced out of the gene transcript product. If a gene has more than one intron, the exons can be assembled into mRNA in a number of different combinations. The 26K annotated genes in the human genome contain on the order of 230,000 exons and 207,000. Genes typically have 8-9 exons and 7-8 introns.
The process of intron splicing is a process by which introns, or non-coding regions of DNA, are removed from a gene transcript. During intron splicing, the introns are cut out and the exons (coding regions) are joined together to form a continuous mRNA molecule. This mRNA molecule is then translated into a protein outside of the nucleus at the ribosome. Intron splicing can lead one gene to code for different proteins because it allows for alternative splicing of exons, which can result in different combinations of exons - from the same gene - being used to create different proteins.
As a result, we have far more proteins than protein-coding genes. Estimates range as high as 6 million possible different proteins encoding by the human genome due to combined effects of alternative splicing (AS), single amino acid polymorphisms (SAPs), and posttranslational modifications (PTMs).
The final layering of information is methylation. Specific tissues can express their own typical constituent proteome based on which proteins are made accessible to the transcriptional process. Methylation patterns are laid down during tissue development and specialization, which very much resembles a phylogenetic process. During tissue specialization, tissues inherit their methylation patterns from the tissue from which they derive. This is (in part) how are tissues and organs express the correct proteins. We don’t want liver proteins, for example, expressed in the brain, or vice-versa. DNA methylation is a fine-tune programming of cellular activity that even helps different parts of our body participate in and with the immune system (Morales-Nebreda et al., 2019).
We Are Our Imperfect Selves…
The transmission efficacy and information transfer accuracy of the transcription and translation processes are very high. During transcription, the genetic information encoded in the DNA is faithfully copied into mRNA with an accuracy of 99.9%. During translation, the mRNA is accurately translated into a protein with an accuracy of around 95-98%. This is surprisingly low, and it means that we are, on average (across proteins) something like 94.5% of what our genetic code predicts we are likely to be.
The information fidelity of the transfer of genetic information between parents and offspring (humans) is quite high. The error rate of replication in mitosis and meiosis is very low, estimated to be around 1 in 10 million base pairs. The size of the human genome is also relatively large, with approximately 3 billion base pairs. Additionally, meiosis involves the duplication of a genome that has been through multiple rounds of mitosis before forming germline stem cells, which further reduces the chances of errors occurring during replication.
Finally, oocytes form partway while females are still in utero, pausing after Meiosis I, while male gametes undergo Meiosis I only after puberty, which allows for additional time for errors to be corrected before gametes are formed. Overall, thanks to a variety of DNA repair enzymes, the information fidelity of the transfer of genetic information between parents and offspring (humans) is quite high due to the low error rate of replication in mitosis and meiosis, the large size of the human genome, and the fact that meiosis involves duplication of a genome that has been through multiple rounds of mitosis before forming germline stem cells. But it’s not perfect.
Females are more closely related to their offspring than males
The female gamete (the egg) is formed in utero and remains dormant until fertilization, while the male gamete (the sperm) is formed after puberty and must travel to the egg for fertilization. This means that the female germ cells that lead to gametes have been dormant during childhood until puberty and oocyte formation, while the male gamete has not. Male germline stem cells have undergone more rounds of genomic replication than those in females. As a result of the biological differences between male and female germline derivations and succession, female germ cells acquire far fewer than male germ cells per year (0.74 and 2.7 mutations per year, respectively). This makes males the driver of diversity in evolution (no social or gender importance implications here).
As a result, the female gamete contains more genetic information without change from both parents than the male gamete, making females slightly more closely related to their offspring than males.
We Become Less and Less Like Ourselves As We Age
As we age, our tissues accumulate somatic mutations. The rate varies with tissue type. One study found that the number of mutations that accumulate in different tissues ranged from 9 substitutions per year in bile ductular cells to 56 substitutions per year in appendiceal crypts (Moore et al., 2021). Thus, we are (somatically) evolving away from our original selves every moment from the instant of our first cellular division following conception. This also means that some of our tissues - the ones that undergo more cellular divisions per unit of time - are, from a genomic standpoint, older than others. Overall our intestines are among the oldest types of tissues, whereas satellite cells - the most abundant skeletal muscle stem cells - undergo the fewest, making them the youngest types of tissues in humans.
Caloric Restriction Keeps Us Young
What drives our cells to divide? Metabolism. One of the determinants of how much metabolism we perform every 24hr is the number of calories we consume. Burn, store, or excrete, we still have to use cellular metabolism to handle calories. Carbs, sugar, protein, fats… all sources of calories put our bodies to work somehow. As cells become “used up”, they are replaced by stem cells. Our chromosomes lose parts of their telomeres with every cell division, activating a type of built-in “self-destruct” mechanism leading to genomic instability in cells that have undergone large numbers of divisions. There are immunologic consequences that should also be noted. Metabolic conservation of self is real.
A phase 2, multicenter, randomized controlled trial published in the Lancet found that young and middle-aged adults (aged 21–50 years) who were healthy and non-obese (BMI 22·0–27·9 kg/m2) who underwent 2 years of moderate calorie restriction had significantly reduced cardiometabolic risk factors. The findings point to the “potential for a substantial advantage for cardiovascular health of practicing moderate calorie restriction in young and middle-aged healthy individuals, and they offer promise for pronounced long-term population health benefits” (Kraus et al., 2019).
From an information theory standpoint, when you’re feeling off, you can rightly say “I’m probably not an accurate representation of myself today” - and you’d be accurate.
Feeling fuzzy today? Not quite yourself? You’re not alone. Every last one of us is an approximate self.
Learn more about biology and yourself with Dr. James Lyons-Weiler and other instructors @ IPAK-EDU!
Citations
Kraus WE, Bhapkar M, Huffman KM, Pieper CF, Krupa Das S, Redman LM, Villareal DT, Rochon J, Roberts SB, Ravussin E, Holloszy JO, Fontana L; CALERIE Investigators. 2 years of calorie restriction and cardiometabolic risk (CALERIE): exploratory outcomes of a multicentre, phase 2, randomised controlled trial. Lancet Diabetes Endocrinol. 2019 Sep;7(9):673-683. doi: 10.1016/S2213-8587(19)30151-2. Epub 2019 Jul 11. PMID: 31303390; PMCID: PMC6707879.
Manders F, van Boxtel R, Middelkamp S. (2021) The Dynamics of Somatic Mutagenesis During Life in Humans. Front Aging. 2:802407. doi: 10.3389/fragi.2021.802407. PMID: 35822044; PMCID: PMC9261377.
Morales-Nebreda L, McLafferty FS, Singer BD. DNA methylation as a transcriptional regulator of the immune system. Transl Res. 2019 Feb;204:1-18. doi: 10.1016/j.trsl.2018.08.001. Epub 2018 Aug 9. PMID: 30170004; PMCID: PMC6331288. https://pubmed.ncbi.nlm.nih.gov/30170004/
Moore L., Cagan A., Coorens T. H. H., Neville M. D. C., Sanghvi R., Sanders M. A., et al. (2021). The Mutational Landscape of Human Somatic and Germline Cells. Nature 597 (7876), 381–386. 10.1038/s41586-021-03822-7.
Thanks for the great biology lesson! I had no idea that a single gene could code for multiple proteins.
Thank you so much for this article! I studied microbiology briefly at uni, this was a great refresher, and update to my basic mRNA and methylation knowledge from over 30 years ago (and who knew it would become such a hot topic back then as it is today??)... I'm about to forward this article many times... Thank you once again