Genetic Information

The current, editable version of this book is available in Wikibooks, the open-content textbooks collection, at
https://en.wikibooks.org/wiki/Genetic_Information

Permission is granted to copy, distribute, and/or modify this document under the terms of the Creative Commons Attribution-ShareAlike 3.0 License.

Introduction to Genetic Information

Introduction

There is a growing interest in mathematical study of succession of letters that represent the basic structure of DNA strand. In molecular genetics, these letters are T, G, C and A where they represent thymine, guanine, cytosine and adenine.

Electropherogram printout from automated sequencer showing part of a DNA sequence

DNA is an information molecule that codes sequence of deoxyribonucleotides. Where as messenger RNA (mRNA) is an information molecule coding by sequence of ribonucleotides. In the structure of mRNA is a single stranded molecule where uracil, lettered U takes the place of thymine found in double stranded DNA.

The letters in the information molecules are normally written without gaps and when they are taken three at a time, they are called codons. They produce 64 different combinations of bases. It is this coded sequence in the letters, for example, GATCTAC that we will refer to here as genetic information.

Information in this case is about genes. It may also be information about gene products or our understanding of combinations of inherited family traits.

Modern study of genetic information includes information that regards carrier status and information that is derived from science laboratory tests. The science laboratory tests are often intended to identify mutations or changes in specific genes or chromosomes. There are three ways in which genetic information is gathered in a scientific study. One is usually done by series of physical medical examinations while the other is by taking family histories and, lastly, by engaging direct analysis of genes or chromosomes through detailed biometric processes.

The end result in the flow of genetic information is protein, that is, a functional molecule consisting of sequence of amino acids.

<<Back | Next>>

Table of Contents

Overview of Sequence Statistics

General History of Statistics

The historical development of the field of sequence statistics as discussed in this chapter represents the common denominator of genetic biometry. From its onset at the end of pre-dynastic period (about 3100 BCE) associated with Nilotic African graphemics to the present genomic age, sequence statistics considers probability the core of information.

Among the southern Luo in this particular subject, one finds the proverb Obaro nyar Wasare that is intended to conceal the meaning of probability and information research. Kwame Gyeke of Ghana suggests the importance of many proverbs from the Akan people. But this Nilotic proverb is the origin of limiting relative frequency, often recited by the wise people to reveal variation, frequency and estimation of the living being. In the work of Odhiambo Akoko (Paul Mboya 1953), Luo Kitgi gi Timbegi, for example, the science is revealed in the sense that one’s character and behavior in general is by far a reflection on one’s customs and traditions in which one operates.

The proverb teaches a bout kit Wasare concreted by rieko te in linear and circular order and means the amount or measure of understanding as well as the totality of what can be known to come from a parent, Wasare in linear form and taught by a teacher to the daughter of Wasare ma adiera.

Similarly, in decoded Egyptian hieroglyphics, Wasare is the Osiris or Ra whose daughter, mesia is ma adier, the truth, ma’at who is nyar re the daughter of Ra. In effect, from the Luo yud jaot (jaodi) which is finding a consort (Tehuty) and kido which is heredability comes the pharaonic udjat, the “sound eye” and khed, building a mathematical process with a living yardstick. It is this living yardstick that through Latin, metiri and old French, mesure, we get the origins of the English word measure that therefore is the ultimate source of biological mathematics as we know it today to have come from the Luo.

Genetic information relies heavily on modernized conventional concepts of statistics and probability in symbols to which we now turn.

<<Back | Next>>

Table of Contents

Probability

History of probability and information

One of the ancient expressions of information dealing with entropy also deals with information and the number of possible outcomes in an event.

The sound eye, udjat, shows it clearly.

Here, for example, is where we find the history of African ways of combining a system of two to a system of three compartments applied by the artists in the graphic form.

Mathematical language shows these expressions, the destroyed parts of Horus' eye, to result into six compartments that exhibit exponential equations of the form: $y=ac^{bx}$ often expressed in the West with the Greek symbol : $\phi (x)=ac^{bx}$ . Where c is the base and in this case equals two. This same c in other useful mathematical expressions can be 10 or more often e as an approximated value of natural logarithm. Why logarithm?

The ancient forms of udjat parts are $2^{-1}$ , $2^{-2}$ , $2^{-3}$ , ... Because the base number 2 is raised to powers, -1, -2, -3... that we call exponents, the exponents are the logarithms and they are here with their bases resulting into fractions of unitary numerators equal to respective probabilities for equally likely outcomes of events exemplified by udjat parts.

The hieroglyph P for this particular event was initially represented in the arts by 64 squares in a grid reference where each small square had a unit area. Of all the 64 squares, the debate was centered on one. The information therefore varied inversely with probability as it does in the modern communications theory. Words such as magic and beauty were employed to mean infinity and outcome. The area study was called seked S from which the integral symbol S in calculus would be derived.

Weighted information of each class since the ancient time has always involved multiplying log (1/P) by probability Pi of the class.

<<Back | Next>>

Table of Contents

Properties of DNA

Fundamental Properties

There are two fundamental properties of DNA molecule often studied with sequence statistics; local and long-range properties. These properties are usually analyzed in terms of displaced correlations at nucleotide and Batch levels. In which case, the levels are studied as statistical models involving spectral analysis where they are noted to exhibit entropy.

Local-Range

Long-Range

<<Back | Next>>

Table of Contents

Glossary

codon: three consecutive nucleotides in a strand of DNA or RNA that genetically codes for a specific amino acid.