Genetics for Computer Scientists

5 min readAug 1, 2021

Genetics and epigenetics are a fascinating area of scientific discovery that computer scientists should be paying attention to. With no formal training in biology or genetics, it took me one afternoon of crash courses on DNA, RNA and the cell cycle to get a fundamental undersatnding of this science. I wrote down some new insights comparing biological vs man made technology. I have not seen this perspective anywhere and wanted to share.

Computers vs Biology

Let’s contrast man-made computers with biology.

Computer Technology

Highly centralized, one “copy” of the software running at a time (somewhat over simplified, but in an abstract it’s true)

Have access to all information in the system at all time (through a very fat BUS and caches)

Light use of composition (a single “routine” that performs thousands of functions)

Self replication, self repair, and evolutionary capability are non existent, or a “last effort”

Weak virus/intrusion detection, “last effort” or non existent

Information is encoded using just one building block type: the electron

Advanced compression and use of random information access.

Utilizes electrons exclusively for communication and storage.

Biological Organism Technology

Highly decentralized. An animal contains trillions of copies of the source code (each cell contains the entire perfect DNA copy)

Run trillions of “threads” (cells) run in parallel

Perform 10s of millions of different functions (42 million unique types of proteins in human DNA)

Each cell has access to very limited information through physical interaction with cells that bump into it; some cells only benefit from electrical signal communication

Very heavy use of composition (tens of millions of “routines” each performing one highly specialized task)

Self replication, self repair, and evolutionary capability are primary functions

Highly advanced anti-virus/intrusion detection, prevention and destruction

Information is stored using chemistry. 4 atoms, CHON(carbon, hydrogen, oxygen, nitrogen), in addition to utilizing electrons for transmission.

Very little compression — mostly sequential access. Two exceptions I identified are 1\ DNA: bidirectional encoding in the double helix, and, 2\ protein folding, a 2D chain of amino acids folding itself into a 3D structure, which is mind bogglingly complex.

Utilizes CHON atoms primarily for storage and communication, with the exception of the brain, which prefers electrons.

Traits in Common

Both are, in fact, highly evolutionary — meaning — the results of many experiments. Very little data science is truly deterministic. Millions of computer engineers performing millions of controlled experiments to arrive at the solutions we use today. Biology performs trillions of experiments through mutations.

Complexity

Biological organisms are highly constrained. Any individual process is quite simple and self contained. The complexity comes in two forms. 1\ while each process is simple, the sheer quantity of processes is absurdly large. 2\ the observation of biology is awkward to say the least, due to the microscopic nature.

A computer scientist should have no trouble understanding individual processes; but will find it challenging to learn enough for a useful base of knowledge. The language of genomics borrows little from standard english, memorizing the terminology will be challenging. Finally, the actual study of genomics requires an expert who can think in a microscopic world using macroscopic tools.

In essence, I believe to learn from 0 to latest in biology and genomics is a rapid process; to advance this science on the other hand is incredibly challenging. Computer science on the other hand is the opposite. To learn from 0 to latest is very challenging; to advance it is relatively easy.

Electrons are a wonderful thing, as a building block they enable rapid iteration, compared to atoms which biological organisms are limited to. Electrons are subatomic (size advantage), and require very little energy to move (no hard bonds).

Common Misconceptions on DNA

Let’s talk about DNA. It’s often described as “the source code” of the human body, but this isn’t quite right. DNA by itself is actually fairly worthless. That’s because DNA is not a single “source code” that can be “executed”. Rather, it’s tens of millions of tiny snippets of code. To create something useful, you need an assembled “program”, made of many snippets of DNA, called a “gene”.

Let’s review what you need.

The full DNA chain (made up of 6 billion base pairs)
A “gene”, ranging from commonly 27,000 up to 2 million bars pairs
A mechanism to “select” and “stitch” DNA segments into a “gene”
The “gene” needs to be copied to machinery that can read it and produce a molecule by literally, assembling C-H-O-N atoms together.

The mechanism in (3) is of course RNA, the epigenetics system. Unlike software, which uses highly static code (self modifying code is a very new, mostly unused idea), genes are highly dynamic. The gene that produces a skin cell on a warm sunny day will be slightly different from the gene that produces a skin cell on a cold winter day. RNA changes in response to environment, and can select different “snippets” of DNA to produce a “skin cell” gene, which is the source code used by machinery to assemble atoms to create the skin cell.

It’s the universal theory that brings together chemistry, biology, genetics, and physics to produce Life with a capitol L.

Are genetics digital, or analog?

The lines here start to get blurry and there is no right or wrong answer. All living organisms — from single cell organisms to complex mammals have machinery that assembles C-H-O-N atoms. In fact, we can go a step further to say the machinery both, disassembles (breathing O2) and reassembles (amino acid->protein folding) atoms.

This machinery solves material science (stretchy skin, hard bone) as well as logic processing (enzymes that can cut/insert/copy DNA segments).

Computers are not too different — they also operate on the atomic and subatomic level using electrons as an analog counter. The lines between digital and analog become very blurry at the subatomic scale.

Epigenetics and Aging

What would happen if the RNA that “selected” snippets of DNA to produce a genome was damaged? It would select random, incorrect segments of DNA. Your skin cell “program” (genome) would contain snippets rom the liver, or from the bone. Latest research from David Sinclair and his colleagues suggests this is exactly what’s happening. Our DNA is not what gets damaged with aging, it’s the epigenetic system, the RNA. This make sense, since RNA is designed to be highly dynamic, influenced by environment.

Wrapping it up, Pop Quiz

Could you rebuild a perfect replica of a human being from a dead cell?

Answer: Yes and no. Assuming the DNA In that cell was in tact, which it very well could be, you’d end up growing a human with your exact DNA sequence, all 6 billion pairs, perfectly replicated. However, the clone’s epigenetic system would produce a very different individual. If your clone went through a famine period at a young age, they would like grow weaker, shorter, with a smaller brain, due to the epigenetic system re-allocating resources to critical functions, and reducing the need for energy (smaller body=less food energy required to survive).

Alive, Digital, or Dead?

If you’ve come this long, you may start asking the question, what does it mean to be alive? At the molecular level, we are just Carbon, Hydrogen, Oxygen and Nitrogen atoms bonded together chemically, interacting with each other to produce more cells, reacting to the environment to alter the hard-coded DNA sequence used to produce cells. This world to me looks highly digital, no clear indication of “life”, or even “biology” to differentiate us from a computer or from a tree or a rock.

The difference between a rock and a living organism is that the rock’s atoms and molecules haven’t developed the ability to clone themselves, which is where the entirety of “Life” originates in.