Human Genome Sequence and Its unexpected "Small" Size?
The
Human Genome Project (HGP) was launched in 1990
with the goal of obtaining a highly accurate sequence of the vast majority of
the euchromatic portion of the human genome. The initial work followed a
two-pronged approach: (1) the mapping of the human
and mouse genomes to allow the study of inherited disease and provide
a crucial scaffold for genome assembly; and (2) the
sequencing of organisms with smaller, simpler genomes to serve as a
testbed for method development and assist in interpreting the human genome. With
success along both paths, the sequencing of the human genome itself eventually
became feasible. The
International Human Genome Sequencing Consortium (IHGSC), an open
collaboration involving twenty genome centers in six different countries, was
formed to carry out this component of the HGP.
The
sequence of the human genome encodes the genetic instructions for human
physiology, as well as rich information about human evolution. In
2001, the International Human Genome Sequencing
Consortium and Celera Genomics each reported a
first draft sequence of the
euchromatic portion of the human genome.
In April of 2003, the
IHGSC published in the April 24 issue of the journal
Nature, coinciding with the 50th anniversary of
Nature's publication of the landmark paper
by Nobel Laureates James Watson and Francis Crick that described DNA's double
helix, a
complete
draft of the sequences.
Since then, the international collaboration has worked to convert
these drafts into a genome sequence with high accuracy and nearly complete
coverage. The IHGSC reported in
Nature on October 21, 2004, the result of this finishing process. The
current genome sequence (Build 35) contains 2.85
billion nucleotides interrupted by only 341 gaps not yet sequenced. It covers
99%
of the euchromatic genome and is accurate to an error rate of
1 nucleotide per 100,000 bases
sequenced. Many of the remaining euchromatic gaps are associated with segmental
duplications and will require focused work with new methods.
The near-complete sequence, the first for a vertebrate,
greatly improves the precision of biological analyses of the human genome
including studies of gene number, birth and death. Notably,
the human genome seems to encode only 20,000–25,000
protein-coding genes.
a COMPARATIVE FIGURE
Due to the suspected presence of some 100,000 proteins involved in cell function, the human genome was also expected to contain about 100,000 genes. The first draft of the genome sequence was found to have about 35,000 genes. The most complete draft to date (build 35) has only about 25,000 genes maximally.
Nature 431, 931 - 945 (21 October 2004); doi:10.1038/nature03001
BACK
copyright - c.mallery - Oct.2004