The Genome and Genome Evolution:
   - GENOME                  is an organism's complete set of genes containing all info to build that organism. 
GENOMICS              studies whole sets of genes and their interactions within and between species.
BIOINFORMATICS   is the application of computer methodologies to analyze vast genomic data.

     What can GENOMIC ANALYSES tell us?       Sequenceing can tell us what genes can make

       "Sequencing of the Human genome is comparable to the establishment of the
                    Periodic Table of elements by  a Russian chemist, Dimitri Mendeleev, 1869". 
   Genomics began with the 
Human Genome Project    a 15 year 
'JFK moon-like'  project...
           HGP was initiated in 1990 by NIH & DOE and involved 20 sequencing labs in 6 countries.
           A 1st sequence was announced in 2003 and a more complete sequence was published in 2006.
Genome Sequencing Procedures:     
       1.  Frederick Sanger (Cambridge U.) in 1975 produced 1st complete sequence of a viral
                 genome of phage
ΦΧ-174 (5,375 np's) - followed by human mitochondrion with 17K np's.
      2.  Sanger's dideoxy*method* of "Chain Termination" -->  reading*  +   sequencing labs*  
                 uses termination nucleotides for they randomly stop action of polymerase
                 when they are incorporated into growing chain marking its end.

              Strategy   the Shotgun procedure*    &     Celera Genomics shotgun - video*       
                  random DNA fragments (from a library 500-800n long) are sequenced using automated 
                  sequencing machines & then ordered relative to each other via overlap & supercomputing.

Next Generation Sequencing, then Minion sequencing*  
MinION portable DNA/RNA sequencer (~$1,000 to $50K+)*








Timeline:  a 1st draft of the Human genome sequence was announced* in June 2000 and
in 2003 (13thyr) 1st human genome sequence was published (cost = ~ $3 billion).

                   2007 - Craig - Venter's Genome (dideoxy method)  ~  $7mil ($10 per 1,000 bases).
2008 - James Watson's genome  ~  $1mil  (20 per 1,000 bases)
2009 - an EBay bidder  paid $68,000 for sequencing his genome.

  Costs:     NHGRI cost estimates: -   per million bases*     &    per genome    [table of costs]

$1,000 genomes and Genome Testing?  =  
                   Genealogy Testing* = Your Ancestry DNA + 23and me   &   future of medicine* 
                            Medical whole genome analysis    &    Novogene Sequencing
                            Even a Human fetal genome has been sequenced         

  27 May, 2021:  
A more Complete Human Genome Sequence*

  10 May 2023:    a draft Human Pangenome reference*
Genome Sequencing Standards  -->  NIST Reference Standards

  Organisms:    Early list of organisms whose genomes are sequenced*
  Databases:    GenBank - an NIH database of publically available DNA sequences

Nat'l. Ctr. Biotech. Information Database video




What we've learned from sequencing the Human Genome...
     Using methods for cloning DNA fragments, automated DNA sequencing techniques,
     & computer algorithms to piece together the entire genome sequences of humans,
     many viruses, bacteria, archaea, yeast, C.elegans, Drosophila, & mouse are known.

          Surprising Size Estimates of active Human Genome*
             Originally estimated to be 100,000 genes,
                                 current estimates list only some 20,687 protein coding loci.
             Comparison of genome sizes
for complexity in model organisms
                            the Human Genome
differs by only 0.1% from person to person.

The number of Human genes will depend upon how we DEFINE a "GENE"

Some Definitions of a Gene include
   Mendel's Particles... unit of heredity responsible for phenotype
           term Gene was coined by Wilhelm Johanssen (1909) to describe whatever it was that parents
           passed to offspring to develop same traits (a definition completely free of any hypothesis).

   Morgan's Loci... he placed genes on a chromosome, i.e.,
           it's a cellular entity, that is part of chromosome & is mapable to a gene LOCUS.
   Watson & Crick... sequence of specific nucleotides along length of double helical DNA

Molecular Definition... 
length:  1 nucleotide = 0.34nm  thus tRNA  =  81n x 0.34 = 27.5nm
                                        mass:  1 nucleotide = 340amu thus tRNA  =  81n x 340 = 27,540amu
                                                ex %
:     20% A  :   20% T   : 30%  G   :   30% C      [A:T  &  G:C]

  a Functional GENE Definition...   a DNA sequence coding for specific polypeptide** 


But any definition of a gene would also need to include...

   Split Genes...  presence of Introns & Exons
* - eukaryotic genes contain non-coding segments
introns) and coding segments (exons - that make proteins)
   Other DNA pieces... any definition of a gene should also include:
                              DNA segments that code for rRNA,  tRNA,  snRNP's,  miRNA's  &
also promoters,  enhancer segments regulator genes,  operator sites?
   Pseudogene...  mutated DNA segments, no longer making a protein [≈ 10-20K segments];
were once active, but evolution has made them effectively inactive now.
non-coding DNA... 98% of Human genome does not code* for protein (regulatory functions?).

     Encode Project - (ENCyclOpedia of Dna Elements) is a long term [begun 2003]
                 research effort involving 440 researchers in 32 research lab that 30 papers and
                 1,600 data sets describing determine functions of many DNA pieces that
                 identified protein coding genes, non-coding RNAs, regulatry genes, and enhancers,
                 and promoters,along with DNA and histone modifications.

   the most commonly used defintion of a GENE...
          ... traditionally... traces back to a mRNA transcript (
) & back to its DNA...
        ... the smallest unit for an inherited trait, a gene --> a COLLECTION of EXONS.
             or    "a segment of DNA corresponding to a single protein
                                    (or set of alternate protein variants)  
                          or a single catalytic
or structural RNA molecule"







Genome Organization...   differs between Prokaryotes and Eukaryotes*  

                        genes DNA DNA2 chromatin

Size of Human genome:
        3 billion+ base pairs, equaling some 500,000 pages of journal Nature.  

              yet, there are only about 20,687  protein coding genes.
              in fact, only about 1.5% of the genome codes for proteins, i.e, about 40,000,000 bases
            Gene Density
* is less in Humans than many species. 

        the chromosome*      How genes are represented on chromosome maps*View@Home   
              the structural organization of genome in eukaryotes influences gene expression.
               - some unique human DNA folded structures may regulate gene action
(i-motifs vs. hairpins
               - packing & unpacking chromosomal DNA by special proteins generates loops & coils opens
                 it to be copies, repaired, or expressed into proteins.


       Types of Human DNA*    Mobile Genetic Elements*        Human Genome Statistics*
       Gene models for studying Human genetic diseases
        U.M. Biology Core Zebrafish Facility















application of statistics and informational theory, computational methodologies
                              and algorithms to store and analyze biological data.

        BIOINFORMATICS*View   is a way to decipher DNA*View
                                                to understand life's diversity & genetic diseases.  

       The vast amount of genomic sequence data is stored & organized in 2 data banks:
       the GenBank
at the NIH in Bethesda, MD and
EMBL Sequence Database at European Molecular Biology Laboratory in Heidelberg,
                  Germany. Both of these databases are available to all via the internet.

       other databases include:
NCBI                                   -   National Center for Biotechnology Information
          CCDS                                         -   NCBI Consensus human & mouse protein coding regions
  USCS Genome Browser    -   UC Santa Cruz Genome Browser 
            Ensembl Browser               -   a genome browser for vertebrate genomes

    the Wonder of DNA...

     next presentation - neurophysiology *        








        next presentation - neurophysiology*    









    - using bioinformatics to assess gene actions across a whole genome tells us how they work in
    - different species comparison reveals information on the
evolutionary history of life
    - comparisons of embryonic gene action in different species may tell us how the
great diversity of life
             may have arisen.

    DNA divergence may infer relatedness or diversity... Tree of Life - fig 21.17*
             sequence divergence via genotyping
conservation can infer what it means to be a particular type of organism, e.g.,
             - some 47% of 414 genes common in
yeast and humans have similar functions making
                         yeast a model organism for human genetics.
             - similar genes from closely related species (humans, chimps, mice, etc) can provide
                         clues about what characteristics it takes to be a
             - while finding genes shared by chimps/humans, but not rodents tells us about being a

        - show 99% similar chromosome banding patterns*
             - differ by ~ 1.2% of N's, but humans have many more insertions and
                             repetitive duplications, 1/3 of which are not in chimp DNA. 
             - Humans have more
ALU elements (DNA pieces cut by the ALU restriction endonulease),
                             and chimps have more
retroviral provirus DNA not found in humans.


Mechanisms that lead to Gene Diversity* favor evolution. 

Role of transcription    factors: some examples:   
  Genes that code for transcription factors are involved with many functions & seem to evolve faster...

FOXP2  gene (fig 21.8*)?  it may be involved in speech in humans and other vertebrates:
               Researches used knock-out gene mechanisms to silence
mice genes involved in vocalizations
               producing 3 genotypes: 2 normal copies, a heterozygote, and 2 homozygous knock-outs
                       EXP 1:    the homozygous knock-out had brain abnormalities
                                      the heterozygote had less brain abnormalities
                                      the  normal genes showed no abnormalities
                       EXP 2:    new mouse pups squeek/whistle vocalizations were analyzed and absence
                                          of a functional gene reduced these vocalizations  
FOXP2 transcription factor gene in Humans shows rapid change & regulates vocalization genes
           Mutations in the Human FOXP2 gene results in
language impairment
           Human/Chimp FOXP2 gene sequences differ by only 2 amino acids,
                                           and presumably may be involved in
'speech development'
           Recently sequenced Neanderthal DNA showed a FOXP2 gene... Could they have spoken?

     Human history is only ~ 200,000 years, thus human DNA variations are small...
           Most changes are SNPs (
Single Nucleotide Polymorphismsfig 20.15*   
                                  & occur in about 1% of the human population and about 1 in 100-300 NPs
           Human genomes also show many regions of inversions, deletions, & duplications.


. EVOlutionary DEVOpmental biology is the area of genomics that compares the developmental
processes of different organisms to infer the ancestral relationships between them and how
           developmental processes evolved. 

           Genomic analyses have reveals homology where dissimilar organs, such as the EYES of
           insects, vertebrates and molluscs, (long thought to have evolved separately), are controlled by
           similar genes, such as
Pax-6. Such genes are ancient, being highly conserved among phyla and
           generate the patterns which shape an embryo, and ultimately form the body plan of the organism.

Pax-6 is one of the Pax gene family which codes for the transcription factor protein Pax-6,
               found in
neural ectodermal cells of the forebrain/hindbrain/spinal cord & midbrain.
               It helps "control" the development of eyes
* and other sensory organs across many species.
                     e.g., Mouse Pax6 can trigger normal compound eye development in Drosophila melanogaster.
                                         and fly Pax6 genes can trigger normal eye development in frogs.
                             Mouse and Human PAX6 have identical amino acid sequences.   

           Many Evo-Devo genes are structural genes, coding for enzymes common to many organisms, 
           being expressed in different parts of the embryo and at different stages of development, forming a
           cascade of control of other regulatory genes and structural genes on and off in a precise pattern.
           New morphological features and ultimately new species are produced by variations either when
           these  genes are expressed in a new pattern, or when these genes acquire additional functions. 

      back         next lecture*           










a short Pictorial Chronology of the Gene... &  Genetic Milestones from Peas to now...


















 Genes and Evolution:      Modern synthetic Darwinian evolution predicts that Natural Selection
                                 over time can lead to permanent changes in the the DNA that is inheritable.


   but recent studies have evoked a new science called EPIGENETICS...
                       study of changes in gene activity that do not involve alteration to the genetic code,
                       but changes that are still
are passed to at least one successive generation.
      epigenetics changes the DNA as a biological response to an environmental stressor,
changes that can be inherited through many generations via epigenetic marks,
   but if you
remove the stressor epigenetic marks fade & the DNA code will revert to normal
Lamarckian (1744-1829) animals acquire traits with their life span (giraffe).   

ex: 1. drug geldanamycin produces outgrowth of Drosophila eyes that can last for 13 generations
                        no change is DNA sequences and generations 2-13 were not exposed to drug.
166 fathers who smoked before age 11 had sons who had a significantly higher BMI
                        than control kids of 14,024 fathers.
             3. a
diet rich in B-vitamins (folic acid & B12 - CH3 donors) fed to pregnant agouti mice
                        normal pups, but those w/o B-vitamins produces pus with yellow coats & diabetes.

analogy: the genome is the hardware and epigenetics is the software:
                         one can load windows on a Mac; you'll have the same chip in the Mac (same genome)
                         but the software will produce a different outcome - a different cell type.      end.
       Home |About |News |Syllabus |Lecture-outlines |Links |FAQ |Sitemap |Contact      




































The index case for AMPD deficiency was a 18 year old female with calf pain that revealed a mis-sense mutation at nucleotide 143 in codon 48 of exon 3 where C changed to T resulting in a change of proline to leucine. This mutant allele is found in 12% of Caucasians and 19% of African Americans and results in the AMPD deficiency in muscle biopsies and results in exercise-induced metabolic myopathy in humans.

ref: A life Decoded by J. Craig Venter, Viking Press, 2007, C2, pg 28


















 Gene expressions in   pharmacogenomics    &   toxicogenomics  via microarrays

  1 cM = about 1 Mb

  TRANSPOSONS - pieces of DNA prone to moving & creating repeat sequences
                 LINE - long interspersed nuclear element holds promoter & 2 genes: RT &

    an anomaly - RNA Recoding*


          Simple Tandem Repeats (short- 5n to 6n)  or trinucleotide (3n) repeats can undergo an increase in copy
                     number by a process of dynamic mutation; # of tandem repeats is unique to a genetic indiv.
                     Variation in the length of these repeats is polymorphic.     figure*
                                   individual A has ACA repeated 65 times @ loci 121, 118, and 129
                                   individual B has a different repeat pattern at these loci
                     STR'sa can cause genetic diseases as well:

                                      CCG trinucleotide occur in fragile sites on human chromosomes (folate-sensitive group).
                                            fragile X (FRAXA) is responsible for familial mental retardation.
                                            another FRAXE is responsible for a rarer mild form of mental retardation.
                                            mutations of AGC repeats give rise to a number of neurological disorders.

 3.  Forensics - DNA fingerprinting is the vogue judicial modus operandi
 a murder case* &  a rape case*  + DNA prints in Health & Society & DNA Forensic Science
                     DNA fingerprinting usually looks a 5 RFLP markers and blood is tested via
                     Southern Blotting (20.10) using probes for these alleles

 4.  Environmental Clean-up...
            bacteria can extract heavy metals (Cu, Pb, Ni) from the environment
            & convert them into non-toxic compounds
                          genetically modified bacteria may be the "miner's" of the future



 5.  Franken Food...  genetically modified (GM) animals & agricultural crops
        Transgenics - organisms with inserted foreign DNA in their genomes
*  -  GFP novelties*  +  Dolly
                                -  animal cloning companies   --->   mammalian cloning success?
-  "pharm" animals (20.18*)  --->   transgenic animal movie
sheep carry human blood protein gene that inhibits enzymes in cystic fibrosis;
    artificially insemination, microinjection of human gene, fertilized ova are put
                                              into a surrogate sheep:
                                              chimeras mated to produce homozygote- Milk tested for active protein.

Plants      - genetically modified crop plants - fig 20.19*
                               -  to get Ti plasmids in = a DNA gun*   Purdue University Gene Gun movie
                          -  Frankenfood  &  Edible Vaccines
-  National Plant Genome Initiative Plan    update  future   

 6.  Synthetic Biology...   artificially manufactured biological systems
                                - virus models
*                            (synthetic Biology)

                                            An overview of biotechnology  
                                                 History of Biotechnology    
                                                 Human Genome Project & Biotech Companies
                                      HHMI  funded  DNA  Interactive  tutorial





What are Introns? and What is the Role of Intron DNA?
don't really know, but Percentage of non-coding DNA during evolution*  goes up.


   summer & fall 2006: skip this material
 INTRONS - DNA Junk or sophisticated Genetic Control Elements?
Current dogma of Molecular Biology
           DNA --> RNA --> Proteins,   (proteins supposedly regulate gene expression)     figure
in 1977 Phillip Sharp & Richard Roberts discovered DNA contains introns
                intervening DNA segments that do NOT code for proteins
                a primary RNA transcript is processed by splicing to assemble protein coding exons

   Presence of Introns:  Absent in prokaryotes: they have few non-coding DNA sequences
                as eukaryotic complexity grows so does non-coding DNA    [figure]
makes up greater than 95% of the DNA
less than 1.5% of human genome encodes proteins, but all of DNA is transcribed
                40% of human genome is Transposons & repeat genetic elements.

   Evolutionary Origins?  may have been self-splicing mobile genetic elements
                                    that inserted themselves into host genomes 
                           Advent of Spliceosomes: catalytic RNA/protein complexes
                                    that snip RNAs out of mRNAs,
                            would encourage introns to proliferate, mutate, evolve


   fall 2005 skip ALL OF THIS MATERIAL
Role of Introns?  Not Junk, but rather Genetic Control Elements          [figure*]
Micro RNAs - derived from introns? - occur in plants, animals, & fungi
                         a) help control timing of developmental processes as cell proliferation,
                                    apoptosis, and stem cell maintenance
                         b) help tag chromatin with methyl and acetyl groups
                         c) may help in alternative splicing mechanisms              

   COMPLEXITY:    to build a complex structure one must have bricks & mortar,
                      as well as an architectural plan.
DNA, therefore should contain both - the materials and the plan:
                      a) component molecules - proteins, carbs, lipids, and nucleic acids:
                                     all known living organism use the same bricks and mortar
                      b) the difference between Man & Monkey is the architectural plan

            Where is the Architectural Information?   we've always assumed in the regulatory proteins
                      Maybe it's in the non-coding mirco-RNAs (intronic elements)

            Thus the greater proportion of the genome of complex organisms, the introns, isn't junk,
                      but rather, it is functional RNA that regulates time dependent complexity?

                           A Gene Sweep betting pool winner with closest bet (25,947) was Lee Rowen.


Visualizing Restriction Fragments...    to a probe of a gene (cDNA)
Southern Blotting
g   fig*       Sumanas animation - DNA electrophoresis & blotting*view@home
                    one can detect specific gene sequence by binding to labeled probes to DNA fragments

       d.  'Human Microbiome Project'... sequencing human microbes to correlate with human health  


---> RFLP markers to disease* 

& Newer VECTORS -    [ animation of translation*view@home]

                                [ Virus infection - phage pics - Replication  role of tubulins ] 

                      DNA replication animation quiz*view@home  

   Review of the Events:
          1.  DNA pol III binds at the origin of replication site in the template strand
          2.  DNA is unwound by replisome complex using helicase & topoisomerase
          3.  all polymerases require a preexisting DNA strand (PRIMER) to start replication,
                  thus Primase adds a single short primer to the LEADING strand
                  and also adds primers to the LAGGING strand
          4. DNA pol III is a dimer adding new nucleotides to both strands primers
                  direction of reading is 3' ---> 5' on template
                  direction of synthesis of new strand is 5" ---> 3'
                  rate of synthesis is substantial 400 nucleotide/sec
          5. DNA pol I removes primer at 5' end replacing with DNA bases, leaves 3' hole
          6. DNA ligase seals 3' holes of Okazaki fragments on lagging strand -
Review of the sequence of  
events in detail- lagging strand   and 

and       BioFlix anim. of Replication*view@home                       
         DNA Repair mechanisms - 2015 Nobel Prize in Chemistry*

         Linear DNA replication, Telomeres, and Aging

          Sumanas advanced animation

            - de novo gene origins         

                              current events [Ebola poster]                        

                             DNA @ 60 Poster & sequencing costs  

   rmaceuticals...   Recombinant bacteria*Humulin pen  &  protropin (an ethical dilemma)

                                   How the Maize genome was sequenced by NSF's Plant Genome Inititive

Forensic Analyses     

  e.  Genetically Modified Organisms   [1st by FDA =  animationAtlantic Salmon*picture)
2.  control of the Human genome...   How is gene expression regulated???



  SKIP this one page 
Micro RNAs (& siRNA):  function in gene silencing & gene regulation  
   are single-stranded RNA molecules of 20-24 nucleotides in length, that are not translated
   into protein (non-coding RNA) and form short
stem-loop structures
that are partially
complimentary to mRNAs and that can down-regulate or possibly activate gene expression.

   some 400 miRNAs are known in the human genome & function as small interfering RNAs
   (siRNA) or miRISC's by hybridizing to mRNA, forming a dsRNA duplex  & block translation

         found in MODEL eukaryotic organisms as: roundworms, fruit flies, mice, humans, &  plants (arabidopsis);
miRNA -->    1) BARR Body*     &     2)  CCD     &      3) heart disease
                                             2) TMK-ebola siRNA Drug
   FDA approved Sep 22, 2014

     The ability of transfected synthetic small interfering miRNAs to suppress the expression of
      specific transcripts is a useful technique to probe gene function in mammalian cells.

      Such non-protein coding RNA transcripts likely regulate much of the genome via - miRNA's.

        new roles* for siRNA, miRNA,  and Crispr-gene editing...


   Sequencing Genes   &

procedures*   =