Bioinformatics Tutorial
Exploring BLAST with Dino-DNA
"Jurassic Park" Dino-DNA Analysis
In 1990, Micheal Crichton published the book "Jurassic Park" about the resurrection of dinosaurs using the blood from the stomachs of insects which had been encased in tree sap, later turned into the mineral, amber.
At one point in the book, Dr. Henry Wu is asked to explain some of DNA techniques used in reconstructing the extinct dinosaur genomes. Dr. Wu describes the use of restriction enzymes and how the fragmented pieces of dino DNA can be spliced together with these enzymes. He also alludes to the fact that they don't have the entire genome but that they "fill in the gaps" with modern day frog DNA.
At one point during his discussion he points to a computer screen and remarks "Here you see the actual structure of a small fragment of dinosaur DNA."
>JurassicPark DinoDNA from the book Jurassic Park gcgttgctgg cgtttttcca taggctccgc ccccctgacg agcatcacaa aaatcgacgc ggtggcgaaa cccgacagga ctataaagat accaggcgtt tccccctgga agctccctcg tgttccgacc ctgccgctta ccggatacct gtccgccttt ctcccttcgg gaagcgtggc tgctcacgct gtaggtatct cagttcggtg taggtcgttc gctccaagct gggctgtgtg ccgttcagcc cgaccgctgc gccttatccg gtaactatcg tcttgagtcc aacccggtaa agtaggacag gtgccggcag cgctctgggt cattttcggc gaggaccgct ttcgctggag atcggcctgt cgcttgcggt attcggaatc ttgcacgccc tcgctcaagc cttcgtcact ccaaacgttt cggcgagaag caggccatta tcgccggcat ggcggccgac gcgctgggct ggcgttcgcg acgcgaggct ggatggcctt ccccattatg attcttctcg cttccggcgg cccgcgttgc aggccatgct gtccaggcag gtagatgacg accatcaggg acagcttcaa cggctcttac cagcctaact tcgatcactg gaccgctgat cgtcacggcg atttatgccg caagtcagag gtggcgaaac ccgacaagga ctataaagat accaggcgtt tcccctggaa gcgctctcct gttccgaccc tgccgcttac cggatacctg tccgcctttc tcccttcggg ctttctcatt gctcacgctg taggtatctc agttcggtgt aggtcgttcg ctccaagctg acgaaccccc cgttcagccc gaccgctgcg ccttatccgg taactatcgt cttgagtcca acacgactta acgggttggc atggattgta ggcgccgccc tataccttgt ctgcctcccc gcggtgcatg gagccgggcc acctcgacct gaatggaagc cggcggcacc tcgctaacgg ccaagaattg gagccaatca attcttgcgg agaactgtga atgcgcaaac caacccttgg ccatcgcgtc cgccatctcc agcagccgca cgcggcgcat ctcgggcagc gttgggtcct gcgcatgatc gtgctagcct gtcgttgagg acccggctag gctggcgggg ttgccttact atgaatcacc gatacgcgag cgaacgtgaa gcgactgctg ctgcaaaacg tctgcgacct atgaatggtc ttcggtttcc gtgtttcgta aagtctggaa acgcggaagt cagcgccctgIn 1992, a young scientist, Dr. Mark Boguski, at the National Center for Biotechnology Information, NCBI, having read the book "Jurrasic Park"entered this sequence into a text editor and searched all of the known DNA sequences at the time. This collection of sequences make up a database refered to as GenBank. Mark wrote up his findings and submitted a manuscript to the journal BioTechniques, as a tongue-in-cheek joke. His manuscript was accepted and published. (Boguski, M.S. A Molecular Biologist Visits Jurassic Park. (1992) BioTechniques 12(5):668-669).
Any person reading "Jurrasic Park" who had access to a web browser could have performed this pure bioinformatics experiment and submitted an article for publication. You will reproduce Mark's experiment by using select, copy and paste to send this sequence for comparison against the GeneBank database just as Mark did in 1992.
Exercise 1
- Select, copy, and paste the sequence shown above into the web form located named: Nucleotide-Nucleotide BLAST Search. The search will be opened in a new window so that you can refer back to these instructions.
- Familiarize yourself with the results by following the various links to the alignments and GenBank database entries. Follow the links are far as you can or as far as seems resonable.
- Repeat the search using only portions of the sequence above. After pasting, enter some random A,C,G,T into the sequence, mess it up a bit. How does this affect the results?
- From your results do you really beleive what Dr. Wu is claiming? Did Micheal Crichton just type random A, C, G, T letters or do you think he knew about GenBank and borrowed some DNA?
"The Lost World" Dino-DNA Analysis
Mark's published article was brought to Micheal Crichton's attention. In his second book, "The Lost World", Mr. Crichton used Mark as a consultant. Mark chose a DNA sequence from a living organism which is much more closely related to the dinosaurs.
Mark also mixed in some frog, Xenopus, DNA just like Dr. Wu described to fill in the holes in their dino-genomes. However, Mark played a little trick on Mr. Crichton by embeding a message in the protein translation of the DNA sequence which he submitted for use in the book.
Here is the sequence Mark gave Micheal Crichton for the book "The Lost World":
>LostWorld DinoDNA from the book The Lost World gaattccgga agcgagcaag agataagtcc tggcatcaga tacagttgga gataaggacg gacgtgtggc agctcccgca gaggattcac tggaagtgca ttacctatcc catgggagcc atggagttcg tggcgctggg ggggccggat gcgggctccc ccactccgtt ccctgatgaa gccggagcct tcctggggct gggggggggc gagaggacgg aggcgggggg gctgctggcc tcctaccccc cctcaggccg cgtgtccctg gtgccgtggg cagacacggg tactttgggg accccccagt gggtgccgcc cgccacccaa atggagcccc cccactacct ggagctgctg caaccccccc ggggcagccc cccccatccc tcctccgggc ccctactgcc actcagcagc gggcccccac cctgcgaggc ccgtgagtgc gtcatggcca ggaagaactg cggagcgacg gcaacgccgc tgtggcgccg ggacggcacc gggcattacc tgtgcaactg ggcctcagcc tgcgggctct accaccgcct caacggccag aaccgcccgc tcatccgccc caaaaagcgc ctgcgggtga gtaagcgcgc aggcacagtg tgcagccacg agcgtgaaaa ctgccagaca tccaccacca ctctgtggcg tcgcagcccc atgggggacc ccgtctgcaa caacattcac gcctgcggcc tctactacaa actgcaccaa gtgaaccgcc ccctcacgat gcgcaaagac ggaatccaaa cccgaaaccg caaagtttcc tccaagggta aaaagcggcg ccccccgggg gggggaaacc cctccgccac cgcgggaggg ggcgctccta tggggggagg gggggacccc tctatgcccc ccccgccgcc ccccccggcc gccgcccccc ctcaaagcga cgctctgtac gctctcggcc ccgtggtcct ttcgggccat tttctgccct ttggaaactc cggagggttt tttggggggg gggcgggggg ttacacggcc cccccggggc tgagcccgca gatttaaata ataactctga cgtgggcaag tgggccttgc tgagaagaca gtgtaacata ataatttgca cctcggcaat tgcagagggt cgatctccac tttggacaca acagggctac tcggtaggac cagataagca ctttgctccc tggactgaaa aagaaaggat ttatctgttt gcttcttgct gacaaatccc tgtgaaaggt aaaagtcgga cacagcaatc gattatttct cgcctgtgtg aaattactgt gaatattgta aatatatata tatatatata tatatctgta tagaacagcc tcggaggcgg catggaccca gcgtagatca tgctggattt gtactgccgg aattcExercise 2
- Select, copy, and paste the "Lost World" sequence shown above into the web form named: Nucleotide-Nucleotide BLAST Search.
- In your results, follow the link to the Genbank entry, this link is along the left. On that page find the ORGANISM keyword and click on the species link. This will bring up the species catagory of this organism. Do any of the terms seem to imply a relationship to the dinosaurs? Click there to look at other decendents to the dinosaurs. What organism did Mark choose as being his living dinosaur? What is it's common name?
- Select, copy, and paste the "Lost World" sequence again into the web form: Translating BLAST Search. This type of search 'translates' the DNA sequence to six protein sequences and searches the protein database. This search takes longer but is much informative about the relationship between the probe DNA sequence and the hits in the database. Proteins use 20 letters instead of 4, this made it easier for Mark to create a hidden message. When the analysis is finished look at the best pairwise alignment by clicking on the score value in the right hand column or scroll down past the hit list to the first alignment -- Can you find Mark's hidden message?
Dr. Mark Boguski is currently Senior Vice President, Research and Development, at Rosetta Inpharmatics in Kirkland, WA.