python (sequence analysis)

In 1990, Michael Crichton published the book Jurassic Park about the resurrection of dinosaurs

using the blood from the stomachs of insects which had been encased in amber. At one point in

the book, Dr. HenryWu is asked to explain some of the DNA techniques used in reconstructing

the extinct dinosaur genomes. Dr. Wu describes the use of restriction enzymes and how the

fragmented pieces of dino DNA can be spliced together with these enzymes. He also alludes

to the fact that they don’t have the entire genome but that they ”fill in the gaps” with modern

day frog DNA. At one point during his discussion he points to a computer screen and remarks

”Here you see the actual structure of a small fragment of dinosaur DNA.”

In 1992 Dr. Mark Boguski at NCBI entered this sequence into a text editor and searched all

of the known DNA sequences at the time. Dr. Boguski wrote up his findings and submitted

a manuscript to the journal BioTechniques, as a tongue-in-cheek joke. His manuscript was

accepted and published. (Boguski, M.S. A Molecular Biologist Visits Jurassic Park. (1992)

BioTechniques 12(5):668-669).

You will reproduce this experiment using BLAST. ([url removed, login to view])

Submit the ”dinosaur DNA” sequence you can find in the file [url removed, login to view] to a Nucleotidenucleotide

BLAST (blastn) search. How many of the top ten matches are artificial sequences?

Identify any actual organisms in the top ten.

Mark Boguski’s published article was brought to Crichton’s attention. In his second book,

”The Lost World”, Mr. Crichton used Dr. Boguski as a consultant. Dr. Boguski constructed

an interesting sequence from existing species and also embedded a message in the protein

translation of the DNA sequence which he submitted for use in the book.

Once again, invoke Nucleotide-nucleotide BLAST (blastn) with the second ”dinosaur DNA”

sequence you can find in the file dino2.fasta. Identify all organisms of the top ten matches.

Are either of these organisms related to dinosaurs?

Now use Translated query vs. protein database BLAST (blastx) with the same sequence and

the Swiss-Prot data base. Look at the amino acid sequence of the query sequence aligned to the

best hit. What is the hidden massage Dr. Boguski included in this sequence?

Hand in a well documented exercise, that contains the sequences, sources, output alignments

and scores and the parameters used for the algorithms. One major criterion for the grading of

this exercise is reproducibility.

Hint: Use the blastn and not the megablast option. ”PREDICTED” sequences count as hits.

