Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 177 Hands-on lab with databases Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises Quiz #1 Summary: Nucleotide and protein.

Similar presentations


Presentation on theme: "CS 177 Hands-on lab with databases Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises Quiz #1 Summary: Nucleotide and protein."— Presentation transcript:

1 CS 177 Hands-on lab with databases Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises

2 Quiz #1 Homework #1 Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises

3 The International Nucleotide Sequence Database Collaboration EBI GenBank DDBJ EMBL EMBL Entrez SRS getentry NIG CIB NCBI NIH Submissions Updates Submissions Updates Submissions Updates Sequin BankIt ftp

4 ATTGACTA Primary vs. Derivative Databases ACGTGC TTGACA CGTGA ATTGACTA TATAGCCG ACGTGC TTGACA CGTGA ATTGACTA TATAGCCG GenBank TATAGCCG AT GA C ATT GA ATT C C GA ATT C C GA ATT C C GA ATT C C Sequencing Centers GA ATT C C GA ATT C C UniGene RefSeq Genome Assembly Labs Curators Algorithms TATAGCCG AGCTCCGATA CCGATGACAA

5 The Entrez Databases

6 The (ever) Expanding Entrez System Nucleotide Protein Structure PubMed PopSet Genome OMIM Taxonomy Books ProbeSet 3D Domains UniSTS SNP CDD Entrez UniGene Journals PubMed Central

7 Genbank Search and retrieval of sequences Entrez is a retrieval system for searching several linked databases. It provides access to: PubMedPubMed; Nucleotide; Protein; Structure; Genome; PopSet; OMIM; Taxonomy and more.NucleotideProteinStructureGenomePopSetOMIMTaxonomy BLAST ® (Basic Local Alignment Search Tool) is a set of similarity search programs designed to explore all of the available sequence databases regardless of whether the query is protein or DNA. Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises

8 BLAST selections Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises

9 GenBank format

10 Fasta format

11 Sequence formats ASN.1 DNAStrider EMBL Fitch GCG GenBank/GB IG/Stanford MSF NBRF Olsen PAUP/NEXUS Pearson/Fasta Phylip PIR/CODATA Plain/Raw Pretty Zuker - FASTA is a popular sequence format - it also is a sequence similarity and homology search tool (similar to BLAST) used by EMBL-EBI NOTE: Convertible in ReadSeq (Web based) http://bimas.dcrt.nih.gov/molbio/readseq/ http://www.hgmp.mrc.ac.uk/embnet.news/vol6_1/ForCon/forcon.html or ForCon (stand-alone application) Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises

12

13 2) Go to Entrez nucleotide. Find all sequences for the following terms: neander Neanderthals Neanderthal neanderthal neanderthal* Homo sapiens neanderthalensis Lab exercises 1) How many sequences are available in GenBank for Neanderthals? Depends on your search strategy … 101166101166 2) Go to Entrez taxonomy. Try to find all sequences for Neanderthals! 6 Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises

14 4) How many nucleotide sequences are available for the house mouse Mus musculus? Try both Entrez nucleotides and Entrez taxonomy. How do you explain the difference? Entrez taxonomy Entrez nucleotides 5) A man is found murdered in Yellowstone National Park. Few hairs of unidentified origin are recovered on the victim’s clothes. The samples arrive in the lab and DNA is isolated and sequenced: CCATGCATATAAGCATGTACATAATATTATATTCTTACATAGGACATATTAACTCAATCTCATAATTCAT Formulate a hypothesis regarding the origin of the recovered hairs and potential links with the killing! Canis lupus (Gray Wolf) 5.403.701 5.458.506 (Mus musculus) 5.393.552 (house mouse) 5.458.527 (Mus musculsus OR house mouse) Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises

15 The Poliovirus Problem VOL 297, 9 August 2002 Cello, J; Paul, A.V. & Wimmer, E.: Chemical Synthesis of Poliovirus cDNA: Generation of Infectious Virus in the Absence of Natural Template - they generated about 7.7 kilobases of single-stranded RNA genome based on the know genetic map - DNA fragments were synthesized from purified oligo- nucleotides (average length 69: bases) - the cDNA was then transcribed into highly infectious RNA Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises

16 The Poliovirus Problem 17 July 2002 Weiss, R.: Mail-Order Molecules Brew a Terrorism Debate - mail-order oligonucleotides can be used to manufacture a deadly virus - because they are so small, most oligos lack a “fingerprint” - call for more control and/or institutional oversight Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises

17 The Poliovirus Problem - search in Genbank for nucleotide sequences of the poliovirus - copy about 100 bp from a sequence of your choice and paste it into the search window of blastn, is the fragment identifiable as poliovirus? - if so, do a blastn search with a 90 bp, 80 bp, 70 bp … fragment - what is the length of the shortest fragment still identifiable as poliovirus? - is this fragment shorter than the average length of 69 bp used to synthesize the poliovirus? - do these oligos have a “fingerprint” (i.e. can ‘typical’ oligos with lengths of 20-50 be assigned to a particular organism)? Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises Are these oligos so small that they lack a “fingerprint” ??

18 Homework assignment lecture #4 Explain in your own words and in simple terms the basics of the BLAST tool! - assignment is due on 6 Oct 2003, 3:30 PM - send your assignment as e-mail attachment to mtmtxw@gwumc.edumtmtxw@gwumc.edu (type your name and the term “homework” in the subject line) - maximum size: 500 words Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises


Download ppt "CS 177 Hands-on lab with databases Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises Quiz #1 Summary: Nucleotide and protein."

Similar presentations


Ads by Google