WWW.NCBI.NLM.NIH.GOV PubMed: Scientific Journals Entrez: Keyword Search of Database BLAST: Sequence Queries OMIM: Online Mendelian Inheritance in Man Books.

Slides:



Advertisements
Similar presentations
Proteins: Structure reflects function….. Fig. 5-UN1 Amino group Carboxyl group carbon.
Advertisements

Protein Synthesis and Gene Expression zDNA: deoxyribonucleic acid--contains sugar deoxyribose. zDNA is double stranded. zDNA contains bases adenine,
Aim: How does a chromosome code for a specific protein ?
Warm Up: (11_5) ATGCGTCGT What is the complementary DNA strand? Based on this complementary strand what would the mRNA strand be?
Sequence Databases NCBI (National Center for Biotechnology Information) NIH (National Institutes of Health)
• Exam II Tuesday 5/10 – Bring a scantron with you!
5’ C 3’ OH (free) 1’ C 5’ PO4 (free) DNA is a linear polymer of nucleotide subunits joined together by phosphodiester bonds - covalent bonds between.
Bioinformatics (Using Computers to Solve Biological Problems) & Biomedical Informatics (Using Computers to Solve Human Health Problems) Michael D. Kane,
Biomedical Informatics Michael D. Kane, Ph.D.. The Cell is a Living Machine.
©CMBI 2006 Amino Acids “ When you understand the amino acids, you understand everything ”
Genomic Technologies CIT581N Michael Kane, Ph.D. Lecture 1: Sequencing Technology and DNA Microarray Technology.
Unit 7 RNA, Protein Synthesis & Gene Expression Chapter 10-2, 10-3
How does DNA work? What is a gene?
Genomics & Biotechnology Michael D. Kane, PhD Asst. Professor, Department of Computer & Information Technology Lead Genomic Scientist, Bindley Bioscience.
Protein Synthesis. DNA RNA Proteins (Transcription) (Translation) DNA (genetic information stored in genes) RNA (working copies of genes) Proteins (functional.
Human Genetic Variation Basic terminology. What is a gene? A gene is a functional and physical unit of heredity passed from parent to offspring. Genes.
CHAPTER 12 PROTEIN SYNTHESIS AND MUTATIONS -RNA -PROTEIN SYNTHESIS -MUTATIONS.
PROTEIN SYNTHESIS NOTES #1. Review What is transcription? Copying of DNA onto mRNA Where does transcription occur? In the Nucleus When copying DNA onto.
Part II: Genetic Code and Translation
1 LSM2241 P1 & P2 – Extra Discussion Questions. Features of major databases (PubMed and NCBI Protein Db) 2.
LESSON 4: Using Bioinformatics to Analyze Protein Sequences PowerPoint slides to accompany Using Bioinformatics : Genetic Research.
Now playing: Frank Sinatra “My Way” A large part of modern biology is understanding large molecules like Proteins A large part of modern biology is understanding.
Learning Targets “I Can...” -State how many nucleotides make up a codon. -Use a codon chart to find the corresponding amino acid.
Fig Second mRNA base First mRNA base (5 end of codon) Third mRNA base (3 end of codon)
Aim: How does the nucleus control the activities of the cell? There are two main functions of the nucleus: 1. Contains the codes  protein 2. Cell division.
Aim: How does DNA direct the production of proteins in the cell?
Macromolecules of Life Proteins and Nucleic Acids
The Purpose of DNA To make PROTEINS! Proteins give us our traits (ex: one protein gives a person blue eyes, another gives brown Central Dogma of Molecular.
Online – animated web site 5Storyboard.htm.
DNA Pretest! Yes, I know I am a little late… Take out a separate sheet of paper Name Date Period DNA Pretest.
Amino Acids ©CMBI 2001 “ When you understand the amino acids, you understand everything ”
1 Discussion Practical 1. Features of major databases (PubMed and NCBI Protein Db) 2.
Body System Project Animal Nutrition Chapter 41 Kristy Blake and Krystal Brostek.
CS273a A Zero-Knowledge Based Introduction to Biology Courtesy of George Asimenos.
(ex: framework for hair, transporting oxygen in the blood)
Chapter Human-Genome-Project-Video--3D- Animation-Introductionwww.dnatube.com/video/2933/The -Human-Genome-Project-Video-
DANDY Deoxyribonucleic Acid ALL CELLS HAVE DNA… Cells are the basic unit of structure and function of all living things. –Prokaryotes (bacteria) –Eukaryotes.
1 Discussion Practical 1. Features of major databases (PubMed and NCBI Protein Db) 2.
Replication, Transcription, Translation PRACTICE.
Prepared By: Syed Khaleelulla Hussaini. Outline Proteins DNA RNA Genetics and evolution The Sequence Matching Problem RNA Sequence Matching Complexity.
Table 1: Essential amino acids profile of a complete protein in comparison to whey protein isolate and rice protein isolate used in this study (Eurofins.
Protein chemistry Lecture Amino acids are the basic structural units of proteins consisting of: - Amino group, (-NH2) - Carboxyl group(-COOH)
Biochemistry Free For All
Protein Folding Notes.
Protein Synthesis: Translation
Alignment Sequence, Structure, Network
Protein Folding.
BIOLOGY 12 Protein Synthesis.
RNA Ribonucleic Acid.
Do now activity #2 Name all the DNA base pairs.
UNIT 3: Genetics-DNA vs. RNA
THE PRIMARY STRUCTURES OF PROTEINS
Warm Up.
Do now activity #6 Give the complementary DNA strand for: A T A
Section 3-4: Translation
Chapter 4: Amino acids By Prof. Sanjay A. Nagdev
20.2 Gene Expression & Protein Synthesis
The 2nd step in Protein Synthesis
How is the genetic code contained in DNA used to make proteins?
Transcription and Translation
Transcription and Translation
Do now activity #6 What is the definition of: RNA?
Translation.
Replication, Transcription, Translation PRACTICE
Do now activity #5 How many strands are there in DNA?
Aim: How does DNA direct the production of proteins in the cell?
Replication, Transcription, Translation PRACTICE
Replication, Transcription, Translation PRACTICE
Fundamentals of Organic Chemistry
Fundamentals of Organic Chemistry
Presentation transcript:

PubMed: Scientific Journals Entrez: Keyword Search of Database BLAST: Sequence Queries OMIM: Online Mendelian Inheritance in Man Books TaxBrowser Structure: 3D Molecular Structures

Sequence Files Since the information relevant to biological processes is contained in the gene or protein sequence, all genetic and protein data are contained in “sequence” files. Importantly, there is a “directionality” that exists in nature that is conserved in the sequence file; Nucleic Acids are always written 5’ to 3’ (describing the 5’ or 3’ free hydroxyl group used in the phosphodiesterase bond). nucleic acids (genes): 5’-AGCTCGTGTAGACCATTC-3’ Amino Acids are always written with the free amino (N-terminus) first and the carboxylic acid (C-terminus) last. amino acids (proteins): amino-IPKERYRGQIESIWA-carboxy

DNA is Double Stranded… Anti-parallel Configuration Top strand is ALWAYS written 5’ to 3’ When DNA is written in file, top strand is represented and bottom strand is assumed. 5’ 3’ 5’ 3’ 5’ AGTCGTGATCTGCTAAATGTCTCGAAGTTCGATGCTAG |||||||||||||||||||||||||||||||||||||| TCAGCACTAGACGATTTACAGAGCTTCAAGATACGATC Courier font is preferred for writing sequence data since letter spacing is independent of character content.

>gi| |emb|X |HSMYOSIE Homo sapiens partial mRNA for myosin-IF CAGGAGAAGCTGACCAGCCGCAAGATGGACAGCCGCTGGGGCGGGCGCAGCGAGTCCATCAATGTGACCC TCAACGTGGAGCAGGCAGCCTACACCCGTGATGCCCTGGCCAAGGGGCTCTATGCCCGCCTCTTCGACTT CCTCGTGGAGGCCATCAACCGTGCTATGCAGAAACCCCAGGAAGAGTACAGCATCGGTGTGCTGGACATT TACGGCTTCGAGATCTTCCAGAAAAATGGCTTCGAGCAGTTTTGCATCAACTTCGTCAATGAGAAGCTGC AGCAAATCTTTATCGAACTTACCCTGAAGGCCGAGCAGGAGGAGTATGTGCAGGAAGGCATCCGCTGGAC TCCAATCCAGTACTTCAACAACAAGGTCGTCTGTGACCTCATCGAAAACAAGCTGAGCCCCCCAGGCATC ATGAGCGTCTTGGACGACGTGTGCGCCACCATGCACGCCACGGGCGGGGGAGCAGACCAGACACTGCTGC AGAAGCTGCAGGCGGCTGTGGGGACCCACGAGCATTTCAACAGCTGGAGCGCCGGCTTCGTCATCCACCA CTACGCTGGCAAGGTCTCCTACGACGTCAGCGGCTTCTGCGAGAGGAACCGAGACGTTCTCTTCTCCGAC CTCATAGAGCTGATGCAGTCCAGTGACCAGGCCTTCCTCCGGATGCTCTTCCCCGAGAAGCTGGATGGAG ACAAGAAGGGGCGCCCCAGCACCGCCGGCTCCAAGATCAAGAAACAAGCCAACGACCTGGTGGCCACACT GATGAGGTGCACACCCCACTACATCCGCTGCATCAAACCCAACGAGACCAAGCACGCCCGAGACTGGGAG GAGAACAGAGTCCAGCACCAGGTGGAATACCTGGGCCTGAAGGAAAACATCAGGGTGCGCAGAGCCGGCT TCGCCTACCGCCGCCAGTTCGCCAAATTCCTGCAGAGGTATGCCATTCTGACCCCCGAGACGTGGCCGCG GTGGCGTGGGGACGAACGCCAGGGCGTCCAGCACCTGCTTCGGGCGGTCAACATGGAGCCCGACCAGTAC CAGATGGGGAGCACCAAGGTCTTTGTCAAGAACCCAGAGTCGCTTTTCCTCCTGGAGGAGGTGCGAGAGC GAAAGTTCGATGGCTTTGCCCGAACCATCCAGAAGGCCTGGCGGCGCCACGTGGCTGTCCGGAAGTACGA GGAGATGCGGGAGGAAGCTTCCAACATCCTGCTGAACAAGAAGGAGCGGAGGCGCAACAGCATCAATCGG AACTTCGTCGGGGACTACCTGGGGCTGGAGGAGCGGCCCGAGCTGCGTCAGTTCCTGGGCAAGAAGGAGC GGGTGGACTTCGCCGATTCGGTCACCAAGTACGACCGCCGCTTCAAGCCCATCAAGCGGGACTTGATCCT GACGCCCAAGTGTGTGTATGTGATTGGGCGAGAGAAGATGAAGAAGGGACCTGAGAAAGGTCCAGTGTGT GAAATCTTGAAGAAGAAATTGGACATCCAGGCTCTGCGGGGGGTCTCCCTCAGCACGCGACAGGACGACT TCTTCATCCTCCAAGAGGATGCCGCCGACAGCTTCCTGGAGAGCGTCTTCAAGACCGAGTTTGTCAGCCT TCTGTGCAAGCGCTTCGAGGAGGCGACGCGGAGGCCCCTGCCCCTCACCTTCAGCGACACACTACAGTTT CGGGTGAAGAAGGAGGGCTGGGGCGGTGGCGGCACCCGCAGCGTCACCTTCTCCCGCGGCTTCGGCGACT TGGCAGTGCTCAAGGTTGGCGGTCGGACCCTCACGGTCAGCGTGGGCGATGGGCTGCCCAAGAACTCCAA GCCTACCGGAAAGGGATTGGCCAAGGGTAAACCTCGGAGGTCGTCCCAAGCCCCTACCCGGGCGGCCCCT GGCGCCCCCCAAGGCATGGATCGAAATGGGGCCCCCCTCTGCCCACAGGGGGGGGCCCCCTGCCCCCTGG AGAAATTCATTTGGCCCAGGGGGCACCCACAGGCCTCCCCGGCCCTCCGTCCACATCCCTGGGATGCCAG CAGACGACCCCGGGCACGTCCGCCCTCAGAGCACAACACAGAATTCCTCAACGTGCCTGACCAGGGGATG GCCGGCATGCAGAGGAAGCGCAGCGTGGGGCAACGGCCAGTGCCTGTGGGCCGACCCAAGCCCCAGCCTC GGACACATGGTCCCAGGTGCCGGGCCCTATACCAGTACGTGGGCCAAGATGTGGACGAGCTGAGCTTCAA CGTGAACGAGGTCATTGAGATCCTCATGGAAGATCCCTCGGGCTGGTGGAAGGGCCGGCTTCACGGCCAG GAGGGCCTTTTCCCAGGAAACTACGTGGAGAAGATCTGAGCTGGGCCCTGGGATACTGCCTTCTCTTTCG CCCGCCTATCTGCCTGCCGGCCTGGTGGGGAGCCAGGCCCTGCCAATGAAAGCCTCGTTTACCTGGGCTG CAATAGCCTAAAAGTCCAATCCTTTGGCCTCCAGTCCTTGCCCAGGCCCTGGGTCACCAGGTCACTGGTG CAGCCCCCGCCCCTGGGCCCTGGTTTTCCTCCAACATCACACCTGCTGCCCATTGTCCAAAACTGTGTGT GTCAAAGGGGACTAACAGCAGAATTTACCTCCCAACTGCCATGTGATTAAGAAATGGGTCTTGAGTCCTG TGCTGTTGGCAAAGTTCCAGGCACAGTTGGGGAGGGGGGGCCGGAATCCGC FASTA File Format

A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. 1) The description line starts with a greater than symbol (">"). 2) The word following the greater than symbol (">") immediately is the "ID" (name) of the sequence, the rest of the line is the description. The "ID" and the description are optional. 3) All lines of text should be shorter than 80 characters. 4) The sequence ends if there is another greater than symbol (">") symbol at the beginning of a line and another sequence begins.

The following example contains two protein sequences (Example1, Example2): >Example1 envelope protein ELRLRYCAPAGFALLKCNDADYDGFKTNCSNVSVVHCTNLMNTTVTTGLLLNGSYSENRT QIWQKHRTSNDSALILLNKHYNLTVTCKRPGNKTVLPVTIMAGLVFHSQKYNLRLRQAWC HFPSNWKGAWKEVKEEIVNLPKERYRGTNDPKRIFFQRQWGDPETANLWFNCHGEFFYCK MDWFLNYLNNLTVDADHNECKNTSGTKSGNKRAPGPCVQRTYVACHIRSVIIWLETISKK TYAPPREGHLECTSTVTGMTVELNYIPKNRTNVTLSPQIESIWAAELDRYKLVEITPIGF APTEVRRYTGGHERQKRVPFVXXXXXXXXXXXXXXXXXXXXXXVQSQHLLAGILQQQKNL LAAVEAQQQMLKLTIWGVK >Example2 synthetic peptide HITREPLKHIPKERYRGTNDTLSPQIESIWAAELDRYKLVKTNCSNVS

Sequences are expected to be represented in the standard IUB/IUPAC amino acid and nucleic acid codes, with these exceptions: 1) Lower-case letters are accepted and are mapped into upper-case 2) A single hyphen or dash can be used to represent a gap of indeterminate length 3) In amino acid (protein) sequences, U and * are acceptable letters. 4) “N” for unknown nucleic acid residue; or “X” for unknown amino acid residue. 5)mRNA is often listed as cDNA, and the U is replaced with T The nucleic acid codes supported are: A  adenosine M  A C (amino) C  cytidine S  G C (strong) G  guanine W  A T (weak) T  thymidine B  G T C U  uridine D  G A T R  G A (purine) H  A C T Y  T C (pyrimidine) V  G C A K  G T (keto) N  A G C T (any) “-”  gap of indeterminate length

For those programs that use amino acid (protein) query sequences (e.g. BLASTP and TBLASTN), the accepted amino acid codes are: A  alanine P  proline B  aspartateQ  glutamine C  cystine R  arginine D  aspartate S  serine E  glutamate T  threonine F  phenylalanine U  selenocysteine G  glycine V  valine H  histidine W  tryptophan I  isoleucine Y  tyrosine K  lysine Z  glutamine L  leucine X  any M  methionine “*”  translation stop N  asparagine “-”  gap of indeterminate length

>gi| |emb|X |HSMYOSIE Homo sapiens partial mRNA for myosin-IF CAGGAGAAGCTGACCAGCCGCAAGATGGACAGCCGCTGGGGCGGGCGCAGCGAGTCCATCAATGTGACCC TCAACGTGGAGCAGGCAGCCTACACCCGTGATGCCCTGGCCAAGGGGCTCTATGCCCGCCTCTTCGACTT CCTCGTGGAGGCCATCAACCGTGCTATGCAGAAACCCCAGGAAGAGTACAGCATCGGTGTGCTGGACATT TACGGCTTCGAGATCTTCCAGAAAAATGGCTTCGAGCAGTTTTGCATCAACTTCGTCAATGAGAAGCTGC AGCAAATCTTTATCGAACTTACCCTGAAGGCCGAGCAGGAGGAGTATGTGCAGGAAGGCATCCGCTGGAC TCCAATCCAGTACTTCAACAACAAGGTCGTCTGTGACCTCATCGAAAACAAGCTGAGCCCCCCAGGCATC ATGAGCGTCTTGGACGACGTGTGCGCCACCATGCACGCCACGGGCGGGGGAGCAGACCAGACACTGCTGC AGAAGCTGCAGGCGGCTGTGGGGACCCACGAGCATTTCAACAGCTGGAGCGCCGGCTTCGTCATCCACCA CTACGCTGGCAAGGTCTCCTACGACGTCAGCGGCTTCTGCGAGAGGAACCGAGACGTTCTCTTCTCCGAC CTCATAGAGCTGATGCAGTCCAGTGACCAGGCCTTCCTCCGGATGCTCTTCCCCGAGAAGCTGGATGGAG ACAAGAAGGGGCGCCCCAGCACCGCCGGCTCCAAGATCAAGAAACAAGCCAACGACCTGGTGGCCACACT GATGAGGTGCACACCCCACTACATCCGCTGCATCAAACCCAACGAGACCAAGCACGCCCGAGACTGGGAG GAGAACAGAGTCCAGCACCAGGTGGAATACCTGGGCCTGAAGGAAAACATCAGGGTGCGCAGAGCCGGCT TCGCCTACCGCCGCCAGTTCGCCAAATTCCTGCAGAGGTATGCCATTCTGACCCCCGAGACGTGGCCGCG GTGGCGTGGGGACGAACGCCAGGGCGTCCAGCACCTGCTTCGGGCGGTCAACATGGAGCCCGACCAGTAC CAGATGGGGAGCACCAAGGTCTTTGTCAAGAACCCAGAGTCGCTTTTCCTCCTGGAGGAGGTGCGAGAGC GAAAGTTCGATGGCTTTGCCCGAACCATCCAGAAGGCCTGGCGGCGCCACGTGGCTGTCCGGAAGTACGA GGAGATGCGGGAGGAAGCTTCCAACATCCTGCTGAACAAGAAGGAGCGGAGGCGCAACAGCATCAATCGG AACTTCGTCGGGGACTACCTGGGGCTGGAGGAGCGGCCCGAGCTGCGTCAGTTCCTGGGCAAGAAGGAGC GGGTGGACTTCGCCGATTCGGTCACCAAGTACGACCGCCGCTTCAAGCCCATCAAGCGGGACTTGATCCT GACGCCCAAGTGTGTGTATGTGATTGGGCGAGAGAAGATGAAGAAGGGACCTGAGAAAGGTCCAGTGTGT GAAATCTTGAAGAAGAAATTGGACATCCAGGCTCTGCGGGGGGTCTCCCTCAGCACGCGACAGGACGACT TCTTCATCCTCCAAGAGGATGCCGCCGACAGCTTCCTGGAGAGCGTCTTCAAGACCGAGTTTGTCAGCCT TCTGTGCAAGCGCTTCGAGGAGGCGACGCGGAGGCCCCTGCCCCTCACCTTCAGCGACACACTACAGTTT CGGGTGAAGAAGGAGGGCTGGGGCGGTGGCGGCACCCGCAGCGTCACCTTCTCCCGCGGCTTCGGCGACT TGGCAGTGCTCAAGGTTGGCGGTCGGACCCTCACGGTCAGCGTGGGCGATGGGCTGCCCAAGAACTCCAA GCCTACCGGAAAGGGATTGGCCAAGGGTAAACCTCGGAGGTCGTCCCAAGCCCCTACCCGGGCGGCCCCT GGCGCCCCCCAAGGCATGGATCGAAATGGGGCCCCCCTCTGCCCACAGGGGGGGGCCCCCTGCCCCCTGG AGAAATTCATTTGGCCCAGGGGGCACCCACAGGCCTCCCCGGCCCTCCGTCCACATCCCTGGGATGCCAG CAGACGACCCCGGGCACGTCCGCCCTCAGAGCACAACACAGAATTCCTCAACGTGCCTGACCAGGGGATG GCCGGCATGCAGAGGAAGCGCAGCGTGGGGCAACGGCCAGTGCCTGTGGGCCGACCCAAGCCCCAGCCTC GGACACATGGTCCCAGGTGCCGGGCCCTATACCAGTACGTGGGCCAAGATGTGGACGAGCTGAGCTTCAA CGTGAACGAGGTCATTGAGATCCTCATGGAAGATCCCTCGGGCTGGTGGAAGGGCCGGCTTCACGGCCAG GAGGGCCTTTTCCCAGGAAACTACGTGGAGAAGATCTGAGCTGGGCCCTGGGATACTGCCTTCTCTTTCG CCCGCCTATCTGCCTGCCGGCCTGGTGGGGAGCCAGGCCCTGCCAATGAAAGCCTCGTTTACCTGGGCTG CAATAGCCTAAAAGTCCAATCCTTTGGCCTCCAGTCCTTGCCCAGGCCCTGGGTCACCAGGTCACTGGTG CAGCCCCCGCCCCTGGGCCCTGGTTTTCCTCCAACATCACACCTGCTGCCCATTGTCCAAAACTGTGTGT GTCAAAGGGGACTAACAGCAGAATTTACCTCCCAACTGCCATGTGATTAAGAAATGGGTCTTGAGTCCTG TGCTGTTGGCAAAGTTCCAGGCACAGTTGGGGAGGGGGGGCCGGAATCCGC FASTA File Format

>gi| |emb|CAA | myosin-IF [Homo sapiens] QEKLTSRKMDSRWGGRSESINVTLNVEQAAYTRDALAKGLYARLFDFLVEAINRAMQKPQEEYSIGVLDI YGFEIFQKNGFEQFCINFVNEKLQQIFIELTLKAEQEEYVQEGIRWTPIQYFNNKVVCDLIENKLSPPGI MSVLDDVCATMHATGGGADQTLLQKLQAAVGTHEHFNSWSAGFVIHHYAGKVSYDVSGFCERNRDVLFSD LIELMQSSDQAFLRMLFPEKLDGDKKGRPSTAGSKIKKQANDLVATLMRCTPHYIRCIKPNETKHARDWE ENRVQHQVEYLGLKENIRVRRAGFAYRRQFAKFLQRYAILTPETWPRWRGDERQGVQHLLRAVNMEPDQY QMGSTKVFVKNPESLFLLEEVRERKFDGFARTIQKAWRRHVAVRKYEEMREEASNILLNKKERRRNSINR NFVGDYLGLEERPELRQFLGKKERVDFADSVTKYDRRFKPIKRDLILTPKCVYVIGREKMKKGPEKGPVC EILKKKLDIQALRGVSLSTRQDDFFILQEDAADSFLESVFKTEFVSLLCKRFEEATRRPLPLTFSDTLQF RVKKEGWGGGGTRSVTFSRGFGDLAVLKVGGRTLTVSVGDGLPKNSKPTGKGLAKGKPRRSSQAPTRAAP GAPQGMDRNGAPLCPQGGAPCPLEKFIWPRGHPQASPALRPHPWDASRRPRARPPSEHNTEFLNVPDQGM AGMQRKRSVGQRPVPVGRPKPQPRTHGPRCRALYQYVGQDVDELSFNVNEVIEILMEDPSGWWKGRLHGQ EGLFPGNYVEKI FASTA File Format TinySeq XML X Homo sapiens Homo sapiens partial mRNA for myosin-IF 2711 CAGGAGAAGCTGACCAGCCGCAAGATGGACAGCCGCTGGGGCGGGCGCAGCGAGTCCATCAATGT……

FASTA File Format…(note: U = T) >gi|1234|my name from genetic code in DNA ATGATTTGTCACGCTGAGCTC-AAAGCT AACGAGTAA >gi|1234|my name translated into protein MICHAEL-KANE* A  alanine P  proline B  aspartateQ  glutamine C  cystine R  arginine D  aspartate S  serine E  glutamate T  threonine F  phenylalanine U  selenocysteine G  glycine V  valine H  histidine W  tryptophan I  isoleucine Y  tyrosine K  lysine Z  glutamine L  leucine X  any M  methionine “*”  translation stop N  asparagine “-”  gap of indeterminate length

Where do we get DNA sequence information? DNA Sequencing Methods -conversion of biological/bioanalytical data into sequence information There are automated, high-throughput sequencing centers that COMPLETELY automate (robotics and information systems) DNA sequencing, preliminary identification and publishing.

A G C T 5’-AAACCAGGCCGATAAGGTACTACACGAAAAAAA-3’ dATP dCTP dTTP dGTP + ddATP 32 ddCTP 32 ddTTP 32 ddGTP 32 TTTGGTCCGGCTATTCCATGATGTGCTTTTTTT TTGGTCCGGCTATTCCATGATGTGCTTTTTTT TGGTCCGGCTATTCCATGATGTGCTTTTTTT GGTCCGGCTATTCCATGATGTGCTTTTTTT GTCCGGCTATTCCATGATGTGCTTTTTTT TCCGGCTATTCCATGATGTGCTTTTTTT CCGGCTATTCCATGATGTGCTTTTTTT CGGCTATTCCATGATGTGCTTTTTTT GGCTATTCCATGATGTGCTTTTTTT GCTATTCCATGATGTGCTTTTTTT CTATTCCATGATGTGCTTTTTTT TATTCCATGATGTGCTTTTTTT ATTCCATGATGTGCTTTTTTT Step 1. Extend complementary sequence using “free” nucleotides with limiting amounts of radioactive “terminating” nucleotides. Step 2. Run product out on a electrophoresis gel. Step 3. Place gel against radiographic film, develop. TTTTTTT AAACCAGGCCGATAAGGTACTACACGAAAAA | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DNA Sequencing (old method)

DNA Sequencing new method)

GGATCCTGCAAGGAGGGATACAAATTACATACATTTGTCAAAACCCACAGCATGTTGACCACCAGGAGGAG ACCCCATGTGACTCCAGGACCCTGGTTGATAACAACGTATCGAGATTCCTCACATGGAACCAGTGCGCTCC TGTGGTGGAGGGTGTACCTGTGTCAGGGCAGGGGGTACGTGGACATTTTCTGCAGTTTTTGATCAATTTT GCAATGAACTAAATCTGTGGTATAAAAATAAAGTCTATTAAAAGAATCCAAGGCTCCCTCTCATCTCACGATA AGATAAAGTCCCCATCCATTTTACTCCTCTCAGCCCTGGAGAAAGGAGAGGCCAGGTCCCACCACCTTCC ACCAGCATGGACCCCCAGTCCAGACCCCACGCCTTTTCTCAGCATCCTCAGACCAGCAGGACTTGCAGCA ATGGGGAATTAGGCACCTGACTTCTCCTTCATCTACCTTTGGCTGGGGGCCTCCAGCCTTGACCTTCGCT CTGAGAGTCTCAGGCAGGTCCAGAGCCAGTTCTCCCATGACGTGATATGTTTCCAGAGCAGGTTCCTGGG TGAGATAAAAGGATTTGGGCTGAACAGGGTGGAGGGAGCATTGGAATGGCACTCAGGGCAAAGGCAGAG GTGTGCGTGGCAGCGCCCTGGCTGTCCCTGCAAAGGGCACGGGCACTGGGCACTAGAGCCGCTCGGGC CCCTAGGACGGTGCTGCCGTTTGAAGCCATGCCCCAGCATCCAGGCAACAGGTGGCTGAGGCTGCTGCA GATCTGGAGGGAGCAGGGTTATGAGCACCTGCACCTGGAGATGCACCAGACCTTCCAGGAGCTGGGGCC CATTTTCAGGTAAAGCCCTCCCTGGCCCTCGCTGGGAACACCCAGATCCCTGCCCCTGCTGCCCAGGAC CCTGCCAGGCACTCAGCACTGCCATTCCCAGCAGGTCCCGGCACTCTGCATCCTTTGGAGGATGGGGAA GGAGTGCAGCACATGCTGGTCTGTGGTGCTGCCAGGGCAGGGGATAGTGCAGAGAAAACCCCAGCTCAC TGCAGAGAGGGCAGGACTCAGAAGCACTAAAGTTGAAAGGTTCCAGGGAGCCAGCAGGAGGGCTTTAGC TGTGAAGCCGCTAATCCAGGAGCAGGGAGGGTGGACAGGAGACACTTTGGATTGGGACTGCAGGGTGG GGCCACGAGGGACATGACCCCGTCCAGCAGGGCCTCCTGCTTGGCCCCACAGGTACAACTTGGGAGGA CCACGCATGGTGTGTGTGATGCTGCCGGAGGATGTGGAGAAGCTGCAACAGGTGGACAGCCTGCATCCC TGCAGGATGATCCTGGAGCCCTGGGTGGCCTACAGACAACATCGTGGGCACAAATGTGGCGTGTTCTTGT TGTAAGCGGCGAGTTGGGAGCTGAGAGCTGGGAGCAGGGTGGGCAGCCTGGGTGTAGGGGGGAGGCG AGAGAGGTAGGACCCAAAAGCACATCTGCCCTGGGCCCCTGTGGTGGGCAGTGAGGGTGAGCACCCGG CCCAGAGGACGGCCATCCTGTGGGGTCGCGTCTGCACTGTGGGTTGGGGAAGCAGGGCGGTGGTGGA GAAATGGGCACGGGCACCTCTGCAGAGAAGACGCAGAGCAATGAGCCCTTCTGTGTAGTGAGAACCCGC TCTGCACCAACCTCGGCGGCTGCTTTCTCTTGCGGTCTGGGGACTGTCCTTCCCATAGGTCAGAAAACTG AGGCCCTGAGAAGGGGACTTCCACTGGCCCAGGTCACAGGCTGAGTGCTGAGCCTGGTGTTCGCCGGG GCCGCAGCCTCCCTCAGGGCGCTCAGGGTCCCTGCAGTCCTGGCAAACCTTCCTGATGGGGACAGTCC GGGGCAGGAGGCAGGTGGGGACGCAGGTGGCTGGTGGTTCCGTTGTTCTCAGAAGCAAGGCACAAGGT GGGGCGGTTGATGGCACTGGGGAGGATGTTTCCTGGCCCGTGGAGAGGGTGGCGCCTGGTCAGGTGG GCAGGGAGAGGCTGATGCTTGGAGTCGGTCACCTGCAGGGATGTTGTCATTAGGACGGGGGAAGGACT GGATGAGGATGTCACAGTGGTGACAGCCCCCACTCCATGGTAGGAAGGGAACGCTATTGGGAATAGTGG GGTTTAGGTAAAAGGGCACCCGTGGGTCGGGGCCTTCACTGAGGCTGGCCTATAGATGACATCTGGGAG AGAGTCAGGACCCAGGAAGGCAGGTCCAGGA