1 LSM2241 P1 & P2 – Extra Discussion Questions. Features of major databases (PubMed and NCBI Protein Db) 2.

Slides:



Advertisements
Similar presentations
Proteins: Structure reflects function….. Fig. 5-UN1 Amino group Carboxyl group carbon.
Advertisements

Aim: How does a chromosome code for a specific protein ?
On line (DNA and amino acid) Sequence Information Lecture 7.
Archives and Information Retrieval
Lectures on Computational Biology HC Lee Computational Biology Lab Center for Complex Systems & Biophysics National Central University EFSS II National.
How to use the web for bioinformatics Molecular Technologies February 11, 2005 Ethan Strauss X 1373
Computing for Bioinformatics Lecture 8: protein folding.
©CMBI 2006 Amino Acids “ When you understand the amino acids, you understand everything ”
On line (DNA and amino acid) Sequence Information
Unit 7 RNA, Protein Synthesis & Gene Expression Chapter 10-2, 10-3
DNA Replication Vocabulary:  Replication - Synthesis of an identical copy of a DNA strand.
Protein Synthesis. DNA RNA Proteins (Transcription) (Translation) DNA (genetic information stored in genes) RNA (working copies of genes) Proteins (functional.
Human Genetic Variation Basic terminology. What is a gene? A gene is a functional and physical unit of heredity passed from parent to offspring. Genes.
NCBI’s Bioinformatics Resources Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries U.F. Genetics Institute January 2015.
PROTEIN SYNTHESIS NOTES #1. Review What is transcription? Copying of DNA onto mRNA Where does transcription occur? In the Nucleus When copying DNA onto.
LESSON 4: Using Bioinformatics to Analyze Protein Sequences PowerPoint slides to accompany Using Bioinformatics : Genetic Research.
Now playing: Frank Sinatra “My Way” A large part of modern biology is understanding large molecules like Proteins A large part of modern biology is understanding.
WSSP Chapter 8 BLASTX Translated DNA vs Protein searches atttaccgtg ttggattgaa attatcttgc atgagccagc tgatgagtat gatacagttt tccgtattaa taacgaacgg ccggaaatag.
Learning Targets “I Can...” -State how many nucleotides make up a codon. -Use a codon chart to find the corresponding amino acid.
Aim: How does the nucleus control the activities of the cell? There are two main functions of the nucleus: 1. Contains the codes  protein 2. Cell division.
Aim: How does DNA direct the production of proteins in the cell?
Do Now Look at the picture below and answer the following questions.
Place your keyboard aside. Only use the mouse.
PubMed: Scientific Journals Entrez: Keyword Search of Database BLAST: Sequence Queries OMIM: Online Mendelian Inheritance in Man Books.
The Purpose of DNA To make PROTEINS! Proteins give us our traits (ex: one protein gives a person blue eyes, another gives brown Central Dogma of Molecular.
RNA 2 Translation.
Place your keyboard aside. Only use the mouse.
Online – animated web site 5Storyboard.htm.
DNA Pretest! Yes, I know I am a little late… Take out a separate sheet of paper Name Date Period DNA Pretest.
1 Discussion Practical 1. Features of major databases (PubMed and NCBI Protein Db) 2.
Body System Project Animal Nutrition Chapter 41 Kristy Blake and Krystal Brostek.
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
Jeanette Andrade MS,RD,LDN,CDE Kaplan University Unit 7: Protein.
DANDY Deoxyribonucleic Acid ALL CELLS HAVE DNA… Cells are the basic unit of structure and function of all living things. –Prokaryotes (bacteria) –Eukaryotes.
1 Discussion Practical 1. Features of major databases (PubMed and NCBI Protein Db) 2.
GENBANK FILE FORMAT LOCUS –LOCUS NAME Is usually the first letter of the genus and species name, followed by the accession number –SEQUENCE LENGTH Number.
Replication, Transcription, Translation PRACTICE.
Genomics Lecture 3 By Ms. Shumaila Azam. Proteins Proteins: large molecules composed of one or more chains of amino acids, polypeptides. Proteins are.
Table 1: Essential amino acids profile of a complete protein in comparison to whey protein isolate and rice protein isolate used in this study (Eurofins.
Biosynthesis of Amino Acids
Biochemistry Free For All
Amino acids.
Protein Folding Notes.
Protein Synthesis: Translation
Alignment Sequence, Structure, Network
Protein Folding.
Translation Tutorial Place your keyboard aside. Only use the mouse.
BIOLOGY 12 Protein Synthesis.
RNA Ribonucleic Acid.
Do now activity #2 Name all the DNA base pairs.
UNIT 3: Genetics-DNA vs. RNA
THE PRIMARY STRUCTURES OF PROTEINS
Warm Up.
Do now activity #6 Give the complementary DNA strand for: A T A
Section 3-4: Translation
Translation Tutorial Place your keyboard aside. Only use the mouse.
Translation Tutorial Place your keyboard aside. Only use the mouse.
20.2 Gene Expression & Protein Synthesis
How is the genetic code contained in DNA used to make proteins?
Transcription and Translation
Transcription and Translation
Translation Tutorial Place your keyboard aside. Only use the mouse.
Do now activity #6 What is the definition of: RNA?
Translation.
Replication, Transcription, Translation PRACTICE
Do now activity #5 How many strands are there in DNA?
Aim: How does DNA direct the production of proteins in the cell?
Replication, Transcription, Translation PRACTICE
Replication, Transcription, Translation PRACTICE
DNA and Protein Synthesis Notes
Presentation transcript:

1 LSM2241 P1 & P2 – Extra Discussion Questions

Features of major databases (PubMed and NCBI Protein Db) 2

Anatomy of PubMed Db 3

Epub ahead of print and journal impact factor 4 How to get impact factor of any journal: 1)Direct source – web of science database (free for NUS students) 2)In direct source, e.g blogs, sites etc (do Google search)

Anatomy of a PubMed record 5 Extra information compared to slide 3

Demo on downloading articles 6 AccessingOnlineJournalArticles.ppt for details

Anatomy of a Protein Db 7

8 Popular data sources: dbj – DDBJ (DNA Data Bank of Japan database) emb – The European Molecular Biology Laboratory (EMBL) database prf – Protein Research Foundation database sp – SwissProt gb – GenBank pir – Protein Information Resource Version NM_ GI (or Geninfo Identifier) Accession NM_ Accession numbers and GenInfo Identifiers NM_

9 Why do we need accession number and GI for one record? 1) What is the difference between accession and GI? 2) Why do we need these two when both seem to be accession numbers?

10 Why do we need accession numbers and GIs? Q1) Which revision will NCBI show if you were to search by the accession only without the version number? Sequence_v1 NM_ Sequence_v2 NM_ Sequence_v3 NM_ NM_ NM_ NM_ Sequence update Sequence update GI Version

11 Accession numbers -The unique identifier for a sequence record. -An accession number applies to the complete record. - Accession numbers do not change, even if information in the record is changed at the author's request. -Sometimes, however, an original accession number might become secondary to a newer accession number, if the authors make a new submission that combines previous sequences, or if for some reason a new submission supercedes an earlier record.

12 GenInfo Identifiers - GenInfo Identifier: sequence identification number - If a sequence changes in any way, a new GI number will be assigned - A separate GI number is also assigned to each protein translation Within a nucleotide sequence record -A new GI is assigned if the protein translation changes in any way -GI sequence identifiers run parallel to the new accession.version system of sequence identifiers

13 Version - A nucleotide sequence identification number that represents a single, specific sequence in the GenBank database. -If there is any change to the sequence data (even a single base), the version number will be increased, e.g., U → U , but the accession portion will remain stable. -The accession.version system of sequence identifiers runs parallel to the GI number system, i.e., when any change is made to a sequence, it receives a new GI number AND an increase to its version number. -A Sequence Revision History tool ( is available to track the various GI numbers, version numbers, and update dates for sequences that appeared in a specific GenBank record

14 Anatomy of a Protein Db record

15 Fasta Sequence

Fasta Format Text-based format for representing  nucleic acid sequences or peptide sequences (single letter codes). Easy to manipulate and parse sequences to programs. >SEQUENCE_1 MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG LVSVKVSDDFTIAAMRPSYLSYEDLDMTFVENEYKALVAELEKENEERRRLKDPNKPEHK IPQFASRKQLSDAILKEAEEKIKEELKAQGKPEKIWDNIIPGKMNSFIADNSQLDSKLTL MGQFYVMDDKKTVEQVIAEKEKEFGGKIKIVEFICFEVGEGLEKKTEDFAAEVAAQL >SEQUENCE_2 SATVSEINSETDFVAKNDQFIALTKDTTAHIQSNSLQSVEELHSSTINGVKFEEYLKSQI ATIGENLVVRRFATLKAGANGVVNGYIHTNGRVGVVIAAACDSAEVASKSRDLLRQICMH Description line/row Sequence data line(s) Description line/row Sequence data line(s)

Fasta Format (cont.) Begins with a single-line description, followed by lines of sequence data. Description line –Distinguished from the sequence data by a greater-than (">") symbol. –The word following the ">" symbol in the same row is the identifier of the sequence. –There should be no space between the ">" and the first letter of the identifier. –Keep the identifier short and clear ; Some old programs only accept identifiers of only 10 characters. For example: > gi| |Human or >HumanP53 Sequence line(s) –Ensure that the sequence data starts in the row following the description row (be careful of word wrap feature) –The sequence ends if another line starting with a ">" appears; this indicates the start of another sequence. >SEQUENCE_1 MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG LVSVKVSDDFTIAAMRPSYLSYEDLDMTFVENEYKALVAELEKENEERRRLKDPNKPEHK IPQFASRKQLSDAILKEAEEKIKEELKAQGKPEKIWDNIIPGKMNSFIADNSQLDSKLTL MGQFYVMDDKKTVEQVIAEKEKEFGGKIKIVEFICFEVGEGLEKKTEDFAAEVAAQL >SEQUENCE_2 SATVSEINSETDFVAKNDQFIALTKDTTAHIQSNSLQSVEELHSSTINGVKFEEYLKSQI ATIGENLVVRRFATLKAGANGVVNGYIHTNGRVGVVIAAACDSAEVASKSRDLLRQICMH Description line/row Sequence data line(s) Description line/row Sequence data line(s)

Amino acids 18

IUPAC One Letter Amino Acid Code A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Alanine Cysteine Glycine Histidine Isoleucine Leucine Methionine Proline Serine Threonine Valine Glutamic Acid Aspartic Acid Phenylalanine Lysine Asparagine Glutamine Arginine Tryptophan Tyrosine 21 st (Sec) Selenocysteine 22 nd (Pyl) Pyrrolysine GLx ASx Glutamic Acid Aspar(D)ic Acid (F)enylalanine Lysine Asparagi(N)e (Q)lutamine (R)ginine T(W)ptophan T(Y)rosine 21 st (Sec)Selenocysteine 22 nd (Pyl) Pyrr(O)lysine GLx ASx

Note Amino acidThree letter codeSingle letter code Asparagine or aspartic acidAsxB Glutamine or glutamic acid,GLxZ Leucine or Isoleucine,XleJ Unspecified or unknown amino acidXaaX

Advice We highly recommend that you memorize the amino acid codes and their structures (covered in lectures on 3D structures) Memorizing the codes and in particular the structures will be very useful for this module and other modules, especially for research purposes. It is not compulsory that you memorize these for this module.

Features of major database (Gene Db) 23

24 Anatomy of Gene Db

25 Anatomy of a Gene Db record

A section of Gene Db record: Reference Sequences 26 mRNA Accession number Protein Accession number

Questions 27

A) Problem Scenario Mr. Tan Yong Liang, Benjamin just joined Prof. Tan Tin Wee’s lab to do his PhD. He is to continue the project that was done by Dr. Asif M. Khan, who just graduated from Prof. Tan’s lab with PhD. To better understand the project that Dr. Khan did, Prof. Tan asked Benjamin to read all the papers that were published by him. Benjamin being a newbie to bioinformatics, needs your help in finding the papers. Can you help him answer the following questions? 28

A) Questions Q1. Which database(s) should he search? Q2. Help him formulate his search query based on the following available information: 1.Corresponding authors: Vladimir Brusic, Thomas J August, Tan Tin Wee 2.In one of the paper, Dr. Asif M. Khan’s name was incomplete: Asif Khan 3.Prof. August has a paper with Rosati M, which is also co-authored by someone with the same incomplete abbreviation as Dr. Khan Q3. On the results page, you will see two tabs, “All” and “Review”. What is the difference between them? Q4. Is Pubmed comprehensive? 29

B) Questions 30 p53 12 records total cancer 15 records total Both terms: 5 records p53 AND cancer : returns how many records p53 OR cancer : returns how many records p53 NOT cancer : returns how many records ? ? ?

31 C) Questions Q1) When you perform a search for P53 in the protein database, you observe 4 tabs on top, namely All, Bacteria, Refseq and Related Structures. What do you think is the difference between “RefSeq” and “All” tab?

32 D) Questions Q1) Using the skills you have learned and databases that have been introduced to you, can find out where in the p53 protein is the Nuclear Localization Signal located? i.e., what is the sequence range? Q2) Does the entry belong (P04637) to Refseq database? (Hint: analyze the alphanumeric identifiers of the entry)

Summary of items covered today Intro to Practicals – logistics Search strategies exercise and discussion Explored basic bioinformatics resources – exercise and discussion Tips/Tricks to improve productivity “Libproxy1” suffix shortcut WizFolio 33