1 Introduction to Bioinformatics Fall 2008. 2 Administration  Adi Doron  Nimrod Rubinstein  Dudu Burstein.

Slides:



Advertisements
Similar presentations
Bioinformatics growth curves Medline records Computer power DNA sequences 3-D structures.
Advertisements

Bioinformatics Ayesha M. Khan Spring 2013.
COT 6930 HPC and Bioinformatics Bioinformatics Resources and Databases Xingquan Zhu Dept. of Computer Science and Engineering.
1.
On line (DNA and amino acid) Sequence Information Lecture 7.
HCS806 “Methods in Horticulture and Crop Science” Introduction to methods in Bioinformatics for plant science. David Francis (Coordinator) Ian Holford.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
BIOINFORMATICS Ency Lee.
GENBANK, SWISSPROT AND OTHERS As Problem Sources for CSE 549 Andriy Tovkach Genetics.
Swiss-Prot Protein Database Daniel Amoruso December 2, 2004 BI 420.
Protein databases Morten Nielsen. Background- Nucleotide databases GenBank, National Center for Biotechnology Information.
1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.
Introduction to Bioinformatics Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistant: Shula Shazman Sivan Bercovici Course web site :
Archives and Information Retrieval
Biological databases.
JYC: CSM17 BioinformaticsCSM17 Week1:What is Bioinformatics? A Multidisciplinary Subject incorporating: Biology –the study of living systems Informatics.
Lecture 2.21 Retrieving Information: Using Entrez.
How to use the web for bioinformatics Molecular Technologies February 11, 2005 Ethan Strauss X 1373
Protein Databases EBI – European Bioinformatics Institute
Genome Related Biological Databases. Content DNA Sequence databases Protein databases Gene prediction Accession numbers NCBI website Ensembl website.
The Cell, Central Dogma and Human Genome Project.
Department of Biology Core Courses for Majors Bio 114Organisms Bio 124Ecology and Evolution Bio 214Cell and Molecular Biology Bio 224Genetics and Development.
Biological Databases Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
1 Exercise 1 Bioinformatics Databases. 2 What’s in a database?  Sequences – genes, proteins, etc.  Full genomes  Annotation – information about the.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.
Chapter 2 Sequence databases A list of the databases’ uniform resource locators (URLs) discussed in this section is in Box 2.1.
How to use the web for bioinformatics Ethan Strauss X 1171
Databases מאגרי מידע - חלק ב' אחסון שליפה. What are we looking for in a GOOD database? Large amount of data Numerous entries Well defined fields Non-redundancy.
An Introduction to Bioinformatics Molecular Biology Databases.
From T. MADHAVAN, & K.Chandrasekaran Lecturers in Zoology.. EXIT.
Course Module: Introduction to Bioinformatics – CS 2001 July CS Databases.
On line (DNA and amino acid) Sequence Information
Bioinformatics.
Development of Bioinformatics and its application on Biotechnology
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Introduction to databases Tuomas Hätinen. Topics File Formats Databases -Primary structure: UniProt -Tertiary structure: PDB Database integration system.
NCBI’s Bioinformatics Resources Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries U.F. Genetics Institute January 2015.
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
Biological Databases By : Lim Yun Ping E mail :
Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers.
1 LSM2241 P1 & P2 – Extra Discussion Questions. Features of major databases (PubMed and NCBI Protein Db) 2.
Organizing information in the post-genomic era The rise of bioinformatics.
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
Biological databases Exercises. Discovery of distinct sequence databases using ensembl.
NCBI Literature Databases: PubMed
Introduction to Bioinformatics and Biological databases Nicky Mulder:
Application of Bioinformatics in Genetic Research Instructors: Dr. Henry Baker Dr. Luciano Brocchieri Dr. Michele Tennant Dr. Lei Zhou
Bioinformatics and Computational Biology
Computer Storage of Sequences
1 Discussion Practical 1. Features of major databases (PubMed and NCBI Protein Db) 2.
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
1 Discussion Practical 1. Features of major databases (PubMed and NCBI Protein Db) 2.
GENBANK FILE FORMAT LOCUS –LOCUS NAME Is usually the first letter of the genus and species name, followed by the accession number –SEQUENCE LENGTH Number.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
Investigations of HIV-1 Env Evolution Evolutionary Bioinformatics Education: A BioQUEST Curriculum Consortium Approach Grand Valley State University August.
Introduction to Bioinformatics
Retrieving Information: Using Entrez
Archives and Information Retrieval
생물정보학 Bioinformatics.
What is Bioinformatics?
Mangaldai College, Mangaldai
gene-CENTRIC database
Introduction to Bioinformatics
Searching the NCBI Databases
Introduction to Databases
How to search NCBI.
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

1 Introduction to Bioinformatics Fall 2008

2 Administration  Adi Doron  Nimrod Rubinstein  Dudu Burstein  Reception hours: by appointment Britania 405,

3 Course Website

4 Exercises  Each student participates once in 2 weeks: Sunday 16:00-18:00 Monday 12:00-14:00 Monday 14:00-16:00 Computer classroom Sherman 03

5 Requirements  Exam – 80% of final grade  Assignments – 20% of final grade (Compulsory) Assignments include class and home works: Assignments include class and home works: Class works are planned to be completed during the exercise. They should be mailed to the TA. They will be checked but not graded.Class works are planned to be completed during the exercise. They should be mailed to the TA. They will be checked but not graded. Home works should be handed in the following exercise (2 weeks after the hand out date). They will be checked and graded.Home works should be handed in the following exercise (2 weeks after the hand out date). They will be checked and graded.

6 Goals  To familiarize the students with research topics in bioinformatics, and with bioinformatic tools  The emphasis will be on tools and their use Prerequisites  Familiarity with topics in molecular biology (cell biology and genetics)  Basic familiarity with computers & internet

7 BIOINFORMATIC DATABASES

8 What’s in a database?  Sequences – genes, proteins, etc.  Full genomes  Annotation – information about the gene/protein: - function - cellular location - chromosomal location - introns/exons - protein structure - phenotypes, diseases  Publications

9 NCBI and Entrez  One of the largest and most comprehensive databases belonging to the NIH – national institute of health (USA)  Entrez is the search engine of NCBI  Search for : genes, proteins, genomes, structures, diseases, publications and more. 

10 Search for published papers  Yang X, Kurteva S, Ren X, Lee S, Sodroski J. “Subunit stoichiometry of human immunodeficiency virus type 1 envelope glycoprotein trimers during virus entry into host cells “, J Virol May;80(9):

11 Use fields! Yang[AU] AND glycoprotein[TI] AND 2006[DP] AND J virol[TA] For the full list of field tags: go to help -> Search Field Descriptions and Tags

12 Exercise  Retrieve all publications in which the first author is: Pe'er I and the last author is: Shamir R

13 Using Limits Retrieve the publications of Friedman N, in the journals: Bioinformatics and Journal of Computational Biology, in the last 5 years

14 Google scholar

15

16 NCBI gene & protein databases: GenBank  GenBank is an annotated collection of all publicly available DNA sequences.  Holds 65 billion bases (Oct. 2007)  GenPept is a database of translated coding sequences from GenBank

17 Searching for CD4 human using Entrez Search demonstration

18

19 Using Field Descriptions, Qualifiers, and Boolean Operators  Cd4[GENE] AND human[ORGN] Or Cd4[gene name] AND human[organism]  List of field codes: Boolean Operators: AND OR NOT Boolean Operators: AND OR NOT Note: do not use the field Protein name [PROT], only GENE!

20

21 RefSeq  REFSEQ: sub-collection of NCBI databases with only non-redundant, highly annotated entries (genomic DNA, transcript (RNA), and protein products)

22

23 An explanation on GenBank records

24 Accession Numbers Two letters followed by six digits, e.g.: AY One letter followed by five digits, e.g.: U12345 GenBankEMBL Three letters and five digits, e.g.: AAA12345 GenPept (a.a. translations of GenBank) RefSeq accession numbers can be distinguished from GenBank accessions by their prefix distinct format of [2 characters+underscore], e.g.: NP_ NM_: nucleotide, NP_: protein Refseq All are six characters: Character/Format 1 [O,P,Q] 2 [0-9] 3 [A-Z,0-9] 4 [A-Z,0-9] 5 [A-Z,0-9] 6 [0-9] e.g.:P12345 and Q9JJS7 SWISS-PROT (another protein database) one digit followed by three letters, e.g.: 1hxw PDB (Protein Data Bank – structure database)

25 Swissprot  A protein sequence database which strives to provide a high level of annotation: * the function of a protein * domains structure * post-translational modifications * variants  One entry for each protein

26

27 GenBank Vs. Swiss-Prot GenBank results Swiss-Prot results

28 Downloading & Fasta format  Fasta format > sp|P01730|CD4_HUMAN T-cell surface glycoprotein CD4 precursor MNRGVPFRHLLLVLQLALLPAATQGKKVVLGKKGDTVELTCTASQKKSIQFHWKNSNQIK ILGNQGSFLTKGPSKLNDRADSRRSLWDQGNFPLIIKNLKIEDSDTYICEVEDQKEEVQL LVFGLTANSDTHLLQGQSLTLTLESPPGSSPSVQCRSPRGKNIQGGKTLSVSQLELQDSG TWTCTVLQNQKKVEFKIDIVVLAFQKASSIVYKKEGEQVEFSFPLAFTVEKLTGSGELWW QAERASSSKSWITFDLKNKEVSVKRVTQDPKLQMGKKLPLHLTLPQALPQYAGSGNLTLA LEAKTGKLHQEVNLVVMRATQLQKNLTCEVWGPTSPKLMLSLKLENKEAKVSKREKAVWV LNPEAGMWQCLLSDSGQVLLESNIKVLPTWSTPVQPMALIVLGGVAGLLLFIGLGIFFCV RCRHRRRQAERMSQIKRLLSEKKTCQCPHRFQKTCSPI Save Accession Numbers for future use (makes searching quicker): Refseq: NP_ Swissprot: P01730

29

30 PDB: Protein Data Bank  Main database of 3D structures.  Includes ~47,000 entries (proteins, nucleic acids, others).  Proteins organized in groups, families etc.  Is highly redundant. 

31 CD4 in complex with gp120 gp120 CD4 PDB ID 1G9M

32  Model organisms have independent database: Organism specific HIV database

33 Genecards  All in one database of human genes (a project by Weizmann institute)  Attempts to integrate as many as possible databases, publications and all available knowledge 

34

35 Summary  General and comprehensive databases: NCBI, EMBL, DDBJ NCBI, EMBL, DDBJ  Genome specific databases: ENSEMBL, UCSC genome browser ENSEMBL, UCSC genome browser  Highly annotated databases: Human genes Human genes GenecardsGenecards Proteins: Proteins: Swissprot, RefseqSwissprot, Refseq Structures: Structures: PDBPDB

36 The MOST important of all 1. Google (or any search engine)

37 And always remember: 2. RT(F)M – Read the manual!!

38 Help!  Read the Help section  Read the FAQ section  Google the question!