Bioinformatics for biomedicine More annotation, Gene Ontology and pathways Lecture 6, 2006-10-24 Per Kraulis

Slides:



Advertisements
Similar presentations
Genomes and Proteomes genome: complete set of genetic information in organism gene sequence contains recipe for making proteins (genotype) proteome: complete.
Advertisements

Serial Analysis of Gene Expression Velculescu, V., Zhang, L., Vogelstein, B. Kinzler, K. (1995) Science.
Proteomics Examination Yvonne (Bonnie) Eyler Technology Center 1600 Art Unit 1646 (703)
Microarray Data Analysis Day 2
Gene Ontology John Pinney
Bioinformatics for biomedicine Seminar: Sequence analysis of a favourite gene Lecture 5, Per Kraulis
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
Bioinformatics for biomedicine Gene expression data, methods of analysis Lecture 7, Per Kraulis
Gene Expression Chapter 9.
Microarray analysis Golan Yona ( original version by David Lin )
COG and GO tutorial.
Biology 224 Dr. Tom Peavy Sept 27 & 29 Protein Structure & Analysis- part 2.
Bacterial Physiology (Micr430)
Proteomics: A Challenge for Technology and Information Science CBCB Seminar, November 21, 2005 Tim Griffin Dept. Biochemistry, Molecular Biology and Biophysics.
Sequence-Structure-Function Sequence Structure Function Threading Ab initio BLAST Folding: impossible but for the smallest structures Function prediction.
CISC667, F05, Lec24, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) DNA Microarray, 2d gel, MSMS, yeast 2-hybrid.
The bioinformatics of biological processes The challenge of temporal data Per J. Kraulis CMCM, Tartu University.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Microarrays: Theory and Application By Rich Jenkins MS Student of Zoo4670/5670 Year 2004.
Protein and Function Databases
Genomics I: The Transcriptome RNA Expression Analysis Determining genomewide RNA expression levels.
Applications of protomic Presented By: Muhammad Rizwan Roll no: Department of Bioinformatics.
Gene Ontology and Functional Enrichment Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
DEMO CSE fall. What is GeneMANIA GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional.
and analysis of gene transcription
with an emphasis on DNA microarrays
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Bioinformatics for biomedicine Protein domains and 3D structure Lecture 4, Per Kraulis
MN-B-C 2 Analysis of High Dimensional (-omics) Data Kay Hofmann – Protein Evolution Group Week 5: Proteomics.
Ch10. Intermolecular Interactions and Biological Pathways
Automatic methods for functional annotation of sequences Petri Törönen.
Mapping protein-DNA interactions by ChIP-seq Zsolt Szilagyi Institute of Biomedicine.
Intralab Workshop - Reactome CMAP Chang-Feng Quo June 29 th, 2006.
Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.
Finish up array applications Move on to proteomics Protein microarrays.
Proteomics and annotation. Definition of proteomics Study of all the proteins in an organism Derived from genomics all the DNA in an organsim On some.
1 Bio-Trac 40 (Protein Bioinformatics) October 8, 2009 Zhang-Zhi Hu, M.D. Associate Professor Department of Oncology Department of Biochemistry and Molecular.
Verna Vu & Timothy Abreo
Bioinformatics: Theory and Practice – Striking a Balance (a plea for teaching, as well as doing, Bioinformatics) Practice (Molecular Biology) Theory: Central.
BIOINFORMATIK I UEBUNG 2 mRNA processing.
Monday, November 8, 2:30:07 PM  Ontology is the philosophical study of the nature of being, existence or reality as such, as well as the basic categories.
Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
Protein and RNA Families
Getting Started: a user’s guide to the GO GO Workshop 3-6 August 2010.
Systems Biology through Pathway Statistics Chris Evelo BiGCaT Bioinformatics Group – BMT-TU/e & UM Diepenbeek; May
Other biological databases and ontologies. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and.
Genomics II: The Proteome Using high-throughput methods to identify proteins and to understand their function.
Motif discovery and Protein Databases Tutorial 5.
Proteomics Session 1 Introduction. Some basic concepts in biology and biochemistry.
Epidemiology 217 Molecular and Genetic Epidemiology Bioinformatics & Proteomics John Witte.
Central dogma: the story of life RNA DNA Protein.
Bioinformatics and Computational Biology
Microarray (Gene Expression) DNA microarrays is a technology that can be used to measure changes in expression levels or to detect SNiPs Microarrays differ.
Overview of Microarray. 2/71 Gene Expression Gene expression Production of mRNA is very much a reflection of the activity level of gene In the past, looking.
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Proteome and Gene Expression Analysis Chapter 15 & 16.
ESTs Ian Keller Laboratory Techniques in Molecular Bio.
DNA Microarray Overview and Application. Table of Contents Section One : Introduction Section Two : Microarray Technique Section Three : Types of DNA.
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
Protein databases Petri Törönen Shamelessly copied from material done by Eija Korpelainen and from CSC bio-opas
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
Microarray: An Introduction
Functional organization of the yeast proteome by systematic analysis of protein complexes Presented by Nathalie Kirshman and Xinyi Ma.
Basics of Comparative Genomics
“Proteomics is a science that focuses on the study of proteins: their roles, their structures, their localization, their interactions, and other factors.”
Annotation: linking literature to gene products
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

Bioinformatics for biomedicine More annotation, Gene Ontology and pathways Lecture 6, Per Kraulis

Course design 1.What is bioinformatics? Basic databases and tools 2.Sequence searches: BLAST, FASTA 3.Multiple alignments, phylogenetic trees 4.Protein domains and 3D structure 5.Seminar: Sequence analysis of a favourite gene 6.More annotation, Gene Ontology and pathways 7.Gene expression data, methods of analysis 8.Seminar: Further analysis of a favourite gene Note: 6 and 7 swapped!

Task from previous lecture Unknown sequences Which analysis? How interpret results? Medical relevance?

Unk4 Danio rerio (zebrafish) sequence BLAST –Ensembl (species specific): wee1 –UniProt: Q1LYE1, Q6DRI0 (WEE1 homolog) Pfam: Protein kinase Cell cycle regulation –Negative regulator of G2 to M transition –Orthologues in human, and many other species –S. pombe: “wee”=small; mutant goes fast into M –Against cancer: activation wanted (difficult)

Unk5 Human, fragment BLAST –Ensembl (species specific) –UniProt: SRBP1_HUMAN Pfam: basic helix-loop-helix Transcription factor –Regulation of lipid synthesis, colesterol –Orthologues in many species –Very interesting pathway –Not itself a good drug target; maybe indirectly through protease regulation

Annotation, so far Similarity, homology, orthology,… Domains, families –Structure, function Individual annotation (in a certain sense) –Each protein considered on its own –Functional relationships: only limited (indirect) information

Functional relationships Interactions –Genetic: indirect or direct –Physical: direct Regulation –Structural: Cell or tissue type –Temporal: Developmental stage –Co-regulation patterns Processes –Involvement, role

How to obain functional data? Traditional single-gene approaches –Works: good data, cross-checked –But: does not scale easily Genome-wide screens are now possible –Allows novel types of investigations –Approximate, rarely cross-checked

Genome-wide screening 1 Genomes specify all genes –List of all genes is available, in principle Technologies for screening –Design a specific setup to identify all genes that are active This allows genome-wide screening

Genome-wide screening 2 Different from one-gene investigations –All genes investigated at once; overview –“Fishing expedition” Produces novel annotation –Genes X,Y,Z were “active” in condition W –Co-annotates proteins of unrelated sequence –Previously un-annotated genes highlighted

Screens give functional data Annotation from screens is functional Sequence only indirectly involved Experimental setup is crucial –Materials –Conditions –Detection scheme, measurement Requires new types of data analysis

Function (or, biology) of a gene = Activity of the gene product Functional properties –Chemistry –Interaction partners Biological process(es) –Location: tissues, cells –Temporal: development, conditions

Gene activity depends on… 1 Activity requires gene product –Protein (occasionally RNA) Expression (mRNA synthesis) is a –necessary, –but not sufficient condition mRNA abundance: Sum of –Synthesis (expression) –Degradation

Gene activity depends on… 2 Activity requires a functional state –mRNA synthesized, spliced, located,… –Protein synthesized, modified, located, folded,… Activity requires context –Interaction partners –Conditions –Signals

Gene expression 1 Expression = mRNA level –Definition, or approximation? First genome-wide screen approach mRNA is technically easy to –Harvest –Identify –Quantify

Gene expression 2 Note: mRNA levels not really of intrinsic biological relevance! –mRNA is just a messenger, after all Focus would be on proteins, if and when those as easy to handle

Gene expression is a proxy Reasonable first approximation –Expression required for activity –Protein vs. mRNA levels: correlated –Evolutionary optimization: why synthesize mRNA if not needed? Several confounding factors –Protein synthesis rate –mRNA degradation rate –Protein modification, folding, transport,…

Gene expression technologies 1 EST –Expressed Sequence Tag –Short ( bp) sequence of DNA –DNA derived from mRNA samples –Sequence analysis required for identification –Coarse quantification of mRNA levels –Large databases exist SAGE –Serial Analysis of Gene Expression –Very short (10-14 bp) sequences –Some data, not as popular as ESTs

Gene expression technologies 2 Microarrays –Predefined sets of sequences –Many spots on a slide/chip, each of which contains a specific sequence cDNA: spotted glass slides (Brown, Stanford) Oligonucleotide: microchips (Affymetrix) –Usually relative quantification (up, down) –Many different applications Other technologies exist –New inventions quite possible

Gene expression data Analysis is easy, but difficult… –Relative differences between samples Statistically significant? Comparison depends on technology Compare between genes, or just gene to itself? –Experimental design Crucial for proper analysis; statistics Databases combine data sets for analysis –Many issues; watch out

Gene Ontology 1 Keywords in annotation –Labels for properties, function, etc Problem: different sets of keywords –Comparison difficult, or impossible 1998 GO Consortium –Project to specify a consistent set of keywords for gene annotation

Gene Ontology 2 Ontology: Existence and relationships Controlled vocabulary –Artificial Intelligence –Knowledge Representation Example: MeSH (Medical Subjects Headings) at NLM

Gene Ontology 3 Hierarchy of keywords (=terms) –Specific terms are instances of general terms –Something may be part of something else Why? –Natural description in biology –To reflect level of knowledge Example: DNA metabolism –DNA integration –DNA ligation

Gene Ontology 4 Three separate hierarchies Molecular function –Chemical activity Cellular component –Location: nucleus, cytoplasm,… Biological process –Role in cellular events

Gene Ontology: applied For each gene, give the GO terms for which there is evidence: mapping As specific as evidence allows Evidence codes: source of statement Mapping produced by genome projects, not GO consortium Available in Ensembl, UniProt, etc

GO limitations Does not describe process, function or cellular component –Little spatial info –No temporal info Lacks many specific details Strictly limited: Is about terms (keywords) Extensions have been discussed –Should be done in independent way...

Proteomics 1 Protein instead of mRNA Intrinsically biologically more interesting Technically much more difficult –Chemical properties Different: no single protocol applicable Same: hard to separate –How to identify? Sequencing not as simple –Quantification difficult

Proteomics: 2D gels Gel to separate proteins 2D: isoelectric point vs. mass Each spot detected, identified –Sequencing –Mass spectrometry Issues: many –Sensitivity –Comparison SwissProt

Proteomics: in situ Detect protein in tissue sample –Issue: Tissue sample treatment? Antibodies against peptides –Issue: Which peptide? Immunostaining to visualize Protein Atlas

Protein interactions Use bait to fish out all interaction partners –Two-hybrid method –Coimmunoprecipitation –Colocalization, GFP –Other techniques… IntAct at EBI

Pathways Metabolic –Conversion of “small” molecules –Energy, synthesis, degradation –KEGG –BioCyc Signalling, regulation –Reactome Work in progress…

Task Locate gene/protein in GO, Reactome, etc Wee1 SREBP1 Your own