BIOINFORMATIK I UEBUNG 2 mRNA processing.

Slides:



Advertisements
Similar presentations
A Comparative mapping resource ONTOLOGY DEVELOPMENT AND INTEGRATION IN GRAMENE Pankaj Jaiswal Cornell University.
Advertisements

GO : the Gene Ontology “because you know sometimes words have two meanings” Amelia Ireland GO Curator EBI, Cambridge, UK.
Genomic Innovations- Orthology Paralogy. Genomic innovation.
Annotating Gene Products to the GO Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse.
Gene Ontology John Pinney
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
Introduction to Functional Analysis J.L. Mosquera and Alex Sanchez.
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.
Visualization of genomic data Genome browsers. How many have used a genome browser ? UCSC browser ? Ensembl browser ? Others ? survey.
COG and GO tutorial.
Genome analysis and annotation Part II. THE INSTITUTE FOR GENOMIC RESEARCH TIGRTIGR Evidence View S.mansoni PASA assemblies S. japonicum EST alignments.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Tutorial 5 Motif discovery.
Protein analysis and proteomics Friday, 27 January 2006 Introduction to Bioinformatics DA McClellan
Biology 224 Dr. Tom Peavy Sept 27 & 29 Protein Structure & Analysis- part 2.
Sequence-Structure-Function Sequence Structure Function Threading Ab initio BLAST Folding: impossible but for the smallest structures Function prediction.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Today’s menu: -SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
Protein and Function Databases
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Doug Brutlag 2011 Genome Databases Doug Brutlag Professor Emeritus of Biochemistry & Medicine Stanford University School of Medicine Genomics, Bioinformatics.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy) Genome Databases Computational Molecular Biology Biochem 218 – BioMedical Informatics.
Methods for Creating GO Annotations Emily Dimmer European Bioinformatics Institute Wellcome Trust Genome Campus Cambridge UK.
Protein analysis and proteomics (Part 1 of 2). Many of the images in this powerpoint presentation are from Bioinformatics and Functional Genomics by Jonathan.
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Using The Gene Ontology: Gene Product Annotation.
GO : the Gene Ontology “because you know sometimes words have two meanings” Amelia Ireland GO Curator EBI, Cambridge, UK.
Annotating Gene Products to the GO Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse.
Biology 224 Instructor: Tom Peavy Feb 21 & 26, Protein Structure & Analysis.
Ontologies, data standards and controlled vocabularies.
is accessible at: The following pages are a schematic representation of how to navigate through ALE-HSA21.
GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA.
COURSE OF BIOINFORMATICS Exam_31/01/2014 A.
The Gene Ontology project Jane Lomax. Ontology (for our purposes) “an explicit specification of some topic” – Stanford Knowledge Systems Lab Includes:
Gene expression analysis
Lecture Four: GO: The Gene Ontology ----Infrastructure for Systems Biology.
Monday, November 8, 2:30:07 PM  Ontology is the philosophical study of the nature of being, existence or reality as such, as well as the basic categories.
Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
Srr-1 from Streptococcus. i/v nonpolar s serine (polar uncharged) n/s/t polar uncharged s serine (polar uncharged) e glutamic acid (neg. charge) sserine.
Sackler Medical School
Tutorial 7 Gene expression analysis 1. Expression data –GEO –UCSC –ArrayExpress General clustering methods –Unsupervised Clustering Hierarchical clustering.
Protein and RNA Families
Getting Started: a user’s guide to the GO GO Workshop 3-6 August 2010.
Functional Annotation and Functional Enrichment. Annotation Structural Annotation – defining the boundaries of features of interest (coding regions, regulatory.
Other biological databases and ontologies. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and.
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
Motif discovery and Protein Databases Tutorial 5.
Getting Started: a user’s guide to the GO TAMU GO Workshop 17 May 2010.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Bioinformatics and Computational Biology
Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009.
Scope of the Gene Ontology Vocabularies. Compile structured vocabularies describing aspects of molecular biology Describe gene products using vocabulary.
Cool BaRC Web Tools Prat Thiru. BaRC Web Tools We have.
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
Visualization of genomic data Genome browsers. How many have used a genome browser ? UCSC browser ? Ensembl browser ? Others ? survey.
COURSE OF BIOINFORMATICS Exam_30/01/2014 A.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
Gene Annotation & Gene Ontology May 24, Gene lists from RNAseq analysis What do you do with a list of 100s of genes that contain only the following.
Canadian Bioinformatics Workshops
bacteria and eukaryotes
Sequence based searches:
Genome Annotation Continued
Visualization of genomic data
Ensembl Genome Repository.
SIFGD: Setaria italica Functional Genomics Database
Problems from last section
Presentation transcript:

BIOINFORMATIK I UEBUNG 2

mRNA processing

splicing

U2 U2AF GUYAGA U1 U4 U6 U5 GU U2 A Spliceosome assembly + ~200 non-snRNP proteins U4 U1 hnRNP SR proteins RNA helicases kinases and phosphatases Cyclophilins U4 U6 U5 U2 U6 U5 YAG A GU U1

Different levels of regulation

Regulation of transcription

Farnham, Nature Rev Genetics, 2009 ChIP procedure AACTAGGTCAAAGGTCA A/B E/F C PPRE PPAR RXR PPRE DNA

microRNAs

Ensembl BioMart

UCSC Table Browser

Notepad++ and regular expressions ^ >. * \r \n begin of line > any symbol 0 or more times carriage return (CR)line feed (LF)

Notepad++ and regular expressions character meaning \ escape; used to make specials non-special () group; you can retrieve its contents e.g. with \1 for the first occurrence [] any character inside is considered a match. matches any character * match the previous character 0 or more times + match the previous character 1 or more times {n} match the previous character n times ^ if the first character in the regex, means “beginning of line”; inside [] means “not” $ last character in the regex, means “end of line” \s any space character (space, tab) \t tab (-->) \r carriage return (CR) \n line feed (LF)

Notepad++ and regular expressions ^[ACGT].*\r\n replace with ^(.{20}).*\r\n replace with \1\r\n ^>.*\r\n replace with

\r\n replace with > replace with \r\n> repeatMasking=none replace with \r\n ^>.*\r\n replace with.*(.{20})$ replace with \1

Sequence Logo

KEGG

Protein domains Uniprot, Prosite, Interpro, Pfam, CD, SMART

Gene Ontology cellular component (e.g. mitochondrium) biological process (e.g. lipid metabolism) molecular function (e.g. hydrolase activity) Each entry in GO has a unique numerical identifier of the form GO:nnnnnnn, and a GO term The Gene Ontology project provides a controlled vocabulary to describe gene and gene product attributes in any organism. ISSInferred from Sequence Similarity IEPInferred from Expression Pattern IMPInferred from Mutant Phenotype IGIInferred from Genetic Interaction IPIInferred from Physical Interaction IDAInferred from Direct Assay RCAInferred from Reviewed Computational Analysis TASTraceable Author Statement NASNon-traceable Author Statement ICInferred by Curator NDNo biological Data available 3 organizing principles Evidence code Directed acyclic graph (DAG) with different levels and 2 relations (part_of, is_a)

Orthologs Homologs: A – B – C Orthologs: B1 – C1 Paralogs: C1 – C2 –C3 Inparalogs: C2 – C3 Outparalogs: B2 – C1 Xenologs: A1 – AB1 Protein A

Orthologous prediction

Ortholog databases YOGY (eukarYotic OrtholoGY) is a web-based resource and integrates 5 independent resources (Sanger) COG Cluster of ortholog groups of proteins and KOG for 7 eukaryotic genomes (NCBI), Inparanoid (Center Stockholm Bioinformatics) HomoloGene (NCBI) OrthoMCL use Markov Clustering algorithm (University of Pennsylvania)

Multiple sequence alignment (CLUSTALW) Progressive tree alignment Jalview

Exercise 2-1: REGULATORY GENOMICS Pyruvate Carboxylase as example Ensembl Biomart 1.1 For the human transcript NM_ (pyruvate carboxylase) find official gene symbol, number of exons, Ensembl transcript ID, Ensembl gene ID, 3'UTR sequence as fasta file, length of 3'UTR microRNA target prediction 1.2 Is there a complementary sequence within the 3'UTR of PC to postion 2-8 in the sequence of microRNA hsa-mir-182. UCSC genome browser 1.3 Position of transcript start site and transcription end of Pyruvate carboxylase (NM_000920) in hg19 assembly

Exercise 2-1: REGULATORY GENOMICS Find splicing signals 1.4 Get sequences (+10bp/-10bp) around intron-exon borders and exon-intron borders from pyruvate carboxylase using UCSC table browser and Notepad Construct in both cases sequence logo and frequency plot. Can you identify (regulatory) sequence motifs? Regulatory motifs (transcription factor binding sites) 1.6 We know from Chromatin immunoprecipitation (ChIP-seq) experiments in a mouse cell line that the transcription factor Pparg is binding near the pyruvate carboxylase gene and hence potentially regulate its transcription (ppar.wig). Show binding region as custom track in UCSC genome browser and extract sequence.

Exercise 2-2: PROTEIN FUNCTION Identify function /processes/pathways for a protein 2.1 What is the function of pyruvate carboxylase and in which pathways and processes this enzyme is involved? Show pathway maps and find Enzyme ID (EC) using KEGG Identify functional domains and Gene Ontology Annotation of the protein sequence using Uniprot, Prosite, Pfam Find orthologs and perform multiple sequence alignment 2.2 Find ortholog protein sequences in Mus musculus, Rattus norvegicus, Saccharomyces cervisiae, perform multiple sequence alignment using ClustalW, and visualize with Jalview.