Presentation is loading. Please wait.

Presentation is loading. Please wait.

BIOINFORMATIK I UEBUNG 2 mRNA processing.

Similar presentations


Presentation on theme: "BIOINFORMATIK I UEBUNG 2 mRNA processing."— Presentation transcript:

1 BIOINFORMATIK I UEBUNG 2 http://icbi.at/bioinf

2 mRNA processing

3 splicing

4 U2 U2AF GUYAGA U1 U4 U6 U5 GU U2 A Spliceosome assembly + ~200 non-snRNP proteins U4 U1 hnRNP SR proteins RNA helicases kinases and phosphatases Cyclophilins U4 U6 U5 U2 U6 U5 YAG A GU U1

5 Different levels of regulation

6 Regulation of transcription

7 Farnham, Nature Rev Genetics, 2009 ChIP procedure AACTAGGTCAAAGGTCA A/B E/F C PPRE PPAR RXR PPRE DNA

8 microRNAs http://www.mirbase.org/

9 Ensembl BioMart

10 UCSC Table Browser

11

12 Notepad++ and regular expressions ^ >. * \r \n begin of line > any symbol 0 or more times carriage return (CR)line feed (LF)

13 Notepad++ and regular expressions character meaning \ escape; used to make specials non-special () group; you can retrieve its contents e.g. with \1 for the first occurrence [] any character inside is considered a match. matches any character * match the previous character 0 or more times + match the previous character 1 or more times {n} match the previous character n times ^ if the first character in the regex, means “beginning of line”; inside [] means “not” $ last character in the regex, means “end of line” \s any space character (space, tab) \t tab (-->) \r carriage return (CR) \n line feed (LF)

14 Notepad++ and regular expressions ^[ACGT].*\r\n replace with ^(.{20}).*\r\n replace with \1\r\n ^>.*\r\n replace with

15 \r\n replace with > replace with \r\n> repeatMasking=none replace with \r\n ^>.*\r\n replace with.*(.{20})$ replace with \1

16 Sequence Logo http://icbi.at/logo

17 KEGG

18 Protein domains Uniprot, Prosite, Interpro, Pfam, CD, SMART

19 Gene Ontology cellular component (e.g. mitochondrium) biological process (e.g. lipid metabolism) molecular function (e.g. hydrolase activity) Each entry in GO has a unique numerical identifier of the form GO:nnnnnnn, and a GO term The Gene Ontology project provides a controlled vocabulary to describe gene and gene product attributes in any organism. ISSInferred from Sequence Similarity IEPInferred from Expression Pattern IMPInferred from Mutant Phenotype IGIInferred from Genetic Interaction IPIInferred from Physical Interaction IDAInferred from Direct Assay RCAInferred from Reviewed Computational Analysis TASTraceable Author Statement NASNon-traceable Author Statement ICInferred by Curator NDNo biological Data available 3 organizing principles Evidence code Directed acyclic graph (DAG) with different levels and 2 relations (part_of, is_a)

20 Orthologs Homologs: A – B – C Orthologs: B1 – C1 Paralogs: C1 – C2 –C3 Inparalogs: C2 – C3 Outparalogs: B2 – C1 Xenologs: A1 – AB1 Protein A

21 Orthologous prediction

22 Ortholog databases YOGY (eukarYotic OrtholoGY) is a web-based resource and integrates 5 independent resources (Sanger) COG Cluster of ortholog groups of proteins and KOG for 7 eukaryotic genomes (NCBI), Inparanoid (Center Stockholm Bioinformatics) HomoloGene (NCBI) OrthoMCL use Markov Clustering algorithm (University of Pennsylvania)

23 Multiple sequence alignment (CLUSTALW) Progressive tree alignment Jalview

24 Exercise 2-1: REGULATORY GENOMICS Pyruvate Carboxylase as example Ensembl Biomart 1.1 For the human transcript NM_000920 (pyruvate carboxylase) find official gene symbol, number of exons, Ensembl transcript ID, Ensembl gene ID, 3'UTR sequence as fasta file, length of 3'UTR microRNA target prediction 1.2 Is there a complementary sequence within the 3'UTR of PC to postion 2-8 in the sequence of microRNA hsa-mir-182. UCSC genome browser 1.3 Position of transcript start site and transcription end of Pyruvate carboxylase (NM_000920) in hg19 assembly

25 Exercise 2-1: REGULATORY GENOMICS Find splicing signals 1.4 Get sequences (+10bp/-10bp) around intron-exon borders and exon-intron borders from pyruvate carboxylase using UCSC table browser and Notepad++ 1.5 Construct in both cases sequence logo and frequency plot. Can you identify (regulatory) sequence motifs? Regulatory motifs (transcription factor binding sites) 1.6 We know from Chromatin immunoprecipitation (ChIP-seq) experiments in a mouse cell line that the transcription factor Pparg is binding near the pyruvate carboxylase gene and hence potentially regulate its transcription (ppar.wig). Show binding region as custom track in UCSC genome browser and extract sequence.

26 Exercise 2-2: PROTEIN FUNCTION Identify function /processes/pathways for a protein 2.1 What is the function of pyruvate carboxylase and in which pathways and processes this enzyme is involved? Show pathway maps and find Enzyme ID (EC) using KEGG Identify functional domains and Gene Ontology Annotation of the protein sequence using Uniprot, Prosite, Pfam Find orthologs and perform multiple sequence alignment 2.2 Find ortholog protein sequences in Mus musculus, Rattus norvegicus, Saccharomyces cervisiae, perform multiple sequence alignment using ClustalW, and visualize with Jalview.


Download ppt "BIOINFORMATIK I UEBUNG 2 mRNA processing."

Similar presentations


Ads by Google