Introduction to Bioinformatics Tuesday, 19 March

Slides:



Advertisements
Similar presentations
Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome ECS289A.
Advertisements

Predicting Enhancers in Co-Expressed Genes Harshit Maheshwari Prabhat Pandey.
Presented By Dr. Shazzad Hosain Asst. Prof. EECS, NSU Multiple Sequence Alignment Motif Finding and Gene Prediction.
Table 2 shows that the set TFsf-TGblbs of predicted regulatory links has better results than the other two sets, based on having a significantly higher.
Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.
Bioinformatics Motif Detection Revised 27/10/06. Overview Introduction Multiple Alignments Multiple alignment based on HMM Motif Finding –Motif representation.
Ab initio gene prediction Genome 559, Winter 2011.
Regulatory Motifs. Contents Biology of regulatory motifs Experimental discovery Computational discovery PSSM MEME.
Bioinformatics Finding signals and motifs in DNA and proteins Expectation Maximization Algorithm MEME The Gibbs sampler Lecture 10.
Algorithms in Bioinformatics Morten Nielsen BioSys, DTU.
Intro to Bioinformatics Summary. What did we learn Pairwise alignment – Local and Global Alignments When? How ? Tools : for local blast2seq, for global.
Identification of a Novel cis-Regulatory Element Involved in the Heat Shock Response in Caenorhabditis elegans Using Microarray Gene Expression and Computational.
Introduction to Computational Biology Topics. Molecular Data Definition of data  DNA/RNA  Protein  Expression Basics of programming in Matlab  Vectors.
Transcription factor binding motifs (part I) 10/17/07.
DNA Regulatory Binding Motif Search Dong Xu Computer Science Department 109 Engineering Building West
A Very Basic Gibbs Sampler for Motif Detection Frances Tong July 28, 2004 Southern California Bioinformatics Summer Institute.
Introduction to BioInformatics GCB/CIS535
Tutorial 5 Motif discovery.
Sequence Motifs. Motifs Motifs represent a short common sequence –Regulatory motifs (TF binding sites) –Functional site in proteins (DNA binding motif)
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.
CisGreedy Motif Finder for Cistematic Sarah Aerni Mentors: Ali Mortazavi Barbara Wold.
Protein Modules An Introduction to Bioinformatics.
Multiple sequence alignments and motif discovery Tutorial 5.
Biological Sequence Pattern Analysis Liangjiang (LJ) Wang March 8, 2005 PLPTH 890 Introduction to Genomic Bioinformatics Lecture 16.
Project list 1.Peptide MHC binding predictions using position specific scoring matrices including pseudo counts and sequences weighting clustering (Hobohm)
Exploring Protein Sequences Tutorial 5. Exploring Protein Sequences Multiple alignment –ClustalW Motif discovery –MEME –Jaspar.
CisGreedy Motif Finder for Cistematic Sarah Aerni Mentors: Ali Mortazavi Barbara Wold.
Motif finding : Lecture 2 CS 498 CXZ. Recap Problem 1: Given a motif, finding its instances Problem 2: Finding motif ab initio. –Paradigm: look for over-represented.
Algorithms in Bioinformatics Morten Nielsen Department of Systems Biology, DTU.
Welcome to Advanced Molecular Genetics, Bioinformatics, and Computational Genomics Pattern Recognition and Gene Finding.
Detecting binding sites for transcription factors by correlating sequence data with expression. Erik Aurell Adam Ameur Jakub Orzechowski Westholm in collaboration.
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003.
CSCE555 Bioinformatics Lecture 10 Motif Discovery Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
BBSI Research Simulation News Project proposals - Monday, June 16 - Format (see News, Presentations and other dates) Renaissance fair and other events.
Sequence analysis – an overview A.Krishnamachari
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Using Mixed Length Training Sequences in Transcription Factor Binding Site Detection Tools Nathan Snyder Carnegie Mellon University BioGrid REU 2009 University.
Motif discovery Tutorial 5. Motif discovery MEME Creates motif PSSM de-novo (unknown motif) MAST Searches for a PSSM in a DB TOMTOM Searches for a PSSM.
Comparative genomics analysis of NtcA regulons in cyanobacteria: Regulation of nitrogen assimilation and its coupling to photosynthesis Wen-Ting Huang.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
What is a Project Purpose –Use a method introduced in the course to describe some biological problem How –Construct a data set describing the problem –Define.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Localising regulatory elements using statistical analysis and shortest unique substrings of DNA Nora Pierstorff 1, Rodrigo Nunes de Fonseca 2, Thomas Wiehe.
Finding Patterns Gopalan Vivek Lee Teck Kwong Bernett.
Cis-regulatory Modules and Module Discovery
Basic Overview of Bioinformatics Tools and Biocomputing Applications II Dr Tan Tin Wee Director Bioinformatics Centre.
RBP1 Splicing Regulation in Drosophila Melanogaster Fall 2005 Jacob Joseph, Ahmet Bakan, Amina Abdulla This presentation available at
Local Multiple Sequence Alignment Sequence Motifs
. Finding Motifs in Promoter Regions Libi Hertzberg Or Zuk.
Special Topics in Genomics Motif Analysis. Sequence motif – a pattern of nucleotide or amino acid sequences GTATGTACTTACTATGGGTGGTCAACAAATCTATGTATGA TAACATGTGACTCCTATAACCTCTTTGGGTGGTACATGAA.
Intro to Probabilistic Models PSSMs Computational Genomics, Lecture 6b Partially based on slides by Metsada Pasmanik-Chor.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – the Transcription.
Transcription factor binding motifs (part II) 10/22/07.
Welcome to Advanced Molecular Genetics, Bioinformatics, and Computational Genomics Pattern Recognition and Gene Finding Today is the last class. Would.
BIOBASE Training TRANSFAC ® Containing data on eukaryotic transcription factors, their experimentally-proven binding sites, and regulated genes ExPlain™
bacteria and eukaryotes
Pattern Recognition and Gene Finding
A Very Basic Gibbs Sampler for Motif Detection
Prediction of Regulatory Elements for Non-Model Organisms Rachita Sharma, Patricia.
Recitation 7 2/4/09 PSSMs+Gene finding
Strategies for annotation of a genome
CISC 667 Intro to Bioinformatics (Spring 2007) Review session for Mid-Term CISC667, S07, Lec14, Liao.
Mapping Global Histone Acetylation Patterns to Gene Expression
Presented by, Jeremy Logue.
Volume 128, Issue 6, Pages (March 2007)
Nora Pierstorff Dept. of Genetics University of Cologne
Presented by, Jeremy Logue.
BIOBASE Training TRANSFAC® ExPlain™
Gene regulatory regions of the insect/crustacean egr-B homologs.
Presentation transcript:

Introduction to Bioinformatics Tuesday, 19 March

Are genes encoding proteins with all the universal motifs of cytosine methyltransferases commonly found in phages?

Define motifs (known proteins) Are genes encoding proteins with all the universal motifs of cytosine methyltransferases commonly found in phages? Define motifs (known proteins) Find motif (unknown proteins)

Motifs – not only for proteins! Position-specific scoring matrices (PSSMs)

Motifs – not only for proteins! Nature of Regulatory Sites

Motifs – not only for proteins! Nature of Regulatory Sites Sequence Filter Known sites

Motifs – not only for proteins! Nature of Regulatory Sites Genomic sequence Predicted sites Unknown sites Sequence Filter

Nature of Sequence Filters Hidden Markov model-based methods Ad hoc methods Position-dependent scoring matrix (PSSM) = Position-specific frequency table = Weight table

Some of 106 aligned human promoter sequences (near -26) Making a PSSM CCCTATATAAGGC... histone H1t CGCTATAAAAACT... HMG-17 GGGTATATAAGCG... b'-tubulin b'2 GGCTATATAAAAC... a'-actin skel-m. TTCTATAAAGCGG... a'-cardiac actin CCCTATAAAACCC... b'-actin GAGTATAAAGCAC... keratin I 50K GGTTATAAAAACA... vimentin CAGTATAAAAGGG... a'1(I) collagen CCGTATAAATAGG... a'2(I) collagen TCCCATATAAGCC... fibronectin Some of 106 aligned human promoter sequences (near -26) Consensus TATAAA

Some of 106 aligned human promoter sequences (near -26) Making a PSSM CCCTATATAAGGC... histone H1t CGCTATAAAAACT... HMG-17 GGGTATATAAGCG... b'-tubulin b'2 GGCTATATAAAAC... a'-actin skel-m. TTCTATAAAGCGG... a'-cardiac actin CCCTATAAAACCC... b'-actin GAGTATAAAGCAC... keratin I 50K GGTTATAAAAACA... vimentin CAGTATAAAAGGG... a'1(I) collagen CCGTATAAATAGG... a'2(I) collagen TCCCATATAAGCC... fibronectin Some of 106 aligned human promoter sequences (near -26)

Where to get a training set? Making a PSSM Where to get a training set? Experimentally proven regulatory sites Orthologs of genes in different organisms Not too far (divergence of binding sites) Not too close (hidden amidst overall similarity) Experimentally indicated coregulated genes Suspected coregulated genes

Experimentally proven start sites Using a PSSM atpI ACCTCGAAGGGAGCAGGAGTGAAAAAC bioB ACGTTTTGGAGAAGCCCCATGGCTCAC glnA ATCCAGGAGAGTTAAAGTATGTCCGCT glnH TAGAAAAAAGGAAATGCTATGAAGTCT lacZ TTCACACAGGAAACAGCTATGACCATG rpsJ AATTGGAGCTCTGGTCTCATGCAGAAC serC GCAACGTGGTGAGGGGAAATGGCTCAA sucA GATGCTTAAGGGATCACGATGCAGAAC trpE CAAAATTAGAGAATAACAATGCAAACA Experimentally proven start sites

Experimentally proven start sites Using a PSSM ? Unknown start site aceB ACTATGGAGCATCTGCACATGAAAACC atpI ACCTCGAAGGGAGCAGGAGTGAAAAAC bioB ACGTTTTGGAGAAGCCCCATGGCTCAC glnA ATCCAGGAGAGTTAAAGTATGTCCGCT glnH TAGAAAAAAGGAAATGCTATGAAGTCT lacZ TTCACACAGGAAACAGCTATGACCATG rpsJ AATTGGAGCTCTGGTCTCATGCAGAAC serC GCAACGTGGTGAGGGGAAATGGCTCAA sucA GATGCTTAAGGGATCACGATGCAGAAC trpE CAAAATTAGAGAATAACAATGCAAACA Experimentally proven start sites

Experimentally proven start sites Using a PSSM ? Unknown start site aceB ACTATGGAGCATCTGCACATGAAAACC atpI ACCTCGAAGGGAGCAGGAGTGAAAAAC bioB ACGTTTTGGAGAAGCCCCATGGCTCAC glnA ATCCAGGAGAGTTAAAGTATGTCCGCT glnH TAGAAAAAAGGAAATGCTATGAAGTCT lacZ TTCACACAGGAAACAGCTATGACCATG rpsJ AATTGGAGCTCTGGTCTCATGCAGAAC serC GCAACGTGGTGAGGGGAAATGGCTCAA sucA GATGCTTAAGGGATCACGATGCAGAAC trpE CAAAATTAGAGAATAACAATGCAAACA Experimentally proven start sites

Using a PSSM atpI ACCTCGAAGGGAGCAG.....GAGTGAAAAAC bioB ACGTTTTGGAGAAGC...CCCATGGCTCAC glnA ATCCAGGAGAGTTA.AAGTATGTCCGCT glnH TAGAAAAAAGGAAATG.....CTATGAAGTCT lacZ TTCACACAGGAAACAG....CTATGACCATG rpsJ AATTGGAGCTCTGGTCTCATGCAGAAC serC GCAACGTGGTGAGGG...GAAATGGCTCAA sucA GATGCTTAAGGGATCA....CGATGCAGAAC trpE CAAAATTAGAGAATA...ACAATGCAAACA A C G T

Using a PSSM aceB ACCACATAACTATGGAGCATCTGCACATGAAAACC atpI ACCTCGAAGGGAGCAG.....GAGTGAAAAAC bioB ACGTTTTGGAGAAGC...CCCATGGCTCAC glnA ATCCAGGAGAGTTA.AAGTATGTCCGCT glnH TAGAAAAAAGGAAATG.....CTATGAAGTCT lacZ TTCACACAGGAAACAG....CTATGACCATG rpsJ AATTGGAGCTCTGGTCTCATGCAGAAC serC GCAACGTGGTGAGGG...GAAATGGCTCAA sucA GATGCTTAAGGGATCA....CGATGCAGAAC trpE CAAAATTAGAGAATA...ACAATGCAAACA A C G T

Using a PSSM aceB ACCACATAACTATGGAGCATCT.GCACATGAAAACC atpI ACCTCGAAGGGAGCAG.....GAGTGAAAAAC bioB ACGTTTTGGAGAAGC...CCCATGGCTCAC glnA ATCCAGGAGAGTTA.AAGTATGTCCGCT glnH TAGAAAAAAGGAAATG.....CTATGAAGTCT lacZ TTCACACAGGAAACAG....CTATGACCATG rpsJ AATTGGAGCTCTGGTCTCATGCAGAAC serC GCAACGTGGTGAGGG...GAAATGGCTCAA sucA GATGCTTAAGGGATCA....CGATGCAGAAC trpE CAAAATTAGAGAATA...ACAATGCAAACA A C G T

What to do with no training set? New pattern discovery (Meme, Gibbs sampler, BioProspector) snRNA U1 (pU1-6) AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC histone H1t GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT HMG-14 CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG TP1 GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT protamine P1 CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT nucleolin GCAGGCTCAGTCTTTCGCCTCAGTCTCGAGCTCTCGCTGG snRNP E TGCCGCCGCGTGACCTTCACACTTCCGCTTCCGGTTCTTT rp S14 GACACGGAAGTGACCCCCGTCGCTCCGCCCTCTCCCACTC rp S17 TGGCCTAAGCTTTAACAGGCTTCGCCTGTGCTTCCTGTTT ribosomal p. S19 ACCCTACGCCCGACTTGTGCGCCCGGGAAACCCCGTCGTT a'-tubulin ba'1 GGTCTGGGCGTCCCGGCTGGGCCCCGTGTCTGTGCGCACG b'-tubulin b'2 GGGAGGGTATATAAGCGTTGGCGGACGGTCGGTTGTAGCA a'-actin skel-m. CCGCGGGCTATATAAAACCTGAGCAGAGGGACAAGCGGCC a'-cardiac actin TCAGCGTTCTATAAAGCGGCCCTCCTGGAGCCAGCCACCC b'-actin CGCGGCGGCGCCCTATAAAACCCAGCGGCGCGACGCGCCA Human sequences 5’ to transcriptional start

Things to do

ME