Algorithms in Bioinformatics Morten Nielsen Department of Systems Biology, DTU.

Slides:



Advertisements
Similar presentations
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Sequence information, logos and Hidden Markov Models Morten Nielsen, CBS, BioCentrum,
Advertisements

Gibbs sampling Morten Nielsen, CBS, BioSys, DTU. Class II MHC binding MHC class II binds peptides in the class II antigen presentation pathway Binds peptides.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU T cell Epitope predictions using bioinformatics (Neural Networks and hidden.
Bioinformatics Motif Detection Revised 27/10/06. Overview Introduction Multiple Alignments Multiple alignment based on HMM Motif Finding –Motif representation.
Computer Aided Vaccine Design Dr G P S Raghava. Concept of Drug and Vaccine Concept of Drug Concept of Drug –Kill invaders of foreign pathogens –Inhibit.
Artificial Neural Networks 2 Morten Nielsen BioSys, DTU.
Algorithms in Bioinformatics Morten Nielsen BioSys, DTU.
© Wiley Publishing All Rights Reserved. Analyzing Protein Sequences.
Biological sequence analysis and information processing by artificial neural networks.
Gibbs sampling for motif finding in biological sequences Christopher Sheldahl.
PROTEIN SECONDARY STRUCTURE PREDICTION WITH NEURAL NETWORKS.
Artificial Neural Networks 2 Morten Nielsen Depertment of Systems Biology, DTU.
Tools to analyze protein characteristics Protein sequence -Family member -Multiple alignments Identification of conserved regions Evolutionary relationship.
Biological sequence analysis and information processing by artificial neural networks Morten Nielsen CBS.
Identification of a Novel cis-Regulatory Element Involved in the Heat Shock Response in Caenorhabditis elegans Using Microarray Gene Expression and Computational.
A Very Basic Gibbs Sampler for Motif Detection Frances Tong July 28, 2004 Southern California Bioinformatics Summer Institute.
Introduction to BioInformatics GCB/CIS535
Bio 465 Summary. Overview Conserved DNA Conserved DNA Drug Targets, TreeSAAP Drug Targets, TreeSAAP Next Generation Sequencing Next Generation Sequencing.
Artificial Neural Networks Thomas Nordahl Petersen & Morten Nielsen.
In silico cis-analysis promoter analysis - Promoters and cis-elements - Searching for patterns - Searching redundant patterns.
Biological Sequence Pattern Analysis Liangjiang (LJ) Wang March 8, 2005 PLPTH 890 Introduction to Genomic Bioinformatics Lecture 16.
Biological sequence analysis and information processing by artificial neural networks.
A Quantitative Modeling of Protein- DNA interaction for Improved Energy Based Motif Finding Algorithm Junguk Hur School of Informatics April 25, 2005 L529.
Project list 1.Peptide MHC binding predictions using position specific scoring matrices including pseudo counts and sequences weighting clustering (Hobohm)
Signaling Pathways and Summary June 30, 2005 Signaling lecture Course summary Tomorrow Next Week Friday, 7/8/05 Morning presentation of writing assignments.
Blast heuristics Morten Nielsen Department of Systems Biology, DTU.
Mass Spectrometry. What are mass spectrometers? They are analytical tools used to measure the molecular weight of a sample. Accuracy – 0.01 % of the total.
Protein Structures.
Presented by Liu Qi An introduction to Bioinformatics Algorithms Qi Liu
Master’s Degrees in Bioinformatics in Switzerland: Past, present and near future Patricia M. Palagi Swiss Institute of Bioinformatics.
Conceptual Foundations © 2008 Pearson Education Australia Lecture slides for this course are based on teaching materials provided/referred by: (1) Statistics.
Goals Approach Evaluation Intro to Python The two on-line sources Getting started with LPTHW.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Intelligent systems in bioinformatics Introduction to the course.
Project list 1.Peptide MHC binding predictions using position specific scoring matrices including pseudo counts and sequences weighting clustering (Hobohm)
Introduction to Bioinformatics Biostatistics & Medical Informatics 576 Computer Sciences 576 Fall 2008 Colin Dewey Dept. of Biostatistics & Medical Informatics.
Artificiel Neural Networks 2 Morten Nielsen Department of Systems Biology, DTU IIB-INTECH, UNSAM, Argentina.
What is a Project Purpose –Use a method introduced in the course to describe some biological problem How –Construct a data set describing the problem –Define.
CS 461b/661b: Bioinformatics Tools and Applications Software Algorithm Mathematical Models Biology Experiments and Data.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Overview of Bioinformatics 1 Module Denis Manley..
AdvancedBioinformatics Biostatistics & Medical Informatics 776 Computer Sciences 776 Spring 2002 Mark Craven Dept. of Biostatistics & Medical Informatics.
H0K08A EXERCISE SESSIONS. Exercises - overview Steven Van Vooren Steven.VanVooren esat.kuleuven.be Toledo Site
EB3233 Bioinformatics Introduction to Bioinformatics.
Bioinformatics Curriculum Issues, goals, curriculum.
Dealing with Sequence redundancy Morten Nielsen Department of Systems Biology, DTU.
Prediction of Protein Binding Sites in Protein Structures Using Hidden Markov Support Vector Machine.
Learning Sequence Motifs Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 Mark Craven
__________________________________________________________________________________________________ Fall 2015GCBA 815 __________________________________________________________________________________________________.
Hidden Markov Model and Its Application in Bioinformatics Liqing Department of Computer Science.
Psi-Blast Morten Nielsen, Department of systems biology, DTU.
Optimization methods Morten Nielsen Department of Systems biology, DTU IIB-INTECH, UNSAM, Argentina.
Stabilization matrix method (Ridge regression) Morten Nielsen Department of Systems Biology, DTU.
Prediction of T cell epitopes using artificial neural networks Morten Nielsen, CBS, BioCentrum, DTU.
Dr. Abdelkrim Rachedi. 1. General introduction to bioinformatics. 2. Databases in biology: -> 2.1. Databases for the primary structure of Proteins and.
Spectral Algorithms for Learning HMMs and Tree HMMs for Epigenetics Data Kevin C. Chen Rutgers University joint work with Jimin Song (Rutgers/Palentir),
BNFO 615 Fall 2016 Usman Roshan NJIT. Outline Machine learning for bioinformatics – Basic machine learning algorithms – Applications to bioinformatics.
Using core competencies in curriculum design
A Very Basic Gibbs Sampler for Motif Detection
Hidden Markov Models (HMM)
APPLICATIONS OF BIOINFORMATICS IN DRUG DISCOVERY
Predicting Active Site Residue Annotations in the Pfam Database
Sequential Pattern Discovery under a Markov Assumption
Bioinformatics Biological Data Computer Calculations +
Morten Nielsen, CBS, BioSys, DTU
Artificial Neural Networks Thomas Nordahl Petersen & Morten Nielsen
Artificial Neural Networks Thomas Nordahl Petersen & Morten Nielsen
Basic Local Alignment Search Tool
Introduction to Bioinformatics Tuesday, 19 March
Presentation transcript:

Algorithms in Bioinformatics Morten Nielsen Department of Systems Biology, DTU

Course objective

Algorithms are black-boxes No one knows how a neural network is trained No one knows how a PSSM is constructed Often no software exists that does exactly what you need

Where conventional algorithms fail.. Sequence alignment 1PMY._ 4 VKMLNSGPGGMMVFDPALVRLKPGDSIKFLPTDKG--HNVETIKGMAPDG : : : : :: : : : :: : : 1PLC._ 0 IDVLLGADDGSLAFVPSEFSISPGEKIVF-KNNAGFPHNIVFDEDSIPSG 1PMY._ 54 ADYVKTTVGQEAV VKFDKEGVYGFKCAPHYMMGMVALVVV : : : : : : : : :: ::: : : 1PLC._ 50 VDASKISMSEEDLLNAKGETFEVALSNKGEYSFYCSPHQGAGMVGKVTV Gaps should more likely be placed in loops and not in secondary structure elements –No conventional alignment algorithm can do this

Sequence motif identification Say you have 10 ligands known to bind a given receptor. Can you accurately characterize the binding motif from such few data? HMM and Gibbs samplers might do this, but what if you know a priori that some positions are more important than others for the binding? –Then no conventional method will work

Artificial neural networks PEPTIDE IC50(nM) VPLTDLRIPS GWPYIGSRSQIIGRS ILVQAGEAETMTPSG HNWVNHAVPLAMKLI 120 SSTVKLRQNEFGPAR 8045 NMLTHSINSLISDNL LSSKFNKFVSPKSVS 4 GRWDEDGAKRIPVDV ACVKDLVSKYLADNE 86 NLYIKSIQSLISDTQ 67 IYGLPWMTTQTSALS 11 QYDVIIQHPADMSWC Could an ANN be trained to simultaneously identify the binding motif and binding strength of a given peptide?

The Bioinformatical approach. NN-align Method PEPTIDE Pred Meas VPLTDLRIPS GWPYIGSRSQIIGRS ILVQAGEAETMTPSG HNWVNHAVPLAMKLI SSTVKLRQNEFGPAR NMLTHSINSLISDNL LSSKFNKFVSPKSVS GRWDEDGAKRIPVDV ACVKDLVSKYLADNE NLYIKSIQSLISDTQ IYGLPWMTTQTSALS QYDVIIQHPADMSWC Predict binding and core Refine Calculate prediction error Update method to Minimize prediction error

Course objective To provide the student with an overview and in- depth understanding of bioinformatics machine- learning algorithms. Enable the student to first evaluate which algorithm(s) are best suited for answering a given biological question and next Implement and develop prediction tools based on such algorithms to describe complex biological problems such as immune system reactions, vaccine discovery, disease gene finding, protein structure and function, post- translational modifications etc.

Course program Weight matrices Sequence alignment Hidden Markov Models Sequence redundancy Gibbs sampling Stabilization matrix method Artificial neural networks Project

The Mission When you have completed the course, you will have –Worked in great detail on all the most essential algorithms used in bioinformatics –Have a folder with program templates implementing these algorithms –When you in your future scientific carrier need to implement modifications to conventional algorithms, this should give you a solid starting point.

Course structure Mornings –Lectures and small exercises introducing the algorithms Afternoons –Exercise where the algorithms are implemented Project work in groups of 2-3 –The 1 week project work where a biological problem is analyzed using one or more of the algorithms introduced in the course.

Programming language C –C C –C »C Why C? –Nothing compares to C in speed Least of all Perl and Python

Programming language C –C C –C »C Why C? –Nothing compares to C in speed Least of all Perl and Python –Ex. A Gibbs sampler coded in Perl runs 50 times slower than the same method coded in C

Course material Lund et al, MIT, chapter 3 and 4. Research papers –Check course program website for updates to course material

Course Evaluation Oral examination, and report Evaluation of report (50%), oral examination (50%) Exam form –Group presentation of project –Individual “Portfolio exam” based on weekly exercises.