Week 8. Homework 7 2 state HMM – State 1: neutral – State 2: conserved Emissions: alignment columns – Alignment of human, dog, mouse sequences AATAAT.

Slides:



Advertisements
Similar presentations
CHAPTER 2: Supervised Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Learning a Class from Examples.
Advertisements

GS 540 week 5. What discussion topics would you like? Past topics: General programming tips C/C++ tips and standard library BLAST Frequentist vs. Bayesian.
Hidden Markov Model in Biological Sequence Analysis – Part 2
HIDDEN MARKOV MODELS IN COMPUTATIONAL BIOLOGY CS 594: An Introduction to Computational Molecular Biology BY Shalini Venkataraman Vidhya Gunaseelan.
Introduction to Conditional Random Fields John Osborne Sept 4, 2009.
Week 3. Logistic Regression Overview and applications Additional issues Select Inputs Optimize complexity Transforming Inputs.
Conditional Random Fields - A probabilistic graphical model Stefan Mutter Machine Learning Group Conditional Random Fields - A probabilistic graphical.
Hidden Markov Models CBB 231 / COMPSCI 261. An HMM is a following: An HMM is a stochastic machine M=(Q, , P t, P e ) consisting of the following: a finite.
Profiles for Sequences
درس بیوانفورماتیک December 2013 مدل ‌ مخفی مارکوف و تعمیم ‌ های آن به نام خدا.
HMM Sampling and Applications to Gene Finding and Alignment European Conference on Computational Biology 2003 Simon Cawley * and Lior Pachter + and thanks.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
Today Linear Regression Logistic Regression Bayesians v. Frequentists
Bio 465 Summary. Overview Conserved DNA Conserved DNA Drug Targets, TreeSAAP Drug Targets, TreeSAAP Next Generation Sequencing Next Generation Sequencing.
CSE182-L10 Gene Finding.
The Central Dogma of Molecular Biology (Things are not really this simple) Genetic information is stored in our DNA (~ 3 billion bp) The DNA of a.
Linear Discriminant Functions Chapter 5 (Duda et al.)
CSE182-L10 MS Spec Applications + Gene Finding + Projects.
Classifiers, Part 3 Week 1, Video 5 Classification  There is something you want to predict (“the label”)  The thing you want to predict is categorical.
ENCODE enhancers 12/13/2013 Yao Fu Gerstein lab. ‘Supervised’ enhancer prediction Yip et al., Genome Biology (2012) Get enhancer list away to genes DNase.
Machine Learning & Data Mining CS/CNS/EE 155 Lecture 6: Conditional Random Fields 1.
Introduction to Machine Learning for Information Retrieval Xiaolong Wang.
Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.
CSCE555 Bioinformatics Lecture 6 Hidden Markov Models Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Gene finding with GeneMark.HMM (Lukashin & Borodovsky, 1997 ) CS 466 Saurabh Sinha.
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
Genome Sequencing & App. of DNA Technologies Genomics is a branch of science that focuses on the interactions of sets of genes with the environment. –
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Center for Evolutionary Functional Genomics Large-Scale Sparse Logistic Regression Jieping Ye Arizona State University Joint work with Jun Liu and Jianhui.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
Comp. Genomics Recitation 9 11/3/06 Gene finding using HMMs & Conservation.
CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.
Christoph Eick: Learning Models to Predict and Classify 1 Learning from Examples Example of Learning from Examples  Classification: Is car x a family.
The problem of overfitting
Regularization (Additional)
COMPUTATIONAL BIOLOGIST DR. MARTIN TOMPA Place of Employment: University of Washington Type of Work: Develops computer programs and algorithms to identify.
Multiple Species Gene Finding using Gibbs Sampling Sourav Chatterji Lior Pachter University of California, Berkeley.
Prediction of Protein Binding Sites in Protein Structures Using Hidden Markov Support Vector Machine.
Input: Alignment. Model parameters from neutral sequence Estimation example.
Applications of HMMs in Computational Biology BMI/CS 576 Colin Dewey Fall 2010.
Hidden Markov Model and Its Application in Bioinformatics Liqing Department of Computer Science.
MicroRNA Prediction with SCFG and MFE Structure Annotation Tim Shaw, Ying Zheng, and Bram Sebastian.
(H)MMs in gene prediction and similarity searches.
Accessing and visualizing genomics data
Eric Xing © Eric CMU, Machine Learning Structured Models: Hidden Markov Models versus Conditional Random Fields Eric Xing Lecture 13,
Modelling evolution Gil McVean Department of Statistics TC A G.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
HW7: Evolutionarily conserved segments ENCODE region 009 (beta-globin locus) Multiple alignment of human, dog, and mouse 2 states: neutral (fast-evolving),
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
Conditional Random Fields and Its Applications Presenter: Shih-Hsiang Lin 06/25/2007.
Indel rates and probabilistic alignments Gerton Lunter Budapest, June 2008.
BNFO 615 Fall 2016 Usman Roshan NJIT. Outline Machine learning for bioinformatics – Basic machine learning algorithms – Applications to bioinformatics.
What is a Hidden Markov Model?
Aim: How do scientists interpret data (Part1)?
Structured prediction
Machine Learning Logistic Regression
Results for all features Results for the reduced set of features
Christoph Eick: Learning Models to Predict and Classify 1 Learning from Examples Example of Learning from Examples  Classification: Is car x a family.
Machine Learning Logistic Regression
Machine Learning Week 1.
Eukaryotic Gene Finding
Ab initio gene prediction
Finding regulatory modules
Overfitting and Underfitting
Softmax Classifier.
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Discussion Section Week 9
Presentation transcript:

Week 8

Homework 7 2 state HMM – State 1: neutral – State 2: conserved Emissions: alignment columns – Alignment of human, dog, mouse sequences AATAAT 1 2 A-AA-A 1 2 CCCCCC human dog mouse

Homework 7 tips Do just one Viterbi parse (no training). Ambiguous bases have been changed to "A". Make sure you look up hg18 positions. AATAAT 1 2 A-AA-A 1 2 CCCCCC human dog mouse

Homework 8 Use logistic regression to predict gene expression using genomics assays in GM Train using gradient descent. Label: CAGE gene expression -- "expressed"/"non-expressed" Features: Histone modifications and DNA accessibility.

Homework 8 backstory

Model complexity: interpretation and generalization

Two goals for machine learning: prediction or interpretation

Generative methods model the joint distribution of features and labels AGACAAGG Translation start sites: Background: Generative models are usually more interpretable.

Generative methods model the conditional distribution of the label given the features.

Discriminative models are more data-efficient

Simpler models generalize better and are more interpretable Simple models have "strong inductive bias"

Regularization decreases the complexity of a model L2 regression improves the generalizability of a model: L1 regression improves the interpretability of a model:

L2 regularization True True+noise lambda=8 lambda=3 lambda=1

L2 regularization True True+noise lambda=10 lambda=7 lambda=4

L1 regularization True True+noise lambda=10 lambda=8 lambda=5