Gene Structure Prediction Using Neural Networks and Hidden Markov Models June 18, 2001 2000-30460 권동섭 2000-30474 신수용 2000-30478 조동연.

Slides:



Advertisements
Similar presentations
NEURAL NETWORKS Backpropagation Algorithm
Advertisements

Yuri R. Tsoy, Vladimir G. Spitsyn, Department of Computer Engineering
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Hidden Markov Models in Bioinformatics
Bio 465 Summary. Overview Conserved DNA Conserved DNA Drug Targets, TreeSAAP Drug Targets, TreeSAAP Next Generation Sequencing Next Generation Sequencing.
An Analysis of “Gene Finding in Novel Genomes” Michael Sneddon.
Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic.
Hidden Markov Models That Use Predicted Local Structure for Fold Recognition: Alphabets of Backbone Geometry R Karchin, M Cline, Y Mandel- Gutfreund, K.
MLP Exercise (2006) Become familiar with the Neural Network Toolbox in Matlab Construct a single hidden layer, feed forward network with sigmoidal units.
Neural Networks Lab 5. What Is Neural Networks? Neural networks are composed of simple elements( Neurons) operating in parallel. Neural networks are composed.
Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Hidden Markov Models In BioInformatics
BME 110L / BIOL 181L Computational Biology Tools October 29: Quickly that demo: how to align a protein family (10/27)
Back-Propagation MLP Neural Network Optimizer ECE 539 Andrew Beckwith.
Neural Networks1 Introduction to NETLAB NETLAB is a Matlab toolbox for experimenting with neural networks Available from:
Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003.
Application of artificial neural network in materials research Wei SHA Professor of Materials Science
Exploring Alternative Splicing Features using Support Vector Machines Feature for Alternative Splicing Alternative splicing is a mechanism for generating.
Maximum Entropy Models and Feature Engineering CSCI-GA.2590 – Lecture 6B Ralph Grishman NYU.
Eukaryotic Gene Prediction Rui Alves. How are eukaryotic genes different? DNA RNA Pol mRNA Ryb Protein.
Genes and Genomes. Genome On Line Database (GOLD) 243 Published complete genomes 536 Prokaryotic ongoing genomes 434 Eukaryotic ongoing genomes December.
Applications of HMMs in Computational Biology BMI/CS 576 Colin Dewey Fall 2010.
Hidden Markov Model and Its Application in Bioinformatics Liqing Department of Computer Science.
Artificial Intelligence Project 1 Neural Networks Biointelligence Lab School of Computer Sci. & Eng. Seoul National University.
Protein Prediction with Neural Networks! Chris Alvino CS152 Fall ’06 Prof. Keller.
1 Applications of Hidden Markov Models (Lecture for CS498-CXZ Algorithms in Bioinformatics) Nov. 12, 2005 ChengXiang Zhai Department of Computer Science.
Identification of Coding Sequences Bert Gold, Ph.D., F.A.C.M.G.
GPGPU Performance and Power Estimation Using Machine Learning Gene Wu – UT Austin Joseph Greathouse – AMD Research Alexander Lyashevsky – AMD Research.
Spectral Algorithms for Learning HMMs and Tree HMMs for Epigenetics Data Kevin C. Chen Rutgers University joint work with Jimin Song (Rutgers/Palentir),
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Machine Learning. Machine Learning: what is it? We have data.
1 Gene Finding. 2 “The Central Dogma” TranscriptionTranslation RNA Protein.
Neural Networks - Berrin Yanıkoğlu1 MLP & Backpropagation Issues.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
What is a Hidden Markov Model?
Bioinformatics lectures at Rice University
Maximum Entropy Models and Feature Engineering CSCI-GA.2591
Machine Learning.
MBD-Chip.
Medical Diagnosis via Genetic Programming
Hidden Markov Models (HMM)
Predict House Sales Price
CSc 219 Project Proposal Raymond Fraizer.
NEURAL NETWORK APPROACHES FOR AUTOMOBILE MPG PREDICTION
4.3 Feedforward Net. Applications
CSC 594 Topics in AI – Natural Language Processing
Unsupervised Learning and Autoencoders
Introduction to Data Mining, 2nd Edition by
Machine Learning Today: Reading: Maria Florina Balcan
Optimization and Learning via Genetic Programming
A First Look at Music Composition using LSTM Recurrent Neural Networks
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
network of simple neuron-like computing elements
Neural Networks Chapter 5
Artificial Neural Networks
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Neural networks (1) Traditional multi-layer perceptrons
实习生汇报 ——北邮 张安迪.
Structure of a typical back-propagated multilayered perceptron used in this study. Structure of a typical back-propagated multilayered perceptron used.
S.N.U. EECS Jeong-Jin Lee Eui-Taik Na
Modeling of Spliceosome
Topological Signatures For Fast Mobility Analysis
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Prediction Networks Prediction A simple example (section 3.7.3)
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
Practical session on neural network modelling
Sanguthevar Rajasekaran University of Connecticut
Learning Combinational Logic
The experiment based on hier-attention
Presentation transcript:

Gene Structure Prediction Using Neural Networks and Hidden Markov Models June 18, 2001 2000-30460 권동섭 2000-30474 신수용 2000-30478 조동연

Data Sets UCSC data Preprocessing Multiple exon genes 7 Fold Cross validation Preprocessing SNNS pattern definition file V3.2 generated at Wed May 16 17:00:00 2001 No. of patterns : 16 No. of input units : 48 No. of output units : 4 # Input pattern 1 : 0 0 1 0 0 0 0 1 0 0 1 0 1 0 0 0 0 1 0 0 ... # Output pattern 1: 1 0 0 0 # Input pattern 2 : 0 0 0 1 0 0 0 1 0 0 1 0 1 0 0 0 0 1 0 0 ... # Output pattern 2: 0 0 0 0 Multi_exon_GB.dat pre-propessor

Classification Problem 5 Classes 1. Start – Exon 2. Exon – Intron 3. Intron – Exon 4. Exon – End 5. Others Imbalanced data problem Boundary : Others = 1 : 9 1 2 3 4

Training Data Input Data Output Data Boundary Sequences Others ATGCGA | GCATGA Others GCAGCCAGCTAC or GA | CATGATTTCA Encoding A: 0001, C: 0010, G: 0100, T: 1000 Output Data Boundary: 1 – 0001, 2 – 0010, 3 – 0100, 4 – 1000 Internal: 0000

Neural Networks SNNS (version 4.2) Structure Input: 48 Hidden: 96 Output: 4 Learning: Standard BP with momentum Learning rate: 0.2 Momentum: 0.1 Maximum difference: 0.1

Experimental Setup Training Test Group 0 ~ 5 Online Learning Boundary: 3068 Others: 27612 Online Learning Random order Test Group 6 2 genes: HUMELAFIN and HSCPH

Results – Training Performance Early Stopping: 260 (0.85%) SSE

Results – Test Performance HUMELAFIN (6 boundaries) HSCPH70 (8 boundaries) Re = 4/6 Pre = 4/48 Re = 5/10 Pre = 5/136

Hidden Markov Models Simple Structure Training Test Construct each HMM for 4 boundary classes Input: fixed size sequences for each class Test Compare generation probabilities Threshold value