Download presentation
Presentation is loading. Please wait.
Published byJeroen Gerritsen Modified over 5 years ago
1
Gene Structure Prediction Using Neural Networks and Hidden Markov Models
June 18, 2001 권동섭 신수용 조동연
2
Data Sets UCSC data Preprocessing Multiple exon genes
7 Fold Cross validation Preprocessing SNNS pattern definition file V3.2 generated at Wed May 16 17:00: No. of patterns : 16 No. of input units : 48 No. of output units : 4 # Input pattern 1 : # Output pattern 1: # Input pattern 2 : # Output pattern 2: Multi_exon_GB.dat pre-propessor
3
Classification Problem
5 Classes 1. Start – Exon 2. Exon – Intron 3. Intron – Exon 4. Exon – End 5. Others Imbalanced data problem Boundary : Others = 1 : 9 1 2 3 4
4
Training Data Input Data Output Data Boundary Sequences Others
ATGCGA | GCATGA Others GCAGCCAGCTAC or GA | CATGATTTCA Encoding A: 0001, C: 0010, G: 0100, T: 1000 Output Data Boundary: 1 – 0001, 2 – 0010, 3 – 0100, 4 – 1000 Internal: 0000
5
Neural Networks SNNS (version 4.2) Structure
Input: 48 Hidden: 96 Output: 4 Learning: Standard BP with momentum Learning rate: 0.2 Momentum: 0.1 Maximum difference: 0.1
6
Experimental Setup Training Test Group 0 ~ 5 Online Learning
Boundary: 3068 Others: 27612 Online Learning Random order Test Group 6 2 genes: HUMELAFIN and HSCPH
7
Results – Training Performance
Early Stopping: 260 (0.85%) SSE
8
Results – Test Performance
HUMELAFIN (6 boundaries) HSCPH70 (8 boundaries) Re = 4/6 Pre = 4/48 Re = 5/10 Pre = 5/136
9
Hidden Markov Models Simple Structure Training Test
Construct each HMM for 4 boundary classes Input: fixed size sequences for each class Test Compare generation probabilities Threshold value
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.