Modeling of Spliceosome 김동민 이경준 임종윤
Gene Finding Transcription: multi-step process Long sequence in one Que (X) Several steps like many enzymes (O) promoter, 3’-processing, splice site, coding exon
Splicing Site GT-AG : 99.24%, GC-AG : 0.69%, AT-AC : 0.05% (Burset et al., 2000) Site recognition (Chiara et al., 1996) 25-base upstream of GT splice GT, AG splice site branchpoint sequence
Problem Discription GT 또는 AG sequence site를 중심으로 특정 window size의 binary incoding된 sequence를 입력 받아 이 사이트가 exon-intron splicing site인지를 판별 Modeling of spliceosome
Training Data UCSC data GT, AG 앞 뒤 40 염기 Correct False Doner 1149 3813 Acceptor 1143 6021
Neural Network
Parameter Values input node : 328 hidden node : 70 output node : 1 learning rate : 10 slope parameter: 0.02 (activation function은 sigmoid 사용)
Prediction Ratio Doner :96.33% Acceptor site : 95.25%
HMM architecture S E di ii mi
HMM architecture(2) The number of states The number of distinct observation symbols per state The state transition probability distribution The observation symbol probability distribution in state The initial state distribution