Download presentation
Presentation is loading. Please wait.
Published byStephanie Robbins Modified over 9 years ago
1
Artificial Intelligence Project 1 Neural Networks Biointelligence Lab School of Computer Sci. & Eng. Seoul National University
2
(C) 2000-2002 SNU CSE BioIntelligence Lab 2 Outline Classification Problems Two data sets Bioinformatics: DNA Medical diagnosis: Diabetes Generalization performance Epochs Number of hidden units Cross validation Confusion matrix
3
Bioinformatics: Finding Coding Regions of DNA Sequences
4
(C) 2000-2002 SNU CSE BioIntelligence Lab 4 Bioinformatics What is Bioinformatics? Bio – molecular biology Informatics – computer science Bioinformatics – solving problems arising from biology using methodology from computer science
5
(C) 2000-2002 SNU CSE BioIntelligence Lab 5 DNA Structure Double Helix – Base pairs 4 nucleotides A - Adenine T - Thymine G - Guanine C - Cytosine AACCTGCGGAAGGATCATTA CCGAGTGCGGGTCCTTTGGG CCCAACCTCCCATCCGTGTCT ATTGTACCCGTTGCTTCGGCG GGCCCGCCGCTTGTCGGCCG CCGGGGGGGCGCCTCTGCCC CCCGGGCCCGTGCCCGCCGG AGACCCCAACACGAACACTG TCTGAAAGCGTGCAGTCTGA GTTGATTGA
6
(C) 2000-2002 SNU CSE BioIntelligence Lab 6 Central Dogma Information Flow from DNA to Protein Proteins are synthesized based on the information of DNA DNA: information storage RNA: information intermediate Protein: various cellular functions
7
(C) 2000-2002 SNU CSE BioIntelligence Lab 7 Finding Coding Regions of DNA Sequences RNA Synthesis and Processing Exon: coding sequences Intron: non-coding sequences Given a sequence of DNA, recognize the boundaries between exons and introns. Acceptor: intron/exon boundary Donor: exon/intron boundary
8
(C) 2000-2002 SNU CSE BioIntelligence Lab 8 Neural Networks (1/2) Input (180 units) and Output Input: DNA sequence whose length is 60. A 1 0 0 C 0 1 0 G 0 0 1 T 0 0 0 Output: Decide if the middle of the input sequence is a Donor 1 Acceptor 2 Neither 3
9
(C) 2000-2002 SNU CSE BioIntelligence Lab 9 Neural Networks (2/2) Data (3186) Training: 2000 Test: 1186 Class distribution ClassTrainTest 1464 (23.20%)303 (25.55%) 2485 (24.25%)280 (23.61%) 31051 (52.55%)603 (50.84%)
10
(C) 2000-2002 SNU CSE BioIntelligence Lab 10 Results (1/3) Number of Epochs
11
(C) 2000-2002 SNU CSE BioIntelligence Lab 11 Results (2/3) Number of Hidden Units At least, 10 runs for each setting # Hidden Units TrainTest Average SD BestWorst Average SD BestWorst Setting 1 Setting 2 Setting 3
12
(C) 2000-2002 SNU CSE BioIntelligence Lab 12 Results (3/3)
13
Medical Diagnosis: Diabetes
14
(C) 2000-2002 SNU CSE BioIntelligence Lab 14 Pima Indian Diabetes Data (768) 8 Attributes Number of times pregnant Plasma glucose concentration in an oral glucose tolerance test Diastolic blood pressure (mm/Hg) Triceps skin fold thickness (mm) 2-hour serum insulin (mu U/ml) Body mass index (kg/m 2 ) Diabetes pedigree function Age (year) Positive: 500, negative: 268
15
(C) 2000-2002 SNU CSE BioIntelligence Lab 15 Cross Validation (1/2) K-fold Cross Validation The data set is randomly divided into k subsets. One of the k subsets is used as the test set and the other k-1 subsets are put together to form a training set. 128 D1D1 D2D2 D3D3 D4D4 D5D5 D6D6 D1D1 D2D2 D3D3 D4D4 D6D6 D5D5 D2D2 D3D3 D4D4 D5D5 D6D6 D1D1
16
(C) 2000-2002 SNU CSE BioIntelligence Lab 16 Cross Validation (2/2) Calculation of the error Confusion Matrix True Predict PositiveNegative Positive Negative
17
(C) 2000-2002 SNU CSE BioIntelligence Lab 17 Results Cross validation and Confusion Matrix At least 10 runs for your k value. Show the confusion matrix for the best result of your experiments. RunTest Error 1 2 10 Average
18
(C) 2000-2002 SNU CSE BioIntelligence Lab 18 References Source Codes Free softwares NN libraries (C, C++, JAVA, …) MATLAB Tool box Web sites
19
(C) 2000-2002 SNU CSE BioIntelligence Lab 19 Pay Attention! Due (October 7, 2001): By the begin of class Submission Results obtained from your experiments Compress the data Via e-mail Report: Hardcopy!! Used software and running environments Results for many experiments with various parameter settings Analysis and explanation about the results in your own way
20
(C) 2000-2002 SNU CSE BioIntelligence Lab 20 Optional Experiments Various learning rate Number of hidden layers Different k values Output encoding
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.