Presentation is loading. Please wait.

Presentation is loading. Please wait.

Artificial Intelligence Project 1 Neural Networks Biointelligence Lab School of Computer Sci. & Eng. Seoul National University.

Similar presentations


Presentation on theme: "Artificial Intelligence Project 1 Neural Networks Biointelligence Lab School of Computer Sci. & Eng. Seoul National University."— Presentation transcript:

1 Artificial Intelligence Project 1 Neural Networks Biointelligence Lab School of Computer Sci. & Eng. Seoul National University

2 (C) 2000-2002 SNU CSE BioIntelligence Lab 2 Outline Classification Problems  Two data sets  Bioinformatics: DNA  Medical diagnosis: Diabetes  Generalization performance  Epochs  Number of hidden units  Cross validation  Confusion matrix

3 Bioinformatics: Finding Coding Regions of DNA Sequences

4 (C) 2000-2002 SNU CSE BioIntelligence Lab 4 Bioinformatics What is Bioinformatics?  Bio – molecular biology  Informatics – computer science  Bioinformatics – solving problems arising from biology using methodology from computer science

5 (C) 2000-2002 SNU CSE BioIntelligence Lab 5 DNA Structure Double Helix – Base pairs  4 nucleotides  A - Adenine  T - Thymine  G - Guanine  C - Cytosine AACCTGCGGAAGGATCATTA CCGAGTGCGGGTCCTTTGGG CCCAACCTCCCATCCGTGTCT ATTGTACCCGTTGCTTCGGCG GGCCCGCCGCTTGTCGGCCG CCGGGGGGGCGCCTCTGCCC CCCGGGCCCGTGCCCGCCGG AGACCCCAACACGAACACTG TCTGAAAGCGTGCAGTCTGA GTTGATTGA

6 (C) 2000-2002 SNU CSE BioIntelligence Lab 6 Central Dogma Information Flow from DNA to Protein  Proteins are synthesized based on the information of DNA  DNA: information storage  RNA: information intermediate  Protein: various cellular functions

7 (C) 2000-2002 SNU CSE BioIntelligence Lab 7 Finding Coding Regions of DNA Sequences RNA Synthesis and Processing  Exon: coding sequences  Intron: non-coding sequences Given a sequence of DNA, recognize the boundaries between exons and introns.  Acceptor: intron/exon boundary  Donor: exon/intron boundary

8 (C) 2000-2002 SNU CSE BioIntelligence Lab 8 Neural Networks (1/2) Input (180 units) and Output  Input: DNA sequence whose length is 60.  A  1 0 0  C  0 1 0  G  0 0 1  T  0 0 0  Output: Decide if the middle of the input sequence is a  Donor  1  Acceptor  2  Neither  3

9 (C) 2000-2002 SNU CSE BioIntelligence Lab 9 Neural Networks (2/2) Data (3186)  Training: 2000  Test: 1186  Class distribution ClassTrainTest 1464 (23.20%)303 (25.55%) 2485 (24.25%)280 (23.61%) 31051 (52.55%)603 (50.84%)

10 (C) 2000-2002 SNU CSE BioIntelligence Lab 10 Results (1/3) Number of Epochs

11 (C) 2000-2002 SNU CSE BioIntelligence Lab 11 Results (2/3) Number of Hidden Units  At least, 10 runs for each setting # Hidden Units TrainTest Average  SD BestWorst Average  SD BestWorst Setting 1 Setting 2 Setting 3 

12 (C) 2000-2002 SNU CSE BioIntelligence Lab 12 Results (3/3)

13 Medical Diagnosis: Diabetes

14 (C) 2000-2002 SNU CSE BioIntelligence Lab 14 Pima Indian Diabetes Data (768)  8 Attributes  Number of times pregnant  Plasma glucose concentration in an oral glucose tolerance test  Diastolic blood pressure (mm/Hg)  Triceps skin fold thickness (mm)  2-hour serum insulin (mu U/ml)  Body mass index (kg/m 2 )  Diabetes pedigree function  Age (year)  Positive: 500, negative: 268

15 (C) 2000-2002 SNU CSE BioIntelligence Lab 15 Cross Validation (1/2) K-fold Cross Validation  The data set is randomly divided into k subsets.  One of the k subsets is used as the test set and the other k-1 subsets are put together to form a training set. 128 D1D1 D2D2 D3D3 D4D4 D5D5 D6D6 D1D1 D2D2 D3D3 D4D4 D6D6 D5D5 D2D2 D3D3 D4D4 D5D5 D6D6 D1D1

16 (C) 2000-2002 SNU CSE BioIntelligence Lab 16 Cross Validation (2/2)  Calculation of the error Confusion Matrix True Predict PositiveNegative Positive Negative

17 (C) 2000-2002 SNU CSE BioIntelligence Lab 17 Results Cross validation and Confusion Matrix  At least 10 runs for your k value.  Show the confusion matrix for the best result of your experiments. RunTest Error 1 2  10 Average

18 (C) 2000-2002 SNU CSE BioIntelligence Lab 18 References Source Codes  Free softwares  NN libraries (C, C++, JAVA, …)  MATLAB Tool box Web sites

19 (C) 2000-2002 SNU CSE BioIntelligence Lab 19 Pay Attention! Due (October 7, 2001): By the begin of class Submission  Results obtained from your experiments  Compress the data  Via e-mail  Report: Hardcopy!!  Used software and running environments  Results for many experiments with various parameter settings  Analysis and explanation about the results in your own way

20 (C) 2000-2002 SNU CSE BioIntelligence Lab 20 Optional Experiments Various learning rate Number of hidden layers Different k values Output encoding


Download ppt "Artificial Intelligence Project 1 Neural Networks Biointelligence Lab School of Computer Sci. & Eng. Seoul National University."

Similar presentations


Ads by Google