Artificial Intelligence Project 1 Neural Networks Biointelligence Lab School of Computer Sci. & Eng. Seoul National University.

Slides:



Advertisements
Similar presentations
Introduction to Bioinformatics Yana Kortsarts Bob Morris.
Advertisements

Introduction to Bioinformatics Spring 2008 Yana Kortsarts, Computer Science Department Bob Morris, Biology Department.
Biology 107 Macromolecules III September 10, 2002.
Data Mining: Discovering Information From Bio-Data Present by: Hongli Li & Nianya Liu University of Massachusetts Lowell.
Transcription: Synthesizing RNA from DNA
Gene Structure: DNA RNA Protein Dr. Jason Tasch. Nucleic Acids Sequence of Nucleotides Nucleotide composed of: –Nitrogenous Base Purine Pyrimidine –Sugar.
Medical Diagnosis via Genetic Programming Project #2 Artificial Intelligence: Biointelligence Computational Neuroscience Connectionist Modeling of Cognitive.
Project 1: Classification Using Neural Networks Kim, Kwonill Biointelligence laboratory Artificial Intelligence.
CSE 6406: Bioinformatics Algorithms. Course Outline
Plant Molecular biology Lap.1. Plant Molecular biology The field studies how the genes are transferred from generation to generation. Molecular genetics.
Intelligent Systems for Bioinformatics Michael J. Watts
Molecular Biology Primer. Starting 19 th century… Cellular biology: Cell as a fundamental building block 1850s+: ``DNA’’ was discovered by Friedrich Miescher.
Artificial Intelligence Project 1 Neural Networks Biointelligence Lab School of Computer Sci. & Eng. Seoul National University.
DNA Bases. Adenine: Adenine: (A) pairs with Thymine (T) only.
A phospholipid segment Hydrophilic head, hydrophobic tail Watson, The Cell.
National 5 Biology Course Notes Part 4 : DNA and production of
CSCI 6900/4900 Special Topics in Computer Science Automata and Formal Grammars for Bioinformatics Bioinformatics problems sequence comparison pattern/structure.
Biological data mining by Genetic Programming AI Project #2 Biointelligence lab Cho, Dong-Yeon
Project 1: Machine Learning Using Neural Networks Ver 1.1.
From Gene To Protein Chapter 17. From Gene to Protein The “Central Dogma of Molecular Biology” is DNA  RNA  protein Meaning that our DNA codes our RNA.
RNA and Protein Synthesis. How does DNA determine our traits?
Artificial Intelligence Chapter 3 Neural Networks Artificial Intelligence Chapter 3 Neural Networks Biointelligence Lab School of Computer Sci. & Eng.
Bonus Trivia DNA Structure Translation Transcriptio n Replication
Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.
Artificial Intelligence Project 1 Neural Networks Biointelligence Lab School of Computer Sci. & Eng. Seoul National University.
Project 2: Classification Using Genetic Programming Kim, MinHyeok Biointelligence laboratory Artificial.
Structure and functions of RNA. RNA is single stranded, contains uracil instead of thymine and ribose instead of deoxyribose sugar. mRNA carries a copy.
TRANSCRIPTION Copying of the DNA code for a protein into RNA Copying of the DNA code for a protein into RNA 4 Steps: 4 Steps: Initiation Initiation Elongation.
DNA AND RNA STUDY GUIDE FOR THE TEST. Name the three molecules DNA is made up of.
Artificial Intelligence Project 1 Neural Networks Biointelligence Lab School of Computer Sci. & Eng. Seoul National University.
Solving Function Optimization Problems with Genetic Algorithms September 26, 2001 Cho, Dong-Yeon , Tel:
Project 1: Classification Using Neural Networks Kim, Kwonill Biointelligence laboratory Artificial Intelligence.
Teaching Bioinformatics Nevena Ackovska Ana Madevska - Bogdanova.
Lesson 3 – Gene Expression
Introduction to Molecular Biology and Genomics BMI/CS 776 Mark Craven January 2002.
General, Organic, and Biological Chemistry Copyright © 2010 Pearson Education, Inc.1 Chapter 21 Nucleic Acids and Protein Synthesis 21.3DNA Double Helix.
Symbolic Regression via Genetic Programming AI Project #2 Biointelligence lab Cho, Dong-Yeon
DNA (Deoxyribonucleic Acid). What is DNA? DNA is an encoded molecule that determines traits by giving instructions to make proteins.
Cells Lecture IV DNA and Protein Synthesis. Biology Standards Covered 1d ~ students know the central dogma of molecular biology outlines the flow of information.
Analyzing Promoter Sequences with Multilayer Perceptrons Glenn Walker ECE 539.
1. What does DNA stand for? 2. What shape does the DNA molecule have? 3. What does DNA do for your cells? 4. Why is DNA important to you? Stamp Sheet:
RNA and Protein Synthesis. How does DNA determine our traits?
Artificial Intelligence DNA Hypernetworks Biointelligence Lab School of Computer Sci. & Eng. Seoul National University.
1 Gene Finding. 2 “The Central Dogma” TranscriptionTranslation RNA Protein.
Gene Structure: DNA RNA Protein
Things that may help with comprehension of bioinformatics issues in general and Rosalind problems in particular.
Medical Diagnosis via Genetic Programming
Transcription.
Artificial Intelligence Project 2 Genetic Algorithms
Transcription Modeling
Protein Synthesis in Detail
Nucleic Acids.
Optimization and Learning via Genetic Programming
Transcription and Translation Chapter 12
Transcription.
Project 1: Text Classification by Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Protein synthesis: Overview
network of simple neuron-like computing elements
DNA Replication Section 12-2
RNA and Transcription DNA RNA PROTEIN.
DNA Structure.
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
S.N.U. EECS Jeong-Jin Lee Eui-Taik Na
Bioinformatics 김유환, 문현구, 정태진, 정승우.
Gene Structure: DNA RNA Protein
Gene Structure Prediction Using Neural Networks and Hidden Markov Models June 18, 권동섭 신수용 조동연.
Artificial Intelligence Chapter 3 Neural Networks
Presentation transcript:

Artificial Intelligence Project 1 Neural Networks Biointelligence Lab School of Computer Sci. & Eng. Seoul National University

(C) SNU CSE BioIntelligence Lab 2 Outline Classification Problems  Two data sets  Bioinformatics: DNA  Medical diagnosis: Diabetes  Generalization performance  Epochs  Number of hidden units  Cross validation  Confusion matrix

Bioinformatics: Finding Coding Regions of DNA Sequences

(C) SNU CSE BioIntelligence Lab 4 Bioinformatics What is Bioinformatics?  Bio – molecular biology  Informatics – computer science  Bioinformatics – solving problems arising from biology using methodology from computer science

(C) SNU CSE BioIntelligence Lab 5 DNA Structure Double Helix – Base pairs  4 nucleotides  A - Adenine  T - Thymine  G - Guanine  C - Cytosine AACCTGCGGAAGGATCATTA CCGAGTGCGGGTCCTTTGGG CCCAACCTCCCATCCGTGTCT ATTGTACCCGTTGCTTCGGCG GGCCCGCCGCTTGTCGGCCG CCGGGGGGGCGCCTCTGCCC CCCGGGCCCGTGCCCGCCGG AGACCCCAACACGAACACTG TCTGAAAGCGTGCAGTCTGA GTTGATTGA

(C) SNU CSE BioIntelligence Lab 6 Central Dogma Information Flow from DNA to Protein  Proteins are synthesized based on the information of DNA  DNA: information storage  RNA: information intermediate  Protein: various cellular functions

(C) SNU CSE BioIntelligence Lab 7 Finding Coding Regions of DNA Sequences RNA Synthesis and Processing  Exon: coding sequences  Intron: non-coding sequences Given a sequence of DNA, recognize the boundaries between exons and introns.  Acceptor: intron/exon boundary  Donor: exon/intron boundary

(C) SNU CSE BioIntelligence Lab 8 Neural Networks (1/2) Input (180 units) and Output  Input: DNA sequence whose length is 60.  A   C   G   T   Output: Decide if the middle of the input sequence is a  Donor  1  Acceptor  2  Neither  3

(C) SNU CSE BioIntelligence Lab 9 Neural Networks (2/2) Data (3186)  Training: 2000  Test: 1186  Class distribution ClassTrainTest 1464 (23.20%)303 (25.55%) 2485 (24.25%)280 (23.61%) (52.55%)603 (50.84%)

(C) SNU CSE BioIntelligence Lab 10 Results (1/3) Number of Epochs

(C) SNU CSE BioIntelligence Lab 11 Results (2/3) Number of Hidden Units  At least, 10 runs for each setting # Hidden Units TrainTest Average  SD BestWorst Average  SD BestWorst Setting 1 Setting 2 Setting 3 

(C) SNU CSE BioIntelligence Lab 12 Results (3/3)

Medical Diagnosis: Diabetes

(C) SNU CSE BioIntelligence Lab 14 Pima Indian Diabetes Data (768)  8 Attributes  Number of times pregnant  Plasma glucose concentration in an oral glucose tolerance test  Diastolic blood pressure (mm/Hg)  Triceps skin fold thickness (mm)  2-hour serum insulin (mu U/ml)  Body mass index (kg/m 2 )  Diabetes pedigree function  Age (year)  Positive: 500, negative: 268

(C) SNU CSE BioIntelligence Lab 15 Cross Validation (1/2) K-fold Cross Validation  The data set is randomly divided into k subsets.  One of the k subsets is used as the test set and the other k-1 subsets are put together to form a training set. 128 D1D1 D2D2 D3D3 D4D4 D5D5 D6D6 D1D1 D2D2 D3D3 D4D4 D6D6 D5D5 D2D2 D3D3 D4D4 D5D5 D6D6 D1D1

(C) SNU CSE BioIntelligence Lab 16 Cross Validation (2/2)  Calculation of the error Confusion Matrix True Predict PositiveNegative Positive Negative

(C) SNU CSE BioIntelligence Lab 17 Results Cross validation and Confusion Matrix  At least 10 runs for your k value.  Show the confusion matrix for the best result of your experiments. RunTest Error 1 2  10 Average

(C) SNU CSE BioIntelligence Lab 18 References Source Codes  Free softwares  NN libraries (C, C++, JAVA, …)  MATLAB Tool box Web sites

(C) SNU CSE BioIntelligence Lab 19 Pay Attention! Due (October 7, 2001): By the begin of class Submission  Results obtained from your experiments  Compress the data  Via  Report: Hardcopy!!  Used software and running environments  Results for many experiments with various parameter settings  Analysis and explanation about the results in your own way

(C) SNU CSE BioIntelligence Lab 20 Optional Experiments Various learning rate Number of hidden layers Different k values Output encoding