DNA Microarray Data Analysis using Artificial Neural Network Models. by Venkatanand Venkatachalapathy (‘Venkat’) ECE/ CS/ ME 539 Course Project.

Slides:



Advertisements
Similar presentations
Molecular Biomedical Informatics Machine Learning and Bioinformatics Machine Learning & Bioinformatics 1.
Advertisements

A gene expression analysis system for medical diagnosis D. Maroulis, D. Iakovidis, S. Karkanis, I. Flaounas D. Maroulis, D. Iakovidis, S. Karkanis, I.
20,000 GENES IN HUMAN GENOME; WHAT WOULD HAPPEN IF ALL THESE GENES WERE EXPRESSED IN EVERY CELL IN YOUR BODY? WHAT WOULD HAPPEN IF THEY WERE EXPRESSED.
02/21/00 V1.2 Clustering Large Data Sets in Gene expression analysis Daniel Weaver.
NUCLEIC ACIDS {DNA;RNA} w 1. What are they? w 2. Where are they found? w 3. What are their functions? w 4. What is a nucleotide? Draw one. w (pages 219.
August 19, 2002Slide 1 Bioinformatics at Virginia Tech David Bevan (BCHM) Lenwood S. Heath (CS) Ruth Grene (PPWS) Layne Watson (CS) Chris North (CS) Naren.
Learning rule-based models from gene expression time profiles annotated with Gene Ontology terms Jan Komorowski and Astrid Lägreid.
Self Taught Learning : Transfer learning from unlabeled data Presented by: Shankar B S DMML Lab Rajat Raina et al, CS, Stanford ICML 2007.
Genetic algorithms applied to multi-class prediction for the analysis of gene expressions data C.H. Ooi & Patrick Tan Presentation by Tim Hamilton.
Introduction to DNA Microarrays Todd Lowe BME 88a March 11, 2003.
By Russell Armstrong Supervisor Mrs Wei Ji Diagnosis Analysis of Lung Cancer by Genome Expression Profiles.
Microarray GEO – Microarray sets database
JYC: CSM17 BioinformaticsCSM17 Week 10: Summary, Conclusions, The Future.....? Bioinformatics is –the study of living systems –with respect to representation,
Part II: Discriminative Margin Clustering Joint work with: Rob Tibshirani, Dept of Statistics Patrick O. Brown, School of Medicine Stanford University.
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Multidimensional Analysis If you are comparing more than two conditions (for example 10 types of cancer) or if you are looking at a time series (cell cycle.
Supervised gene expression data analysis using SVMs and MLPs Giorgio Valentini
Analysis of microarray data
Genome of the week - Deinococcus radiodurans Highly resistant to DNA damage –Most radiation resistant organism known Multiple genetic elements –2 chromosomes,
Alert Correlation for Extracting Attack Strategies Authors: B. Zhu and A. A. Ghorbani Source: IJNS review paper Reporter: Chun-Ta Li ( 李俊達 )
Classification of multiple cancer types by multicategory support vector machines using gene expression data.
Whole Genome Expression Analysis
An Example of Course Project Face Identification.
Gene Expression Data Qifang Xu. Outline cDNA Microarray Technology cDNA Microarray Technology Data Representation Data Representation Statistical Analysis.
Support vector machines for classification Radek Zíka
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Computational biology of cancer cell pathways Modelling of cancer cell function and response to therapy.
The Rise of Genomics AP Biology Fall The Human Genome Project  With the invention of PCR and automated sequencing, scientists argued for the sequencing.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
+ Get Rich and Cure Cancer with Support Vector Machines (Your Summer Projects)
Predicting protein degradation rates Karen Page. The central dogma DNA RNA protein Transcription Translation The expression of genetic information stored.
Design and Implementation of a Dynamic Data MLP to Predict Motion Picture Revenue David A. Gerasimow.
A presentation on the topic For CIS 595 Bioinformatics course
Support Vector Machine Data Mining Olvi L. Mangasarian with Glenn M. Fung, Jude W. Shavlik & Collaborators at ExonHit – Paris Data Mining Institute University.
Chapter 11: Functional genomics
09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04 Introducing “Proteins” into Genetic Algorithms Virginie LEFORT, Carole KNIBBE, Guillaume.
Neural Network Classification versus Linear Programming Classification in breast cancer diagnosis Denny Wibisono December 10, 2001.
Class 23, 2001 CBCl/AI MIT Bioinformatics Applications and Feature Selection for SVMs S. Mukherjee.
Support Vector Machines and Gene Function Prediction Brown et al PNAS. CS 466 Saurabh Sinha.
Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.
The Genetic Code. The DNA that makes up the human genome can be subdivided into information bytes called genes. Each gene encodes a unique protein that.
CSE182 L14 Mass Spec Quantitation MS applications Microarray analysis.
Gist 2.3 John H. Phan MIBLab Summer Workshop June 28th, 2006.
Machine Learning and Data Mining: A Math Programming- Based Approach Glenn Fung CS412 April 10, 2003 Madison, Wisconsin.
Support Vector Machines Optimization objective Machine Learning.
Classification of Breast Cancer Cells Using Artificial Neural Networks and Support Vector Machines Emmanuel Contreras Guzman.
1 CISC 841 Bioinformatics (Fall 2008) Review Session.
Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.
Information flow from DNA to trait
A graph-based integration of multiple layers of cancer genomics data (Progress Report) Do Kyoon Kim 1.
Disease risk prediction
Warm-Up (12/01) On the piece of white paper from the back, answer the following question. Name Date Period How did the Hershey-Chase experiment prove.
Gene Expression Analysis
Regulation of Gene Expression
Gene expression.
Bud Mishra Professor of Computer Science and Mathematics 12 ¦ 3 ¦ 2001
Schizophrenia Classification Using
WICT 2008 Offline Handwritten Signature Verification using Radial Basis Function Neural Networks Kenneth P. Camilleri St. Martin’s Institute of IT Dept.
Feature Engineering Studio Special Session
محاضرة عامة التقنيات الحيوية (هندسة الجينات .. مبادئ وتطبيقات)
Building and Analyzing Genome-Wide Gene Disruption Networks
חיזוי ואפיון אתרי קישור של חלבון לדנ"א מתוך הרצף
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
Protein Synthesis Lecture 5
Predicted location and functional classification of differentially expressed transcripts. Predicted location and functional classification of differentially.
Deep Learning in Bioinformatics
Evaluating Classifiers for Disease Gene Discovery
Presentation transcript:

DNA Microarray Data Analysis using Artificial Neural Network Models. by Venkatanand Venkatachalapathy (‘Venkat’) ECE/ CS/ ME 539 Course Project

Genetic information flow Genes {DNA} RNA intermediate Protein GENE EXPRESSION (Gene expression refers to both transcription and translation.)  Genes (information molecules) – code for RNA & Proteins (functional molecules- properties of cell).  “Gene Expression Level” - amount of Prot./ RNA produced per gene. Expression varies dynamically with time depending on environment, stage of development of cell etc.  When expression level is “high” or “low” with respect to a reference condition (‘normal state’), GENE is said to be switched ‘ON’ or ‘OFF’. TranscriptionTranslation

Microarray experiments and data  Measures Gene Exp. level of 1000’s of genes in a single experiment.  For a single experiment, each gene has a data point expressed as a ratio of current state expression to reference state expression. Eg. Exp. level for Genes [A B C] = [ 3000/10 10/30 1/1] (Conventionally, these ratios are normalized on a log scale)  ‘N’ such experiments for M genes give rise to GENE EXPRESSION MATRIX ( M x N) G(i,j) = expression level of i th gene in j th experiment. (Collection of Gene expression row vectors)  Enormous Significance in Biotech. & Medicine! WHY? Genome projects completed, => KNOW GENETIC CODE, MUST FIND FUNCTION?

Project Problem & Methodology OBJECTIVE:  Classify “unknown” genes to functional classes based on: - Microarray gene expression data & Knowledge about function of “well known” genes.  A Graphical User Interface for the analysis. SOLUTION STRATEGY:  Functionally related genes have similar expression level! Two step: 1.For “Well known genes” - correlate their gene expression vector & functional class. This correlation can be encoded in a Neural Network! 2. Using this Neural Network, classify of unknown genes using its gene expression vector!

ANN Models & Program Features  Models chosen  MLP (used bp.m), SVM (linear kernel, polynomial kernel, radial basis kernel – svmdemo.m)  GUI Interface accepting comma limited gene expression data files (.csv).

Data Source: Stanford Microarray Database. Classification of 2467 genes into “TCA” Class and “Non-TCA” Class (Tested by 3- way cross validation) Brown et al used SVM Radial Basis : 99.5% MLP Results SVM results ARCHITECTURETRAIN C-RATE TEST C-RATE 80 – 15 – % 96.5 % 80 – 15 – 5 –199.6%97% KERNEL ORDER TRAIN C-RATE TEST C-RATE Linear100%99.1% Poly – 2 And Poly –3 To be done Preliminary Results