Multiple Examples of tumor tissue (public data from Whitehead/MIT) SVM Classification of Multiple Tumor Types DNA Microarray Data Oracle Data Mining 78.25%

Slides:



Advertisements
Similar presentations
BiGCaT Bioinformatics Hunting strategy of the bigcat.
Advertisements

Discovery Workflow: (ServiceFlow) Programming the Grid Prof. Yike Guo Imperial College London.
Chromosome Disorders. Classification of genetic disorders  Single-gene disorders (2%)  Chromosome disorders (
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
Bioinformatics “Other techniques raise more questions than they answer. Bioinformatics is what answers the questions those techniques generate.” SheAvery
Genetic algorithms applied to multi-class prediction for the analysis of gene expressions data C.H. Ooi & Patrick Tan Presentation by Tim Hamilton.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical.
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
ONCOMINE: A Bioinformatics Infrastructure for Cancer Genomics
Multidimensional Analysis If you are comparing more than two conditions (for example 10 types of cancer) or if you are looking at a time series (cell cycle.
Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical.
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
Introduction of Cancer Molecular Epidemiology Zuo-Feng Zhang, MD, PhD University of California Los Angeles.
Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012.
LO: Be able to describe what gene therapy is and how it could be used.
SharePoint 2010 Business Intelligence Module 6: Analysis Services.
Bioinformatics.
Knowledgebase Creation & Systems Biology: A new prospect in discovery informatics S.Shriram, Siri Technologies (Cytogenomics), Bangalore S.Shriram, Siri.
Data integration via XML Ela Hunt John Wilson Vangelis Pafilis Inga Tulloch
Knowledge Discovery in Biomedicine Limsoon Wong Institute for Infocomm Research.
Analysis and Management of Microarray Data Dr G. P. S. Raghava.
From motif search to gene expression analysis
Life Sciences Integrated Demo Joyce Peng Senior Product Manager, Life Sciences Oracle Corporation
Biomarker and Classifier Selection in Diverse Genetic Datasets J AMES L INDSAY 1 E D H EMPHILL 2 C HIH L EE 1 I ON M ANDOIU 1 C RAIG N ELSON 2 U NIVERSITY.
SIMULATION USING CRYSTAL BALL. WHAT CRYSTAL BALL DOES? Crystal ball extends the forecasting capabilities of spreadsheet model and provide the information.
Supporting High- Performance Data Processing on Flat-Files Xuan Zhang Gagan Agrawal Ohio State University.
Supplemental Figures. Supplemental Figure 1. Top two canonical pathways clustering the potential predictive biomarkers. The Ingenuity IPA tool was used.
©Edited by Mingrui Zhang, CS Department, Winona State University, 2008 Identifying Lung Cancer Risks.
Building and Running caGrid Workflows in Taverna 1 Computation Institute, University of Chicago and Argonne National Laboratory, Chicago, IL, USA 2 Mathematics.
Oracle Data Mining Update and Xerox Application Charlie Berger Sr. Director of Product Management, Life Sciences and Data Mining
What is Genetic Research?. Genetic Research Deals with Inherited Traits DNA Isolation Use bioinformatics to Research differences in DNA Genetic researchers.
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
Gramene Objectives Provide researchers working on grasses and plants in general with a bird’s eye view of the grass genomes and their organization. Work.
+ Get Rich and Cure Cancer with Support Vector Machines (Your Summer Projects)
Predicting protein degradation rates Karen Page. The central dogma DNA RNA protein Transcription Translation The expression of genetic information stored.
1 FINAL PROJECT- Key dates –last day to decided on a project * 11-10/1- Presenting a proposed project in small groups A very short presentation (Max.
Gene Expression Analysis. 2 DNA Microarray First introduced in 1987 A microarray is a tool for analyzing gene expression in genomic scale. The microarray.
Gene Expression and Networks. 2 Microarray Analysis Supervised Methods -Analysis of variance -Discriminate analysis -Support Vector Machine (SVM) Unsupervised.
“Ontology” Group Report: Summary Xiaoshu, John, Vinay, Duncan, Robert, Amit, Alfredo, Vipul - An attempt to summarize and organize …
Bioinformatics and Computational Biology
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Gene expression. Gene Expression 2 protein RNA DNA.
CBioPortal Web resource for exploring, visualizing, and analyzing multidimentional cancer genomics data.
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
A New Generation of Artificial Neural Networks.  Support Vector Machines (SVM) appeared in the early nineties in the COLT92 ACM Conference.  SVM have.
1 Survey of Biodata Analysis from a Data Mining Perspective Peter Bajcsy Jiawei Han Lei Liu Jiong Yang.
Improving gene expression similarity measurement using pathway-based analytic dimension Changwon Keum BMDRC.
Department of Pathology UC Davis School of Medicine Jeff Gregg, M.D. The Development of an Informatics Platform for the Characterization of Clinical Samples.
ARCH/VCDE F2F BoF And the Presentation Subtitle Goes Here Ravi Madduri December 2008.
Selection of Resources for the Development of an Information Service Program in Molecular Biology and Genetics Ansuman Chattopadhyay, PhD Information Specialist.
Gene expression.
חיזוי ואפיון אתרי קישור של חלבון לדנ"א מתוך הרצף
A B C Supplementary figure S7
Life Sciences Integrated Demo Senior Product Manager, Life Sciences
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
5 × 7 = × 7 = 70 9 × 7 = CONNECTIONS IN 7 × TABLE
5 × 8 = 40 4 × 8 = 32 9 × 8 = CONNECTIONS IN 8 × TABLE
Volume 143, Issue 3, Pages e2 (September 2012)
CHK1 downregulation upon ERG overexpression.
4 × 6 = 24 8 × 6 = 48 7 × 6 = CONNECTIONS IN 6 × TABLE
5 × 6 = 30 2 × 6 = 12 7 × 6 = CONNECTIONS IN 6 × TABLE
10 × 8 = 80 5 × 8 = 40 6 × 8 = CONNECTIONS IN 8 × TABLE MULTIPLICATION.
3 × 12 = 36 6 × 12 = 72 7 × 12 = CONNECTIONS IN 12 × TABLE
Human Genome Project, Gene Therapy, and Cloning
5 × 12 = × 12 = × 12 = CONNECTIONS IN 12 × TABLE MULTIPLICATION.
5 × 9 = 45 6 × 9 = 54 7 × 9 = CONNECTIONS IN 9 × TABLE
3 × 7 = 21 6 × 7 = 42 7 × 7 = CONNECTIONS IN 7 × TABLE
Presentation transcript:

Multiple Examples of tumor tissue (public data from Whitehead/MIT) SVM Classification of Multiple Tumor Types DNA Microarray Data Oracle Data Mining 78.25% accuracy Green=Correct Red=Errors We feed multiple cancer types data into the Oracle DB: 16,063 genes, 144 cancer patients. We mine the data using Support Vector Machines and create the confusion matrix

SVM Classification of Multiple Tumor Types 78.25% accuracy Green=Correct Red=Errors Oracle Data Mining’s SVM models are able to accurately predict the multi-class tumor problem with 78.25% accuracy.

Identify Biomarkers for DLBC Lymphoma Treatment Outcome Attribute Importance identifies genes correlated with Lymphoma cancer.

Find a Cure for Lymphoma  Literature search on Lymphoma  Set up a project workspace  Set up a meeting  Check lab protocols  Store cell histology images  Analyze gene expression results  Study the markers  Find a lead

Study the Markers  Statistical analysis  Protein sequence analysis (Swissprot)  BLAST Search  Protein secondary structure study  Search of genes and genetic disorders (OMIM)  Pathway modeling

Data Analysis with JDeveloper

PKC Distribution Difference

Statistical Analysis Create an External Table to read data from lymphoma.txt.

Statistical Analysis Calculate Mean and Standard Deviation The t-test shows that the PKC expression levels in cured and fatal patients are significantly different.

Protein sequence analysis Load SwissProt into Oracle XML DB Load SwissProt into XML DB to learn more about expressed genes of interest

Load SwissProt into XML DB FTP SwissProt data and schema into Oracle XML DB

Load SwissProt into XML DB Access XML schema using XML Spy (XML editor) which connects to the database using WebDAV

Load SwissProt into XML DB

Register the XML Schema Once schema is registered, XML DB automatically generates tables

Describe the Table Generated