9 th Annual "Humies" Awards 2012 — Philadelphia, Pennsylvania Uday Kamath, Amarda Shehu,Kenneth A De Jong Department of Computer Science George Mason University.

Slides:



Advertisements
Similar presentations
Grant review at NIH for statistical methodology Jeremy M G Taylor Michelle Dunn Marie Davidian.
Advertisements

GS 540 week 5. What discussion topics would you like? Past topics: General programming tips C/C++ tips and standard library BLAST Frequentist vs. Bayesian.
Forensic Identification by Craniofacial Superimposition using Soft Computing Oscar Ibáñez, Oscar Cordón, Sergio Damas, Jose Santamaría THE 7th ANNUAL (2010)
Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
Optimization of SVM Parameters for Promoter Recognition in DNA Sequences Robertas Damaševičius Software Engineering Department, Kaunas University of Technology.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
© 2005, it - instituto de telecomunicações. Todos os direitos reservados. GENOM-POF: Multi-Objective Evolutionary Synthesis of Analog ICs with Corners.
Bioinformatics and the Engineering Library ASEE 2008 Amy Stout.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
CSE182-L12 Gene Finding.
Scientific Data Mining: Emerging Developments and Challenges F. Seillier-Moiseiwitsch Bioinformatics Research Center Department of Mathematics and Statistics.
The Model To model the complex distribution of the data we used the Gaussian Mixture Model (GMM) with a countable infinite number of Gaussian components.
Many genes have unknown function 30% have unknown function only 9% are experimentally verified The Arabidopsis Genome Initiative, Nature 2000 of the 25,498.
Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002.
Lecture 12 Splicing and gene prediction in eukaryotes
Human Molecular Genetics Section 14–3
© Wiley Publishing All Rights Reserved. Working with a Single DNA Sequence.
Using DNA Subway in the Classroom Red Line Lesson Sketch.
Automated Alphabet Reduction Method with Evolutionary Algorithms for Protein Structure Prediction Jaume Bacardit, Michael Stout, Jonathan D. Hirst, Kumara.
Using DNA Subway in the Classroom Red Line Lesson Sketch.
1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview پرتال پرتال بيوانفورماتيك ايرانيان.
Bioinformatics and it’s methods Prepared by: Petro Rogutskyi
Problem Statement and Motivation Key Achievements and Future Goals Technical Approach Investigators: Yang Dai Prime Grant Support: NSF High-throughput.
Beyond the Human Genome Project Future goals and projects based on findings from the HGP.
Lesson Overview Lesson Overview Studying the Human Genome Lesson Overview 14.3 Studying the Human Genome.
Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.
Assignment 2: Papers read for this assignment Paper 1: PALMA: mRNA to Genome Alignments using Large Margin Algorithms Paper 2: Optimal spliced alignments.
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
1 Formal Verification of Candidate Solutions for Evolutionary Circuit Design (Entry 04) Zdeněk Vašíček and Lukáš Sekanina Faculty of Information Technology.
Agent-based methods for translational cancer multilevel modelling Sylvia Nagl PhD Cancer Systems Science & Biomedical Informatics UCL Cancer Institute.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Exploring Alternative Splicing Features using Support Vector Machines Feature for Alternative Splicing Alternative splicing is a mechanism for generating.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Genetic algorithms and solid-state NMR pulse sequences Matthias Bechmann *, John Clark $, Angelika Sebald & * Department of Organic Chemistry, Johannes.
WMU CS 6260 Parallel Computations II Spring 2013 Presentation #1 about Semester Project Feb/18/2013 Professor: Dr. de Doncker Name: Sandino Vargas Xuanyu.
Mark D. Adams Dept. of Genetics 9/10/04
Eukaryotic Gene Prediction Rui Alves. How are eukaryotic genes different? DNA RNA Pol mRNA Ryb Protein.
Epidemiology 217 Molecular and Genetic Epidemiology Bioinformatics & Proteomics John Witte.
EB3233 Bioinformatics Introduction to Bioinformatics.
Bioinformatics and Computational Biology
COMPUTATIONAL BIOLOGIST DR. MARTIN TOMPA Place of Employment: University of Washington Type of Work: Develops computer programs and algorithms to identify.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Alternative Splicing (a review by Liliana Florea, 2005) CS 498 SS Saurabh Sinha 11/30/06.
RBP1 Splicing Regulation in Drosophila Melanogaster Fall 2005 Jacob Joseph, Ahmet Bakan, Amina Abdulla This presentation available at
1 From Mendel to Genomics Historically –Identify or create mutations, follow inheritance –Determine linkage, create maps Now: Genomics –Not just a gene,
Motif Search and RNA Structure Prediction Lesson 9.
Bioinformatics Dipl. Ing. (FH) Patrick Grossmann
Genome Annotation Assessment in Drosophila melanogaster by Reese, M. G., et al. Summary by: Joe Reardon Swathi Appachi Max Masnick Summary of.
An Introduction to NCBI & BLAST National Center for Biotechnology Information Richard Johnston Pasadena City College.
Finding genes in the genome
Bioinformatics Research Overview Li Liao Develop new algorithms and (statistical) learning methods > Capable of incorporating domain knowledge > Effective,
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Chapter 9 : Application Areas. 2 Some Advance Application Areas of Computers  Software Development  Artificial Intelligence  Robotics  Industrial.
Using DNA Subway in the Classroom Genome Annotation: Red Line.
Jaume Bacardit, Michael Stout, Jonathan D
The Transcriptional Landscape of the Mammalian Genome
9th Annual "Humies" Awards 2012 — Philadelphia, Pennsylvania
Zdeněk Vašíček and Lukáš Sekanina
Bioinformatics Madina Bazarova. What is Bioinformatics? Bioinformatics is marriage between biology and computer. It is the use of computers for the acquisition,
Chapter 4 “DNA Finger Printing”
Avdesh Mishra, Manisha Panta, Md Tamjidul Hoque, Joel Atallah
Recitation 7 2/4/09 PSSMs+Gene finding
Genomes and Their Evolution
Genome organization and Bioinformatics
Introduction to Bioinformatics II
9 Future Challenges for Bioinformatics
Computational Discovery of miR-TF Regulatory Modules in Human Genome
Deep Learning in Bioinformatics
Evaluating Classifiers for Disease Gene Discovery
Manisha Panta, Avdesh Mishra, Md Tamjidul Hoque, Joel Atallah
Presentation transcript:

9 th Annual "Humies" Awards 2012 — Philadelphia, Pennsylvania Uday Kamath, Amarda Shehu,Kenneth A De Jong Department of Computer Science George Mason University Fairfax,VA, {ukamath, amarda, Genetic Programming Based Feature Generation for Automated DNA Sequence Analysis

Bioinformatics and Molecular Biology LarrañagaP et al. Brief Bioinform2006;7:86-112

Promoter Site Identification Copyright 2012 the British Journal of Anaesthesia Background Promoters signal the beginning of a coding region They are important signals for initiation of DNA->RNA transcription. Challenges Complex Gene-specific Many decoys

DNA Splice Site Identification Asa Ben-Hur, Cheng Soon Ong, Sören Sonnenburg, Bernhard Schölkopf, and Gunnar Rätsch TUTORIAL: SUPPORT VECTOR MACHINES AND KERNELS FOR COMPUTATIONAL BIOLOGY [2008] Background Splice sites mark boundaries between exons and introns in a gene Challenges No known sequence pattern i.Diverse sequence length ii.Diverse exon lengths iii.Diverse number and lengths of introns 0.1 to 1% true splice sites, rest decoys

Evolutionary (GP) Approach

Finding Functional Features GP Functional Features Terminals  A,C,T,G  Integers for position/region Basic Non Terminals  Motif (combination of ACTG)  Position based Motifs  Correlation based Motifs  Region based Motifs  Composition based Motifs Complex Non Terminals  Conjuntions  Disjunctions  Negations  Features Evolved combining accuracy/precision

Why Human Competitive ? B) The result >= than a result that was accepted as a new scientific result E) The result >= than the most recent human- created solution to a long-standing problem F) The result >= than a result that was considered an achievement when was first discovered G) The result solves a problem of indisputable difficulty in its field

Why Human Competitive ? B) The result >= than a result that was accepted as a new scientific result Splice Site Prediction Research compares state of the art Enumeration, Iterative, Probabilistic methods, Kernel methods etc. Best Precision with statistical significant improvements on most datasets Promoter Prediction Research compares results with 7 state of the art algorithms ranging from Enumeration, Iterative, Neural Networks, Kernel based etc. Best Precision and with statistical significant improvements on different datasets F) The result >= than a result that was considered an achievement when was first discovered

Why Human Competitive ? F) The result >= than a result that was considered an achievement when was first discovered On Promoter Identification Problem What was considered achievementWhere we stand Uday Kamath, Kenneth A De Jong, and Amarda Shehu. "An Evolutionary-based Approach for Feature Generation: Eukaryotic Promoter Recognition." IEEE Congress on Evolutionary Computation (IEEE CEC), New Orleans, LA, pg , 2011

Why Human Competitive ? On Splice site Identification Problem F) The result >= than a result that was considered an achievement when was first discovered What was considered achievement Where we stand Uday Kamath, Jack Compton, Rezarta Islamaj Dogan, Kenneth A. De Jong, and Amarda Shehu. An Evolutionary Algorithm Approach for Feature Generation from Sequence Data and its Application to DNA Splice-Site Prediction. Trans Comp Biol and Bioinf 2012

Why Human Competitive ? Long Standing Problem(s) Genome Sequence prediction and annotation of Splice sites and Promoters Computational Results >= Around 7 datasets and 10 algorithms compared Advancing Understanding in Genomics Our top features do contain signals painstakingly determined by biologists through decades of wet-lab research. More importantly, new features are found that may help biologists further advance their understanding of DNA architecture All our features are available online for experts to analyze and spur further wet-lab research E) The result >= than the most recent human-created solution to a long-standing problem

Why Human Competitive? G) The result solves a problem of indisputable difficulty in its field Estimated 10-25K human protein-coding genes (only 1.5% of entire genome) Wet-lab models of discovery costly and prone to errors Cannot keep pace with growing genomic sequences Computational models good complements, but Black Box Models – No or Little help to Biologists White Box Models- Lower precision/accuracy and reliant on manual steps Decades of research into DNA function and architecture “Gene finding” on pubmed returns > 80,000 research articles Progress crucial to speed up our understanding of disease and development of targeted treatments

Why is this the Best Entry Addresses central problems to molecular biology and health research Finding functional signals in genome sequences is complex and NP-Hard Improvements over state of the art are statistically significant Extensive statistical analysis validates usefulness of GP features – F-score and Information gain techniques Advances understanding to motivate further research – Features found by GP reproduce results of decades of research by biologists – Novel interesting features also reported – Features, data sets, and software publicly available for community Far reaching implications, spurring research beyond genomics – Example: finding what features determine anti-microbial activity for the purpose of generating novel peptides to combat drug resistance.