Session I Database & Data Mining Speaker: Mehmet M. Dalkilic Content of Talk & Notes: http://www.informatics.indiana.edu/dalkilic/retreat07 Bioinformatics Retreat 02.03.07 © M.M. Dalkilic
Bioinformatics Retreat @ Bradford Woods © Indiana University 2007 “Systems biology is the science of discovering, modeling, understanding and ultimately engineering at the molecular level the dynamic relationships between the biological molecules that define living organisms” Leroy Hood Institute for Systems Biology http://www.systemsbiology.org/ Saturday, December 29, 2018 Bioinformatics Retreat @ Bradford Woods © Indiana University 2007
Bioinformatics Retreat @ Bradford Woods © Indiana University 2007 Outline (I) A cursory overview of Database and Data Mining (II) Examples (a few) (III) Sundry important research questions (IV) Summary & Prelude to Discussion Saturday, December 29, 2018 Bioinformatics Retreat @ Bradford Woods © Indiana University 2007
Bioinformatics Retreat @ Bradford Woods © Indiana University 2007 Perspectives "There's millions and millions of unsolved problems. Biology is so digital, and incredibly complicated, but incredibly useful. …. It is hard for me to say confidently that, after fifty more years of explosive growth of computer science, there will still be a lot of fascinating unsolved problems at peoples' fingertips, that it won't be pretty much working on refinements of well-explored things. Maybe all of the simple stuff and the really great stuff has been discovered. It may not be true, but I can't predict an unending growth. I can't be as confident about computer science as I can about biology. Biology easily has 500 years of exciting problems to work on, it's at that level." - It is hard for me to say confidently that, after fifty more years of explosive growth of computer science, there will still be a lot of fascinating unsolved problems at peoples' fingertips, that it won't be pretty much working on refinements of well-explored things. I can't be as confident about computer science as I can about biology. Biology easily has 500 years of exciting problems to work on, it's at that level. Donald Knuth Saturday, December 29, 2018 Bioinformatics Retreat @ Bradford Woods © Indiana University 2007
Bioinformatics Retreat @ Bradford Woods © Indiana University 2007 Perspectives Computer science is no more about computers than astronomy is about telescopes. Edsger Dijkstra Saturday, December 29, 2018 Bioinformatics Retreat @ Bradford Woods © Indiana University 2007
Bioinformatics Retreat @ Bradford Woods © Indiana University 2007 Database Saturday, December 29, 2018 Bioinformatics Retreat @ Bradford Woods © Indiana University 2007
Bioinformatics Retreat @ Bradford Woods © Indiana University 2007 Database Saturday, December 29, 2018 Bioinformatics Retreat @ Bradford Woods © Indiana University 2007
Bioinformatics Retreat @ Bradford Woods © Indiana University 2007 Database Saturday, December 29, 2018 Bioinformatics Retreat @ Bradford Woods © Indiana University 2007
Bioinformatics Retreat @ Bradford Woods © Indiana University 2007 Database Saturday, December 29, 2018 Bioinformatics Retreat @ Bradford Woods © Indiana University 2007
Bioinformatics Retreat @ Bradford Woods © Indiana University 2007 Database SQL → Algebra → Optimized Algebra → Process → Table Saturday, December 29, 2018 Bioinformatics Retreat @ Bradford Woods © Indiana University 2007
Bioinformatics Retreat @ Bradford Woods © Indiana University 2007 Database SQL is essentially a form of First Order Predicate Calculus differs from general field of Mathematical logic * We don’t focus on use of functions (omit them in SQL) * We focus on finitary models Saturday, December 29, 2018 Bioinformatics Retreat @ Bradford Woods © Indiana University 2007
Database Why can’t I ask any question I’d like in a relational database? Dirk Van Gucht, DSI Saturday, December 29, 2018 Bioinformatics Retreat @ Bradford Woods © Indiana University 2007
Bioinformatics Retreat @ Bradford Woods © Indiana University 2007 Database Saturday, December 29, 2018 Bioinformatics Retreat @ Bradford Woods © Indiana University 2007
Database Why can’t I ask any question I’d like in a relational database? Saturday, December 29, 2018 Bioinformatics Retreat @ Bradford Woods © Indiana University 2007
Database Why can’t I ask any question I’d like in a relational database? Saturday, December 29, 2018 Bioinformatics Retreat @ Bradford Woods © Indiana University 2007
Database Why can’t I ask any question I’d like in a relational database? Saturday, December 29, 2018 Bioinformatics Retreat @ Bradford Woods © Indiana University 2007
Database Why can’t I ask any question I’d like in a relational database? Dirk Van Gucht, DSI Saturday, December 29, 2018 Bioinformatics Retreat @ Bradford Woods © Indiana University 2007
Bioinformatics Retreat @ Bradford Woods © Indiana University 2007 Datamining Dirk Van Gucht, DSI Saturday, December 29, 2018 Bioinformatics Retreat @ Bradford Woods © Indiana University 2007
Bioinformatics Retreat @ Bradford Woods © Indiana University 2007 Datamining Saturday, December 29, 2018 Bioinformatics Retreat @ Bradford Woods © Indiana University 2007
Bioinformatics Retreat @ Bradford Woods © Indiana University 2007 Biological processes can be modeled as complex networks of interconnected components. … Saturday, December 29, 2018 Bioinformatics Retreat @ Bradford Woods © Indiana University 2007
Bioinformatics Retreat @ Bradford Woods © Indiana University 2007 Data Integration Problem how is data meaningfully integrated Saturday, December 29, 2018 Bioinformatics Retreat @ Bradford Woods © Indiana University 2007
How are the data related? Messy issues of database & datamining How are the data related? What kind of model? What kind of inferencing? Is the data validated? Is there sufficient reason to use the network? Saturday, December 29, 2018 Bioinformatics Retreat @ Bradford Woods © Indiana University 2007
Relational Database currently ignores domains. Significant Challenges Relational Database currently ignores domains. The relational model is poor at modeling biological data and their uncertain nature…no probabilistic means in querying. No advance in querying. Incorporate other successes in dealing with large repositories. Databases have no casual user in mind—they are designed by experts. Datamining has focused almost exclusively on relational modeled data. Ignored actionable results. Viewing and Search are still in their infancy. Saturday, December 29, 2018 Bioinformatics Retreat @ Bradford Woods © Indiana University 2007
John Colbourne Scott Beason Thanks to organizers email me if you’d like to discuss anything Acknowledgements (no special order) Justen Andrews Haixu Tang Sun Kim Jim Costello Rupali Patwardhan Junguk Hur Sumit Middha Brian Ead, Esfandiar Haghverdi John Colbourne Scott Beason Pedja Radivojac Saturday, December 29, 2018 Bioinformatics Retreat @ Bradford Woods © Indiana University 2007