John Darrell Van Horn, Ph.D. Associate Professor
Discovery Science (also known as “discovery-based science”) is a scientific methodology which emphasizes analysis of large volumes of experimental data with the goal of finding new patterns or correlations, leading to hypothesis formation and new scientific results. Discovery-based methodologies are often viewed in contrast to traditional scientific practice, where hypotheses are formed before close examination of experimental data. However, from a philosophical perspective where all or most of the observable "low hanging fruit" has already been plucked, examining the phenomenological world more closely opens a new source of knowledge for hypothesis formation. Data mining is the most common tool used in discovery science, and is applied to data from diverse fields of study such as brain imaging, DNA analysis, and proteomics. The use of data mining in discovery science follows a general trend of increasing use of computers and computational theory in all fields of science. Further following this trend, the cutting edge of data mining employs specialized machine learning algorithms for automated hypothesis forming and automated theorem proving.
Aim 1: Courses. Establish short course offerings focused on large-scale biomedical data informatics. These will be offered at University of Southern California (USC), University of Michigan (UM) and the University of Chicago (UC). Aim 2: Fellowships. Establish graduate and postdoctoral fellowships for those trainees electing to work and study in our BDDS Center(s). Aim 3: Visitors. Create a visiting professorship series providing space, training and support to enable professors from various institutions to come to our BDDS Center institutions for training and experience. Aim 4: Seminars and Workshops. Create topic-specific and audience-appropriate hands-on workshops. Our goal is to create a suite of tutorials on best practices and Big Data solutions - which illustrate hands-on, utilization of BDDS workflows, data management tools, data resources, and expertise to address concrete biomedical problems – and broadly disseminate our suite to broad multidisciplinary audiences. Aim 5: Training Materials. Develop interactive training materials for Big Data informatics. This will include online software documentation and tutorials, test datasets, Big Data use cases, educational papers, books, videos and webcasts with instructional aides to assist in training and spawn interest in large-scale biomedical informatics.
Big Biomedical Data Roundtable (August 4 th, 2015): Join leading experts in large-scale biomedicine and computer science for a round table discussion of what the future of medical science research looks like from the point of view of “big data”. Featured speakers will include Carl Kesselman (USC), Ian Foster (Chicago), Arthur Toga (USC), and Lee Hood (Inst. for Systems Biology). Located at the University Park Campus of the University of Southern California, this intimate discussion will reveal new insights into 21 st Century biomedicine and the computational needs required for new understanding in brain, genomics, and proteomics and their combination for advancing science and curing disease. Big Data Analysis using the LONI Pipeline: Advanced Neuroimaging, Informatics, and Genomics Computing (September 11, 2015): This day-long event will include paired training and application demonstrations on using different graphical and script-based pipeline workflow architectures to manage, process, analyze and visualize large volumes of neuroimaging and genetics data. Attendees will learn to use several concrete end-to-end pipeline workflow solutions for brain imaging (sMRI, fMRI, DTI), proteomics, and phenotypic (demographic, genetic, clinical) data in development, aging and pathology. Proteomics Informatics Course in Vancouver, B.C., Canada: September 22-25, 2015 (Prior to HUPO World Congress in Vancouver, B.C, Canada): Our BDDS partner, the Seattle Proteome Center (SPC), is pleased to offer a four-day intensive in the use of a suite of open-source software tools designed for the analysis, validation, storage and interpretation of data obtained from large-scale quantitative proteomics experiments using stable isotope labeling method, multi-dimensional chromatography and tandem mass spectrometry. This will include a detailed introduction to the LONI Pipeline, the construction of scientific workflows, and the use of the Proteomics Toolkit. Through daily lectures and tutorials, each course participant should become proficient in the use of these BDDS-supported tools. (
Big Data for Discovery Science (Toga, USC)* ENIGMA (Thompson, USC)* Center for Big Data in Translational Genomics (Haussler, Santa Cruz) Center for Expanded Data Annotation and Retrieval (Musen, Stanford) Mobility Data Integration to Insight (Delp, Stanford) Translate Protein Data to Knowledge (Ping, UCLA) BIOCADDIE (Ohno-Machado, San Diego) Integrated Active Learning Framework for Biomedical BD2K (Pevzner, UCSD) The BD2K Concept Network (Lee, UCLA) Palm Springs, CA - October 9-11 th, 2015 Key Topics: Among the many topics relevant to BD2K to be discussed are: Identifying California-wide thematic linkages on brain research between BD2K research centers Organizing the computational needs for multi-site, large-scale brain research in CA Reviewing and exploring neuroinformatics concepts, tools, ontologies, challenges, and best- practices Featuring examples of large-scale neuroscience applications, results, visualization, and clinical outcomes Examining the needs for graduate, post-doctoral, etc. training in large-scale biomedical data methods Key Topics: Among the many topics relevant to BD2K to be discussed are: Identifying California-wide thematic linkages on brain research between BD2K research centers Organizing the computational needs for multi-site, large-scale brain research in CA Reviewing and exploring neuroinformatics concepts, tools, ontologies, challenges, and best- practices Featuring examples of large-scale neuroscience applications, results, visualization, and clinical outcomes Examining the needs for graduate, post-doctoral, etc. training in large-scale biomedical data methods
As the breadth and depth of BDDS tools for Big Data increase, we plan to develop the following: – “Boot camp”-style multi-day “experiences” – TED-style talks and further Round Table events – Short courses: University classes, workshops, and programs for conferences and satellite events