The Future of Scientific Computing at Harvard Alyssa A. Goodman Professor of Astronomy Director, Initiative in Innovative Computing Alyssa A. Goodman Professor.

Slides:



Advertisements
Similar presentations
E-Science Data Information and Knowledge Transformation Thoughts on Education and Training for E-Science Based on edikt project experience Dr. Denise Ecklund.
Advertisements

Indian Institute of Remote Sensing Indian Space Research Organisation Dehradun Challenges in Capacity Building in Remote Sensing & GIS P. S. Roy
Building a Strategic Management System Office for Student Affairs, Twin Cities Campus Ground Level Work Metrics Initiatives Managing Change Change Management.
TATIONpRÆSEN AARHUS UNIVERSITET 1 AARHUS UNIVERSITET Aarhus University - The new administration.
Presentation at WebEx Meeting June 15,  Context  Challenge  Anticipated Outcomes  Framework  Timeline & Guidance  Comment and Questions.
Dept. of Computing and Technology (CaT) School of Science and Technology B.S. in Computer Information Systems (CIS) CIP Code: Program Code: 411.
Science of Science and Innovation Policy (SciSIP) Presentation to: SBE Advisory Committee By: Dr. Kaye Husbands Fealing National Science Foundation November.
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CF21) IRNC Kick-Off Workshop July 13,
Amanda Felix BUS 550 Tuesday, May 24,  Traditional methods are not enough!  Reduce costs, improve efficiency and spur innovation!  Information.
School of Business University of Bridgeport Admissions Presentation Robert Gilmore, Ph.D. Associate Dean School of Business.
A new collaborative scientific initiative at Harvard.
Star & Planet Formation Alyssa A. Goodman Harvard-Smithsonian Center for Astrophysics.
PURPOSE OF THE UIF * Enable the University to seize opportunities at the frontiers of knowledge and learning or to reshape existing programs consistent.
1 Strategic Planning: An Update March 13, Outline What we have done so far? Where do we stand now? Next steps?
Institute of Technology University of Minnesota An Introduction Mos Kaveh Associate Dean for Research and Planning Centennial Professor, Electrical & Computer.
Data Sources & Using VIVO Data Visualizing Scholarship VIVO provides network analysis and visualization tools to maximize the benefits afforded by the.
The NIH Roadmap for Medical Research
Engineering & Physical Sciences Research Council.
Computing in Atmospheric Sciences Workshop: 2003 Challenges of Cyberinfrastructure Alan Blatecky Executive Director San Diego Supercomputer Center.
Company LOGO Broader Impacts Sherita Moses-Whitlow 07/09/09.
Module 2.1 Finance and Administration Cabinet Organizational Changes and Agency Impact March
EMU Strategic Planning Strategic Planning Material Mission/Vision/Values Goals and Objectives January 10, 2014.
Nurjana Technologies Company Presentation. Nurjana Technologies (NT) is a small business enterprise founded in 2012 and operating in Aerospace and Defence.
Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
ISCB-SC - Saraswathi.S Saraswathi S PhD Student Iowa State University Ames, Iowa, USA Chairperson: ISCB-SC-RSG.
2011 SAA Annual Meeting Genya O’Gara SPECIAL COLLECTIONS RESEARCH CENTER Engaged! Innovative Engagement and Outreach and Its Assessment.
Designing the Microbial Research Commons: An International Symposium Overview National Academy of Sciences Washington, DC October 8-9, 2009 Cathy H. Wu.
Partnerships and Broadening Participation Dr. Nathaniel G. Pitts Director, Office of Integrative Activities May 18, 2004 Center.
Mission The faculty and staff of Pittman Elementary School are committed to providing every student with adequate time, effective teaching, and a positive.
© 2011 Partners Harvard Medical International Strategic Plan for Teaching, Learning and Assessment Program Teaching, Learning, and Assessment Center Strategic.
Seeing Science with Animation Alyssa A. Goodman Harvard University.
An Excellent Proposal is a Good Idea, Well Expressed, With A Clear Indication of Methods for Pursuing the Idea, Evaluating the Findings, and Making Them.
Introduction to Science Informatics Lecture 1. What Is Science? a dependence on external verification; an expectation of reproducible results; a focus.
Preparing and Evaluating 21 st Century Faculty Aligning Expectations, Competencies and Rewards The NACU Teagle Grant Nancy Hensel, NACU Rick Gillman, Valporaiso.
New Research and Collaborative Opportunities for CHIP Affiliates – School and Child Health Sandra M Chafouleas, Professor & Associate Dean for Research,
Mapping New Strategies: National Science Foundation J. HicksNew York Academy of Sciences4 April 2006 Examples from our daily life at NSF Vision Opportunities.
1 Direction scientifique Networks of Excellence objectives  Reinforce or strengthen scientific and technological excellence on a given research topic.
Catawba County Board of Commissioners Retreat June 11, 2007 It is a great time to be an innovator 2007 Technology Strategic Plan *
What is CDR? – A Few Examples Water Resources in a Changing Climate – Idaho Climate Change Large CD consortia — not the case that everyone works on everything.
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
SciVal Spotlight Training for KU Huiling Ng, SciVal Product Sales Manager (South East Asia) Cassandra Teo, Account Manager (South East Asia) June 2013.
Materials Innovation Platforms (MIP): A New NSF Mid-scale Instrumentation and User Program to Accelerate The Discovery of New Materials MRSEC Director’s.
1 Electrono. 2 Vision Foster industry - academia collaboration to solve real-time engineering challenges Mission – 2013 To address the needs.
THE ROLE AND COMMITMENT OF RESEARCHERS/INTELLECTUALS IN THE DIGITAL AND GLOBAL AGE. THE EXPERIENCE OF THE VIRTUAL CENTER FOR HIGH ENERGIES (CEVALE2) FOR.
EScience: Techniques and Technologies for 21st Century Discovery Ed Lazowska Bill & Melinda Gates Chair in Computer Science & Engineering Computer Science.
Foundations of Information Systems in Business. System ® System  A system is an interrelated set of business procedures used within one business unit.
Why Write A Grant? Elaine M. Hylek, MD, MPH Professor of Medicine Associate Director, Education and Training Division BU CTSI Section of General Internal.
University of Kentucky Center for Clinical and Translational Science (CCTS) November 2015 Stephen W. Wyatt, DMD, MPH Senior Associate Director Center for.
Program Review Section III Training Sacramento City College Student Services Division Fall 2008.
Understanding Collaboration Using Social Network Analysis Diana Rhoten Office of Cyberinfrastructure National Science Foundation.
Toward a common data and command representation for quantum chemistry Malcolm Atkinson Director 5 th April 2004.
AACN – Manatt Study In February 2015, the AACN Board of Directors commissioned Manatt Health to conduct a study on how to position academic nursing to.
Info-Tech Research Group1 Info-Tech Research Group, Inc. is a global leader in providing IT research and advice. Info-Tech’s products and services combine.
- 1 - FINAL_NOScript_JDVerificationTraining pptx Job Titles Examples Used for HISD Nonexempt Jobs Assistant: Using knowledge of a functional area(s),
QCC General Education Assessment Task Force March 21 and 22, 2016 Faculty Forum on General Education Outcomes.
WORKFORCE DEVELOPMENT DOMAIN TASK FORCE Rebecca Jackson, MD Co-Lead.
A successful public- private partnership Maria Girone CERN openlab CTO.
Data Sources & Using VIVO Data Visualizing Science VIVO provides network analysis and visualization tools to maximize the benefits afforded by the data.
Job Titles Examples Used for HISD Nonexempt Jobs
Community Science Updates
Joslynn Lee – Data Science Educator
Department of Medicine Michael Farkouh, Vice-Chair Research michael
What are your Career Options?
Cyberinfrastructure for the Life Sciences
EDUCAUSE MARC 2004 E-Portfolios: Two Approaches for Transforming Curriculum & Promoting Student Learning Glenn Johnson Instructional Designer Penn State.
Computer Science Section
Bird of Feather Session
Seminar on the Evaluation of AUT STEM Programme
Computer Science Dr Hwang Chair, Computer Science Department
Presentation transcript:

The Future of Scientific Computing at Harvard Alyssa A. Goodman Professor of Astronomy Director, Initiative in Innovative Computing Alyssa A. Goodman Professor of Astronomy Director, Initiative in Innovative Computing

“The Heavy Red Bag” How can computers advance (my) science?

A new collaborative scientific initiative at Harvard.

Computational challenges are common across scientific disciplines How to: Acquire, transmit, organize, and query new kinds of data? Apply distributed computing resources to solve complex problems? Derive meaningful insight from large datasets? Share, integrate and analyze knowledge across geographically dispersed researchers? Visually represent scientific results so as to maximize understanding? Opportunity to collaborate and apply insights from one field to another

Filling the “Gap” between Science and Computer Science Increasingly, core problems in science require computational solution Typically hire/“home grow” computationalists, but often lack the expertise or funding to go beyond the immediate pressing need Focused on finding elegant solutions to basic computer science challenges Often see specific, “applied” problems as outside their interests Scientific disciplines Computer Science departments

“Workflow” & “Continuum”

Workflow ExamplesAstronomyPublic Health“Collect”TelescopeMicroscope, Stethoscope, Survey COLLECT “National Virtual Observatory”/ COMPLETE CDC Wonder “Analyze” Study the density structure of a star- forming glob of gas Find a link between one factory’s chlorine runoff & disease ANALYZE Study the density structure of all star- forming gas in… Study the toxic effects of chlorine runoff in the U.S. “Collaborate” Work with your student COLLABORATE Work with 20 people in 5 countries, in real-time “Respond” Write a paper for a Journal. RESPOND Write a paper, the quantitative results of which are shared globally, digitally.

IIC contact: AG, FAS Workflow

Workflow a.k.a. The Scientific Method (in the Age of the Age of High-Speed Networks, Fast Processors, Mass Storage, and Miniature Devices) IIC contact: Matt Welsh, FAS

Workflow: The Harvard Virtual Brain Faculty of Arts and Sciences  Harvard College  Division of Engineering Harvard School of Public Health Faculty of Medicine  Harvard Medical School  Affiliated Teaching Hospitals Data Acquisition  MRI  PET  Microscopy  etc. Distributed Data Storage Data Processing  Analysis  Visualization  Integration  etc. Information Access  Query  Statistical Analysis  Knowledge Management  etc. Establishing a Harvard-wide Neuroscience Infrastructure Harvard IIC IIC contact: David Kennedy, HMS/MGH

New technologies for measurement and simulation are transforming the “workflow.” Manual/low throughput Solitary Limited by two hands Analog High throughput Automated/networked Highly scalable Digital Biomedicine: pre-genomics Biomedicine: genomics era

Continuum “Pure” Discipline Science (e.g. Galileo) “Pure” Computer Science (e.g. Turing) “Computational Science” Missing at Most Universities

Workflow & Continuum For any particular scientific investigation: Where does, and could, “computational science” make improvements in this cycle?

Harvard Public Health “NOW” (Oct. 2004) "In the past, experiments did not involve such large data sets," observed Dyann Wirth, professor of infectious diseases in the Department of Immunology and Infectious Diseases and member of the advisory group for the core. "There has been a dramatic change in the past five to 10 years in the amount and availability of genomic data [or the DNA sequences themselves] and functional genomic data, [or the sequences’ purpose]." In the past five years alone, the genomes of humans, rats, and the malaria parasite Plasmodium Falciparum have been published, for example. Dyann Wirth "One of the purposes of bioinformatics is to reduce the number of experiments that need to be done to achieve reliable information," said L.J. Wei, professor of biostatistics in the Department of Biostatistics and member of the advisory group for the core. "However, an issue right now is that there are huge data sets that can be run through different kinds of software programs, ending up with many data points. Unless we understand and use bioinformatics well, we may not even know which of those data points are important." L.J. Wei

Filling the “computational science” gap: IIC Problem-driven approach …focusing effort on solving problems that will have greatest impact & educational value Collaborative projects …combining disciplinary knowledge with computer science expertise Interdisciplinary effort …to ensure that best practices are shared across fields and that new tools and methodologies will be broadly applicable Links with industry …to draw on and learn from experience in applied computation Institutional funding …to ensure effort is directed towards key needs and not driven solely by narrow priorities of funding agencies

IIC at Harvard

Numerical Simulation of Star Formation Bate, Bonnell & Bromm 2002 (UKAFF) MHD turbulence gives “t=0” conditions; Jeans mass=1 M sun 50 M sun, 0.38 pc, n avg =3 x 10 5 ptcls/cc forms ~50 objects T=10 K SPH, no B or  movie=1.4 free-fall times

Simulations & Public Health

Goal: Statistical Comparison of “Real” and “Synthesized” Star Formation Figure based on work of Padoan, Nordlund, Juvela, et al. Excerpt from realization used in Padoan & Goodman 2002.

Measuring Motions: Molecular Line Maps

Alves, Lada & Lada 1999 Radio Spectral-Line Survey Radio Spectral-line Observations of Interstellar Clouds

Velocity from Spectroscopy Intensity "Velocity" Observed Spectrum All thanks to Doppler Telescope  Spectrometer

Intensity "Velocity" Observed Spectrum Telescope  Spectrometer All thanks to Doppler Velocity from Spectroscopy

Barnard’s Perseus COMPLETE/FCRAO W( 13 CO)

IRAS N dust H-  emission,WHAM/SHASSA Surveys (see Finkbeiner 2003) HH 2MASS/NICER Extinction

“Astronomical Medicine” Excerpts from Junior Thesis of Michelle Borkin (Harvard College); IIC Contacts: AG (FAS) & Michael Halle (HMS/BWH/SPL)

IC 348

“Astronomical Medicine”

After “Medical Treatment” Before “Medical Treatment”

3D Slicer Demo (available after talk) IIC contacts: Michael Halle & Ron Kikinis

VisualizationDistributed Computing Databases/ Provenance Analysis & Simulations Instrumentation Physically meaningful combination of diverse data types. e-Science aspects of large collaborations. Sharing of data and computational resources and tools in real-time. Management, and rapid retrieval, of data. “Research reproducibility” …where did the data come from? How? Development of efficient algorithms. Cross-disciplinary comparative tools (e.g. statistical). Improved data acquisition. Novel hardware approaches (e.g. GPUs, sensors). IIC: Five Research Branches

IIC: Innovative Organizational Model Culture Staffing Promotion/ career path Criteria for promotion will give equal weight to scholarly activities, and to technological invention No “class” distinctions made between teaching and non- teaching faculty, scientists and engineers, artists and designers working in the visualization program Highly accomplished academics and senior experts whose careers have been primarily in industry, working together

How IIC will Function: Overview IIC Objectives Identify and fund projects that are likeliest to have the greatest and broadest impact Pursue projects in way that will yield best outcome, enable shared learning, etc. Enable new research for specific scientific discipline Generate new computational tools for broader application Project execution Dissemination of knowledge Project selection

Role Submit proposal in response to call for ideas Evaluate/rank proposals for scientific merit: should this be a priority for IIC? Evaluate/prioritize proposals according to technical feasibility, assess resource needs Who participates Any Harvard researcher (e.g., in genomics, fluid dynamics, epidemiology,neuroscience, nanoscience, comp bio, chemical biology, optics, geology, astronomy, quantum mechanics, et al.) Harvard researchers representing broad interests of IIC stakeholders plus IIC Director & Dir. of Research Consists of IIC Director Dirs. of Res. & Adm/Ops Heads of IIC branches Project Selection Program Advisory Committee Project proposals IIC Management Team

Project Execution Responsible for project execution and metrics for tracking progress/performance; interfaces with IIC branch heads Scientists who “own” the problem and are committed to working with IIC staff to tackle it IIC staff scientists assigned to work on project by relevant IIC branch heads. The same IIC staff member may serve on multiple IIC project teams Discipline scientistsIIC staff Project Manager IIC Project Team C, etc. Discipline scientistsIIC staff Project Manager IIC Project Team B Discipline scientistsIIC staff Project Manager IIC Project Team A

Dissemination of Knowledge Seminars/colloquiaPublications Knowledge management system Communities of practice Scientific journals IIC white papers Internal... External… New tools IIC process

Education is central to IIC’s mission At Harvard: Undergraduate & graduate courses focused on “data-intensive science” New graduate certificate program, within existing Ph.D. programs Research opportunities at undergraduate, graduate, and postdoctoral levels Beyond Harvard: New museum, highlighting the kind of science done at the IIC

IIC organization: research and education Assoc Dir, Instrumentation Assoc Dir, Visualization Assoc Dir, Analysis & Simulation Provost IIC Director Assoc Provost Dir of Admin & Operations Project 1 (Proj Mgr 1) Project 2 (Proj Mgr 2) Project 3 (Proj Mgr 3) Dir of Education & Outreach    Etc. CIO (systems) Knowledge mgmt Education & Outreach staff Dean, Physical Sciences Dir of Research Assoc Dir, Databases/Data Provenance Assoc Dir, Distributed Computing

IIC organization: admin and operations Provost IIC Director Dir of Research Assoc Provost Dir of Admin & Operations Dir of Education & Outreach Dean, Physical Sciences Admin Finance Development Facilities HR Note: admin roles expected to be played by 1-2 staff members at outset; staff will grow with overall IIC growth

VisualizationDistributed Computing Databases/ Provenance Analysis & Simulations Instrumentation Physically meaningful combination of diverse data types. e-Science aspects of large collaborations. Sharing of data and computational resources and tools in real-time. Management, and rapid retrieval, of data. “Research reproducibility” …where did the data come from? How? Development of efficient algorithms. Cross-disciplinary comparative tools (e.g. statistical). Improved data acquisition. Novel hardware approaches (e.g. GPUs, sensors). IIC: Examples

Visualization: 3D Slicer (BWH Surgical Planning Lab) IIC contacts: Michael Halle & Ron Kikinis

IIC contact: Felice Frankel (MIT) Work: Garstecki/Whitesides (FAS) “Image and Meaning” (Visualization)

Distributed Computing: Semantics, Ontologies IIC Contact: Tim Clark (HMS/MGH)

Distributed Computing & Large Databases: Large Synoptic Survey Telescope Optimized for time domain scan mode deep mode 7 square degree field 6.5m effective aperture 24th mag in 20 sec > 5 Tbyte/night Real-time analysis Simultaneous multiple science goals Simultaneous multiple science goals IIC contact: Christopher Stubbs (FAS)

Relative optical survey power based on A  = 270 LSST design

AstronomyHigh Energy Physics LSSTSDSS2MASSMACHODLSBaBarAtlasRHIC First year of operation Run-time data rate to storage (MB/sec) 5000 Peak 500 Avg (zero- suppressd) 6* 540* 120* ( ’ 03) 250* ( ’ 04) Daily average data rate (TB/day) ( ’ 03) 10 ( ’ 04) Annual data store (TB) ( ’ 03) 500 ( ’ 04) Total data store capacity (TB) 20,000 (10 yrs) ,000100,000 (10 yrs) 10,000 (10 yrs) Peak computational load (GFLOPS) 140, ,000100,0003,000 Average computational load (GFLOPS) 140, ,000100,0003,000 Data release delay acceptable 1 day moving 3 months static 2 months 6 months1 year6 hrs (trans) 1 yr (static) 1 day (max) <1 hr (typ) Few days100 days Real-time alert of event30 secnone <1 hour1 hrnone Type/number of processors TBD1GHz Xeon MHz Sparc MHz Sparc MH z Pentium 5 Mixed/ GHz/ 10,000 Pentium/ 2500

Analysis & Simulations Figure based on work of Padoan, Nordlund, Juvela, et al. Excerpt from realization used in Padoan & Goodman 2002.

Analysis & Simulations: Neural Net Models of Intelligence Does Speed of Convergence in Neural Nets Predict Scores on Measures of “General Intelligence”? Select from the lower 8 the one that completes the pattern in the top 9 IIC contact: Stephen Kosslyn (Psychology)

(Easier) Analysis of Large Data Sets: Mendelian Disease Genes OMIM on the genome Position (MB) Chromosome 1 2 Hello world 189 Large data files reformat, merge, and filter Can a biologist get from here to there? Location of every known disease gene on the human genome Without programming? IIC contact: Eitan Rubin (FAS/CGR)

Instrumentation IIC contact: Matt Welsh, FAS

IIC: Mission The Institute for Innovative Computing (IIC) will make Harvard a world leader in the innovative and creative use of computational resources to address forefront scientific problems. We will focus on developing capabilities that are applicable to multiple disciplines, by undertaking specific, well-defined projects, thereby developing tools and approaches that can be generalized and shared. We will foster the flow of ideas and inventions along the continuum from basic science to scientific computation to computational science to computer science. We will train a next generation of creative and computationally capable scientists, build linkages to industry, and communicate with the public at large.

Why Here? Diverse group of senior faculty and accomplished scientists… …spanning a wide range of relevant disciplines, e.g., Computer science Physics, Chemistry, Astronomy, Statistics, Biology, Medicine, etc. Psychology, Graphic Design …with backgrounds in both academia and industry… …deeply committed to the vision of a collaborative approach to solving the most compelling computing challenges facing scientists today

Who are IIC’s “competitors”? Caltech Center for Advanced Scientific Computing Research Computation Institute at the University of Chicago Cornell Theory Center MIT Media Lab Scientific Computing and Imaging Institute (University of Utah) UK National eScience Center of the Universities of Glasgow and Edinburgh IIC is unique in its collaborative, comprehensive, interdisciplinary approach

IIC will evolve over three phases Phase I Timing IIC staffing level, combo of new faculty senior scientists admin staff Number of projects Educational mission New courses offered Outreach programs Other key milestones Phase II Phase III Total ~25to ~100 ~3to ~15 New courses to museum Evaluation schedule (internal, external committees)

Challenges In “Phase I” (Startup) Result of “Allston” Science & Technology Task Force IIC intended to be a “University” (not a single school) initiative FAS Constraints Faculty Appointments Non-Faculty Appointments Startup Space “Chicken-and-Egg” Problem with Recruiting Good, but not certain, Funding Prospects Role of DEAS Computer Science