Download presentation
Presentation is loading. Please wait.
2
Pacific Northwest National Laboratory U.S. Department of Energy DOE Data Workshop View from Information-intensive Applications H. Steven Wiley Biomolecular Systems Initiative Pacific Northwest National Laboratory (www.sysbio.org)
3
2 Information Intensive Science Goals of IIS Understanding systems versus individual phenomena Strengthening/automating links between different types of data from different scales Examples Biology: Cell Signaling Biology: BIRN Chemistry: CMCS Homeland Defense Complexity of systems is becoming pervasive Challenges Efficient federation, graph-based queries Continuous data correlation Managing complex experiments, data provenance using multiple independent data and analysis resources Priorities High-performance federation, data mining, semantic query capabilities (software, hardware architecture) Knowledge environments (lightweight, evolvable, powerful, …) Organization and Visualization of large-scale, complex information
4
3 A systems-science approach to address complex problems New knowledge is assimilated from different data, tools, and disciplines at each scale Real-time bi-directional information flow Deep analysis across scales Multiple applications for the same information Challenges Data, provenance, annotation publication Syntactic and Semantic Federation Standardization versus innovation Examples: IUPAC – update of radical thermochemistry reference values by global expert group PrIMe – community developed optimized reaction mechanisms guiding experimental plans across scales, providing community resources for applied research Combustion is a Multi-scale Chemical Science Challenge
5
4 Volume of data, orders of magnitude larger and at different levels of abstraction Complexity of information spaces into very high dimensions, 200 the norm Information often out of context, incomplete, fuzzy Deception Information in all media types: text, imagery, video, voice, web, sensor data Time and temporal dynamics fundamentally change the approach Spatial, yet non-spatial abstract data Multiple ontologies, languages, cultures Privacy Issues Homeland Security: Pulling insight out of information overload Immigration Financial Sensors Shipping Communications Is there a domestic terrorist plot? Can we detect and prevent a terrorist attack BEFORE it happens? For homeland security and science we now turn to data-intensive visual analytics we now turn to data-intensive visual analytics
6
5
7
6 Molecular parameters: protein levels / states / locations / interactions / activities Cell function: death, proliferation, differentiation, migration,... Systems Biology of Cells Ultimate aim: Understanding and prediction of effects of component properties
8
7
9
8
10
9 What, Where, Quantity, Quality? What parts are being made? (identity) What is the regulatory network structured? (interactions) Where are the proteins located in cell? (location) What are their levels? (quantity) How do they interact with their partners? (activity) As a function of covalent modification Contribution of steric restrictions Forward and reverse rate constants To successfully model a complex biological system, one must minimally know the following information:
11
10 Cells as Input-Output Systems Biologists look at their experiments as input-output systems We start with a “defined” system to which we apply a stimulus (Input: independent variable) We then look for a specific response (output: dependent variable) The relationship between the input and output provides insight into the workings of the system System Input Output Unknown context So unless we control the experimental context, we cannot interpret our experiments
12
11 The Two Greatest Challenges of Systems Biology 1.Working with indeterminate systems 2.Understanding context - what it is and how to control and capture it
13
12 Defining the composition of living systems is driving analytical technologies Genomics Proteomics Metabanomics Expression profiling Imaging Etc……. All of these technologies seek to rigorously define the composition of living systems
14
13 Time 2-D display of detected peptides Mass Global simultaneous quantitative proteome measurements Proteins identified and quantified using accurate mass and time (AMT) tags 0 42 84126 LC elution time (min) m/z 750 1000 Dimension one - separation time Dimension two - accurate mass 1250 1500
15
14 9.4 Tesla High Throughput Mass Spectrometer 1 Experiment per hour 5000 spectra per experiment 4 MByte per spectrum Per instrument: 20 Gbytes per hour 480 Gbytes per day These are based on today's technologies. Time to analyze offsite: 1 week Time to analyze onsite: 48 hours Time to analyze onsite with smart storage: 2 hours High Throughput Proteomics
16
15 Integrated, High-throughput Experiments will Generate Enormous Amounts of Data
17
16
18
17 Trey Ideker The Molecular Interaction Scaffold is Huge
19
18 Cell Imaging New multispectral, multidimensional imaging techniques can generate enormous amounts of data
20
19 Cell Imaging Workflow Complex set of metadata collected here
21
20 How Much Data From Imaging? Currently, a high quality image of a single cell field is 4mb per image, obtained at 4fps (16mb/s) Following cell through one cell cycle is 24h, or approximately 1.4tb New hyperspectral microscopes analyzing only 10 wavelengths would generate 7tb/day Characterizing dynamics of most abundant set of genes (4000) would require 5.5pb This is for a single instrument and a single experiment using today’s technology
22
21 Understanding the influence of cell context is driving experimental and computational biology Cell Signaling Developmental biology Cancer and growth control Host-pathogen interactions Dynamics of microbial communities Cellular responses to stress
23
22 Computational Modeling Approaches -- Diverse Spectrum differential equations statistical mining Bayesian networks SPECIFIEDABSTRACTED Markov chains Boolean models relationships mechanisms influences * (including structure) *
24
23 Computer Models Allow Reconstruction of Processes Across Different Scales MODEL DATABASE Organ 1 Organ N Model 1 Cell Data Set N Unique ID Model Name Model Descr. Default Par. Default Comp. Timestamp Security Organ Species 1 Species N Species Solution Par. Input_par ID React. Rates Chemical Par. Concen. Val. - Geometric Par. Input_par ID Value_par - Equation Docs. Input_par ID Symbolic Source - TissueModel 1 Tissue N Cell Compute Par. Input_par ID Value_par - Initial Conditions Input_fld ID Value_par - Parameter Docs. Input_par ID References Limits -
25
24
26
25
27
26
28
27 Data is distributed across many repositories with various ontologies and data formats Analysis tools do not address integration of heterogeneous data sets Minimal informatics based analysis tools that support a systems biology approach Collaboration capabilities are primitive to support shared knowledge among researchers Obstacles preventing scientists from utilizing available data
29
28 The Challenge for Data Handling is Two-fold 1.Managing the massive amounts of compositional data necessary to define all of the relevant experimental systems 2.Capture all of the data on the relationships between context, composition and response Integration of the analytical and experimental methodologies into a single system is necessary to link all of the data in a useful way
30
29 END
31
30 Understanding Living Cells Cell responses are multiphasic Different classes of stimulants (information) are processed at characteristic time scales Processing nodes within cells are spatially segregated Each cell responds independently depending on its specific context A response generally induces a reprogramming of the cell machinery To create cell simulations, we must “abstract” this information to create a reference model which can then be modified
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.