Computational Tools for Population Biology Tanya Berger-Wolf, Computer Science, UIC; Daniel Rubenstein, Ecology and Evolutionary Biology, Princeton; Jared.

Slides:



Advertisements
Similar presentations
INDIANAUNIVERSITYINDIANAUNIVERSITY GENI Global Environment for Network Innovation James Williams Director – International Networking Director – Operational.
Advertisements

GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
Ch:8 Design Concepts S.W Design should have following quality attribute: Functionality Usability Reliability Performance Supportability (extensibility,
Presentation at WebEx Meeting June 15,  Context  Challenge  Anticipated Outcomes  Framework  Timeline & Guidance  Comment and Questions.
GENI: Global Environment for Networking Innovations Larry Landweber Senior Advisor NSF:CISE Joint Techs Madison, WI July 17, 2006.
Presented by: Thabet Kacem Spring Outline Contributions Introduction Proposed Approach Related Work Reconception of ADLs XTEAM Tool Chain Discussion.
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CF21) IRNC Kick-Off Workshop July 13,
Oklahoma Supercomputing Symposium 2008 Oct 7 th 2008 Mining for Science and Engineering Presented by: Kenji Yoshigoe.
Rutgers Components Phase 2 Principal investigators –Paul Kantor, PI; Design, modelling and analysis –Kwong Bor Ng, Co-PI - Fusion; Experimental design.
Introduction and Overview “the grid” – a proposed distributed computing infrastructure for advanced science and engineering. Purpose: grid concept is motivated.
Social and behavioral scientists building cyberinfrastructure David W. Lightfoot Assistant Director, National Science Foundation Social, Behavior & Economic.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
Bing LiuCS Department, UIC1 Learning from Positive and Unlabeled Examples Bing Liu Department of Computer Science University of Illinois at Chicago Joint.
Introduction to Systems Analysis and Design
Computational Thinking Related Efforts. CS Principles – Big Ideas  Computing is a creative human activity that engenders innovation and promotes exploration.
SKA-cba-ase NSF and Science of Design Avogadro Scale Engineering Center for Bits & Atoms November 18-19, 2003 Kamal Abdali Computing & Communication.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Architectural Design.
1 Building National Cyberinfrastructure Alan Blatecky Office of Cyberinfrastructure EPSCoR Meeting May 21,
 A set of objectives or student learning outcomes for a course or a set of courses.  Specifies the set of concepts and skills that the student must.
Computing in Atmospheric Sciences Workshop: 2003 Challenges of Cyberinfrastructure Alan Blatecky Executive Director San Diego Supercomputer Center.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Unit 2: Engineering Design Process
Operational Capability: An underlying simplification of a data encoding standard has been developing over the past decade and is being demonstrated in.
Introduction to RUP Spring Sharif Univ. of Tech.2 Outlines What is RUP? RUP Phases –Inception –Elaboration –Construction –Transition.
Chapter 2 The process Process, Methods, and Tools
Lecture 1 What is Modeling? What is Modeling? Creating a simplified version of reality Working with this version to understand or control some.
Design Science Method By Temtim Assefa.
Exploring the Applicability of Scientific Data Management Tools and Techniques on the Records Management Requirements for the National Archives and Records.
Presented by Abirami Poonkundran.  Introduction  Current Work  Current Tools  Solution  Tesseract  Tesseract Usage Scenarios  Information Flow.
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
Master Thesis Defense Jan Fiedler 04/17/98
1 A National Virtual Specimen Database for Early Cancer Detection June 26, 2003 Daniel Crichton NASA Jet Propulsion Laboratory Sean Kelly NASA Jet Propulsion.
The roots of innovation Future and Emerging Technologies (FET) Future and Emerging Technologies (FET) The roots of innovation Proactive initiative on:
© 2012 xtUML.org Bill Chown – Mentor Graphics Model Driven Engineering.
DataTAG Research and Technological Development for a Transatlantic Grid Abstract Several major international Grid development projects are underway at.
Service - Oriented Middleware for Distributed Data Mining on the Grid ,劉妘鑏 Antonio C., Domenico T., and Paolo T. Journal of Parallel and Distributed.
Pascucci-1 Valerio Pascucci Director, CEDMAV Professor, SCI Institute & School of Computing Laboratory Fellow, PNNL Massive Data Management, Analysis,
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
Futures Lab: Biology Greenhouse gasses. Carbon-neutral fuels. Cleaning Waste Sites. All of these problems have possible solutions originating in the biology.
9 Systems Analysis and Design in a Changing World, Fourth Edition.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Distributed Information Systems. Motivation ● To understand the problems that Web services try to solve it is helpful to understand how distributed information.
Optical Architecture Invisible Nodes, Elements, Hierarchical, Centrally Controlled, Fairly Static Traditional Provider Services: Invisible, Static Resources,
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
Learning from Positive and Unlabeled Examples Investigator: Bing Liu, Computer Science Prime Grant Support: National Science Foundation Problem Statement.
Bing LiuCS Department, UIC1 Chapter 8: Semi-supervised learning.
MODEL-BASED SOFTWARE ARCHITECTURES.  Models of software are used in an increasing number of projects to handle the complexity of application domains.
First Responder Pathogen Detection System (FiRPaDS) Investigator: Bhaskar DasGupta, Computer Science Prime Grant Support: NSF (including a CAREER grant)
Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet.
Computational Tools for Population Biology Tanya Berger-Wolf, Computer Science, UIC; Daniel Rubenstein, Ecology and Evolutionary Biology, Princeton; Jared.
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
The OptIPuter Project Tom DeFanti, Jason Leigh, Maxine Brown, Tom Moher, Oliver Yu, Bob Grossman, Luc Renambot Electronic Visualization Laboratory, Department.
26/05/2005 Research Infrastructures - 'eInfrastructure: Grid initiatives‘ FP INFRASTRUCTURES-71 DIMMI Project a DI gital M ulti M edia I nfrastructure.
A Resource Discovery Service for the Library of Texas Requirements, Architecture, and Interoperability Testing William E. Moen, Ph.D. Principal Investigator.
Electronic visualization laboratory, university of illinois at chicago Towards Lifelike Interfaces That Learn Jason Leigh, Andrew Johnson, Luc Renambot,
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Unsupervised Streaming Feature Selection in Social Media
Societal-Scale Computing: The eXtremes Scalable, Available Internet Services Information Appliances Client Server Clusters Massive Cluster Gigabit Ethernet.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
High Risk 1. Ensure productive use of GRID computing through participation of biologists to shape the development of the GRID. 2. Develop user-friendly.
RobuSTore: Performance Isolation for Distributed Storage and Parallel Disk Arrays Justin Burke, Huaxia Xia, and Andrew A. Chien Department of Computer.
Cyberinfrastructure Overview of Demos Townsville, AU 28 – 31 March 2006 CREON/GLEON.
All Hands Meeting 2005 BIRN-CC: Building, Maintaining and Maturing a National Information Infrastructure to Enable and Advance Biomedical Research.
Introduction to Machine Learning, its potential usage in network area,
Sub-fields of computer science. Sub-fields of computer science.
Joseph JaJa, Mike Smorul, and Sangchul Song
Data Warehousing and Data Mining
Presentation transcript:

Computational Tools for Population Biology Tanya Berger-Wolf, Computer Science, UIC; Daniel Rubenstein, Ecology and Evolutionary Biology, Princeton; Jared Saia, Computer Science, U New Mexico Supported by NSF Technical Approach Collect explicitly dynamic social data: sensor collars on animals, disease logs, synthetic population simulations, cellphone and communications Represent a time series of observation snapshots as a layered graph. Questions about persistence and strength of social connections and about criticality of individuals and times can be answered using standard and novel graph connectivity algorithms Validate theoretical predictions derived from the abstract graph representation by simulations on collected data and controlled experiments on real populations Key Achievements and Future Goals A formal computational framework for analysis of dynamic social interactions Valid and tested computational criteria for identifying Individuals critical for spreading processes in a population Times of social and behavioral transition Implicit communities of individuals Preliminary results on Grevy’s zebra and wild donkeys data show that addressing dynamics of the population produces more accurate conclusions Extend and test our framework and computational tools to other problems and other data Problem Statement and Motivation Of the three existing species of zebra, one, the Grevy's zebra, is endangered while another, the plains zebra, is extremely abundant. The two species are similar in almost all but one key characteristic: their social organization. Finding patterns of social interaction within a population has applications from epidemiology and marketing to conservation biology and behavioral ecology. One of the intrinsic characteristics of societies is their continual change. Yet, there are few analysis methods that are explicitly dynamic. Our goal is to develop a novel conceptual and computational framework to accurately describe the social context of an individual at time scales matching changes in individual and group activity. Zebra with a sensor collar A snapshot of zebra population and the corresponding abstract representation

Collaborative Research: Information Integration for Locating and Querying Geospatial Data Lead PI: Isabel F. Cruz (Computer Science). In collaboration with Nancy Wiegand (U. Wisconsin-Madison) Prime Grant Support: NSF Technical Approach Geospatial data are complex and highly heterogeneous, having been developed independently by various levels of government and the private sector Portals created by the geospatial community disseminate data but lack the capability to support complex queries on heterogeneous data Complex queries on heterogeneous data will support information discovery, decision, or emergency response Data integration using ontologies Ontology representation Algorithms for the alignment and merging of ontologies Semantic operators and indexing for geospatial queries User interfaces for Ontology alignment Display of geospatial data Create a geospatial cyberinfrastructure for the web to Automatically locate data Match data semantically to other relevant data sources using automatic methods Provide an environment for exploring, and querying heterogeneous data for emergency managers and government officials Develop a robust and scalable framework that encompasses techniques and algorithms for integrating heterogeneous data sources using an ontology-based approach Problem Statement and Motivation Key Achievements and Future Goals

Learning from Positive and Unlabeled Examples Investigator: Bing Liu, Computer Science Prime Grant Support: National Science Foundation Problem Statement and Motivation Technical Approach Key Achievements and Future Goals Given a set of positive examples P and a set of unlabeled examples U, we want to build a classifier. The key feature of this problem is that we do not have labeled negative examples. This makes traditional classification learning algorithms not directly applicable..The main motivation for studying this learning model is to solve many practical problems where it is needed. Labeling of negative examples can be very time consuming. We have proposed three approaches. Two-step approach: The first step finds some reliable negative data from U. The second step uses an iterative algorithm based on naïve Bayesian classification and support vector machines (SVM) to build the final classifier. Biased SVM: This method models the problem with a biased SVM formulation and solves it directly. A new evaluation method is also given, which allows us to tune biased SVM parameters. Weighted logistic regression: The problem can be regarded as an one-side error problem and thus a weighted logistic regress method is proposed. In (Liu et al. ICML-2002), it was shown theoretically that P and U provide sufficient information for learning, and the problem can be posed as a constrained optimization problem. Some of our algorithms are reported in (Liu et al. ICML- 2002; Liu et al. ICDM-2003; Lee and Liu ICML-2003; Li and Liu IJCAI-2003). Our future work will focus on two aspects: Deal with the problem when P is very small Apply it to the bio-informatics domain. There are many problems there requiring this type of learning. Positive training data Unlabeled data Learning algorithm Classifier

Gene Expression Programming for Data Mining and Knowledge Discovery Investigators: Peter Nelson, CS; Xin Li, CS; Chi Zhou, Motorola Inc. Prime Grant Support: Physical Realization Research Center of Motorola Labs Problem Statement and Motivation Technical Approach Key Achievements and Future Goals Real world data mining tasks: large data set, high dimensional feature set, non-linear form of hidden knowledge; in need of effective algorithms. Gene Expression Programming (GEP): a new evolutionary computation technique for the creation of computer programs; capable of producing solutions of any possible form. Research goal: applying and enhancing GEP algorithm to fulfill complex data mining tasks. Overview: improving the problem solving ability of the GEP algorithm by preserving and utilizing the self- emergence of structures during its evolutionary process Constant Creation Methods for GEP: local optimization of constant coefficients given the evolved solution structures to speed up the learning process. A new hierarchical genotype representation: natural hierarchy in forming the solution and more protective genetic operation for functional components Dynamic substructure library: defining and reusing self- emergent substructures in the evolutionary process. Have finished the initial implementation of the proposed approaches. Preliminary testing has demonstrated the feasibility and effectiveness of the implemented methods: constant creation methods have achieved significant improvement in the fitness of the best solutions; dynamic substructure library helps identify meaningful building blocks to incrementally form the final solution following a faster fitness convergence curve. Future work include investigation for parametric constants, exploration of higher level emergent structures, and comprehensive benchmark studies. Genotype: sqrt.*.+.*.a.*.sqrt.a.b.c./.1.-.c.d Mathematical form:Phenotype: Figure 1. Representations of solutions in GEP

Massive Effective Search from the Web Investigator: Clement Yu, Department of Computer Science Primary Grant Support: NSF Problem Statement and Motivation Technical Approach Key Achievements and Future Goals Retrieve, on behalf of each user request, the most accurate and most up-to-date information from the Web. The Web is estimated to contain 500 billion pages. Google indexed 8 billion pages. A search engine, based on crawling technology, cannot access the Deep Web and may not get most up-to-date information. A metasearch engine connects to numerous search engines and can retrieve any information which is retrievable by any of these search engines. On receiving a user request, automatically selects just a few search engines that are most suitable to answer the query. Connects to search engines automatically and maintains the connections automatically. Extracts results returned from search engines automatically. Merges results from multiple search engines automatically. Optimal selection of search engines to answer accurately a user’s request. Automatic connection to search engines to reduce labor cost. Automatic extraction of query results to reduce labor cost. Has a prototype to retrieve news from 50 news search engines. Has received 2 regular NSF grants and 1 phase 1 NSF SBIR grant. Has just submitted a phase 2 NSF SBIR grant proposal to connect to at least 10,000 news search engines. Plans to extend to do cross language (English-Chinese) retrieval. Users Queries Metasearch Engine Search Engine N Search Engine 1 ……… Queries Results

Automatic Analysis and Verification of Concurrent Hardware/Software Systems Investigators: A.Prasad Sistla, CS dept. Prime Grant Support: NSF Problem Statement and Motivation Technical Approach Key Achievements and Future Goals The project develops tools for debugging and verification hardware/software systems. Errors in hardware/software analysis occur frequently Can have enormous economic and social impact Can cause serious security breaches such errors need to be detected and corrected Model Checking based approach Correctness specified in a suitable logical frame work Employs State Space Exploration Different techniques for containing state space explosion are used Developed SMC ( Symmetry Based Model Checker ) Employed to find bugs in Fire Wire Protocol Also employed in analysis of security protocols Need to extend to embedded systems and general software systems Need to combine static analysis methods with model checking Counter example Yes/No Concurrent System Spec Correctness Spec Model Checker

The OptIPuter Project Tom DeFanti, Jason Leigh, Maxine Brown, Tom Moher, Oliver Yu, Bob Grossman, Luc Renambot Electronic Visualization Laboratory, Department of Computer Science, UIC Larry Smarr, California Institute of Telecommunications and Information Technology, UCSD National Science Foundation Award #SCI Problem Statement and Motivation Technical Approach—UIC OptIPuter Team Key Achievements and Future Goals—UIC Team The OptIPuter, so named for its use of Optical networking, Internet Protocol, computer storage, processing and visualization technologies, is an infrastructure that tightly couples computational resources and displays over parallel optical networks using the IP communication mechanism. The OptIPuter exploits a new world in which the central architectural element is optical networking, not computers. This paradigm shift requires large-scale applications-driven, system experiments and a broad multidisciplinary team to understand and develop innovative solutions for a "LambdaGrid" world. The goal of this new architecture is to enable scientists who are generating terabytes of data to interactively visualize, analyze, and correlate their data from multiple storage sites connected to optical networks. Deployed tiled displays and clusters at partner sites Procured a 10Gigabit Ethernet (GigE) private network UIC to UCSD Connected 1GigE and 10GigE metro, regional, national and international research networks into the OptIPuter project. Developed software and middleware to interconnect and interoperate heterogeneous network domains, enabling applications to set up on-demand private networks using electronic-optical and fully optical switches. Developed advanced data transport protocols to move large data files quickly Developed a two-month Earthquake instructional unit test in a fifth-grade class at Lincoln school Develop high-bandwidth distributed applications in geoscience, medical imaging and digital cinema Engaging NASA, NIH, ONR, USGS and DOD scientists Design, build and evaluate ultra-high-resolution displays Transmit ultra-high-resolution still and motion images Design, deploy and test high-bandwidth collaboration tools Procure/provide experimental high-performance network services Research distributed optical backplane architectures Create and deploy lightpath management methods Implement novel data transport protocols Design performance metrics, analysis and protocol parameters Create outreach mechanisms benefiting scientists and educators Assure interoperability of software developed at UIC with OptIPuter partners (Univ of California, San Diego; Northwestern Univ; San Diego State Univ; Univ of Southern California; Univ of Illinois at Urbana- Champaign; Univ of California, Irvine; Texas A&M Univ; USGS; Univ of Amsterdam; SARA/Amsterdam; CANARIE; and, KISTI/Korea.

Invention and Applications of ImmersiveTouch™, a High-Performance Haptic Augmented Virtual Reality System Investigator: Pat Banerjee, MIE, CS and BioE Departments Prime Grant Support: NIST-ATP Problem Statement and Motivation Technical Approach Key Achievements and Future Goals High-performance interface enables development of medical, engineering or scientific virtual reality simulation and training applications that appeal to many stimuli: audio, visual, tactile and kinesthetic. First system that integrates a haptic device, a head and hand tracking system, a cost-effective high-resolution and high-pixel- density stereoscopic display Patent application by University of Illinois Depending upon future popularity, the invention can be as fundamental as a microscope Continue adding technical capabilities to enhance the usefulness of the device