The Claremont Report on Database Research 2009-10-28 淡江大學 周清江.

Slides:



Advertisements
Similar presentations
Distributed Data Processing
Advertisements

Computational Paradigms in the Humanities – eHumanities and their role and impact in transdisciplinary research Gerhard Budin University of Vienna.
Nokia Technology Institute Natural Partner for Innovation.
Priority Research Direction (I/O Models, Abstractions and Software) Key challenges What will you do to address the challenges? – Develop newer I/O models.
Towards Autonomic Adaptive Scaling of General Purpose Virtual Worlds Deploying a large-scale OpenSim grid using OpenStack cloud infrastructure and Chef.
1.Data categorization 2.Information 3.Knowledge 4.Wisdom 5.Social understanding Which of the following requires a firm to expend resources to organize.
Big Data Management and Analytics Introduction Spring 2015 Dr. Latifur Khan 1.
Search Engines and Information Retrieval
MS DB Proposal Scott Canaan B. Thomas Golisano College of Computing & Information Sciences.
The Lowell Database Research Self Assessment 淡江大學 周清江.
WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.
Integration and Insight Aren’t Simple Enough Laura Haas IBM Distinguished Engineer Director, Computer Science Almaden Research Center.
Unlock Your Data Rich connectivity Robust data integration Enterprise-class manageability Deliver Relevant Information Intuitive design environment.
Chapter 14 The Second Component: The Database.
Overview of Web Data Mining and Applications Part I
Demonstrating IT Relevance to Business Aligning IT and Business Goals with On Demand Automation Solutions Robert LeBlanc General Manager Tivoli Software.
Review of Claremont Report on Database Research Jiaheng Lu Renmin University of China.
The 2014 International Conference on Internet Computing and Big Data (ICOMP'14), USA, Las-Vegas, July 21-24, science.org/worldcomp14/ws/conferences/icomp14/submission.
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 12 Slide 1 Distributed Systems Architectures.
Search Engines and Information Retrieval Chapter 1.
1 Copyright © 2004, Oracle. All rights reserved. Introduction to Oracle Forms Developer and Oracle Forms Services.
Cloud Computing. What is Cloud Computing? Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
The Yellow Group Design Informatics (Regli, Stone, Kusiak, Leifer, Gupta, Chung, Fenves, Law, Kopena)
Introduction to Hadoop and HDFS
Ensemble Computing in the National Science Digital Library (NSDL)
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Master Thesis Defense Jan Fiedler 04/17/98
SWETO: Large-Scale Semantic Web Test-bed Ontology In Action Workshop (Banff Alberta, Canada June 21 st 2004) Boanerges Aleman-MezaBoanerges Aleman-Meza,
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer.
Model-Driven Analysis Frameworks for Embedded Systems George Edwards USC Center for Systems and Software Engineering
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Future Learning Landscapes Yvan Peter – Université Lille 1 Serge Garlatti – Telecom Bretagne.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Big Data Analytics Large-Scale Data Management Big Data Analytics Data Science and Analytics How to manage very large amounts of data and extract value.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
SQL Server 2008 Analysis Services. END USER TOOLS & PERFORMANCE MANAGEMENT APPS Excel PerformancePoint Server BI PLATFORM SQL Server Reporting Services.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
Big Data Analytics Platforms. Our Team NameApplication Viborov MichaelApache Spark Bordeynik YanivApache Storm Abu Jabal FerasHPCC Oun JosephGoogle BigQuery.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
SQL Server 2012 Session: 1 Session: 4 SQL Azure Data Management Using Microsoft SQL Server.
Cyberinfrastructure Overview of Demos Townsville, AU 28 – 31 March 2006 CREON/GLEON.
Fedora Commons Overview and Background Sandy Payette, Executive Director UK Fedora Training London January 22-23, 2009.
Big Data: Every Word Managing Data Data Mining TerminologyData Collection CrowdsourcingSecurity & Validation Universal Translation Monolingual Dictionaries.
BIG DATA. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database.
© 2007 IBM Corporation IBM Software Strategy Group IBM Google Announcement on Internet-Scale Computing (“Cloud Computing Model”) Oct 8, 2007 IBM Confidential.
Data mining in web applications
Introduction to Oracle Forms Developer and Oracle Forms Services
Connected Infrastructure
Organizations Are Embracing New Opportunities
Computing models, facilities, distributed computing
Introduction to Oracle Forms Developer and Oracle Forms Services
Introduction to Oracle Forms Developer and Oracle Forms Services
Modern Data Management
Couchbase Server is a NoSQL Database with a SQL-Based Query Language
Connected Infrastructure
Chapter 18 MobileApp Design
به نام خدا Big Data and a New Look at Communication Networks Babak Khalaj Sharif University of Technology Department of Electrical Engineering.
Model-Driven Analysis Frameworks for Embedded Systems
Modernizing your enterprise with hybrid it
DeFacto Planning on the Powerful Microsoft Azure Platform Puts the Power of Intelligent and Timely Planning at Any Business Manager’s Fingertips Partner.
Data Warehousing and Data Mining
Defining Data-intensive computing
XtremeData on the Microsoft Azure Cloud Platform:
Improve Patient Experience with Saama and Microsoft Azure
Technical Capabilities
Presentation transcript:

The Claremont Report on Database Research 淡江大學 周清江

2 Background Senior database researchers have gathered every few years to assess the state of database research and to recommend problems and problem areas deserve additional focus. Laguna Beach, Calif. in 1989 Palo Alto, Calif. (“Lagunita”) in 1990 and 1995 Cambridge, Mass. in 1996 Asilomar, Calif. in 1998 Lowell, Mass. in 2003 Claremont, Calif. in 2008

3 New Focus Areas New database engine architectures Declarative programming languages Interplay of structured and unstructured data Cloud data services Mobile and virtual worlds

4 A Turning Point in Database Research Unusually rich opportunities for Technical advances, intellectual achievement, entrepreneurship, and impact on science and society Sense of change as a function of several factors Breadth of excitement about Big data Data analysis as a profit center Ubiquity of structured and unstructured data Expanded development demand Architecture shift in computing

5 Research Portfolio Change Impact and Breadth Evaluated by external measures Helping new classes of users Powering new computing platforms Making conceptual breakthroughs across computing

6 Two Promising Approaches Reformation Deconstucting core data-centric ideas and systems Reforming for new applications and architectural realities Synthesis Leverage good research ideas that have yet to develop identifiable, agreed-upon system architectures Data integration, information extraction, data privacy, etc.

7 Research Opportunities Revisiting Database Engines Declarative Programming for Emerging Platforms The Interplay of Structured and Unstructured Data Cloud Data Services Mobile Applications and Virtual Worlds

8 Research Opportunities Main issues cut across the above topics Management of uncertain information data privacy and security e-science and other scholarly applications human centric interactions with data social networks and Web 2.0 personalization and contextualization of query- and search-related tasks streaming and networked data self-tuning and adaptive systems, and the challenges raised by new hardware technologies and energy constraints

9 Revisiting Database Engines Data-intensive tasks for which relational DBs provide poor price/performance Ex: text indexing, serving web pages, media delivery Room for significant innovation within traditional application domains Analytics for business and science The cost of software and management relative to hardware is exorbitant OLTP Need to address data lifecycle issues  Data provenance, schema evolution, and versioning Good time to try radical ideas

10 Revisiting Database Engines Two directions of research projects Revolutionary steps in DB system architecture Broadening the range of applicability Radically improving performance by designing special purpose DB systems for specific domains These efforts may be synergistic

11 Revisiting Database Engines Important research topics in the core DB engine Designing systems for clusters of many-core processors Exploiting remote RAM and Flash as persistent media Treating query optimization and physical data layout as a unified, adaptive, self-tuning task to be carried out continuously Compressing and encrypting data at the storage layer, integrated with data layout and query optimization Designing systems for non-relational data models Trading off consistency and availability for better performance and scaleout to thousands of machines Designing power-aware DBMS that limit energy costs without sacrificing scalability

12 Declarative Programming for Emerging Platforms The urgency of programmer productivity is increasing exponentially as programmers target even more complex environments No-expert programmers need to be write robust code that scales out across processors in both loosely- and tightly-coupled architectures

13 Declarative Programming for Emerging Platforms Example: Map-Reduce New declarative languages, based on Datalog, have been developed for a variety of domain-specific systems Network and distributed systems, computer games, machine learning and robotics, compilers, security protocols, and information extraction Enterprise application programming Ruby on Rails ( ) LINQ (Language-Integrated Query, )

14 Declarative Programming for Emerging Platforms Research questions Language design Fairly expressive Attractive syntax, typing and modularity, development tools, smooth interactions with the rest of the computing ecosystem Efficient compilers and runtimes Techniques to optimize code automatically Across both the horizontal distribution of parallel processors and the vertical distribution of tiers Should extend techniques behind parallel and distributed DBs

15 The Interplay of Structured and Unstructured Data Within enterprises, heterogeneous collections of structured data linked with unstructured data On Web, structured data from Millions of DBs hidden behind forms (deep web) High quality data items in HTML tables on web pages, and mashups providing dynamic views on structured data Data contributed by Web 2.0 services Photo and video sites Collaborative annotation services On-line structured data repositories

16 The Interplay of Structured and Unstructured Data Challenges of managing dataspaces Managing a rich collection of structured, semi- structured, and unstructured data On the web, previous contributions Techniques for domain-specific search engines Domain-independent tech for crawling through forms, and surfacing the resulting HTML pages in a search-engine index Within enterprises, enterprise search and discovery of relationships between structured and unstructured data

17 The Interplay of Structured and Unstructured Data Challenge 1 Extract structure and meaning from unstructured and semi-structured data Applying and managing predictions from large numbers of independently developed extractors Need algorithms to introspect about the correctness of extractions Better technology to manage data in context  Discover data sources  Discover implicit relationships  Determine the weight of an object’s context when assigning it semantics  Maintain data provenance

18 The Interplay of Structured and Unstructured Data Challenge 2 Develop methods for effectively querying and deriving insight from the resulting sea of heterogeneous data Analyze keyword query to extract its intended semantics Route the query to relevant sources  Do not assume we have semantic mappings for the data sources  Cannot assume that the domain of the query or data sources is known The system should provide best-effort service and improve over time Develop index structures to support querying hybrid data Need new notions of correctness and consistency to provide metrics and to make cost/quality tradeoffs

19 The Interplay of Structured and Unstructured Data Challenge 2 Innovation about creating data collections Web 2.0  Users join ad-hoc communities to create, collaborate, curate, and discuss data online  They rarely agree on schemata ahead of time  Schemata need to be inferred from the data and will be highly dynamic  Schemata will be used to guide users to consensus Need to incorporate visualizations effectively They need to be easy to use

20 Cloud Data Services Infrastructure change Service-oriented cloud computing Application services (salesforce.com) Storage services (Amazon S3) Compute services (Google App Engine, Amazon EC2) Data services (Amazon SimpleDB, MS SQL Server Data Services, Google Datastore) Trade-off between functionality and operational costs Manageability is particularly important Limited human intervention High-variance workloads: elastic provisioning A variety of shared infrastructures: service tuning depends on how the shared infrastructure is virtualized Urgency of self-managing DB technologies

21 Cloud Data Services Challenges from scale of cloud computing SQL databases cannot scale to thousands of nodes Different transactional implementation techniques? Different storage semantics? More work is needed to synthesize ideas from the literature in cloud computing Limitations on either the plan space or the search will be required How programmers will express their programs in the cloud

22 Cloud Data Services Challenges from scale of cloud computing Data security and privacy Key to success: target usage scenarios in the cloud New scenarios will emerge with their own challenges Specialized services pre-loaded with large data-sets “Mash up” data from public and private domains Services reaching out across clouds  Prevalent in scientific data “grids”  Federated cloud architectures will enhance the challenges

23 Mobile Applications and Virtual Worlds This new class of applications need to manage diverse user-created data, synthesize it intelligently, and provide real-time services Trends in the mobile space Platforms to build mobile applications are mature The emergence of mobile search and social networks suggest a new set of mobile applications Virtual worlds, like Second Life, increasingly blur the distinctions with the real world Suggest a more data-rich mixture (co-space) Applications include rich social networking, massive multi-player games, military training, edutainment and knowledge sharing

24 Mobile Applications and Virtual Worlds New challenges The need to process heterogeneous data streams to materialize real-world events The need to balance privacy against the collective benefit of sharing personal real-time information The need for more intelligent processing to send interesting events in the co-space to someone in the physical world

25 Moving Forward Survey articles and tutorials are becoming an increasingly important contribution Risky or speculative papers not championed effectively A need for approachable books on scalable data management algorithms and techniques Time is ripe for projects to stimulate collaboration and cross-fertilization of ideas, like information integration Two areas are identified for competitions System components for cloud computing Large-scale information extraction