Review of Claremont Report on Database Research Jiaheng Lu Renmin University of China.

Slides:



Advertisements
Similar presentations
1 Efficient Merging and Filtering Algorithms for Approximate String Searches Jiaheng Lu, University of California, Irvine Joint work with Chen Li, Yiming.
Advertisements

Jiaheng Lu, University of California, Irvine
Space-Constrained Gram-Based Indexing for Efficient Approximate String Search, ICDE 2009, Shanghai Space-Constrained Gram-Based Indexing for Efficient.
Large Scale Computing Systems
LIBRA: Lightweight Data Skew Mitigation in MapReduce
计算机学院 数据库系统原理 1 Introduction to Databases 杨宁 1/23.
The Flamingo Software Package on Approximate String Queries Chen Li UC Irvine and Bimaple
Big Data and Predictive Analytics in Health Care Presented by: Mehadi Sayed President and CEO, Clinisys EMR Inc.
Search Engines and Information Retrieval
Observation Pattern Theory Hypothesis What will happen? How can we make it happen? Predictive Analytics Prescriptive Analytics What happened? Why.
Distributed Search over the Hidden Web Hierarchical Database Sampling and Selection Panagiotis G. Ipeirotis Luis Gravano Computer Science Department Columbia.
IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.
1 Searching the Web Junghoo Cho UCLA Computer Science.
The Claremont Report on Database Research 淡江大學 周清江.
The Lowell Database Research Self Assessment 淡江大學 周清江.
Information Retrieval in Practice
AnHai Doan University of Wisconsin-Madison Managing Unstructured Data.
Liang Jin and Chen Li VLDB’2005 Supported by NSF CAREER Award IIS Selectivity Estimation for Fuzzy String Predicates in Large Data Sets.
Web Data Management Dr. Daniel Deutch. Web Data The web has revolutionized our world Data is everywhere Constitutes a great potential But also a lot of.
1 Notes 06: Efficient Fuzzy Search Professor Chen Li Department of Computer Science UC Irvine CS122B: Projects in Databases and Web Applications Spring.
Jeremy Boyd Director – Mindscape MSDN Regional Director
Overview of Search Engines
WORKFLOWS IN CLOUD COMPUTING. CLOUD COMPUTING  Delivering applications or services in on-demand environment  Hundreds of thousands of users / applications.
Large-Scale Content-Based Image Retrieval Project Presentation CMPT 880: Large Scale Multimedia Systems and Cloud Computing Under supervision of Dr. Mohamed.
Efficient Parallel Set-Similarity Joins Using Hadoop Chen Li Joint work with Michael Carey and Rares Vernica.
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
Chapter 11 Databases.
USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Enterprise & Intranet Search How Enterprise is different from Web search What to think about when evaluating Enterprise Search How Intranet use is different.
Search Engines and Information Retrieval Chapter 1.
Web Search Created by Ejaj Ahamed. What is web?  The World Wide Web began in 1989 at the CERN Particle Physics Lab in Switzerland. The Web did not gain.
VGRAM: Improving Performance of Approximate Queries on String Collections Using Variable-Length Grams Chen Li Bin Wang and Xiaochun Yang Northeastern University,
Web Data Management Dr. Daniel Deutch. Web Data The web has revolutionized our world Data is everywhere Constitutes a great potential But also a lot of.
Microsoft TechForge 2009 SQL Server 2008 Unplugged Microsoft’s Data Platform Vinod Kumar Technology Evangelist – DB and BI
Michael Cafarella Alon HalevyNodira Khoussainova University of Washington Google, incUniversity of Washington Data Integration for Relational Web.
1 Searching XML Documents via XML Fragments D. Camel, Y. S. Maarek, M. Mandelbrod, Y. Mass and A. Soffer Presented by Hui Fang.
Mini-Project on Web Data Analysis DANIEL DEUTCH. Data Management “Data management is the development, execution and supervision of plans, policies, programs.
Querying Structured Text in an XML Database By Xuemei Luo.
Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters Hung-chih Yang(Yahoo!), Ali Dasdan(Yahoo!), Ruey-Lung Hsiao(UCLA), D. Stott Parker(UCLA)
AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System.
EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Cuoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong.
ICS (072)Database Systems: An Introduction & Review 1 ICS 424 Advanced Database Systems Dr. Muhammad Shafique.
CSM06 Information Retrieval Lecture 1a – Introduction Dr Andrew Salway
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
GUIDED BY DR. A. J. AGRAWAL Search Engine By Chetan R. Rathod.
VGRAM:Improving Performance of Approximate Queries on String Collections Using Variable- Length Grams VLDB 2007 Chen Li (UC, Irvine) Bin Wang (Northeastern.
XML and Database.
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
XCluster Synopses for Structured XML Content Alkis Polyzotis (UC Santa Cruz) Minos Garofalakis (Intel Research, Berkeley)
Improving Search for Emerging Applications * Some techniques current being licensed to Bimaple Chen Li UC Irvine.
Efficient Merging and Filtering Algorithms for Approximate String Searches Chen Li, Jiaheng Lu and Yiming Lu Univ. of California, Irvine, USA ICDE ’08.
Information Retrieval in Practice
Information Retrieval in Practice
Efficient Approximate Search on String Collections Part I
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
Search Engine Architecture
Proposal for Term Project
Every Good Graph Starts With
Map Reduce.
Couchbase Server is a NoSQL Database with a SQL-Based Query Language
Twitter & NoSQL Integration with MVC4 Web API
به نام خدا Big Data and a New Look at Communication Networks Babak Khalaj Sharif University of Technology Department of Electrical Engineering.
Research Issues in Electronic Commerce
CSCE 561 Information Retrieval System Models
Data Integration for Relational Web
Cloud Data Services: Self-Manageability and other Challenges
Panagiotis G. Ipeirotis Luis Gravano
Relax and Adapt: Computing Top-k Matches to XPath Queries
Presentation transcript:

Review of Claremont Report on Database Research Jiaheng Lu Renmin University of China

Outline Five challenges on database research  Database engine revisiting  Declarative programming  Structured and unstructured data  Cloud data management  Mobile application Our research to meet those challenges

数据库的挑战 : Senior database researcher Meeting Senior database researchers have gathered every few years to assess the state of database research and to recommend problems and problem areas deserve additional focus.  Laguna Beach, Calif. in 1989  Palo Alto, Calif. ( “ Lagunita ” ) in 1990 and 1995  Cambridge, Mass. in 1996  Asilomar, Calif. in 1998  Lowell, Mass. In 2003

Claremont Meeting About 20 Database researchers Claremont Resort, Berkeley, CA May 29-30, 2008

Revisiting database engines(1) Traditional data engine NOT work well  OLTP System: data provenance, schema evolution and versioning  Text indexing  Media delivery  ……

Revisiting database engines(2) Research topics  Remote RAM and flash as persistent media  Treat query optimization and physical data a a unified, adaptive, self-tuning task  Compressing and encrypting data with query optimization  Designing systems that embrace non- relational data models

Declarative programming for Emerging platforms (1) Data-centric approach for emerging platforms  Manycore chips  Distributed services  Cloud computing platforms  …..

Declarative programming for Emerging platforms (2) Good examples  Map-reduce: data-parallelism  Ruby, Rails query-like logic  XQuery

The interplay of structured and unstructured data(1) Witnessing a growing amount of structured data  Millions of database hidden (Deep Web)  Millions of HTML tables and Mashups  Web 2.0 Service photo video websites

The interplay of structured and unstructured data(2) Research challenge:  Extract structured meaning for unstructured data (IR, ML)  Querying and deriving insight from heterogeneous data Keyword queries Pay-as-you-go fashion

Cloud data management (1) Cloud service: shared commodity hardware for computing and storage  Application service (salesforce.com)  Storage service (Amazon Web service)  Computing service (Google App Engine)  Data service (Microsoft SQLServer data center)

Cloud data management (2) Research challenge  Self-management database: limited human invention, various workloads  Large scale query processing and optimization  Data security and privacy with sharing

Mobile applications “On the go” interaction Location based service

Our research to meet challenges XML search Approximate string search Cloud data management Mobile data privacy DataSpace,……

XML search (1) XML twig query processing (SIGMOD’05, VLDB’05)  Problem Statement Given an XML twig pattern Q, and an XML database D, we need to find ALL the matches of Q on D. An XML tree: s1 s2 f1 p1 t1 t2 Section TitleFigure Twig pattern: Query answers: (s1, t1, f1) (s2, t2, f1) (s1, t2, f1)

XML search (2) XML keyword search (ICDE’09)  Problem Statement How to efficiently rank the results of XML keyword query  Contribution: Extend TF/IDF by incorporating the structure of XML data

Approximate string search Approximate string queries (ICDE’08,09)  Problem Statement Given a collection of string data, how to efficiently perform approximate search … Schwarzenger Samuel Jackson Keanu Reeves Star Search Output: strings s that satisfy Sim(q,s)≤ δ Schwarrzenger

18 Main Example Query 1,2,3,4 0,1,2,4 Merge Final answers Data Grams stick (st,ti,ic,ck) Candidate string ids {1,2,3,4} {1,2,3} Double check for the real edit distance st ti ic ck count >=2 Performance bottleneck! idstrings 0rich 1stick 2stich 3stuck 4static ck ic st ta ti … 1,3 0,1,2,4 1,2,3,4 4 1,2,4 1,3 ed(s,q)≤1

Cloud data management WAMDM 实验室的分布式存储系统实验平台

Research topics about cloud data Self management and self tuning Query optimization on thousands of nodes

Thank you Q & A WAMDM lab website: