Course Project Ideas Yanlei Diao University of Massachusetts Amherst.

Slides:



Advertisements
Similar presentations
Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
Advertisements

Research Challenges in the CarTel Mobile Sensor System Samuel Madden Associate Professor, MIT.
XML: Extensible Markup Language
Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management Eytan Adar et al Presented by Xiao Hu CS491CXZ.
Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,
Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
High-Performance Complex Event Processing over Streams Eugene Wu, Yanlei Diao, ShariqRizvi Presented by Ming Li and Mo Liu Presented by Ming Li and Mo.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #15.
Information Retrieval in Practice
Xyleme A Dynamic Warehouse for XML Data of the Web.
1 Rethinking Data Management for Storage-centric Sensor Networks Yanlei Diao, Deepak Ganesan, Gaurav Mathur, and Prashant Shenoy CIDR 2007 Proceedings.
The Cougar Approach to In-Network Query Processing in Sensor Networks By Yong Yao and Johannes Gehrke Cornell University Presented by Penelope Brooks.
Database Systems: A Two Sided View Yanlei Diao & Gerome Miklau University of Massachusetts Amherst.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science SPIRE: Scalable Processing of RFID Event Streams Yanlei Diao University of Massachusetts,
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Re-thinking Data Management for Storage-Centric Sensor Networks Deepak Ganesan University.
Chapter 14 The Second Component: The Database.
University of Kansas Data Discovery on the Information Highway Susan Gauch University of Kansas.
Overview of Search Engines
Chapter 9 Overview  Reasons to monitor SQL Server  Performance Monitoring and Tuning  Tools for Monitoring SQL Server  Common Monitoring and Tuning.
1: IntroductionData Management & Engineering1 Course Overview: CS 395T Semantic Web, Ontologies and Cloud Databases Daniel P. Miranker Objectives: Get.
CSE 590DB: Database Seminar Autumn 2002: Meta Data Management Phil Bernstein Microsoft Research.
Conceptual Architecture of PostgreSQL PopSQL Andrew Heard, Daniel Basilio, Eril Berkok, Julia Canella, Mark Fischer, Misiu Godfrey.
Sensor Networks Storage Sanket Totala Sudarshan Jagannathan.
High-Speed, High Volume Document Storage, Retrieval, and Manipulation with Documentum and Snowbound March 8, 2007.
Overview of a Database Management System
Sensor Data Management: Challenges and (some) Solutions Amol Deshpande, University of Maryland.
Chapter 16 The World Wide Web. 2 The Web An infrastructure of information combined and the network software used to access it Web page A document that.
1 Distributed Monitoring of Peer-to-Peer Systems By Serge Abiteboul, Bogdan Marinoiu Docflow meeting, Bordeaux.
NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
Zhao Cao*, Charles Sutton +, Yanlei Diao*, Prashant Shenoy* * University of Massachusetts, Amherst + University of Edinburgh Distributed Inference and.
1 CS 430 Database Theory Winter 2005 Lecture 1: Introduction.
 DATABASE DATABASE  DATABASE ENVIRONMENT DATABASE ENVIRONMENT  WHY STUDY DATABASE WHY STUDY DATABASE  DBMS & ITS FUNCTIONS DBMS & ITS FUNCTIONS 
Department of Computer Science and Engineering, CUHK 1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Re-thinking Data Management for Storage-Centric Sensor Networks Deepak Ganesan University.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?
Linked-data and the Internet of Things Payam Barnaghi Centre for Communication Systems Research University of Surrey March 2012.
1 CS 430 Database Theory Winter 2005 Lecture 17: Objects, XML, and DBMSs.
Towards Low Overhead Provenance Tracking in Near Real-Time Stream Filtering Nithya N. Vijayakumar, Beth Plale DDE Lab, Indiana University {nvijayak,
DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer.
The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)
Professor Michael J. Losacco CIS 1110 – Using Computers Database Management Chapter 9.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
The Anatomy of a Large-Scale Hyper textual Web Search Engine S. Brin, L. Page Presenter :- Abhishek Taneja.
Copyright © 2012, SAS Institute Inc. All rights reserved. ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY,
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
INNOV-10 Progress® Event Engine™ Technical Overview Prashant Thumma Principal Software Engineer.
Yanlei Diao, University of Massachusetts Amherst Future Directions in Sensor Data Management Yanlei Diao University of Massachusetts, Amherst.
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #15 Secure Multimedia Data.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
Yanlei Diao, University of Massachusetts Amherst Capturing Data Uncertainty in High- Volume Stream Processing Yanlei Diao, Boduo Li, Anna Liu, Liping Peng,
IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida Physical Database Implementation—Topics.
Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
Retele de senzori Curs 2 - 1st edition UNIVERSITATEA „ TRANSILVANIA ” DIN BRAŞOV FACULTATEA DE INGINERIE ELECTRICĂ ŞI ŞTIINŢA CALCULATOARELOR.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
Information Retrieval in Practice
Search Engine Architecture
Database Processing with XML
Query Processing for High-Volume XML Message Brokering
Towards an Internet-Scale XML Dissemination Service
Semi-Structured data (XML Data MODEL)
Probabilistic Databases
REED : Robust, Efficient Filtering and Event Detection
Presentation transcript:

Course Project Ideas Yanlei Diao University of Massachusetts Amherst

6/11/2015 Yanlei Diao, University of Massachusetts Amherst New Directions for DB Research Sensor data: new architecture XML: new data model Streams: new execution model Data quality and lineage: new services …

6/11/2015 Yanlei Diao, University of Massachusetts Amherst Querying in Sensor Networks Acoustic stream Store data locally at sensors and push queries into the sensor network –Flash memory energy- efficiency. –Limited capabilities of sensor platforms. Internet Gateway Image stream Flash Memory Push query to sensors

6/11/2015 Yanlei Diao, University of Massachusetts Amherst Optimize for Flash and Limited RAM Flash Memory Constraints –Data cannot be over-written, only erased –Pages can often only be erased in blocks (16-64KB) –Unlike magnetic disks, cannot modify in-place Challenges: –Energy: Organize data on flash to minimize read/write/erase operations –Memory: Minimize use of memory for flash database Load block 2.Into Memory 3. Save block back Erase block Memory 2. Modify in-memory ~16-64 KB ~4-10 KB

6/11/2015 Yanlei Diao, University of Massachusetts Amherst StonesDB: System Operation Image Retrieval: Return images taken last month with at least two birds one of which is a bird of type A. Identify “best” sensors to forward query. Provide hints to reduce search complexity at sensor. Proxy Cache of Image Summaries

6/11/2015 Yanlei Diao, University of Massachusetts Amherst StonesDB: System Operation Image Retrieval: Return images taken last month with at least two birds one of which is a bird of type A. Query Engine Partitioned Access Methods

6/11/2015 Yanlei Diao, University of Massachusetts Amherst Research Issues in StonesDB Local Database Layer –Reduce updates for indexing and aging. –New cost models for self-tuning sensor databases. –Energy-optimized query processing. –Query processing over aged data. Distributed Database Layer –What summaries are relevant to queries? –What remainder queries to send to sensors? –What resolution of summaries to cache?

6/11/2015 Yanlei Diao, University of Massachusetts Amherst XML (Extensible Markup Language) Foundations… Abiteboul Hull Vianu Addison Wesley 1995 … XML: a tagging mechanism to describe content.

6/11/2015 Yanlei Diao, University of Massachusetts Amherst XML Data Model (Graph) Main structure: ordered, labeled tree References between node: becoming a graph

6/11/2015 Yanlei Diao, University of Massachusetts Amherst XQuery: XML Query Language A declarative language for querying XML data XPath: path expressions –Patterns to be matched against an XML graph –/bib/paper[author/lastname=‘Croft’]/title FLOWR expressions –Combining matching and restructuring of XML data – For $p in distinct(document("bib.xml")//publisher) Let $b := document("bib.xml")/book[publisher = $p] Where count($b) > 100 Order by $p/name Return $p

6/11/2015 Yanlei Diao, University of Massachusetts Amherst Metadata Management using XML File systems for large-scale scientific simulations –File systems: petabytes or even more –Directory tree (metadata): large, can’t fit in memory –Links between files: steps in a simulation, data derivation File Searches –all the files generated on Oct 1, 2005 –all the files whose name is like ‘*simu*.txt’ –all the files that were generated from the file ‘basic-measures.txt’  Build an XML store to manage directory trees! –XML data model –XML Query language –XML Indices

6/11/2015 Yanlei Diao, University of Massachusetts Amherst XML Document Processing  Multi-hierarchical XML markup of text documents –Multi-hierarchies: part-of-speech, page-line –Features in different hierarchies overlap in scope –Need a query language & querying mechanism –References [Nakov et al., 2005; Iacob & Dekhtyar, 2005]  Querying and ranking of XML data –XML fragments returned as results –Fuzzy matches –Ranking of matches –References [Amer-Yahia et al., 2005; Luo et al., 2003] Well-defined problems  identify your contributions!

6/11/2015 Yanlei Diao, University of Massachusetts Amherst Data Stream Management Queries, Rules Event Specs, Subscriptions Results Data in motion, unending Continuous, long-running queries Data-driven execution Data Traditional Database Attr1 Attr2 Attr3 Query Data Stream Processor Data at rest One-shot or periodic queries Query-driven execution

6/11/2015 Yanlei Diao, University of Massachusetts Amherst XML is becoming the wire format for data In-network XML processing –Authentication –Authorization –Routing –Transformation –Pattern matching XPath widely used for in-network XML processing Applied directly to streaming XML data Line-speed performance In-Network XML Processing Expedite traffic Enhance security Real-time monitoring & diagnosis

6/11/2015 Yanlei Diao, University of Massachusetts Amherst Research Issues  Gigabit rate XPath processing –Take one look, process XPath, buffer data for future use if necessary –Processing needs to be gigabit rate –Memory usage needs to be minimized Time/space complexity of XPath stream processing –Theoretical analysis for common features of XPath Minimizing memory usage of YFilter technolgy –YFilter: state-of-the-art for multi-XPath processing

6/11/2015 Yanlei Diao, University of Massachusetts Amherst RFID Technology RFID technology EF.0A D E.001.F0 reader_id, tag_id, timestamp

6/11/2015 Yanlei Diao, University of Massachusetts Amherst RFID Stream Processing Out of stocks : the number of items of product X on shelf ≤ 3. Shoplifting : an item was taken out of store without being checked out EF.0A shelf EF.0A exit1 RFID tag RFID reader

6/11/2015 Yanlei Diao, University of Massachusetts Amherst RFID Processing: Global Tracking EF.0A … X Ltd. … EF.0A … … <msr label=“temperature” max=2>90 … EF.0A … … <msr label=“temperature” max=5>95 … EF.0A … … <msr label=“temperature” max=2>80 … EF.0A … … <msr label=“temperature” max=2>85 … EF.0A … CVS … Counterfeit drugs: a bottle is accepted at the retailer if it came from a legal manufacturer and followed all necessary steps in the distribution network. Expired/spoiled drugs: a bottle is accepted at the retailer if it went through the distribution network in less than 3 months and was never exposed to temperature > 96 F. Missing pallet, expected case, illegally cloned tags…

6/11/2015 Yanlei Diao, University of Massachusetts Amherst Challenges in RFID Management Data-Information Mismatch –RFID raw data: (tag id, reader id, timestamp) –Meaningful information: shoplifting, misplaced inventory, out-of- stocks; expired drugs, spoiled drugs… Incomplete, inaccurate data –Readers miss tags –Readers can pick up tags from overlapping areas High-volume data –Readers read constantly, from all tags in range, without line-of-sight –Can create up to millions of terabytes of data in a single day Low-latency processing –Up-to-the-second information, time-critical actions

6/11/2015 Yanlei Diao, University of Massachusetts Amherst Research Issues Real-time event stream processing –Handling duplicate readings/results –Data cleaning –Data compression Handling incomplete readings –Inferences in event databases –Inferences over event streams Distributed processing –Real time anomaly detection –Distributed inferences

6/11/2015 Yanlei Diao, University of Massachusetts Amherst Adaptive Sensing of Atmosphere Environmental monitoring: real-time processing of huge- volume meteorological data Challenges –Large volume but limited bandwidth –Real-time processing –Uncertain data –Data archiving and querying the history Sense Send Merge Detection Prediction

6/11/2015 Yanlei Diao, University of Massachusetts Amherst Managing Uncertain Data Sources of data uncertainty 1)Sensing noise and partial scanning 2)Data compression 3)Lossy wireless links 4)Incomplete merging Managing uncertain data –Model sources of data uncertainty –Develop uncertainty calculus to combine the effects of these sources –Augment results with confidence values (1) (2) (3) Merge (4) Tornado Detection Prediction (confidence?)

6/11/2015 Yanlei Diao, University of Massachusetts Amherst Managing Uncertain Data Sources of data uncertainty 1)Sensing noise and partial scanning 2)Data compression 3)Lossy wireless links 4)Incomplete merging Self diagnosis and tuning –Compare predication at t with observation at t+1 (no ground truth?!) –System diagnosis when confidence value is low –Automatically tune the system (1) (2) (3) Merge (4) Tornado Detection Prediction (confidence?)

6/11/2015 Yanlei Diao, University of Massachusetts Amherst Questions

6/11/2015 Yanlei Diao, University of Massachusetts Amherst Outline An outside look: DB Application An inside look: Anatomy of DBMS Project ideas: DB Application Project ideas: DBMS Internals

6/11/2015 Yanlei Diao, University of Massachusetts Amherst Application: UMass CS Pub DB UMass Computer Science Publication Database –All papers on professors’ web pages and in their DBLP records –All technical reports Search: –Catalog search (author, title, year, conference, etc.) –Text search (using SQL “LIKE”) Navigation –Overview of the structure of document collection –Area-based “drill down” and “roll up” with statistics Add document Top hits Example: Deliverables: useful software, user-friendly interface

6/11/2015 Yanlei Diao, University of Massachusetts Amherst ManufacturerSupplier DCRetail DCRetail Store Application: RFID Database RFID technology RFID supply chain –Locations –Objects Pallet Truck Case

6/11/2015 Yanlei Diao, University of Massachusetts Amherst Application: RFID Database RFID technology RFID Supply chain Database propagation –Streams of (reader_id, tag_id, time) –Semantics: reader_id  location, tag_id  object –Containment Location-based, items in a case, cases on a pallet, pallets in a truck… Duration of containment –History of movement: (object, location, time_in, time_out) –Data compression for duplicate readings –Integration with sensors: temperature, location… Track and trace queries

6/11/2015 Yanlei Diao, University of Massachusetts Amherst Data Quality Closed world assumption: not any more! Various sources of data loss 1)Sensing noise 2)Data compression 3)Lossy wireless links 4)Incomplete merging Probabilistic query processing –Model sources of data loss –Quantify the effect on queries max(), avg(), percentile… –Output query results with confidence level (1) (2) (3) Merge (4)

6/11/2015 Yanlei Diao, University of Massachusetts Amherst Some idea on INFOD/data dissemination