Predictive Modeling in Data Management Byung S. Lee Computer Science University of Vermont

Slides:



Advertisements
Similar presentations
Generation of Multimedia TV News Contents for WWW Hsin Chia Fu, Yeong Yuh Xu, and Cheng Lung Tseng Department of computer science, National Chiao-Tung.
Advertisements

Sociological Abstracts Searching using indexes Universiteitsbibliotheek verder = klikken.
INFO415 Approaches to System Development: Part 1
1 Storage of images for Efficient Retrieval  Representing IDB as relations  straightforward  Representing IDB with spatial data structures  represent.
Support Vector Machines and Margins
INTRODUCTION TO Machine Learning 3rd Edition
AN APPLICATION SPECIFIC TECHNIQUE FOR RETRIEVAL AND ADAPTATION OF TRUSTED COMPONENTS Benny Thomas Master of Computer Science Supervised by Dr. David Hemer.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Fast and Compact Retrieval Methods in Computer Vision Part II A. Torralba, R. Fergus and Y. Weiss. Small Codes and Large Image Databases for Recognition.
CS335 Principles of Multimedia Systems Content Based Media Retrieval Hao Jiang Computer Science Department Boston College Dec. 4, 2007.
Machine Learning Group University College Dublin Nearest Neighbour Classifiers Lazy v’s Eager k-NN Condensed NN.
Batch Award Load in M-Pathways SA/HRMS Unit Liaison Meeting Wednesday, March 28, 2008 Johnson Rooms, Lurie Engineering Center.
1 ISI’02 Multidimensional Databases Challenge: representation for efficient storage, indexing & querying Examples (time-series, images) New multidimensional.
Goals of Adaptive Signal Processing Design algorithms that learn from training data Algorithms must have good properties: attain good solutions, simple.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
University of California San Diego Locality Phase Prediction Xipeng Shen, Yutao Zhong, Chen Ding Computer Science Department, University of Rochester Class.
1 Collaborative Filtering: Latent Variable Model LIU Tengfei Computer Science and Engineering Department April 13, 2011.
1 CS 502: Computing Methods for Digital Libraries Lecture 11 Information Retrieval I.
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
Data Mining Techniques
ACM (Association for Computing Machinery) Digital Library TUTORIAL.
Econometric Analysis Using Stata
Prakash Chockalingam Clemson University Non-Rigid Multi-Modal Object Tracking Using Gaussian Mixture Models Committee Members Dr Stan Birchfield (chair)
A Framework for Energy- Saving Data Gathering Using Two-Phase Clustering in Wireless Sensor Networks Wook Chio, Prateek Shah, and Sajal K. Das Center for.
Systems Analysis and Design in a Changing World, 6th Edition 1 Chapter 8 - Approaches to System Development.
K Nearest Neighbors Saed Sayad 1www.ismartsoft.com.
Incremental Learning Chris Mesterharm Fordham University.
Types of computer operation. There a several different methods of operation. Most computers can undertake each of these simultaneously. These methods.
Efficient Elastic Burst Detection in Data Streams Yunyue Zhu and Dennis Shasha Department of Computer Science Courant Institute of Mathematical Sciences.
Indexing and Visualizing Multidimensional Data I. Csabai, M. Trencséni, L. Dobos, G. Herczegh, P. Józsa, N. Purger Eötvös University,Budapest.
Wei Feng , Jiawei Han, Jianyong Wang , Charu Aggarwal , Jianbin Huang
Systems Analysis and Design in a Changing World, Fourth Edition
Lighting Lab Online Presented By: Omar Yehia Omar Elshrief By: Konstantinos Papamichael Web Address:
RADAR: an In-building RF-based user location and tracking system
Query Sensitive Embeddings Vassilis Athitsos, Marios Hadjieleftheriou, George Kollios, Stan Sclaroff.
HEILBRUNN’S TIMELINE OF ART HISTORY
University of Macau Discovering Longest-lasting Correlation in Sequence Databases Yuhong Li Department of Computer and Information Science.
Chapter 5 Ranking with Indexes 1. 2 More Indexing Techniques n Indexing techniques:  Inverted files - best choice for most applications  Suffix trees.
© Amor Group 2010 MCIS Training Introduction Date – 11 th May 2011 Allen Baird – Amor Group.
Development of an Intelligent Translation Memory MorphoLogic SZAK Publishers Balázs Kis
D-skyline and T-skyline Methods for Similarity Search Query in Streaming Environment Ling Wang 1, Tie Hua Zhou 1, Kyung Ah Kim 2, Eun Jong Cha 2, and Keun.
Virtual Tutor Application v1.0 Ruth Agada Dr. Jie Yan Bowie State University Computer Science Department.
RefWorks Portland State University. Getting Started General Info: Create an Account:
Cost Modeling of Spatial Query Operators Using Nonparametric Regression Songtao Jiang Department of Computer Science University of Vermont October 10,
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
CHAPTER 8: Nonparametric Methods Alpaydin transparencies significantly modified, extended and changed by Ch. Eick Last updated: March 4, 2011.
HIVE-DRYAD Integration. For Curators Use HIVE to generate subject, taxon, and spatial terms suggestion. Curator’s needs: – Get terms suggestion from HIVE.
GNU EPrints 2 Overview Christopher Gutteridge 19 th October 2002 CERN. Geneva, Switzerland.
Why Intelligent Data Analysis? Joost N. Kok Leiden Institute of Advanced Computer Science Universiteit Leiden.
Lecture 7 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
Zheng ZHANG 1-st year PhD candidate Group ILES, LIMSI
DATA MINING Spatial Clustering
RF-based positioning.
Tuesday Tech Talks Skeen Library Presents Today’s Topic: Presented by:
Query in Streaming Environment
IS IT EFFECTIVE TO RUN ONLINE DATING BACKGROUND CHECKS YOURSELF?
Pervasive Data Access (PDA) Research Group
Research Areas Christoph F. Eick
Scaled Neural Indirect Predictor
ورود اطلاعات بصورت غيربرخط
Data science online training.
Presenter’s Name and Title Department Date of Presentation
Department of Psychological Science, Saint Vincent College
Realizing Closed-loop, Online Tuning and Control for Configurable-Cache Embedded Systems: Progress and Challenges Islam S. Badreldin*, Ann Gordon-Ross*,
FREERIDE: A Framework for Rapid Implementation of Datamining Engines
FREERIDE: A Framework for Rapid Implementation of Datamining Engines
Presentation transcript:

Predictive Modeling in Data Management Byung S. Lee Computer Science University of Vermont

Cost UDF Overview Funding: US Department of Energy. Title: Generating Cost Functions of User- Defined Functions. Phase 1: preliminary studies. Phase 2: core modeling techniques. Phase 3: applications.

How long would this one take to run? UDF CostUDF Problem

Phase 1 Approaches: –Off-line training with cost data sets generated in the same batch. –On-line training with cost data sets generated in incremental batches. (a.k.a. self-tuning) Models: –parametric or nonparametric regression.

Phase 1 UDFs: –Financial time series aggregate functions: median(time series, start date, end date) nth moving window average(time series, start date, end date, window size) –Keyword-based text search functions: “dog AND cat” “dog OR cat” “dog cat” within w words apart. –Spatial search operators: range(ref_point, distance) Window(lower_left_point, upper_right_point) KNN(ref_point, K)

Phase 2 Approaches: –On-line training with cost data points generated one at a time. –Assume limited main memory. Models: –Nonparametric techniques using multidimensional index structures.

Phase 2 Core modeling techniques: –Incremental edited k nearest neighbors. –Memory limited quadtrees. –Dr. Zhen He will give a quick overview of the recent phase 2 efforts.

Phase 3 Additional core modeling techniques. Abstraction of the problem to “efficient adaptive predictive modeling of incremental data.” Applications that need –Value predictions. –Class predictions.