Download presentation
Presentation is loading. Please wait.
Published byJanel Haynes Modified over 9 years ago
1
TCU Dept. of Computer Science CRESCENT Database Issues in Smart Homes Pervasive Intelligent Environments Spring 2004 March 2, 2004
2
TCU Dept. of Computer Science CRESCENT Topics: Lecture 3 Preparing for prediction & decision making: Data Mining/KDD An example of some of the issues we’ve discussed –“Towards Sensor Database Systems”, Bonnet, Gehrke, Seshadri Data mining taken from Elmasri & Navathe, 4 th edition
3
TCU Dept. of Computer Science CRESCENT Data Warehouses (1 more thing) Repositories for data mining activities –Aggregates/summaries of data help efficiency Optimized for decision-support, not transaction processing Definition (Elmasri, page 900) –A subject-oriented, integrated, non-volatile, time-variant collection of data in support of management’s decisions” Replace “management”, with “smart home agents”
4
TCU Dept. of Computer Science CRESCENT Data Mining Definition Discovery of new information in terms of patterns or rules from vast amounts of data Extracts patterns that can’t readily be found by asking the right questions (queries) –TOO MUCH DATA FOR HUMANS Emerged from –Artificial Intelligence:Machine learning, Neural nets, Genetic Algorithms –Statistics –Operations Research
5
TCU Dept. of Computer Science CRESCENT 6 STEPS TO DM: some may be done as part of warehouse creations Data selection -- pick the data needed Data cleansing –Fix bad data (e.g., spelling, zip codes) –Hard to deal with missing, erroneous, conflicting, redundant data Enrichment –Add data (e.g., age, gender, income) Data transformation –Aggregate (e.g., zip codes regions) Data mining Reporting on discovered K
6
TCU Dept. of Computer Science CRESCENT Types of results Association rules –Buy diapers buy lots of beer Sequential patterns –Buy house buy furniture within months Classification trees –Types of buyers (upscale,bargain-conscience, …) Why do it? –Make more money –Science & medicine
7
TCU Dept. of Computer Science CRESCENT DM/KDD Goals Find patterns to predict future events Find major groupings –Groupings of buyers, stars, diseases … Find which group something belongs to –creditworthiness
8
TCU Dept. of Computer Science CRESCENT What are we learning? Association rules Classification hierarchies Clustering Sequential patterns Patterns within time series Type of result, inputs & algorithms vary Often interested in some combination of these types of K
9
TCU Dept. of Computer Science CRESCENT Clustering –Unsupervised learning techniques Training samples are unclassified Vs. supervised learning (classification) –Drug categories for depression –Categories of TV viewers –Categories of buyers (likely, unlikely) –Categories of households? Single male, mother/children, conventional (M/D/kids), DINKs.
10
TCU Dept. of Computer Science CRESCENT Sequential patterns Detecting associations among events with certain temporal relationships Example: –Cardiac bypass for blocked arteries –AND within 18 months, high blood urea –THEN kidney failure likely in next 18 months Particularly important in smart homes
11
TCU Dept. of Computer Science CRESCENT Sequential Pattern Discovery Sequence of itemsets –Grocery store purchases by 1 person (3 itemsets) {soy milk, bread, chocolate}, {bananas, chocolate}, {lettuce, tomato, chocolate} 2 Subsequences –{soy milk, bread, chocolate}, {bananas, chocolate}, –{bananas, chocolate}, {lettuce, tomato, chocolate}
12
TCU Dept. of Computer Science CRESCENT Sequential pattern discovery The support for a sequence S is the % of the given set U of sequences of which S is a subsequence. –That is: how many times does S show up? Find all subsequences from the given sequence sets that have a user-defined minimum support. The sequence S1, S2, … Sn, is a predictor of “fact” that a customer that buys itemset S1 is likely to buy itemset S2, then S3, … Prediction support based on frequency of this sequence in the past Many research issues to create good algos
13
TCU Dept. of Computer Science CRESCENT Patterns within time series Finding 2 patterns that occur over time –2003 stock prices of Choice Homes and Home Depot –2 products show same sales pattern in summer but different one in winter –Solar magnetic wind patterns may predict earth atmospheric changes
14
TCU Dept. of Computer Science CRESCENT Time series pattern discovery Time series are sequences of events –Event could be a transaction (closing daily stock price) –Look at sequences over n days, or –Longest period in which change is no greater than 1% Comparing –Must define similarity measures
15
TCU Dept. of Computer Science CRESCENT Other approaches in DM/KDD Neural nets –Infer a function from a set of examples Non-parametric curve-fitting Interpolates to solve new problems –Supervised & unsupervised algorithms – classification – time-series – can’t see what it learned (not declarative)
16
TCU Dept. of Computer Science CRESCENT Other approaches in DM/KDD Genetic algorithms –Set up Representation (strings over an alphabet) Evaluation (fitness) function Parameters: # of generations, cross-over rate, mutation rate, etc. –Randomized (probabilistic operators), parallel search over search space –Used for problem solving and clustering
17
TCU Dept. of Computer Science CRESCENT Sensor DB Article Design –Distributed vs warehouse approach –Sensor data Measurement uncertainty, communications failures Data representation Data model –Relational + Sensor descriptions, including location –Special rep for sensor sequences ADT attribute represents sensor data as output of ADT functions
18
TCU Dept. of Computer Science CRESCENT Sensor DB Article: Queries Sample queries/characteristics (2 nd page) and sample extended SQL (3.1) Long running (continuous) queries –Incremental queries retrieves all data over t second interval, repeated every t seconds, take union of them –WHERE $every() in SQL Aggregates over time windows Virtual joins for ADT (slow) functions
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.