Download presentation
Presentation is loading. Please wait.
1
Repository and Mining of Temporal Data
Philip Chan
2
Motivation/Opportunity
Plenty of data, but: From different sources In different formats Not easy to find Repositories collect different datasets, but: integrating data from different datasets is not easy integrating data from different repositories is not easy
3
Sample Scenario 1 Reports in pdf/xls/… on: Variable Granularity Source
Enrollment semester Enrollment Management Retention rates year Mid-term failure rates Academic Support Degree program outcomes Department SAT/ACT scores Admission Assignment submission week/month/… Instructor Graduation rates …
4
Sample Scenario 1 (continued)
Consider mid-term failure rates: Is there significant change this semester? What are the variables that are strongly correlated? Assignment submission rates, pre-test scores, class attendance, grades of pre-requisite courses, SAT/ACT? (potential causes) degree program outcomes, retention, enrollment, graduation (potential effects)
5
Sample Scenario 2 Global warming:
Population, cars, deforestation, industrial emissions, CO2 , … (possible causes) Extreme storms, sea level rise, …(possible effects)
6
Sample Scenario 3 Approval rating of President Trump
Tweets, executive orders, media coverage, unemployment rate, GDP, stock prices, … (possible causes) Approval ratings of cabinet members, … (possible effects)
7
Approach Web-based repository of temporal data (ie, with timestamps)
One large virtually integrated dataset Users Data providers e.g. ASC, Admissions, Registrar, Departments, Instructors, … usually generate/collect data in digital format Data customers could be the data providers and/or data scientists/miners/analysts
8
Data Providers Upload/import data In different spreadsheet formats
initially csv timestamps could be in the first column or row initially numeric values Meta data Description of each variable (in a data set) Provider/owner Access: public (available to all) Or private (provider/owner only)
9
Data Customers Specify a target variable, tasks include:
Detect significant changes Identify top-k variables with the highest correlation Build models to characterize/forecast the target variable Visualization for the above tasks Search for variables to be included Download/export data in spreadsheet formats (for tasks not supported by the system)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.