Presentation is loading. Please wait.

Presentation is loading. Please wait.

Repository and Mining of Temporal Data

Similar presentations


Presentation on theme: "Repository and Mining of Temporal Data"— Presentation transcript:

1 Repository and Mining of Temporal Data
Philip Chan

2 Motivation/Opportunity
Plenty of data, but: From different sources In different formats Not easy to find Repositories collect different datasets, but: integrating data from different datasets is not easy integrating data from different repositories is not easy

3 Sample Scenario 1 Reports in pdf/xls/… on: Variable Granularity Source
Enrollment semester Enrollment Management Retention rates year Mid-term failure rates Academic Support Degree program outcomes Department SAT/ACT scores Admission Assignment submission week/month/… Instructor Graduation rates

4 Sample Scenario 1 (continued)
Consider mid-term failure rates: Is there significant change this semester? What are the variables that are strongly correlated? Assignment submission rates, pre-test scores, class attendance, grades of pre-requisite courses, SAT/ACT? (potential causes) degree program outcomes, retention, enrollment, graduation (potential effects)

5 Sample Scenario 2 Global warming:
Population, cars, deforestation, industrial emissions, CO2 , … (possible causes) Extreme storms, sea level rise, …(possible effects)

6 Sample Scenario 3 Approval rating of President Trump
Tweets, executive orders, media coverage, unemployment rate, GDP, stock prices, … (possible causes) Approval ratings of cabinet members, … (possible effects)

7 Approach Web-based repository of temporal data (ie, with timestamps)
One large virtually integrated dataset Users Data providers e.g. ASC, Admissions, Registrar, Departments, Instructors, … usually generate/collect data in digital format Data customers could be the data providers and/or data scientists/miners/analysts

8 Data Providers Upload/import data In different spreadsheet formats
initially csv timestamps could be in the first column or row initially numeric values Meta data Description of each variable (in a data set) Provider/owner Access: public (available to all) Or private (provider/owner only)

9 Data Customers Specify a target variable, tasks include:
Detect significant changes Identify top-k variables with the highest correlation Build models to characterize/forecast the target variable Visualization for the above tasks Search for variables to be included Download/export data in spreadsheet formats (for tasks not supported by the system)


Download ppt "Repository and Mining of Temporal Data"

Similar presentations


Ads by Google