Download presentation
Presentation is loading. Please wait.
Published byDarlene Malone Modified over 9 years ago
1
Prof. Jason Hong, Carnegie Mellon University Rapid End-User Programming and Visualization for the Web IDA Session 5 2007 CS Study Panel 24 April 2008
2
Research Areas End-User Programming Extracting and visualizing data from web Usable Privacy and Security Anti-phishing (training, detection) Managing privacy and security policies Mobile Computing Location-based services Context-aware computing Jason Hong Assistant Professor Human-Computer Interaction Institute Carnegie Mellon University PhD: University of California, Berkeley Potential Military Applications Tools for rapidly integrating data and web services Better visualizations of large data sets Effective training for security Automated algorithms for detecting phishing scams Better interfaces for managing security Principal Investigator Contact Information School of Computer Science Carnegie Mellon University 2504D Newell-Simon Hall 5000 Forbes Ave Tel: (412) 268 1251 Fax: (412) 268 1266 E-mail: jasonh@cs.cmu.edu Web: http://www.cs.cmu.edu/~jasonh Principal Investigator
3
30000 Foot View High-level problems observed: –Stovepipes - Data and services spread over multiple systems –Agility - Integration takes months or years –Overload - Too much information to easily process Goal: Make it easy for people to visualize and process data gathered from variety of sources –Information extraction + visualization + machine learning –No PhD required Analogies: –Spreadsheets –Visual Basic
4
Mashups as Key Focus Area More specifically, provide an end-user programming tool that makes it easy to create mashups –Mashups are applications that combine content and services from multiple web sites –Ex. Craigslist.com + GoogleMaps = Housingmaps.com
8
Other Example Mashups Other example mashups –Ex. MySpace child predators –Ex. Locations of friends on MySpace or Facebook Common themes –Aggregating multiple sources (web pages, databases, etc) –Handling multiple data formats (not designed to be shared) –Processing the data (filtering, summarizing, etc) –Supporting multiple forms of output (graphs, maps, lists)
9
Creating Mashups is Difficult Requires lots of skill to create a mashup –Ex. Housingmaps creator has PhD in computer science –Ex. MySpace predator list took months of custom coding Requires programming expertise in many areas –Web crawling –Text parsing and pattern matching –Web services (WSDL and REST) –Databases –HTML Can we accelerate this process to a matter of days or hours for non-experts?
10
End-User Programming Haggis, an end-user programming tool 1.Rapidly extract and combine data from multiple sources 2.Quickly create high-quality interfaces and visualizations 3.Use programming-by-example techniques to specify what is normal and what is anomalous
11
1. Extract data from multiple sources Improved wizards for extracting data from web pages –Can specify example of desired links, system generalizes
12
Improved wizards for extracting data from web pages –Can specify example of desired links, system generalizes –Better support for other patterns on web Tables, street addresses, etc Support for real-time data –Weather, traffic, stocks, any web page periodically updated –Sensor Andrew, sensor network being deployed at CMU Electrical usage, water usage, etc 1. Extract data from multiple sources
13
2. Interfaces and Visualizations Wizards for supporting common UI patterns –Table views, maps, graph views, alerts, etc Programming-by-example techniques
14
2. Interfaces and Visualizations Output as a web page or desktop widget –Yahoo Widgets, Google Desktop, Windows Sidebar
15
2. Interfaces and Visualizations Output as a web page or desktop widget –Yahoo Widgets, Google Desktop, Windows Sidebar
16
3. Normal versus Anomalous Problem: Too much data, gets dropped on floor Solution: “Teach” the system what patterns to look for –Analyst-in-the-loop: infoviz + machine learning –Long-term goal Example: –eBay “penny sellers”, could create custom software, but slow –Analyst uses visualization to find some examples of penny sellers and gives hints to system as to why –System finds more suspects, analyst gives relevance feedback –As new data streams in, system can flag suspects Can help address high turnover rate at intelligence agencies, loss of organizational memory
17
Current Progress First round of interviews completed –Sensor Andrew team (Civil and Electrical Engineers) –Mashup Camp –Programmers around CMU Initial prototype of “plumbing” in progress –An Integrated Development Environment (IDE) for programmers, to facilitate extraction and visualization of data –Low-level support for extracting data from tables, basic visualizations, etc –Higher-level tools later to be built on top First round of user tests planned for August
19
Past Work with Marmite Wizard for extracting data from arbitrary web pages Combine operators together in a dataflow (Unix) View the data in multiple ways (table, map)
20
How Marmite Works Wizard for getting data from web pages Combine operators together in a dataflow (Unix) View the data in multiple ways (table, map)
21
How Marmite Works Operators let you know what operations can be done Input, processing, output
22
How Marmite Works Operators are chained together in a dataflow (Unix)
23
How Marmite Works Current data is shown
24
How Marmite Works And multiple views too
25
How Marmite Works A wizard UI for helping people get the data they want
26
Some High-Level Design Issues Centralized model –Clean data model: well-managed, well-formatted, common representations, well-known databases, etc Decentralized model –“Anarchic”, multiple data formats in multiple places –Hard to get lots of people to agree on data format and representation –More likely scenario (look at how databases are used today) –Haggis is being designed for this model, assuming that a person may have to clean up the data and resolve formats
27
Other High-Level Design Issues Discovery –What data sources are available? –May need some kind of centralized store that describes these (sort of like DNS for Internet) Security –Access control, who can access what data sources? –This is a general problem with sensor data Privacy –What kinds of queries / apps should people be able to do? –Unclear how to restrict those in practice
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.