Free your Data: Instant Gratification with the Semantic Web David Karger.

Slides:



Advertisements
Similar presentations
Interaction Design: Visio
Advertisements

© 2011 Delmar, Cengage Learning Chapter 1 Getting Started with Dreamweaver.
Agenda Definitions Evolution of Programming Languages and Personal Computers The C Language.
Chapter 3 – Web Design Tables & Page Layout
WEB DESIGN TABLES, PAGE LAYOUT AND FORMS. Page Layout Page Layout is an important part of web design Why do you think your page layout is important?
Business Development Suit Presented by Thomas Mathews.
Unference: UI (not AI) as key to the Semantic Web David Karger.
KompoZer. This is what KompoZer will look like with a blank document open. As you can see, there are a lot of icons for beginning users. But don't be.
Data Science for Business: Semantic Verses Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Online Collaboration Applications ADE100- Computer Literacy Lecture 28.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
The KB on its way to Web 2.0 Lower the barrier for users to remix the output of services. Theo van Veen, ELAG 2006, April 26.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Copyright 2003 The McGraw-Hill Companies, Inc CHAPTER Application Software computing ESSENTIALS    
Chapter 1 Getting Started With Dreamweaver. Explore the Dreamweaver Workspace The Dreamweaver workspace is where you can find all the tools to create.
SQL Reporting Services Overview SSRS includes all the development and management pieces necessary to publish end user reports in  HTML  PDF 
1 Chapter 20 — Creating Web Projects Microsoft Visual Basic.NET, Introduction to Programming.
The Internet & The World Wide Web Notes
IBM User Technology March 2004 | Dynamic Navigation in DITA © 2004 IBM Corporation Dynamic Navigation in DITA Erik Hennum and Robert Anderson.
Microsoft ® Office Word 2007 Training Mail Merge II: Use the Ribbon and perform a complex mail merge [Your company name] presents:
Open and save files directly from Word, Excel, and PowerPoint No more flash drives or sending yourself documents via Stop manually merging versions.
XP New Perspectives on Microsoft Access 2002 Tutorial 71 Microsoft Access 2002 Tutorial 7 – Integrating Access With the Web and With Other Programs.
With Alex Conger – President of Webmajik.com FrontPage 2002 Level I (Intro & Training) FrontPage 2002 Level I (Intro & Training)
Working with SharePoint Document Libraries. What are document libraries? Document libraries are collections of files that you can share with team members.
Creating a Web Page HTML, FrontPage, Word, Composer.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
MS Access: Database Concepts Instructor: Vicki Weidler.
Review of last Session Adding custom html Adding custom html HTML is the language that web servers understand, all web pages are created using HTML. HTML.
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
Classroom User Training June 29, 2005 Presented by:
Semantic Web outlook and trends May The Past 24 Odd Years 1984 Lenat’s Cyc vision 1989 TBL’s Web vision 1991 DARPA Knowledge Sharing Effort 1996.
16-1 The World Wide Web The Web An infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that information.
What is Architecture  Architecture is a subjective thing, a shared understanding of a system’s design by the expert developers on a project  In the.
® IBM Software Group © 2009 IBM Corporation Rational Publishing Engine RQM Multi Level Report Tutorial David Rennie, IBM Rational Services A/NZ
Tutorial 1 Getting Started with Adobe Dreamweaver CS3
Web-designWeb-design. Web design What is it? Web-design features Before…
XP New Perspectives on Browser and Basics Tutorial 1 1 Browser and Basics Tutorial 1.
Four Phases of Report Authoring Targeted for Executives and Upper Management By: Ben Aminnia President, L.A. SQL Server Professionals Group
Moodle (Course Management Systems). Assignments 1 Assignments are a refreshingly simple method for collecting student work. They are a simple and flexible.
Unit B: Expanding Your Productivity Page: 24 to 37.
Programming in HTML.  Programming Language  Used to design/create web pages  Hyper Text Markup Language  Markup Language  Series of Markup tags 
File Upload Competitive Analysis. Catalyst - Browse in-line Of interest:
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
Session 1 SESSION 1 Working with Dreamweaver 8.0.
XP New Perspectives on Microsoft FrontPage 2002 Tutorial 1 1 Microsoft FrontPage 2002 Tutorial 1 – Introducing FrontPage 2002.
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
Unit 2, cont. September 12 More HTML. Attributes Some tags are modifiable with attributes This changes the way a tag behaves Modifying a tag requires.
SharePoint document libraries I: Introduction to sharing files Sharjah Higher Colleges of Technology presents:
Putting it all together Dynamic Data Base Access Norman White Stern School of Business.
Forms and Server Side Includes. What are Forms? Forms are used to get user input We’ve all used them before. For example, ever had to sign up for courses.
Individualized Knowledge Access David Karger Lynn Andrea Stein Mark Ackerman Ralph Swick.
Microsoft ® Office Excel 2003 Training Using XML in Excel SynAppSys Educational Services presents:
Java server pages. A JSP file basically contains HTML, but with embedded JSP tags with snippets of Java code inside them. A JSP file basically contains.
OWL Representing Information Using the Web Ontology Language.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Tutorial 3 Adding and Formatting Text with CSS Styles.
Chapter 1 Getting Started With Dreamweaver. Exploring the Dreamweaver Workspace The Dreamweaver workspace is where you can find all the tools to create.
Build a database V: Create forms for a new Access database Overview: A window into your data So far in this series of courses, you’ve built tables, relationships,
USING WORDPRESS TO CREATE A WEBSITE (RATHER THAN A BLOG) STEP-BY-STEP INSTRUCTIONS.
DataSpace Data Visualization David Karger February 8, 2010 NSF Site Visit to MIT DataSpace 1.
XP New Perspectives on Macromedia Dreamweaver MX 2004 Tutorial 5 1 Adding Shared Site Elements.
COMP 143 Web Development with Adobe Dreamweaver CC.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
Data Visualization with Tableau
Building the Semantic Web
MVC Framework, in general.
Planning and Storyboarding a Web Site
Tutorial 7 – Integrating Access With the Web and With Other Programs
5.00 Apply procedures to organize content by using Dreamweaver. (22%)
Presentation transcript:

Free your Data: Instant Gratification with the Semantic Web David Karger

Why everyone should be their own database administrator, UI designer, application developer, and web site builder, and how they can David Karger

A Semantic Web Vision Autonomous computational agents perform sophisticated information tasks on behalf of their human users Use data that is annotated with rich semantics –Ontologies that explain precisely what the data means –Schema annotations that explain how to align multiple ontologies –Rules that explain how new data can be formally derived from existing –Inference systems that put it all together –Lots of logicians and AI researchers developing tools This vision is frightening –Involves solving problems that have bedeviled AI for decades –Often used to attack the semantic web –Or to argue to slow down deployment *“we can’t put up that data until we have an ontology!”

Aim Lower: the Semimantic Web Not “make computers help” but “make them not hinder” –“First, do no harm” Create a tiny bit of structure: –Name objects (with URLs) –Record named relations between them –No semantics on relations –No schemas –No inference This is both –Technically simple –Immediately useful You should do it –And you can right now

Why Applications? Typical user tasks require interaction with multiple pieces of information –Display –Explore –Query –Manipulate Applications bring together the data, specialized views, and operations necessary to perform tasks

Irrelevant info –Distracting –Covers up more important info Artist –Of dance, not music –ID3v2 added “Composer” –shown in wrong place No “difficulty” field –Place in comment field –Uses field up –Where put “tempo”? Menu of genre choices –My genre (of dance, not music) missing –ID3v2 lets user add

Summary of Problems Application has fixed idea of “right” data –Both properties and values for them And right way to display that data User wants to “stretch” the app to their needs –Cannot hide irrelevant data –Cannot incorporate new kinds of data –Cannot change how data is presented Perhaps just use generic comment field? –Add what you want –Format how you want

Properties have structure –Used for layout –And for browsing

Sometimes, one application isn’t enough Applications inappropriately partition task –Because task wasn’t planned for in application design No application has all the necessary data, operations –Need to launch several to do task Each includes unneeded data, operations –Clutter distracts from what you need to see Can’t work with data “across” application boundaries –Can’t record or view data connections –Have to find it again in second application –Or enter it manually a second time *Type budget numbers on postits to move to other application

Why? Building applications is hard –Done by expert few for the many –They determine which data, views, operations are useful Applications are “mass produced” –Everybody gets the same one –And only build for large markets –Word processor, , photo album, … Problem: different people want different applications –Basket weaving. UFO sightings, junkyard management –Want to work with unusual information –Want to see, navigate, manipulate it “their way” Developers can’t afford to build these boutique applications

What about the Web? Anything can get a URL Anything can go in a page, linked to anything –Common to “schematize on the fly”, making lists of interesting properties/values Support for orienteering –Scan list of choices –Pick the one that seems to lead in the right direction –Fact: people orienteer even when there’s an easy query that is faster –On web, never bounce off an application boundary

Downside Hard to author –Especially if I want to record lots of complex data Hard to manipulate, do complex queries –HTML loses meaning of data –Can’t “switch to tabular view” That’s why web sites are backed by databases –Data is kept structured to support complex queries –Templating engines convert to human readable presentation End users aren’t going to manage this kind of web site Gives powerful operations, but only “inside” web site –User may discover need to cross site boundaries –Like applications, web sites create (possibly wrong) data partitions –So all the problems with applications apply here too

Not just music Scientific research generates masses of data –E.g. Bioinformatics Others want to access that data Big standards bodies meet to decide on community standard formats and systems under which everyone will distribute data When scientist wants to try or report something new, or needs data from outside the community, stuck.

Information Wants to be Free Applications and Web Sites make assumptions about how their data will be used Those assumptions are hard-coded into the interaction with the data But no developer can predict all uses of the data Fixed interfaces prevent data repurposing Solution: give direct access to the data Just set up a SQL server? –(A long-running screed of the DB community)

But it Can’t be Just about the Data People need to look at the data –(unless we figure out those autonomous agents…) And need to create it in the first place Apps and template-driven web sets give us nice interfaces for interacting with the data they manage But if we use them we can’t repurpose the data And what interface can we use for the repurposed data? Web needed a server (of data) and a client (to show it) How make viewing, authoring and repurposing arbitrary data as easy as viewing and authoring web pages? –Without knowing precisely what data people will want to view or how they will want to view it?

Example: Piggy Bank I need data from more than one web site And I need to look at it differently than any web site What is minimum necessary support? Piggy Bank: A firefox plugin for navigating structured data

Find some movies

Free that data

Show it a different way

Combine it with other sources

Mash Ups? Developer decides to integrate data from multiple sites Writes programmatic “scrapers” –reverse the web site’s templating process to recover data Combines resulting data structures Presents using their own template driven web site –Thus guilty of same sin as the one they are fighting –I only get the mash-ups a programmer decides to create Piggy bank lets end users do their own mashing

Data Model

RDF W3C standard Minimum data model –URL for arbitrary objects –Arbitrary named links between two objects –No schemas Much like the web, except –URLs need not be web pages –Machine readable “anchor text” in links Yet Powerful –Relations are natural/universal –Represent a semantic network Loew’sSuperman title venue Kendall Sq. Movie type location 8PM time Theater type

Are we done? Is RDF the only answer? –SQL/Tuples, XML can represent same info –So any would do –And user shouldn’t have to know which we’ve chosen –But RDF is easiest to create sloppily, incrementally *So best suited to let enthusiasts create some –And imposes fewest requirements to be “compatible” Is RDF the whole answer? –Still unclear how to interact with it

Visualization

Lenses If data is amorphous, monolithic UI won’t do –Can’t know in advance what kind of data we’ll need to display –Or what user will want to do with that data Let each type come with “view prescription” –“To display a document, show its title, author, and abstract –“To display a person, show his name and affiliation” –Specifies properties to show, and “decoration” (fonts, layouts) After you get the data, assemble lenses to show it –(recursively) Lenses are described in RDF –So they can be collected, repurposed like any other data

Fresnel dsp:publicationLens rdf:type :Lens; :classLensDomain ow:Publication; :group gr:group; :purpose :defaultLens; :showProperties ( dc:description dc:identifier dc:creator dc:contributor dc:date dc:subject dc:type dc:publisher dc:rights ). dsp:rightsFomat rdf:type :Format; :group gr:group; :propertyFormatDomain dc:rights; :propertyStyle "dspace-rights".

Benefits Data collected from anywhere can be viewed together –Each piece of data with its own lens Lenses are described, not programmed –Enthusiasts can write their own –(especially if we give them wysiwyg tools) –No need to build a template driven web site –Just edit, publish some lenses

Manipulation

Application Development by End Users People want applications to manipulate their data But applications only manipulate developer’s data So let end users build their own Use lenses, but refract in both directions –Lenses describe how to map data to presentation –Invert, interpret manipulation of presentation as manipulation of data *(extend lenses to talk about click, drag, drop) Operations represented as web services –Internal and remote operations –Receive RDF data and act on it

The Big Picture

Sufficient for Nice Applications? Application design is impoverished –Divide up the screen –Put an object in each piece –Show properties of each object –With pretty formatting –Put operations in menus –And add some toolbars to save time This application “vocabulary” is limited enough –to be described instead of programmed –so it can be edited by end users

Workspace Designer Editing mode for applications Define regions of screen –By splitting existing regions Resize Regions Specify content of each region –Object to be shown (drag and drop object) –Lens to use to show object (menu of relevant lens) –Operations to make available on object (drag operations)

Writing a Brain Research Paper

Adding “Things to Do” Region

Revised Application

Lens Designer Specify how a particular object can be shown Similar to workspace designer –Lens is “workspace” for viewed object Subdivide canvas Specify property to show in each region Specify lens for value of each property

50 Drug Discovery Dashboard Topic: GSK3beta Topic Target: GSK3beta Disease: DiabetesT2 Alt Dis: Alzheimers Cmpd: SB44121 CE: DBP Team: GSK3 Team Person: John Related Set Path: WNT

51 Lenses can aggregate, accentuate, or even analyze new result sets Behind the lens, the data can be persistently stored as RDF-OWL Correspondence does not need to mean “same descriptive object”, but may mean objects with identical references Bridging Chemistry and Molecular Biology

52 Pathway Polymorphisms Merge directly onto pathway graph Identify targets with lowest chance of genetic variance Predict parts of pathways with highest functional variability Map genetic influence to potential pathway elements Select mechanisms of action that are minimally impacted by polymorphisms Non-synonymous polymorphisms from db-SNP

53 Clinical Dashboard Gene Expression Data Additional relations and aspects can be defined additionally: Mendelian Index of Man Diseased Tissue Links to OMIM (RDF)

54 Bar View Lens for Gene Expression

55 ClinDash: Clinical Trials Browser Clinical Obs Expression Data Subjects Values can be normalized across all measurables (rows) Samples can be aligned to their subjects using RDF rules Clustering can now be done over all measureables (rows) and types

Shattering Applications Specific lenses may be too complex for end users to create But end users can –Assemble these lenses into “applications” –Decide at which data these lenses point Current application developers can build those views –Much more modular –Instead of building whole application, just build a lens and add to pool –Repurposable lenses for repurposable data Simpler views can be built by non programmers –Embedding the complex lenses as subparts

Sharing

Semantic Bank Tools directly collect and manipulate RDF –So sharing just requires publishing the RDF back Semantic Bank is just a big RDF repository –GET a resource to fetch the (XML encoding of) RDF about it –Similarly, upload an XML encoding of the RDF: *POST /semantic-bank/foo?command=upload&format=rdfxml HTTP/1.1 Host: bank.example.org Content-Length: 317 An Example

Getting There

What’s wrong? It seems obvious: RDF lets anyone –Ignore web site and application boundaries –Gather data they need –Define their own new attributes and relationships –Look at it the way that the need –Manipulate it –Publish it back for others to use it, without having to manage a web site So why don’t we already have it?

Cost of Getting Started? Web: –Download/run a web server (hardest part, happens only once) –Download a web browser –Write a web page Semantic Web –Install database, define schemas –Add middleware layer –Create templating engines –Develop ontolgies, data import protocols –… Semimantic web –Post some rdf (written in n3) to a semantic bank –Install piggybank

Absence of Schemas? What good is it to put up RDF without explaining all the properties? What happens when different people put up “mismatched” data with different (explicit or implicit) schemas? What if there are multiple URLs for the same thing, with inconsistent statements about them? How can I use data I collected from somewhere else, if it doesn’t have the same schema as mine? But designing schemas is hard –Requires big committees, lots of meetings, deliberation, buy-in

Data First, Schema later (if ever) Need for schemas is a fallacy, blocking progress Each site is likely consistent with itself And will likely “go with the crowd” and be consistent with others If not, let users (not machines) translate –Mapping properties to properties –As needed, from site to site *(or site to personal repository) –Typically only need to blend a few sites

There’s no RDF? Database backed servers can easily expose RDF, if they want to –E.g., citeseer.csail.mit.edu –Import into piggy bank –Browse, query, search in interesting ways –Maintain collections of references If server won’t cooperate, scrape –Piggy bank has a scraper repository –One person writes scraper, everyone uses –Or, one scrapes and publishes to semantic bank, others get from bank –Also unsupervised machine learning approaches

Clogs and Plogs Much blogging is about recycling content Clogs (Content Blogs) can manually merge data –Blogger locates sources of data that ought to be in their schema –Invests work to align properties and instances –Publishes resulting single (schema unified) blob of data –No front end Plogs (Presentation Blogs) display data –Develop interesting lenses –Point them at clogger content –Someone else’s back end Separate front and back ends into different web sites

Chicken and Egg RDF-aware clients useless without data, and vice versa What can prime the pump?

Research Projects Many of our projects generate interesting data Then present through one interface –Eg NLP, speech Instead, post it to the semantic bank –Others will find new uses for the data Other projects consume data –Get it from the bank Let’s talk…

Conclusion We have the tools to separate data from presentation –RDF repositories –Lenses to display arbitrary data in arbitrary combination Doing so would offer substantial benefits –Application barriers go away –Anyone can create interesting content –People can repurpose it to their own specific needs Semantic Web can be lightweight –Low cost of deployment –Immediate benefit –All we need do is ignore semantics

Haystack.csail.mit.edu Simile.mit.edu