Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ivan Herman, W3C, W3C Brazil Office Meeting São Paulo, Brazil, 2010-10-15.

Similar presentations

Presentation on theme: "Ivan Herman, W3C, W3C Brazil Office Meeting São Paulo, Brazil, 2010-10-15."— Presentation transcript:

1 Ivan Herman, W3C, W3C Brazil Office Meeting São Paulo, Brazil, 2010-10-15

2 (2)

3 (3)

4 (4)

5 (5)

6 (6)

7 (7)

8 (8)

9 (9)

10 (10)

11 (11)

12 (12)

13 (13)

14 (14)

15 (15)

16 (16)

17 (17)

18 (18)

19 (19)

20 (20)

21 (21)  You had to consult a large number of sites, all different in style, purpose, possibly language…  You had to mentally integrate all those information to achieve your goals  We all know that, sometimes, this is a long and tedious process!

22 (22)  All those pages are only tips of respective icebergs: ◦ the real data is hidden in databases, XML files, Excel sheets, … ◦ you only have access to what the Web page designers allow you to see

23 (23)  Specialized sites (Expedia, TripAdvisor) do a bit more: ◦ they gather and combine data from other sources (usually with the approval of the data owners) ◦ but they still control how you see those sources  But sometimes you want to personalize: access the original data and combine it yourself!

24 (24)

25 (25)

26 (26)

27 (27)

28 (28)  I have to type the same data again and again…  This is even worse: I feed the icebergs…

29 (29)  The raw data should be available on the Web ◦ let the community figure out what applications are possible…

30 (30)

31 (31)

32 (32)

33 (33)  Mashup sites are forced to do very ad-hoc jobs ◦ various data sources expose their data via Web Services, API-s ◦ each with a different API, a different logic, different structure ◦ mashup sites are forced to reinvent the wheel many times because there is no standard way getting to the data!

34 (34)  The raw data should be available in a standard way on the Web ◦ i.e., using URI-s to access data ◦ dereferencing that data should lead to something useful

35 (35)  What makes the current (document) Web work? ◦ people create different documents ◦ they give an address to it (ie, a URI) and make it accessible to others on the Web

36 (36)

37 (37)  Others discover the site and they link to it  The more they link to it, the more important and well known the page becomes ◦ remember, this is what, eg, Google exploits!  This is the “Network effect”: some pages become important, and others begin to rely on it even if the author did not expect it…

38 (38)

39 (39)

40 (40)  The same network effect works on the raw data ◦ Many people link to the data, use it ◦ Much more (and diverse) applications will be created than the “authors” would even dream of!

41 (41)

42 (42)

43 (43) Photo credit “nepatterson”, Flickr

44 (44)  A “Web” where ◦ documents are available for download on the Internet ◦ but there would be no hyperlinks among them  This is certainly not what we want!

45 (45)

46 (46)

47 (47)

48 (48)

49 (49)

50 (50)  The raw data should be available in a standard way on the Web  There should be links among datasets

51 (51) Photo credit “kxlly”, Flickr

52 (52)  On the traditional Web, humans are implicitly taken into account  A Web link has a “context” that a person may use

53 (53)

54 (54)

55 (55)  A human understands that this is where my office is, ie, the institution’s home page  He/she knows what it means ◦ realizes that it is a research institute in Amsterdam  When handling data, something is missing; machines can’t make sense of the link alone

56 (56)  New lesson learned: ◦ extra information (“label”) must be added to a link: “this links to my institution, which is a research institute” ◦ this information should be machine readable ◦ this is a characterization (or “classification”) of both the link and its target ◦ in some cases, the classification should allow for some limited “reasoning”

57 (57)  The raw data should be available in a standard way on the Web  Datasets should be linked  Links, data, sites, should be characterized, classified, etc.  The result is a Web of Data

58 (58)

59 (59)

60 (60)  It is that simple…  Of course, the devil is in the details ◦ a common data model data has to be provided ◦ the “classification” of the terms can become very complex ◦ but these details are fleshed out by experts as we speak!

61 (61)  A set of core technologies are in place  Lots of data (billions of relationships) are available in standard format ◦ often referred to as “Linked Open Data Cloud”

62 (62)

63 (63)  There is a vibrant community of ◦ academics: universities of Southampton, Oxford, Stanford, PUC ◦ small startups: Garlik, Talis, C&P, TopQuandrant, Cambridge Semantics, OpenLink, … ◦ major companies: Oracle, IBM, SAP, … ◦ users of Semantic Web data: Google, Facebook, Yahoo! ◦ publishers of Semantic Web data: New York Times, US Library of Congress, open governmental data (US, UK, France,…)

64 (64)  Companies, institutions begin to use the technology: ◦ BBC, Vodafone, Siemens, NASA, BestBuy, Tesco, Korean National Archives, Pfizer, Chevron, …  see  Truth must be said: we still have a way to go ◦ deployment may still be experimental, or on some specific places only

65 (65)

66 (66)

67 (67)

68 (68)

69 (69)  Help in finding the best drug regimen for a specific case, per patient  Integrate data from various sources (patients, physicians, Pharma, researchers, ontologies, etc)  Data (eg, regulation, drugs) change often, but the tool is much more resistant against change Courtesy of Erick Von Schweber, PharmaSURVEYOR Inc., (SWEO Use Case)(SWEO Use Case)

70 (70)  Integration of relevant data in Zaragoza  Use rules to provide a proper itinerary Courtesy of Jesús Fernández, Mun. of Zaragoza, and Antonio Campos, CTIC (SWEO Use Case)(SWEO Use Case)

71 (71)  More an more data should be “published” on the Web ◦ this can lead to the “network effect” on data  New breeds of applications come to the fore ◦ “mashups on steroids” ◦ better representation and usage of community knowledge ◦ new customization possibilities ◦ …

72 (72)  A huge amount of data (“information”) is available on the Web  Sites struggle with the dual task of: ◦ providing quality data ◦ providing usable and attractive interfaces to access that data

73 (73) “Raw Data Now!” Tim Berners-Lee, TED Talk, 2009 “Raw Data Now!” Tim Berners-Lee, TED Talk, 2009  Semantic Web technologies allow a separation of tasks: 1. publish quality, interlinked datasets 2. “mash-up” datasets for a better user experience

74 (74)  The “network effect” is also valid for data  There are unexpected usages of data that authors may not even have thought of  “Curating”, using, exploiting the data requires a different expertise

75 (75)  W3C ◦ was one of the initiators of the Semantic Web (Tim Berners-Lee and others) ◦ is the place where Semantic Web Standards are developed and defined ◦ is integral part of the Semantic Web community

76 (76)  It is done by groups, with W3C members delegating experts  Each group has at least one W3C staff member to help the process and contribute to the technology ◦ there is a formal process that has to be followed ◦ the price to pay…

77 (77)

78 (78)  The public can comment at specific points in the process  Groups must take all comments into account ◦ the number of comments can be in the hundreds...

79 (79)  Regular telecons (usually once a week)  Possibly 1-2 face-to-face meetings a year  Lots of email discussions  Editorial work to get everything properly written down  Average life-span: 2-3 years

80 (80)

81 (81) Thank you for your attention! These slides are also available on the Web:

Download ppt "Ivan Herman, W3C, W3C Brazil Office Meeting São Paulo, Brazil, 2010-10-15."

Similar presentations

Ads by Google