1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department.

Slides:



Advertisements
Similar presentations
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Advertisements

International Technology Alliance In Network & Information Sciences International Technology Alliance In Network & Information Sciences Paul Smart, Ali.
The 20th International Conference on Software Engineering and Knowledge Engineering (SEKE2008) Department of Electrical and Computer Engineering
A Stepwise Modeling Approach for Individual Media Semantics Annett Mitschick, Klaus Meißner TU Dresden, Department of Computer Science, Multimedia Technology.
Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management Eytan Adar et al Presented by Xiao Hu CS491CXZ.
CS570 Artificial Intelligence Semantic Web & Ontology 2
1 Publishing Linked Sensor Data Semantic Sensor Networks Workshop 2010 In conjunction with the 9th International Semantic Web Conference (ISWC 2010), 7-11.
Multi-Phase Reasoning of temporal semantic knowledge Sakirulai O. Isiaq and Taha Osman School of Computer and Informatics Nottingham Trent University Nottingham.
Building and Analyzing Social Networks Web Data and Semantics in Social Network Applications Dr. Bhavani Thuraisingham February 15, 2013.
An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.
W3C Video on the Web Workshop December 2007, San Jose, California Video on the Semantic Sensor Web Amit Sheth Amit Sheth with Cory Henson, Prateek.
Information Retrieval Visualization CPSC 533c Class Presentation Qixing Zheng March 22, 2004.
Linked Sensor Data Harshal Patni, Cory Henson, Amit P. Sheth Ohio Center of Excellence in Knowledge enabled Computing (Kno.e.sis) Wright State University,
Xyleme A Dynamic Warehouse for XML Data of the Web.
Sensemaking and Ground Truth Ontology Development Chinua Umoja William M. Pottenger Jason Perry Christopher Janneck.
COMP 6703 eScience Project Semantic Web for Museums Student : Lei Junran Client/Technical Supervisor : Tom Worthington Academic Supervisor : Peter Strazdins.
FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University.
Semantics For the Semantic Web: The Implicit, the Formal and The Powerful Amit Sheth, Cartic Ramakrishnan, Christopher Thomas CS751 Spring 2005 Presenter:
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
1 On Embedding Machine-Processable Semantics into Documents Krishnaprasad Thirunarayan Department of Computer Science & Engineering Wright State University.
Information Retrieval
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Overview of Search Engines
Cloud based linked data platform for Structural Engineering Experiment Xiaohui Zhang
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Improving Data Discovery in Metadata Repositories through Semantic Search Chad Berkley 1, Shawn Bowers 2, Matt Jones 1, Mark Schildhauer 1, Josh Madin.
Amarnath Gupta Univ. of California San Diego. An Abstract Question There is no concrete answer …but …
An Intelligent Broker Architecture for Context-Aware Systems A PhD. Dissertation Proposal in Computer Science at the University of Maryland Baltimore County.
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
Formalizing and Querying Heterogeneous Documents with Tables Krishnaprasad Thirunarayan and Trivikram Immaneni Department of Computer Science and Engineering.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
1 The BT Digital Library A case study in intelligent content management Paul Warren
Krishnaprasad Thirunarayan, Pramod Anantharam, Cory A. Henson, and Amit P. Sheth Kno.e.sis Center, Ohio Center of Excellence on Knowledge-enabled Computing,
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
SWETO: Large-Scale Semantic Web Test-bed Ontology In Action Workshop (Banff Alberta, Canada June 21 st 2004) Boanerges Aleman-MezaBoanerges Aleman-Meza,
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
10/18/2015Page 1 Introduction to Semantic Web Design B. Ramamurthy.
Semantic Network as Continuous System Technical University of Košice doc. Ing. Kristína Machová, PhD. Ing. Stanislav Dvorščák WIKT 2010.
Knowledge Representation of Statistic Domain For CBR Application Supervisor : Dr. Aslina Saad Dr. Mashitoh Hashim PM Dr. Nor Hasbiah Ubaidullah.
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
1. 2 Semantic Sensor Markup of Data and Services SSN-XG Meeting (04/22/09) Amit ShethAmit Sheth, Kno.e.sis CenterKno.e.sis Center SSW LabSSW Lab, Services.
A Context Model based on Ontological Languages: a Proposal for Information Visualization School of Informatics Castilla-La Mancha University Ramón Hervás.
EEL 5937 Ontologies EEL 5937 Multi Agent Systems Lecture 5, Jan 23 th, 2003 Lotzi Bölöni.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
OWL Representing Information Using the Web Ontology Language.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Introduction to the Semantic Web and Linked Data
Trustworthy Semantic Webs Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #4 Vision for Semantic Web.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Strategies for subject navigation of linked Web sites using RDF topic maps Carol Jean Godby Devon Smith OCLC Online Computer Library Center Knowledge Technologies.
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
KAnOE: Research Centre for Knowledge Analytics and Ontological Engineering Managing Semantic Data NACLIN-2014, 10 Dec 2014 Dr. Kavi Mahesh Dean of Research,
The Semantic Web. What is the Semantic Web? The Semantic Web is an extension of the current Web in which information is given well-defined meaning, enabling.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
An Ontological Approach to Financial Analysis and Monitoring.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
SEMANTIC WEB Presented by- Farhana Yasmin – MD.Raihanul Islam – Nohore Jannat –
OWL (Ontology Web Language and Applications) Maw-Sheng Horng Department of Mathematics and Information Education National Taipei University of Education.
Data mining in web applications
An Efficient Bit Vector Approach to Semantics-based
Cloud based linked data platform for Structural Engineering Experiment
over Machine and Citizen Sensing
Analyzing and Securing Social Networks
Information Retrieval
A Modular Approach to Document Indexing and Semantic Search
A framework for ontology Learning FROM Big Data
Presentation transcript:

1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department of Computer Science and Engineering Wright State University, Dayton, OH-45435, USA

2

Knowledge Enabled Information and Services Science 3

Information Retrieval Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers). 4

Evolution of the Web 5

Semantic Web Semantic Web is a standards-based extension of the WWW in which the semantics of information and services on the web is defined, so as to satisfy information need of people and enable machines to use the web content. Machine comprehensible structured data 6

Tim Berner-Lee’s Semantic Web Layer Cake 7

Updated Semantic Web Cake 8

9 Trust Issues in Social Media and Sensor Networks T. K. Prasad, Cory Henson, Amit Sheth and Pramod Anantharam Kno.e.sis Center Department of Computer Science and Engineering Wright State University, Dayton, OH-45435, USA

10 Goal Study semantic issues relevant to trust in Social Media - Data and Networks Sensor - Data and Networks Generic Examples involving Trust Analyzing ratings/reviews online on TV models before making purchasing decision from amazon.com Seeking recommendations on handy man, car mechanic, etc. from neighbors

11 Generic Approach Propose models of trust/trust metrics to formalize trust aggregation and trust propagation to deal with indirect trust Develop techniques and tools to glean trust information from social media data (streams) and networks sensor data (streams) and networks

Trust in Social Media Networks 12

Previous Work Structure of Trust Trust between a pair of users is modelled as a real number in the closed interval [0,1] or [-1,1] Pros: Facilitates propagation and computation of aggregated trust Cons: Too fine-grained, total order Inherent difficulties in initializing, understanding, and justifying computed trust values

Quote Guha et al: While continuous-valued trusts are mathematically clean from the standpoint of usability, most real-world systems will in fact use discrete values at which one user can rate another. E.g., Epinions, Ebay, Amazon, Facebook, etc all use small sets for (dis)trust/rating values.

Trust-aware Recommender Systems Collaborative Filtering systems exploit user-similarity to get recommendations. But suffer from data sparsity problem. Adding trust links improves quality of recommendations benefits cold-start users who most need it is robust w.r.t. spamming via engineered profiles (Shilling Attacks) 15

16 Our Research Propose a model of trust based on Partially ordered discrete values (with emphasis on relative magnitude) Local but realistic semantics Distinguishes functional and referral trust Distinguishes direct and inferred trust Prefers direct information over conflicting inferred information Represents ambiguity explicitly HOLY GRAIL: Direct Semantics in favor of Indirect Translations

Essential concepts Trust Scope: Context, Action, … Functional Trust: Agent a1 trusts agent a2’s ability in some context or for doing something Referral Trust: Agent a1 trusts agent a2’s ability to recommend another agent in some context or for doing something Trust is a relationship among agents/users, while belief is a relationship between agents/users and statements 17

Semantics : Interpretation Four valued logic {inconsistent information, true, false, no information} Trust / Distrust 4-valued “binary” function among users Belief / Disbelief: 4-valued “binary” function on users and statements

Example: Trust Network - Different Trust Links and Local Ordering on Trust Links Alice trusts Bob for recommending good car mechanic. Bob trusts Dick to be a good car mechanic. Charlie does not trust Dick to be a good car mechanic. Alice trusts Bob more than Charlie, w.r.t. car mechanic context. Alice trusts Charlie more than Bob, w.r.t. baby sitter context. 19

Formalization of Semantics : Basis for Trust Computation Algorithm 20

Formalization Approach Given a trust network (Nodes, Edges with Trust Scopes, Local Orderings), specify when a source agent can trust, distrust, or be ambiguous about another target agent, reflecting: Functional and referral trust links Direct and inferred trust Locality 21

22

23 Similarly for Evidence in support of Negative Functional Trust.

24

Quote summarizing potential bug The whole problem with the world is that fools and fanatics are always so certain of themselves, but wiser people so full of doubts. --- Betrand Russell 25

Possible Future Extensions Trust links with trust-scoped exceptions Straddles two extremes involving just trust links and just trust-scoped links Trust values annotated with trust path length, target neighborhood summary, etc. Other forms of trust links formalized using upper ontology 26

Trust in Sensor Networks 27

Sensor Networks Approaches to Trust Reputation-based Trust Based on past behavior Policy-based Trust Based on explicitly stated constraints Evidence-based Trust Based on seeking/verifying evidence 28

Probabilistic basis for reputation-based trust in a Sensor Node Sensor Reputation and Sensor Observation Credibility determined using outlier detection algorithm aggregating results over time Homogeneous sensor networks can exploit spatio-temporal locality and redundancy for this purpose Heterogeneous sensor networks require complex domain models for this purpose 29

(cont’d) Trust/Reputation in a Sensor node can be modeled as beta probability distribution function with parameters ( ,  ) gleaned from total number of correct (  ) and erroneous (  ) observations so far. 30

Motivation for using Beta PDF Computational Ease Retain/manipulate just two values ( ,  ) Incremental update : after checking whether new data is normal or outlier Intuitively Satisfactory Initialization not necessary (flat PDF) PDF variation sufficiently expressive That is, it assimilates updates and large number of observations satisfactorily 31

Next few slides shed light on beta probability distribution function (1) Mathematical formulation (2) Graphs for intuitive understanding of its role 32

Role of Beta probability distribution function 33 x is a probability, so it ranges from 0-1 If the prior distribution of p is uniform, then the beta distribution gives posterior distribution of p after observing  occurrences of event with probability p and  occurrences of the complementary event with probability (1-p).

34          = , so the pdf’s are symmetric w.r.t 0.5. Note that the graphs get narrower as (  ) increases.

35          =/= , so the pdf’s are asymmetric w.r.t Note that the graphs get narrower as (  ) increases.

Advantages: Robust w.r.t. attacks Bad-mouthing attack E-commerce analogy: Sellers collude with buyers to give bad ratings to others Ballot stuffing attack E-commerce analogy: Sellers collude with buyers to give it unfairly good ratings Sleeper attacks Apparently trusted agent defects 36

Trust in Tweets 37

Twitter Large network of people Large number of tweets Tweet – 140 character description of an event Problem: How to organize tweets? 38

Exploiting trust information Rank tweets according to trust information Trust in the user who tweets Belief (trust) in the tweet 39

Trust in the person who tweets Popularity of the user Based on count of followers Reputation of the user Based on history of making informed observations Enrich using Pagerank Analogy? Highly trusted followers count more than lowly trusted followers

Belief (Trust) in a tweet Belief in a tweet depends on the trust in the user who generates it. Belief in a tweet depends on the content of “similar” tweets (originating from approximately the same location around the same time)

Trust in Linked Open Data 42

Linked Data The Linking Open Data (LOD) project is a community-led effort to create openly accessible, and interlinked, RDF Data on the Web. RDF: Resource Description Framework – graph- based representation language 43

Linked Data 44

Exploiting trust for access and standardization Trust in the creator of the data, and belief (trust) in the data How well connected is the data? Rank LOD according to trust information 45

Sensor Data on LOD MesoWest weather data in US ~20,000+ Sensor Systems ~1 billion Observational Assertions Sensors linked with Geonames on LOD 46

Trust in Active Perception 47

Active Perception Perception is the process of observing, hypothesis generation, and verification 48

Evidence-based Trust Observations (and hypotheses) are more trusted if they can be verified through empirical evidence Sensors are more trusted if their observations are trusted 49

Evidence-based Trust 50 Strengthened Trust Trust

Additional uses of active perception in sensors context Determining actionable intelligence by narrowing set of explanations to one Enable use of a “small” set of always on sensors to bootstrap and selectively turn-on additional sensors in a resource (e.g., power) constrained environment 51

52 References Krishnaprasad Thirunarayan, Dharan Althuru, Cory Henson, and Amit Sheth, “A Local Qualitative Approach to Referral and Functional Trust,” The 4th Indian International Conference on Artificial Intelligence (IICAI-09), December Cory Henson, Joshua Pschorr, Amit Sheth, and Krishnaprasad Thirunarayan, “SemSOS: Semantic Sensor Observation Service,” International Symposium on Collaborative Technologies and Systems (CTS2009), Workshop on Sensor Web Enablement (SWE2009), Baltimore, Maryland, Krishnaprasad Thirunarayan, Cory Henson, and Amit Sheth, “Situation Awareness via Abductive Reasoning from Semantic Sensor Data: A Preliminary Report,” International Symposium on Collaborative Technologies and Systems (CTS2009), Workshop on Collaborative Trusted Sensing, Baltimore, Maryland, 2009.

53 References A. Sheth and M. Nagarajan, “Semantics-Empowered Social Computing, IEEE Internet Computing”, Jan/Feb 2009, Amit Sheth, Cory Henson, and Satya Sahoo, "Semantic Sensor Web," IEEE Internet Computing, vol. 12, no. 4, July/August 2008, p

Machine and Citizen Sensor Data Demos Illustrate semantic web and information retrieval techniques -- spatio-temporal-thematic ontologies, mash-ups, machine and citizen sensor data analytics 54

55 High-level Sensor Low-level Sensor How do we check if the three images depict … the same time and same place? same entity? a serious threat? Motivating Scenario: Spatio-temporal-thematic analytics

Semantic Observation Service: Overall Architecture and Details

SemSOS Demo ci/application_domain/sem_sensor/co ry/demos/ssos_demo/ssos_demo.htm Twitris Demo

Situation Awareness : Analysis Situation Awareness Components Physical World: Sensor Data Perception: Entity Metadata Comprehension: Relationship Metadata Semantic Analysis How is the data represented? Sensor Web Enablement What are the sources of the data? Provenance Analysis What objects/events account for the data? Abductive Reasoning Where did the event occur? Spatial Analysis When did the event occur? Temporal Analysis What is the significance of the event? Thematic Analysis What are the reasons for inconsistency? Abductive Reasoning

59 A Unified Approach to Retrieving Web Documents and Semantic Web Data Trivikram Immaneni* and Krishnaprasad Thirunarayan Department of Computer Science and Engineering Wright State University Dayton, OH-45435, USA *Currently at: Technorati, San Francisco

60 Outline Goal (What?) Background and Motivation (Why?) Unified Web Model (Why?) Query Language and Examples (What?) Implementation Details (How?) Evaluation and Applications (Why?) Conclusions

61 Goal

62 Integrate HTML Web and Semantic Web by establishing and exploiting connections between them => Unified Web Model Design and implement a language to retrieve data and documents from the Unified Web => Hybrid Query Language Implement the system using mature software components for indexing and search => SITAR

63 Background and Motivation

64 HTML Web Hyperlinked Web of documents Content human comprehensible Search engines and web browsers search, retrieve, navigate, and display information Keyword-based searches have low precision and high recall

65 Semantic Web Standards-based labeled graph of resources and binary properties (data) Content machine accessible Database techniques adapted to store and retrieve Semantic Web data Query formulation by lay users difficult but results are precise XML, RDF, SPARQL, Web Services, etc.

66 Shoehorning HTML Web into Semantic Web Document = Data node + Content as string in RDF graph Regular expressions in SPARQL used to retrieve documents. Drawbacks that IR tries to overcome Ease of query formulation: Keyword-based Dealing with Large datasets: Ranking

67 Formalizing HTML Web as Semantic Web Techniques for manual (re)-authoring of (legacy) documents using Semantic Web Technologies is neither feasible nor advisable. State-of-the-art NLP and information extraction techniques inadequate Informal description indispensable for human comprehension Escape route: Traceability via superposition (E.g., RDFa)

68 Shoehorning Semantic Web into HTML Web Currently, Semantic Web documents live on the HTML Web but their components are neither accessible nor reasoned with via keyword-based searches Swoogle attempts to rank Semantic Web documents

69 Unified Web Model (What?)

70 Aim Unified Web integrates the two Webs to enable improved hybrid retrieval of data and documents. Unified Web Model Hybrid Query Language

71 Unified Web Model Graph Node Abstract entity identified by its URI Blank/Literal node names automatically generated Home URI Section URI index words Document Section (optional) Outgoing Links Section Triples Section …

72 (cont’d) Relationships (Edges) hasDocument Relates Node to content string hyperlinksTo Relates Node with another node to which the former node’s document contains a hyperlink Asserts Relates Node with each RDF statement in the document linksTo Relates Node with another node to which the former node’s document contains a hyperlink, or such that the former node’s document contains a triple with the latter node

73 Example of Unified Web Model Document contains the RDF fragment: … …

74

75 Data Retrieval from Unified Web Unified Web Model can be specified using RDF In terms of rdfs:Resource, rdfs:Propery, rdfs:Statement, rdfs:Literal, refs:Subject, rdfs:Predicate, rdfs:Object, etc Unified Web is a reified Semantic Web (user triples) SPARQL usable as query language

76 Information Retrieval from Unified Web Node can be indexed using URI index words Based on name, content, label, triples, etc Node can be ranked using its phrasal / URI-based annotations and its node neighborhood

77 Advantages Semantic Web nodes can be retrieved using (associated) keywords Legacy document recall improved by interpreting hyperlink as Semantic Markup for reasoning. Hyperlink: Triple:

78 Semantics rich URIs (such as those from dictionary.com) in legacy documents can be incrementally equated with ontologies Document: … Jaguar God of the Underworld … Ontology: …... owl:Sameas

79 Query Language and Examples (What?)

80 Aim Store and retrieve Semantic Web data, and use information in documents to enhance data retrieval Enable use of keywords to deal with lack of complete URI information Peter affiliated-with ?X Enable use of partial information about data being searched Student :: Peter affiliated-with ?X

81 Store and retrieve documents, and use information in the Semantic Web to enhance document retrieval Docsearch( :: Maya God)

82 Sample Queries Wordset queries: -> retrieves all URIs indexed by BOTH peter AND haase Includes document and URIs URIs are indexed by words. The words are obtained by analyzing URIs, from label literals, and anchor text of the URIs.

83 Wordset Pair queries: :: -> specifies that user is looking for peter, the phd student Transitive closure

84 More Queries Get Peter the Phd student’s home page: getBindings ( [ :: ?x] ) Get Peter Haase’s publications that have “Semantic” in their title: getBindings([ ?x] [?x ]) Get group 1 element which is white in color getBindings( [?x ] [?x ] )

85 Homepages of Phd students named Peter that “talk about” Semantic Grid getDocsByBindingsAndContent ( [ :: ?x] “semantic grid” ) getLinkingNodes ( karlsruhe.de/Personen/viewPerson?id_db=2023 ) karlsruhe.de/Personen/viewPerson?id_db=2023 getAssertingNodes ([ ?x]). getDocsByIndexOrContent (peter haase)

86 Implementation Details (How?) SITAR : Semantic InformaTion Analysis and Retrieval system

87 Tools Used Apache Lucene 2.0 APIs in Java A high-performance, text search engine library with smart indexing strategies. Cyberneko HTML Parser Jena ARP RDF parser

88 Evaluation and Application (Why?)

89 Experiments DATASETs: AIFB SEAL data The crawler collected 1665 files (English XHTML pages and RDF/OWL pages) (610 RDF files and 845 XHTML files) were successfully parsed and indexed A total of triples were parsed and indexed

90 Datasets (cont’d) TAP dataset Periodic table Lehigh University BenchMarks

91 Conclusions

92 Developed a Hybrid Query language for data and document retrieval that is convenient because it is keyword-based that can be accurate and flexible because disambiguation information can be provided that is expressive because it can support inheritance reasoning that is pragmatic because it can work with legacy documents FUTURE WORK: Robust Ranking Strategy

93 References T. Immaneni and K. Thirunarayan, A Unified approach To Retrieving Web Documents and Semantic Web Data, In: Proceedings of the 4th European Semantic Web Conference (ESWC 2007), LNCS 4519, pp , June T. Immaneni, and K. Thirunarayan, Hybrid Retrieval from the Unified Web, In: Proceedings of the 22nd Annual ACM Symposium on Applied Computing (ACM SAC 2007), pp , March 2007.

THANK YOU! 94