Presentation is loading. Please wait.

Presentation is loading. Please wait.

Research Overview 23 August 2006

Similar presentations


Presentation on theme: "Research Overview 23 August 2006"— Presentation transcript:

1 Research Overview 23 August 2006
Research Overview 23 August 2006

2 UMBC and Ebiquity UMBC is a research extensive University with a a major focus on Information Technology Ebiquity is a large and active research group with the goal of “Building intelligent systems in open, heterogeneous, dynamic, distributed environments” Current research includes mobile and pervasive computing, security/trust/privacy, semantic web, multiagent systems, advanced databases, and high performance computing 5/14/2019

3 5/14/2019

4 People and funding Faculty: Finin, Yesha, Joshi, Peng, Halem
Colleagues: Oates, desJardins, Pinkston, Segall, … Students: ~10 PhD, ~10 MS, ~5 undergrad Funding Current: DARPA (Trauma Pod, STTRs), NSF (two ITRs, Cybertrust, NSG, …), Intelligence community, NASA, NIST, Industry (IBM, Fujitsu, …) Recent: DARPA (CoABS, GENOA II, DAML), NSF (CAREER) 5/14/2019

5 Ebiquity Research Space
KR information extraction user modeling semantic web machine learning IR AI data mining Intelligent Information Systems web services/SOC DB knowledge management wearable computing mobility policies HPCC assurance Networking & Systems wireless Security trust context awareness DRM pervasive computing intrusion detection privacy 5/14/2019

6 Ebiquity Research Space
language technology robotics HCI planning KR Building intelligent systems in open, heterogeneous, dynamic, distributed environments user modeling semantic web data mining machine learning AI DB Intelligent Information Systems knowledge management web services IR service oriented computing wearable computing policies Networking & Systems wireless Security mobility assurance context awareness pervasive computing intrusion detection privacy trust 5/14/2019

7 Some Current and Recent Projects
Pervasive and mobile computing (1) Trauma Pod (2) Context aware pervasive computing (3) Mogatu: Tivo for mobile computing Semantic Web (4) Swoogle: searching and indexing Semantic Web data (5) Semnews: text understanding and extraction (6) Agents and the Semantic Web (7) Spire: Semantic Web for data discovery and integration Security and trust (8) Semantic policy languages (9) Semdis: Discovering Semantic Links (10) Securing ad hoc networks (11) Privacy for passive RFID tags Information extraction and retrieval (12) Recognizing spam weblogs (13) Extracting opinions from weblogs (14) Modeling the Spread of Influence on the Blogosphere 5/14/2019

8 Semantic Web "The Semantic Web is an
extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation." -- Berners-Lee, Hendler and Lassila, The Semantic Web, Scientific American, 2001 5/14/2019

9 OntoRank for RDF documents
Doc. & Term Ranking Swoogle Search OntoRank for RDF documents TermRank for RDF terms TermRank: Ranking semantic web terms Analysis Index Discovery IR Indexer Search Services Semantic Web metadata Web Service Server Candidate URLs Bounded Web Crawler Google Crawler SwoogleBot SWD Indexer Ranking document cache SWD classifier human machine html rdf/xml Semantic Web Archive Ontology Dictionary Sem. Web Archive Swoogle Statistics SW docs. 1.6M classes 1.3M embedded 400K properties 175K triples 300M individuals 43.1M ontologies 10K registered users 417 Swoogle Triple Shop July 2006 5/14/2019 Filip Perich Contributors: Tim Finin, Li Ding, Rong Pan, Pavan Reddivari, Pranam Kolari, Akshay Java, Anupam Joshi, Yun Peng, R. Scott Cost, Jim Mayfield, Joel Sachs, and Drew Ogle. Partial research support was provided by DARPA contract F and by NSF by awards NSF-ITR-IIS and NSF-ITR-IDM April 2005. Semantic Web Navigation

10 Semantic Prototypes in Research Ecoinformatics
Ebiquity • CAIN• RMBL • Mindswap Semantic Prototypes in Research Ecoinformatics ELVIS Ecosystem Location Visualization and Information System What are likely predators and prey of an invader in a new environment? Web Ontologies For intelligent agents Species List Constructor Click a county, get a species list. Food Web Constructor Predict food web links using database and taxonomic reasoning. Evidence Provider Examine evidence for predicted links. SpireEcoConcepts Food web database concepts and results of Food Web Constructor ETHAN Evolutionary Trees and Natural History For online scientific data-bases about species and higher taxonomic levels For humans For machines Predictions In an new estuary, Nile tilapia could compete with ostracods (green) to eat algae. Predators (red) and prey (blue) of ostracods may be affected. Oreochromis niloticus Nile tilapia TripleShop A SPARQL Workshop for Triples Use the semantic web to find and share body masses of fish that eat fish. Query Enter a SPARQL query. Create a dataset Find semantic web docs that can answer query. Get results Apply query to dataset with semantic reasoning. PREFIX rdf: < PREFIX rdfs: < PREFIX spec: < PREFIX ethan: < PREFIX kw: < SELECT DISTINCT ?predator ?prey ?preymaxmass ?predatormaxmass WHERE { ?link rdf:type spec:ConfirmedFoodWebLink . ?link spec:predator ?predator . ?link spec:prey ?prey . ?predator rdfs:subClassOf ethan:Actinopterygii . ?prey rdfs:subClassOf ethan:Actinopterygii . OPTIONAL { ?prey kw:mass_kg_high ?preymaxmass } . OPTIONAL { ?predator kw:mass_kg_high ?predatormaxmass } } Esox_lucius.owl webs_publisher.php? published_study=11 Actinopterygii.owl Export results for further analysis (e.g. in Food Web Constructor). UMich Animal Diversity Web Name, tag, and share with group members or public.

11 SEMDIS Motivation Architecture
On Homeland Security and the Semantic Web: a Provenance and Trust Aware Inference Framework Semantic Association Discovery and Evaluation Motivation Architecture 2 1 Semantic association between X and Bin Laden Provenance Multiple sources contribute unique fragments of association Multiple sources confirm a fragment in different belief states Rank: only some discovered associations are interesting Trust: some information sources not sufficiently trustworthy Collaborative implementation University of Georgia Extracting knowledge from the Web Discovering complex semantic association Ranking semantic association by content UMBC Tracking provenance of semantic association Trusting semantic association by context Enabling best-first search using trust heuristics

12 SEMDIS Provenance Trust
On Homeland Security and the Semantic Web: a Provenance and Trust Aware Inference Framework Semantic Association Discovery and Evaluation Provenance 3 Trust 4 Provenance of an RDF graph or sub-graph Three sources of a RDF graph, G where-provenance: the web documents that serialize G whom-provenance: the person who created/published G why-provenance: the RDF graphs which logically imply G Trustworthiness of an RDF graph The hypothesis “Mr X is associated with Bin Laden” is proved by a four-triple semantic association (SA), how to evaluate SA’s trustworthiness. S1: eg:MrX eg:isPresidentOf eg:companyA S2: eg:organizationB eg:invests eg:companyA S3: eg:organizationB eg:isOwnedBy eg:MrY S4: eg:MrY eg:relatesTo eg:BinLaden Trust relation between agents helps propagate belief states case1: (belief concatenation) exact one source per triple case2: [belief aggregation] multiple sources for a triple case3: [social dependency] sources are dependent through social network We assume all triples are semantically independent RDF graph provenance service Observations provenance information is part of context information provenance is not required for most inference tasks provenance is useful for context based trust analysis provenance can be used to group knowledge Approach provide a stand alone service that queries provenance of a given RDF graph or sub-graph

13 SemNews: A Semantic News Framework
SemNews: A Semantic News Framework Provides RDF version of the news. WWW Web of documents Web of data Text Images Audio video Ontologies Instances triples Natural Language RDF/OWL Facts from NL Semantic Web NLP Tools Intelligent agents need knowledge and information. Majority of content on the Web remains in natural language. SW can benefit NLP tools in their language understanding task.

14 SemNews: A Semantic News Framework
SemNews: A Semantic News Framework Semantacizing RSS Semantic RSS Data Aggregators News Feeds OntoSem TMRs FR Language Processing OntoSem2OWL Dekade Editor Knowledge Editor Environment Semantic Web Tools OntoSem Ontology (OWL) TMR Inferred Triples Fact Repository Interface Ontology & Instance browser Text Search RDQL Query Swoogle Index 1 2 5 6 7 8 9 3 4 10 11 12 13 14 15 RSS Aggregator Browsing Facts View structured representation of RSS news stories. Browse facts not just news. Ontologically linked News Semantic Queries Semantic Alerts Tracking Named Entities RDQL Find news stories by browsing through the OntoSem ontology. Structured queries over text converted to RDF representation. Ontological Semantics Find stories about a specific named entity. Alerts can be specified as ontological concepts/ keywords/ RDQL queries. Subscribe to the results as an RSS feed. OntoSem to OWL NL Text OntoSem OWL Ontology Lexicon OntoSem2OWL Fact Repository TMR TMRs In OWL OntoSem is a NLP system that processes text and converts them into facts. Supported by a constructed world model encoded in an rich ontology. Its Ontology has > concepts with an average of 16 properties/concept. OntoSem2OWL is a rule based conversion engine that maps frame-based OntoSem ontology, fact repository to OWL. Over triples generated.

15 Analyzing Weblogs Blogs are an important new communication technology
Social networking + user generated content On the Blogosphere or in an Intranet (e.g., Sun, Intel) Modeling and understand blog-based systems Extracting and computing metadata Topic modeling and splog detection Community recognition; influence and information flow Opinion extraction (TREC 06) More to come: fact extraction, recognizing bias, trend mapping, event detection, monitoring for “surprises” If the Web is our common “brain”, the Blogosphere is its consciousness 5/14/2019

16 Detecting Spam Blogs: A Machine Learning Approach
What are Spam Blogs (Splogs)? Blogs hosting machine generated posts, each adding to web spam Posts have content hijacked from other blogs and/or stuffed keywords Posts with links interspersed between random text Blogs with context based ads to fool users into clicking ads (See 3) 1 A Case of Content Plagiarism (1) Original Post by Ebiquity (2) Infiltration in Search Results (3) A splog result 2 Why are Splogs a Problem? Splogs undermine ranking algorithms (See 6) Splogs water down search results (See 6) Splogs threaten the Web advertising model (See 3) Splogs indulge in “plagiarism” (See 1,2,3) Splogs skew results of social research tools (See 4) Splogs stress the Blogosphere infrastructure (See 4,5) 3 $197 “Holy Grail Of Advertising... “ “Easy Dominate Any Market, Any Search Engine, Any Keyword” This Work Formalizes the Splog Detection Problem Supervised Machine Learning Technique Training set of 1400 hand labeled examples Effectiveness of Specialized features Local Models for Fast Splog Detection Global link-based models good for (delayed) detection Precision/Recall of 87% for bag-of-words See 5/14/2019 Filip Perich Blog Features We, what, was, my, org, flickr, paper, words, me, thank, go, archives Splog Features Find, info, news, website, best, articles, perfect, Products, uncategorized, hot, Resources, inc, copyright

17 Feeds That Matter General Statistics Applications 5/14/2019
83,204 publicly listed subscribers. 2,786,687 feeds of which 496,879 are unique. 26,2436 users (35%) use folders to organize subscriptions Data collected in May 2006. Feed Recommendations Finding Influential Feeds for a Topic. Two feeds are similar if they are categorized under similar folder names. The above chart shows the feed recommendations and corresponding text based cosine similarity. The number of subscribers per feed follows a power law distribution. The distribution of domains in the Bloglines dataset. Leading Blogs on topic “Politics”. Seed set are top blogs in “politics” from bloglines and blog graph used is from Blogpulse dataset.. The number of folders per user. Most users tend to use modest number of folders. Scatter plot showing the relation between the number of folders and the number of feeds subscribed. As more feeds are subscribed users tend to organize feeds into folders. 5/14/2019

18 Feeds That Matter Tag Cloud Before Merge Tag Cloud After Merge
Tag cloud generated by using the folder names as labels (Top 200 folder names). Tag cloud generated by merging related folders (Top 200 folder names). Folder Names are used as a substitute for topics. Lower ranked folder merged into a higher ranked folder if there is an overlap and high cosine similarity. 5/14/2019

19 Opinion extraction from Blogs
Data: 3M posts from 83K blogs over two weeks Task: given a topic (e.g., March of the Penguins), find blogs posts that express an opinion about it. Some features of our system: Clean data by removing splogs (~12% posts) and non-content (e.g., ads, headers, footers, blogrolls) Use Google to find opinion words relevant to topic and induced topic category (eg, heavy=bad for digital cameras) Multiple-hand crafted heuristic scoring functions to measure opinionatedness given the topic phrase Train an SVM to learn appropriate scorer weights 5/14/2019

20 Security and Trust in Open Environments
Many new information systems are open, heterogeneous and dynamic Examples: the web, web services, P2P systems, Grid computing, pervasive computing, MANETs, etc. Providing security and privacy in such systems is challenging We can not rely on traditional authentication-based schemes Recognizing “bad actors” in such systems is hard We are exploring new approaches using computational policies, trust and reputation. 5/14/2019

21 Trust & Security for the Semantic Web
Autonomous agents need policies as “norms of behavior” In OS, networking, data management, applications, multiagent systems, pervasive environments, etc. Especially to secure complex open, distributed, dynamic environments Traditional “hard coded” rules like DB access control & file permissions depending on known entities won’t work! Trust associations based on attributes are needed Interesting issues abound, like how to Resolve conflicts among agents governed by multiple policies Enforce policies via sanctions, reputation, escalation, etc. Modify policies dynamically according to context Make policy engineering easier than software engineering 1 A robot may not injure a human being, or, through inaction, allow a human being to come to harm. 2 A robot must obey the orders given it by human beings except where such orders would conflict with the First Law. 3 A robot must protect its own existence as long as such protection does not conflict with the First or Second Law. An early policy for agents 5/14/2019

22 Rei Policy Language Developed several versions of Rei, a policy specification language, encoded in (1) Prolog, (2) RDFS, (3) OWL Used to model different kinds of policies Authorization for services Privacy in pervasive computing and the web Conversations between agents Team formation, collaboration & maintenance The OWL grounding enables policies that reason over SW descriptions of actions, agents, targets and context 5/14/2019

23 Rei Policy Language Developed several versions of the Rei policy specification language in Prolog, RDFS, & OWL Used to model different kinds of policies Authorization for services Privacy in pervasive computing and the web Conversations between agents Team formation, collaboration & maintenance The OWL grounding enables policies that reason over SW descriptions of actions, agents, targets and context XSB FLORA YAJXB USER JAVA API FOWL REI REI INTERFACE 5/14/2019

24 Applications – past, present & future
Coordinating access in supply chain management system Authorization policies in a pervasive computing environment Policies for team formation, collaboration, information flow in multi-agent systems Security in semantic web services Privacy and trust on the Internet Privacy in pervasive computing environments 1999 2002 2003 … 2004 … 5/14/2019

25 Enhancing Web Privacy via Policies and Trust
Motivation compliant Non-compliant Trust website policies W3C specified P3P Architecture Distrust website policies P3P Compliance Consumer Confidence Key Points Web Sites optionally publish P3P policies Clients specify privacy preferences using a policy language, for instance Rei Privacy Expert is the privacy enhancement enabler by binding together entities of the system Rei Engine evaluates policies of users against website attributes Website Recommender Network propagates and builds a model of websites based on reputation FOAF – Enables the creation of the website recommender network publish (optionally) Website Recommender Network Web Server DiscussionGroup serviceType 9 URI -- popularity hasP3P hasTextPolicy hasPrivacyCertifier subDomainOf isBasedOutOf hasPolicyEnforcement lawEnforcedBy USA Yes US OSDN policySimilarTo owner Website Evaluation Ontology P3P Policy Ontologies, Trust rules Personal agents XSLT Transformer Rei Engine Privacy Expert Intelligent Privacy Proxy* FOAF Rei Privacy Policy (RDF based, enhancements over APPEL) Trusted Agent Network# Clients publish 5/14/2019

26 Securing Ad-Hoc Networks
5/14/2019

27 Monitoring and Response
Active Response Framework Nodes Snoop Locally Send Signed Accusations to Other Nodes Each Node Makes Decision Locally based on Policy Accusations can be Corroborated and lead to increase in reputation False Accusations Can Be Flagged and lead to loss of reputation (or even sanctions) Nodes Can Choose Not To Communicate Through Suspected Nodes 5/14/2019

28 SWANS: Secure and Adaptive WSNs
A holistic policy driven approach to designing secure and adaptive wireless sensor networks Secure self-organization Centralized and distributed protocols State determination Parameters to define “raw” state Node-level logical construct to identify complete state Network-level logical construct to help identify global state A set of policies to adapt to changes in state 5/14/2019

29 SWANS: Secure and Adaptive WSN
5/14/2019

30 ORs will be data rich CAST
Drugs RFID RFID CAST Tools RFID Patient Monitors Staff ORs will be awash in low-level data, much of it noisy or incomplete Challenges include coping with the noise and interpreting the low-level data to recognize high-level events and activities 5/14/2019

31 System Architecture Trend Analyzer Context Aware Agent Rule RFID Base
Patient Monitor Stream Processor (TelegraphCQ) Continuous Queries Trend Analyzer Physiological Data RFID System Medicines Tools Staff Assert facts Context Aware Agent Database Patient History Medical Supplies Staff Assert facts Rule Base Video Clipper Events Medical Encounter Record 5/14/2019

32 5/14/2019

33 5/14/2019

34 5/14/2019


Download ppt "Research Overview 23 August 2006"

Similar presentations


Ads by Google