Network of Epidemiology Digital Objects Naren Sundar, Kui Xu Client: Sandeep Gupta, S.M. Shamimul CS6604 Class Project.

Slides:



Advertisements
Similar presentations
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Advertisements

CS 267: Automated Verification Lecture 10: Nested Depth First Search, Counter- Example Generation Revisited, Bit-State Hashing, On-The-Fly Model Checking.
CS171 Introduction to Computer Science II Graphs Strike Back.
More on Rankings. Query-independent LAR Have an a-priori ordering of the web pages Q: Set of pages that contain the keywords in the query q Present the.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. 1 The Architecture of a Large-Scale Web Search and Query Engine.
Distributed Search over the Hidden Web Hierarchical Database Sampling and Selection Panagiotis G. Ipeirotis Luis Gravano Computer Science Department Columbia.
1 CS 430 / INFO 430: Information Retrieval Lecture 16 Web Search 2.
Crawling the WEB Representation and Management of Data on the Internet.
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
1 MARG-DARSHAK: A Scrapbook on Web Search engines allow the users to enter keywords relating to a topic and retrieve information about internet sites (URLs)
CS 206 Introduction to Computer Science II 03 / 30 / 2009 Instructor: Michael Eckmann.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
WEB SCIENCE: SEARCHING THE WEB. Basic Terms Search engine Software that finds information on the Internet or World Wide Web Web crawler An automated program.
Enhancing Internet Search Engines to Achieve Concept- based Retrieval F. Lu, T. Johnsten, V. Raghavan, and D. Traylor.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Data Analysis in YouTube. Introduction Social network + a video sharing media – Potential environment to propagate an influence. Friendship network and.
WAES 3308 Numerical Methods for AI
Rate-based Data Propagation in Sensor Networks Gurdip Singh and Sandeep Pujar Computing and Information Sciences Sanjoy Das Electrical and Computer Engineering.
Using Hyperlink structure information for web search.
Web Categorization Crawler Mohammed Agabaria Adam Shobash Supervisor: Victor Kulikov Winter 2009/10 Design & Architecture Dec
Basics of Information Retrieval Lillian N. Cassel Some of these slides are taken or adapted from Source:
Query Routing in Peer-to-Peer Web Search Engine Speaker: Pavel Serdyukov Supervisors: Gerhard Weikum Christian Zimmer Matthias Bender International Max.
1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:
A Hybrid Approach for Searching in the Semantic Web Guide: Dr. S. N. Sivanandam Dept of Computer science & Engg P.Raja 07MW06 Final Yr ME-Software Engg.
 What is a graph? What is a graph?  Directed vs. undirected graphs Directed vs. undirected graphs  Trees vs graphs Trees vs graphs  Terminology: Degree.
COM1721: Freshman Honors Seminar A Random Walk Through Computing Lecture 2: Structure of the Web October 1, 2002.
Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
CS 533 Information Retrieval Systems.  Introduction  Connectivity Analysis  Kleinberg’s Algorithm  Problems Encountered  Improved Connectivity Analysis.
Book: Bayesian Networks : A practical guide to applications Paper-authors: Luis M. de Campos, Juan M. Fernandez-Luna, Juan F. Huete, Carlos Martine, Alfonso.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Q2Semantic: A Lightweight Keyword Interface to Semantic Search Haofen Wang 1, Kang Zhang 1, Qiaoling Liu 1, Thanh Tran 2, and Yong Yu 1 1 Apex Lab, Shanghai.
Search Engine Architecture
PEERSPECTIVE.MPI-SWS.ORG ALAN MISLOVE KRISHNA P. GUMMADI PETER DRUSCHEL BY RAGHURAM KRISHNAMACHARI Exploiting Social Networks for Internet Search.
Search Engines Reyhaneh Salkhi Outline What is a search engine? How do search engines work? Which search engines are most useful and efficient? How can.
Autumn Web Information retrieval (Web IR) Handout #1:Web characteristics Ali Mohammad Zareh Bidoki ECE Department, Yazd University
Freelib: A Self-sustainable Digital Library for Education Community Ashraf Amrou, Kurt Maly, Mohammad Zubair Computer Science Dept., Old Dominion University.
Algorithmic Detection of Semantic Similarity WWW 2005.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Post-Ranking query suggestion by diversifying search Chao Wang.
1/16/20161 Introduction to Graphs Advanced Programming Concepts/Data Structures Ananda Gunawardena.
ASSIST: Adaptive Social Support for Information Space Traversal Jill Freyne and Rosta Farzan.
Keyword Searching and Browsing in Databases using BANKS Charuta Nakhe, Arvind Hulgeri, Gaurav Bhalotia, Soumen Chakrabarti, S. Sudarshan Presented by Sushanth.
Liaoruo Wang and John E. Hopcroft Dept. of Computer Engineering & Computer Science, Cornell University In Proc. 7th Annual Conference on Theory and Applications.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
1 Discovering Web Communities in the Blogspace Ying Zhou, Joseph Davis (HICSS 2007)
Paper Presentation Social influence based clustering of heterogeneous information networks Qiwei Bao & Siqi Huang.
Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.
CS 440 Database Management Systems Web Data Management 1.
Search Engine and Optimization 1. Introduction to Web Search Engines 2.
CS 540 Database Management Systems Web Data Management some slides are due to Kevin Chang 1.
GRAPH AND LINK MINING 1. Graphs - Basics 2 Undirected Graphs Undirected Graph: The edges are undirected pairs – they can be traversed in any direction.
GUILLOU Frederic. Outline Introduction Motivations The basic recommendation system First phase : semantic similarities Second phase : communities Application.
Crawling When the Google visit your website for the purpose of tracking, Google does this with help of machine, known as web crawler, spider, Google bot,
Enhancing Internet Search Engines to Achieve Concept-based Retrieval
Search Engine Architecture
Submitted By: Usha MIT-876-2K11 M.Tech(3rd Sem) Information Technology
Information Retrieval
CS & CS Capstone Project & Software Development Project
CS 440 Database Management Systems
Keyword Searching and Browsing in Databases using BANKS
Panagiotis G. Ipeirotis Luis Gravano
Search Engine Architecture
Navigation-Aided Retrieval
Junghoo “John” Cho UCLA
3.2 Graph Traversal.
Presentation transcript:

Network of Epidemiology Digital Objects Naren Sundar, Kui Xu Client: Sandeep Gupta, S.M. Shamimul CS6604 Class Project

Outline Problem Statement Requirement Specification Related work What a human would do? Modeling Evaluation

Client Problem Statement CINET – Computational and analytic environment for network science research and education. Goal: A RDF graph building service – Web crawling for contents related to epidemiology – RDF representation to interconnect digital objects

Client Problem Statement Requirement: Build a connected network of: – Papers – Wiki pages – Websites – Videos – Other digital objects pertaining to epidemiology Representing using RDF for future graph analysis.

Refined Problem Statement Strongly connected digital objects & metadata – Include metadata to model the connection – Better represent the underlying correlation among digital objects Given a search request, provide resulting DO network: – Include all related digital objects – Strongly connected DOs are more related

Refined Problem Statement Automated process – Web exploration – Network construction Restricted digital objects – Research papers Web crawling (search engine from dblp)

Meta-data for DO Authors, key words, publisher, year Given papers {Paper1} {Paper2} {Paper3} – P1: author1, author2, publisher 1, 2011, {keywords} – P2: author2, author3, publisher 2, 2012, {keywords} – P3: author 3, publisher 2, 2011, {keywords} The resulting network through the meta-data: author1 author2 author3 Paper2 Paper1 Paper3 publisher1publisher2

Requirement Specification Undirected crawling or normal search engine not sufficient: – Results un-organized – Little specialization – Ambiguity – Relations and connections not available

Requirement Specification What’s availableWhat we want

Related Work Web crawling topics – Building efficient, robust and scalable crawler – Traversal order of the web graph – Re-visitation of previously crawled content – Avoid problematic and undesirable content – Crawling “deep web” content Christopher Olston and Marc Najork Web Crawling. Found. Trends Inf. Retr. 4, 3 (March 2010),

Why can a human do this better? A user is interested in “Disease propagation” 1.Set N = {Disease propagation} 2.Search using N and gets R 3.Splits R into A and B 4.Selects A as relevant; extracts terms and connections; augments N 5.Ignores or stashes away B 6.Repeat from 1 with updated network N

Desired Property 1: Connected Growth A relevant paper shares nodes and edges in the network N Not all of the paper becomes relevant at once What is shared changes over time Author1 Keyword1Abstract1Coauthor1Keyword2

Desired Property 2: Grouping Only a small fraction of a document is shared with network N The rest is used to group papers Paper1 Paper2 Paper3 Paper4 Paper5

Modeling: Digital Object Is represented as a set of edges – D i = {(x 1,y 1 ),…, (x n,y n )} Connections are determined by predicates over kind of vertex – (x,y): Paper p, authored by x, is published in y From a generative point-of-view, an edge in a paper comes from – A shared network – Or, a local network

Modeling: Split-View of Digital Object A paper’s edges can be split into two parts – D i = S i + E i – Shared set: S i – Local set: E i EiEi SiSi SiSi DiDi

Modeling: Grouping Each D i belongs to a group Groups are determined w.r.t. unshared edges E i No need to determine number of groups beforehand E i in G k SiSi SiSi DiDi

Modeling: Network Connectivity – Allow (x,y) only if x is reachable from root nodes Smoothness – Allow (x,y) only if path connecting to x from root has decreasing weights Edge is either shared or local based on satsifying connectivity and smoothness constraints when shared.

The Entire Process Model 1.Start with root nodes, e.g., “Disease propagation” in N 2.Generate search query from N 3.Crawl and add new digital objects 4.Learn random variables for sharing (Si, Ei) and grouping (Gi) 5.Goto step 2 Human 1.Set N = {Disease propagation} 2.Search using N and gets R 3.Splits R into A and B 4.Selects A as relevant; extracts terms and connections; augments N 5.Ignores or stashes away B 6.Repeat from 1 with updated network N

Evaluation: Against a Search Engine Goal: Relevance Set a starting topic Perform k related queries – Let R i be the sets of query results ranked by the search engine – Learn network N – For each paper p selected by N Get best rank of p according to R i – Compare ranks against relevance determined by N Are there lowly ranked results that are higher in N and vice versa?

Evaluation: Against a Digital Library Goal: Completeness Pick a digital library for digital objects with a categorical browsing service Pick a starting point – Learn network N – Evaluate completeness of network N against the digital library

Thank You! Questions?