Human Wayfinding in Information Networks

Slides:



Advertisements
Similar presentations
Taking the Password Test for English Language Schools.
Advertisements

AIMSweb Progress Monitor Online User Training
Linked data: P redicting missing properties Klemen Simonic, Jan Rupnik, Primoz Skraba {klemen.simonic, jan.rupnik,
AUTOMATIC VS. HUMAN NAVIGATION IN INFORMATION NETWORKS Based on the article by Robert West and Jure Leskovec, Computer Science Department Stanford University,
CSE 380 – Computer Game Programming Pathfinding AI
LYRIC-BASED ARTIST NETWORK METHODOLOGY Derek Gossi CS 765 Fall 2014.
Identity and search in social networks Presented by Pooja Deodhar Duncan J. Watts, Peter Sheridan Dodds and M. E. J. Newman.
Oz Shaharabani. Study topic Detailed study of network evolution by analyzing four large online social networks with full temporal information about node.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
Introduction Information Management systems are designed to retrieve information efficiently. Such systems typically provide an interface in which users.
Introduction to PageRank Algorithm and Programming Assignment 1 CSC4170 Web Intelligence and Social Computing Tutorial 4 Tutor: Tom Chao Zhou
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Efficient and Robust Computation of Resource Clusters in the Internet Efficient and Robust Computation of Resource Clusters in the Internet Chuang Liu,
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
Vector Space Model CS 652 Information Extraction and Integration.
Recommender systems Ram Akella November 26 th 2008.
Vocabulary Spectral Analysis as an Exploratory Tool for Scientific Web Intelligence Mike Thelwall Professor of Information Science University of Wolverhampton.
Information Retrieval
Chapter 5.4 Artificial Intelligence: Pathfinding.
© English Language Testing Ltd Taking the Password Knowledge with Reading and Writing Test.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
CS344: Introduction to Artificial Intelligence Vishal Vachhani M.Tech, CSE Lecture 34-35: CLIR and Ranking in IR.
Lecture # 31 Category Trees. Binary Trees 16 How many steps to reach a leaf? 4.
Artificial Intelligence in Game Design Problems and Goals.
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Eric Horvitz, Michael Mahoney,
The identification of interesting web sites Presented by Xiaoshu Cai.
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:
© English Language Testing Ltd Taking the Password with Reading and Writing Test.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
1 Computing Relevance, Similarity: The Vector Space Model.
Online Social Networks and Media
CHECKERS: TD(Λ) LEARNING APPLIED FOR DETERMINISTIC GAME Presented By: Presented To: Amna Khan Mis Saleha Raza.
Mining Binary Constraints in Feature Models: A Classification-based Approach Yi Li.
Web- and Multimedia-based Information Systems Lecture 2.
The Structure of the Web. Getting to knowing the Web How big is the web and how do you measure it? How many people use the web? How many use search engines?
CC P ROCESAMIENTO M ASIVO DE D ATOS O TOÑO 2014 Aidan Hogan Lecture IX: 2014/05/05.
CIS 530 Lecture 2 From frequency to meaning: vector space models of semantics.
Post-Ranking query suggestion by diversifying search Chao Wang.
Leveraging Knowledge Bases for Contextual Entity Exploration Categories Date:2015/09/17 Author:Joonseok Lee, Ariel Fuxman, Bo Zhao, Yuanhua Lv Source:KDD'15.
1 CS 430: Information Discovery Lecture 5 Ranking.
NETWORK FLOWS Shruti Aggrawal Preeti Palkar. Requirements 1.Implement the Ford-Fulkerson algorithm for computing network flow in bipartite graphs. 2.For.
CSCE 552 Fall 2012 AI By Jijun Tang. Homework 3 List of AI techniques in games you have played; Select one game and discuss how AI enhances its game play.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
XRANK: RANKED KEYWORD SEARCH OVER XML DOCUMENTS Lin Guo Feng Shao Chavdar Botev Jayavel Shanmugasundaram Abhishek Chennaka, Alekhya Gade Advanced Database.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
Automated Information Retrieval
DOGMA: A Disk-Oriented Graph Matching Algorithm for RDF Databases
Effects of User Similarity in Social Media Ashton Anderson Jure Leskovec Daniel Huttenlocher Jon Kleinberg Stanford University Cornell University Avia.
Concept Map: Clustering Visualizations of Categorical Domains
Aidan Hogan CC Procesamiento Masivo de Datos Otoño 2017 Lecture 7: Information Retrieval II Aidan Hogan
Identity and Search in Social Networks
Predicting Positive and Negative Links in Online Social Networks
Aidan Hogan CC Procesamiento Masivo de Datos Otoño 2018 Lecture 7 Information Retrieval: Ranking Aidan Hogan
Definition In simple terms, an algorithm is a series of instructions to solve a problem (complete a task) We focus on Deterministic Algorithms Under the.
Dijkstra’s Algorithm We are given a directed weighted graph
בעיות נוספות ב-NPC.
Marina Kogan Sadetsky –
Information Organization: Clustering
Navi 下一步工作的设想 郑 亮 6.6.
Department of Computer Science University of York
From frequency to meaning: vector space models of semantics
CS224w: Social and Information Network Analysis
國立雲林科技大學 National Yunlin University of Science and Technology
Year 5 (National Numeracy Strategy) (Based on DFEE Sample Lessons)
Year 5 (National Numeracy Strategy) (Based on DFEE Sample Lessons)
Knowledge Representation
Text Categorization Berlin Chen 2003 Reference:
INF 141: Information Retrieval
From Unstructured Text to StructureD Data
Presentation transcript:

Human Wayfinding in Information Networks Presented by Ori Yaish -Jure Leskovec Computer Science Department Stanford University jure@cs.stanford.edu -Robert West Computer Science Department Stanford University west@cs.stanford.edu

Navigation Finding a path between two nodes, from “start” node to a “target” node when we use only local information. Local information - seeing only current and previously visited nodes as well as their direct neighbors.

What do you think is a good online resource to understand how humans navigate

Why Wikipedia? 1. Hyperlinks. 2. Representing human knowledge. הקישורים עוזרים לנו לנווט ברגע שזה ידע אנושי זה נותן יותר מקום לאינטואיציה

Present work Wikispeedia: we use the online human-computation game Wikispeedia ,in which player are given two random articles and aim to solve the task of navigating from one to the other by clicking as few hyperlinks as possible. http://cs.mcgill.ca/~rwest/wikispeedia/ להציג דוגמא למשחק מ DIK-DIK to ALBERT EINSTEIN

Contributions: Providing important insights about the methods used by information seekers and human efficiency. predicting what piece of information the information seeker is trying to locate.

EFFICIENCY OF HUMAN SEARCH Black- shortest possible paths Blue –effective human paths( ignoring back clicks) Red- complete human paths (including back clicks) להסביר על הגרף ועל כל הצבעים בפרטי פרטים: שחור- מסלול קצר ביותר אדום – מסלול של אדם שהחישוב כולל לחיצות אחורת כחול – אדם- חישוב לא כולל לחיצות אחורה. -לומר שאצל בני אדם יש שונות לומר שנראה שבני אדם יעילים. לשאול למה השונות?

What causes the high variance Hardness of mission? Individual skills?

And the answer is…both. הגרף מראה את שניהם: 1) רואים שבכל המשחקים עדין קיימת שונות למרות שהם spl 3 אז זה מצדד ביכולות אישיות 2) ירוק מול כחול מראה שהמשחק הירוק יותר קשה למרות ששניהם בעלי spl 3 אז זה מצדד בקושי

The second question: Why is human search so efficient on average?

What about drop-outs? 54% of all games in the data set were canceled before finishing. The probability of giving up at every step is around 10%. להסביר שמכיוון שיש פרישות ומספר הפורשים הולך וגדל ככל שהמסלול ארוך אז נדרש תיקון.

Effect of correcting for drop-outs אנחנו רואים שלמרות תיקון ה"פורשים" עדין היעילות טובה. -We see the effect on mode/median is small. Green – drop out corrected

EFFICIENCY OF HUMAN SEARCH Another explanation: Wikipedia graph is efficiently navigable for humans because they have an intuition about what links to expect.

It’s hard to formalize human node distance measure… אבל אנחנו צריכים לפרמל את אותה אינטואציה. באיזה שיטה נוכל לחשב זאת?

TF-IDF (a reminder) We define the similarity of two articles as the cosine of their TF-IDF vectors and the distance as (1-similarity).

TF-IDF (a reminder) term frequency tf(t,d)- the number of times that term t occurs in document d. inverse document frequency is a measure of how much information the word provides, that is, whether the term is common or rare across all documents.

TF-IDF (a reminder) :Then TF–IDF is calculated as

TF-IDF (a reminder)

TF-IDF (a reminder)

Elements of human wayfinding They investigate how some key features change as games progress from the start towards the target article. List of features: Average degree of an article at every step. Lucrative degree- the number of outgoing links that decrease the SPL to the target. TF-IDF distance. … AVERAGE DEGREE - כלומר כמה הקשת היא מרכזית----רכז קשת משמעותית כלומר כמה היא מקרבת אותנו אל היעד.

Elements of human wayfinding 1) Average degree: רואים שכבר בצעד הראשון עבור כל המשחקים הדרגה הממוצעת עולה ל80-100

Elements of human wayfinding 2) Lucrative degree- the number of outgoing links that decrease the SPL to the target.

Elements of human wayfinding 3) TF-IDF distance. יש יותר דמיון בסוף המסלול

Elements of human wayfinding So what are the most important factors in human wayfinding in Wikipedia?

Elements of human wayfinding Most important factors: 1) Lucrative & average Degree – in the beginning, when finding a good hub is important. 2) Similarity – later, when homing in is important.

Endgame strategy Full category & top-level category(C) Example: The full category of DIK-DIK is SCIENCE/BIOLOGY/MAMMALS, and C(DIK-DIK) = SCIENCE. endgame strategy corresponding to an endgame (u[n−2],u[n−1],u[n]) is defined as: (C(u[n−2]),C(u[n−1]),C(u[n]))

Endgame strategy There are two approaches: ‘simple’ strategy: people tend to approach the target through articles from the same category as the target. ‘multi-category’ strategy: people tend to approach the target through a category not as the target. Which is more efficient? לשאול את המרצה.

Endgame strategy Overhead = (l−l∗)/l∗ From left to right: PEOPLE, MUSIC, IT, LANGUAGE AND LITERATURE, HISTORY, SCIENCE, RELIGION, DESIGN AND TECHNOLOGY, CITIZENSHIP, ART, BUSINESS STUDIES,MATHEMATICS, EVERYDAY LIFE, GEOGRAPHY.

Endgame strategy - Conclusion Reaching the target by the ‘simple’ strategy is more safe & conceptually simple but can prolong the game. It often pays off to think out of the box

Target prediction

Target prediction Designing a learning algorithm for predicting an information seeker’s target, given only a prefix(q) of a few clicks.

Target prediction P(q|t;Θ), Designing a learning algorithm for predicting an information seeker’s target, given only a prefix(q) of a k-1 clicks. So we define the likelihood of t given the prefix q as : P(q|t;Θ),

Human Markov model The most likely target is: Multiplying the local click probabilities

Target prediction Two models of click probability: Binomial logistic model – 2) Multinomial logistic model – לדבר על features לדבר על כך שהמכנה הוא בעצם סה"כ השכנים של u(i) לדבר עם המרצה על השינוי בין שתי המשוואות.

Target prediction Features for learning: Local features:

Target prediction

Target prediction Bold – multi Thin – binomial Dashed – tf-idf Performance of the target prediction algorithms: Given a prefix q and a choice of two targets. הגרף השמאלי הוא בחירה בין שני מאמרים...אחד נכון ואחד שגוי. הגרף הימני הוא בחירה בין כל מאמרים שבעלי אותה דרגה. לדבר עם המרצה לגבי cumulative

Target prediction Bold – multi Thin – binomial Dashed – tf-idf Performance of the target prediction algorithms: 2. Given a prefix q and a set of articles. הגרף השמאלי הוא בחירה בין שני מאמרים...אחד נכון ואחד שגוי. הגרף הימני הוא בחירה בין כל מאמרים שבעלי אותה דרגה. לדבר עם המרצה לגבי cumulative

Conclusions studying more than 30,000 goal-directed human search paths and identify aggregate strategies people use when navigating information spaces. In the opening of games it is common to navigate through hubs, but later through similarity. building a predictive model of human wayfinding that can be applied towards intelligent browsing interfaces.