Lada Adamic, HP Labs, Palo Alto, CA. Talk outline Information flow through blogs Information flow through email Search through email networks Search within.

Slides:



Advertisements
Similar presentations
Routing in Poisson small-world networks A. J. Ganesh Microsoft Research, Cambridge Joint work with Moez Draief.
Advertisements

Peer-to-Peer and Social Networks Power law graphs Small world graphs.
Complex Networks Advanced Computer Networks: Part1.
Mobile Communication Networks Vahid Mirjalili Department of Mechanical Engineering Department of Biochemistry & Molecular Biology.
1 Analyzing Kleinberg’s Small-world Model Chip Martel and Van Nguyen Computer Science Department; University of California at Davis.
Meme spread in networks Matthew Simmons, Lada Adamic, Eytan Adar School of Information University of Michigan.
Online Social Networks and Media Navigation in a small world.
School of Information University of Michigan Network resilience Lecture 20.
P2P Topologies Centralized Ring Hierarchical Decentralized Hybrid.
Social Networks 101 P ROF. J ASON H ARTLINE AND P ROF. N ICOLE I MMORLICA.
Information Networks Small World Networks Lecture 5.
Advanced Topics in Data Mining Special focus: Social Networks.
Identity and search in social networks Presented by Pooja Deodhar Duncan J. Watts, Peter Sheridan Dodds and M. E. J. Newman.
Lecture 7 CS 728 Searchable Networks. Errata: Differences between Copying and Preferential Attachment In generative model: let p k be fraction of nodes.
CS 599: Social Media Analysis University of Southern California1 The Basics of Network Analysis Kristina Lerman University of Southern California.
School of Information University of Michigan SI 614 Search in structured networks Lecture 15.
Company LOGO 1 Identity and Search in Social Networks D.J.Watts, P.S. Dodds, M.E.J. Newman Maryam Fazel-Zarandi.
Farnoush Banaei-Kashani and Cyrus Shahabi Criticality-based Analysis and Design of Unstructured P2P Networks as “ Complex Systems ” Mohammad Al-Rifai.
Search in Power-Law Networks Presented by Hakim Weatherspoon CS294-4: Peer-to-Peer Systems Slides also borrowed from the following paper Path Finding Strategies.
Complex Networks Third Lecture TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA TexPoint fonts used in EMF. Read the.
Small-World Graphs for High Performance Networking Reem Alshahrani Kent State University.
Small Worlds Presented by Geetha Akula For the Faculty of Department of Computer Science, CALSTATE LA. On 8 th June 07.
Lada Adamic School of Information University of Michigan, Ann Arbor.
Structure of Information Pathways in a Social Communication Network Gueorgi KossinetsJon Kleinberg Duncan Watts.
1 Complex systems Made of many non-identical elements connected by diverse interactions. NETWORK New York Times Slides: thanks to A-L Barabasi.
1 Analyzing Kleinberg’s (and other) Small-world Models Chip Martel and Van Nguyen Computer Science Department; University of California at Davis.
INFERRING NETWORKS OF DIFFUSION AND INFLUENCE Presented by Alicia Frame Paper by Manuel Gomez-Rodriguez, Jure Leskovec, and Andreas Kraus.
Advanced Topics in Data Mining Special focus: Social Networks.
Implicit Structure and Dynamics of BlogSpace Eytan Adar, Li Zhang, Lada Adamic, & Rajan Lukose HP Labs, Palo Alto, CA.
1 Analyzing Kleinberg’s (and other) Small-world Models Chip Martel and Van Nguyen Computer Science Department; University of California at Davis.
School of Information University of Michigan SI 614 Small Worlds Lecture 5 Instructor: Lada Adamic.
Lecture 18: Small World Networks CS 790g: Complex Networks
Optimization Based Modeling of Social Network Yong-Yeol Ahn, Hawoong Jeong.
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 9.1 Chapter 9 : Social Networks What is a social.
Social Media Facebook, Twitter, Google+, etc.. What is Social Technology?  Communication tools  Interactive tools  Examples?
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Eric Horvitz, Michael Mahoney,
School of Information University of Michigan Unless otherwise noted, the content of this course material is licensed under a Creative Commons Attribution.
Small World Social Networks With slides from Jon Kleinberg, David Liben-Nowell, and Daniel Bilar.
Small-world networks. What is it? Everyone talks about the small world phenomenon, but truly what is it? There are three landmark papers: Stanley Milgram.
School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16 th, 2006.
School of Information University of Michigan Unless otherwise noted, the content of this course material is licensed under a Creative Commons Attribution.
COLOR TEST COLOR TEST. Social Networks: Structure and Impact N ICOLE I MMORLICA, N ORTHWESTERN U.
Complex Networks First Lecture TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA TexPoint fonts used in EMF. Read the.
Gennaro Cordasco - How Much Independent Should Individual Contacts be to Form a Small-World? - 19/12/2006 How Much Independent Should Individual Contacts.
Online Social Networks and Media
Navigation in small worlds Social Networks: Models and Applications Seminar Toronto, Fall 2007 (based on a presentation by Stratis Ioannidis)
The new protocol of freenet Taken from Ian Clarke and Oskar Sandberg (The Freenet Project)
Complex Network Theory – An Introduction Niloy Ganguly.
Lecture 10: Network models CS 765: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.
Complex Network Theory – An Introduction Niloy Ganguly.
Most of contents are provided by the website Network Models TJTSD66: Advanced Topics in Social Media (Social.
Lecture 17: Search in structured networks CS 765 Complex Networks Slides are modified from Lada Adamic.
With each device or application that expands the bandwidth of available information, the computer ’ s understanding of us remains unchanged.
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
Small World Social Networks With slides from Jon Kleinberg, David Liben-Nowell, and Daniel Bilar.
Performance Evaluation Lecture 1: Complex Networks Giovanni Neglia INRIA – EPI Maestro 10 December 2012.
Urban Traffic Simulated From A Dual Perspective Hu Mao-Bin University of Science and Technology of China Hefei, P.R. China
© 2004 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Implicit Structure and Dynamics of.
Internet Economics כלכלת האינטרנט Class 9 – social networks (based on chapter 3 from Easely & Kleinberg’s books) 1.
Models and Algorithms for Complex Networks
Class 4: It’s a Small World After All Network Science: Small World February 2012 Dr. Baruch Barzel.
Topics In Social Computing (67810) Module 1 Introduction & The Structure of Social Networks.
Social Networks Some content from Ding-Zhu Du, Lada Adamic, and Eytan Adar.
Topics In Social Computing (67810)
Search in structured networks
Identity and Search in Social Networks
Peer-to-Peer and Social Networks
Navigation and Propagation in Networks
Advanced Topics in Data Mining Special focus: Social Networks
Presentation transcript:

Lada Adamic, HP Labs, Palo Alto, CA

Talk outline Information flow through blogs Information flow through Search through networks Search within the enterprise Search in an online community

Implicit Structure and Dynamics of BlogSpace Eytan Adar, Li Zhang, Lada Adamic, & Rajan Lukose Blog use: –Record real-world and virtual experiences –Note and discuss things “seen” on the net Blog structure: blog-to-blog linking Use + Structure –Great to track “memes” (catchy ideas)

Approaches and uses of blog analysis Patterns of information flow –How does the popularity of a topic evolve over time? –Who is getting information from whom? Ranking algorithms that take advantage of transmission patterns

Popularity Time Slashdot Effect BoingBoing Effect Tracking popularity over time Blogdex, BlogPulse, etc. track the most popular links/phrases of the day

Different kinds of information have different popularity profiles Products, etc. Major-news site (editorial content) – back of the paper % of hits received on each day since first appearance Slashdot postings Front-page news

Micro example: Giant Microbes

Microscale Dynamics What do we need track specific info ‘epidemics’? –Timings –Underlying network b1b1 b1b1 Time of infection t0t0 t1t1 b2b2 b2b2 b3b3 b3b3

Microscale Dynamics Challenges –Root may be unknown –Multiple possible paths –Uncrawled space, alternate media ( , voice) –No links b1b1 b1b1 Time of infection t0t0 t1t1 b2b2 b2b2 b3b3 b3b3 ? ? bnbn bnbn

Microscale Dynamics who is getting info from whom Explicit blog to blog links (easy) –Via links are even better Implicit/Inferred transfer (harder) –Use ML algorithm for link inference problem Support Vector Machine (SVM) Logistic Regression –What we can use Full text Blogs in common Links in common History of infection

Visualization Zoomgraph tool –Using GraphViz (by AT&T) layouts Simple algorithm –If single, explicit link exists, draw it –Otherwise use ML algorithm Pick the most likely explicit link Pick the most likely possible link Tool lets you zoom around space, control threshold, link types, etc.

Giant Microbes epidemic visualization via link explicit link inferred linkblog

iRank Find early sources of good information using inferred information paths or timing b1b1 b1b1 b2b2 b2b2 b3b3 b3b3 b4b4 b4b4 b5b5 b5b5 bnbn bnbn … True source Popular site

iRank Algorithm Draw a weighted edge for all pairs of blogs that cite the same URL higher weight for mentions closer together run PageRank control for ‘spam’ Time of infection t0t0 t1t1

Do Bloggers Kill Kittens? 02:00 AM Friday Mar. 05, 2004 PST Wired publishes:Wired "Warning: Blogs Can Be Infectious.” 7:25 AM Friday Mar. 05, 2004 PST Slashdot posts:Slashdot "Bloggers' Plagiarism Scientifically Proven" 9:55 AM Friday Mar. 05, 2004 PST Metafilter announcesMetafilter "A good amount of bloggers are outright thieves."

Information flow in social groups Fang Wu, Bernardo Huberman, Lada Adamic, Joshua Tyler

co-worker mike mom college friend Spread of disease is affected by the underlying network

co-worker mike mom college friend Spread of computer viruses is affected by the underlying network

Viruses (computer and otherwise) are shared indiscriminately (involuntarily) Information is passed selectively from one host to another based on knowledge of the recipient’s interests Difference between information flow and disease/virus spread

co-worker mike mom college friend Spread of information is affected by its content, potential recipients, and network topology

homophily: individuals with like interests associate with one another personal homepages at Stanford distance between personal homepages

The Model: Decay in transmission probability as a function of the distance m between potential target and originating node T (m) = (m+1) -  Tm=0 m=1 m=2 power-law implies slowest decay

Degree distribution of all senders of passing through the HP server outdegree k Virus, information transmission on a scale free network P(k)

 critical threshold  = ,  =0  =100,  =0  =100,  = nodes, epidemic if 1% (10 4 ) infected Pastor-Satorras & Vespignani (2001) epidemics on scale free graphs Newman (2002) Wu et al. (2004)

40 participants (30 within HPL, 10 elsewhere in HP & other orgs) 6370 URLs and 3401 attachments crypotgraphically hashed Question: How many recipients in our sample did each item reach? caveats: messages are deleted (still, the median number of messages > 2000) non-uniform sample Study of the spread of URLs and attachments

forwarded URLs forwarded message Only forwarded messages are counted

short term expense control ads at the bottom of hotmail & yahoo messages average = 1.1 for attachments, and 1.2 for URLs Results

02/19/200315:45:33I-1I-2 02/19/200315:45:33I-1I-3 02/19/200315:45:40E-1I-4 02/19/200315:45:52I-5E-2 02/19/200315:45:55E-3I-6 02/19/200315:45:58I-7I-8 02/19/200315:46:00E-4I-9 02/19/200315:46:05I-10I-11 02/19/200315:46:10I-12I-13 02/19/200315:46:10I-12I-14 02/19/200315:46:10I-12I-15 02/19/200315:46:14I-16E Simulate transmission on log each message has a probability p of transmitting information from an infected individual to the recipient internal node external node

Simulation of information transmission on the actual HP Labs graph an individual is infected if they receive a particular piece of information individuals remain infected for 24 hours start by infecting one individual at random every time an infected individual sends an they have a probability p of infecting the recipient track epidemic over the course of a week, most run their course in 1-2 days

Introduce a decay in the transmission probability based on the hierarchical distance distance 1 distance 2 distance 1 A B h AB = 5

7119 potential recipients p0p0

Conclusions on info flow in social groups Information spread typically does not reach epidemic proportions Information is passed on to individuals with matching properties The likelihood that properties match decreases with distance from the source Model gives a finite threshold Results are consistent with observed URL & attachment frequencies in a sample Simulations following real patterns also consistent

NE MA Milgram’s experiment: Given a target individual and a particular property, pass the message to a person you correspond with who is “closest” to the target. How to search in a small world

Small world experiment at Columbia Dodds, Muhamad, Watts, Science 301, (2003) experiement conducted in targets in 13 different countries 24,163 message chains 384 reached their targets average path length 4.0

Why study small world phenomena? Curiosity: Why is the world small? How are people able to route messages? Social Networking as a Business: Friendster, Orkut, MySpace LinkedIn, Spoke, VisiblePath

Six degrees of separation - to be expected Pool and Kochen (1978) - average person has acquaintances Ignoring clustering, other redundancy … ~ 10 3 first neighbors, 10 6 second neighbors, 10 9 third neighbors But networks are clustered: my friends’ friends tend to be my friends Watts & Strogatz (1998) - a few random links in an otherwise clustered graph give an average shortest path close to that of a random graph

How to choose among hundreds of acquaintances? Strategy: Simple greedy algorithm - each participant chooses correspondent who is closest to target with respect to the given property Models geography Kleinberg (2000) hierarchical groups Watts, Dodds, Newman (2001), Kleinberg(2001) high degree nodes Adamic, Puniyani, Lukose, Huberman (2001), Newman(2003) But how are people are able to find short paths?

Kleinberg (2000) nodes are placed on a lattice and connect to nearest neighbors additional links placed with f(d)~ d(u,v) -r if r = 2, can search in polylog (< (logN) 2 ) time Spatial search “The geographic movement of the [message] from Nebraska to Massachusetts is striking. There is a progressive closing in on the target area as each new person is added to the chain” S.Milgram ‘The small world problem’, Psychology Today 1,61,1967

Kleinberg: searching hierarchical structures ‘Small-World Phenomena and the Dynamics of Information’, NIPS 14, 2001 Hierarchical network models: h is the distance between two individuals in hierarchy with branching b f(h) ~ b -  h If  = 1, can search in O(log n) steps Group structure models: q = size of smallest group that two individuals belong to f(q) ~ q -  If  = 1, can achieve in O(log n) steps

Identity and search in social networks Watts, Dodds, Newman (2001) individuals belong to hierarchically nested groups multiple independent hierarchies coexist p ij ~ exp(-  x)

Identity and search in social networks Watts, Dodds, Newman (2001) There is an attrition rate r Network is ‘searchable’ if a fraction q of messages reach the target N= N= N=204800

Mary Bob Jane Who could introduce me to Richard Gere? High degree search Adamic et al. Phys. Rev. E, (2001)Phys. Rev. E, (2001)

number of nodes found power-law graph

93 number of nodes found Poisson graph

size of graph covertime for half the nodes random walk  = 0.37 fit degree sequence  = 0.24 fit Scaling of search time with size of graph Sharp cutoff at k~N 1/   2 nd degree neighbors

Use a well defined network: HP Labs correspondence over 3.5 months Edges are between individuals who sent at least 6 messages each way Node properties specified: degree geographical location position in organizational hierarchy Can greedy strategies work? Testing the models on social networks ( w/ Eytan Adar)

Degree distribution of all senders of passing through the HP server Strategy 1: High degree search outdegree

Filtered network (6 messages sent each way) 450 users median degree = 10 mean degree = 13 average shortest path = 3 High degree search performance (poor): median # steps = 16 mean = 40 Degree distribution no longer power-law, but Poisson

Strategy 2: Geography

1U 2L3L 3U 2U 4U 1L 87 % of the 4000 links are between individuals on the same floor Communication across corporate geography

Cubicle distance vs. probability of being linked optimum for search

Finding someone in a sea of cubicles median = 7 mean = 12

Strategy 3: Organizational hierarchy

correspondence scrambled

Actual correspondence

Example of search path distance 1 distance 2 hierarchical distance = 5 search path distance = 4 distance 1

Probability of linking vs. distance in hierarchy in the ‘searchable’ regime: 0 <  < 2 (Watts 2001)

Results distancesearchgeodesicorgrandom median43628 mean5.7 (4.7)

Group size vs. probability of linking

optimum for search (Kleinberg 2001) Group size and probability of linking group size g

Search Conclusions Individuals associate on different levels into groups. Group structure facilitates decentralized search using social ties. HP Labs as a social network is searchable but not quite optimal. searching using the organizational hierarchy is faster than using physical location A fraction of ‘important’ individuals are easily findable Humans may be much more resourceful in executing search tasks: making use of weak ties using more sophisticated strategies

PeopleFinder 2 – a search engine for HP people Live Demo If live demo fails: Current PeopleFinder functionality PeopleFinder 2 info on a person Extracted topics for a person Social network Social network visualization Search for individuals by topic Visualize knowledge network Find social network paths to experts Extract & disambiguate names from publicly available documents Enrich information available about individuals Search for them by topic Identify knowledge communities from co-occurrence of names

To find out more: (papers, slides, other research in the group) Information dynamics group (IDL) at HP Labs: List of publications