Download presentation
Presentation is loading. Please wait.
1
From Research of Social Media to Socially Mediated Research 2010 HCIL Symposium Workshop - UMD Government Applications of Social Media Networks and Communities May 28, 2010 Natasa Milic-Frayling Microsoft Research Cambridge
2
Outline Microsoft Research. Integrated Systems team, research areas and approach ‘Social’ as a research topic: Modelling Human to Human Interaction in Technology Mediated Communities ‘Social’ as facilitator of research Leveraging Communities of Practice.
3
Microsoft Research (MSR) MSR Sites –Redmond, Washington (September 1991) –San Francisco, California (June 1995) –Cambridge, United Kingdom (July 1997) –Beijing, China(November 1998) –Silicon Valley, California (July 2001) –Bangalore, India (January 2005) –Cambridge, Massachusetts(July 2008) MSR New England MSR Asia MSR India Redmond MSR Cambridge Silicon Valley
4
WEB AND ON-LINE COMMUNITIES CONTENT ANALYSIS AND RICH UI MOBILE AND CROSS PLATFORM MEDIA Information retrieval & NLP Academic Disciplines Research Areas Machine Learning and Statistics Mathematical Modelling Graph Theory and Analysis HCI and Design Academic Disciplines
5
WEB AND ON-LINE COMMUNITIES CONTENT ANALYSIS AND RICH UI MOBILE AND CROSS PLATFORM MEDIA Information retrieval & NLP Team Research Areas Machine Learning and Statistics Mathematical Modelling Graph Theory and Analysis HCI and Design GabriellaJanezAnnikaRachelGerardNatasaEduarda GavinJamie
6
WEB AND ON-LINE COMMUNITIES CONTENT ANALYSIS AND RICH UI MOBILE AND CROSS PLATFORM MEDIA Information retrieval & NLP Academic Disciplines Research Areas Machine Learning and Statistics Mathematical Modelling Graph Theory and Analysis HCI and Design GabriellaJanezAnnikaRachelGerardNatasaEduarda GavinJamie Vinay Aleks Ignjatovic Ben Shneiderman Elizabeth Bosnignore Cody Dunn Dana Rotman Marc Smith Derek Hansen Tom Lee Team
7
WEB AND ON-LINE COMMUNITIES CONTENT ANALYSIS AND RICH UI MOBILE AND CROSS PLATFORM MEDIA Research Areas InSite Live Web site structure analysis and decomposition into subsites Social Footprints Analysis of social interaction in online communities NodeXL Interactive graph analysis and visualization. Research Desktop Research in information management and tagging practices in the Desktop environment Social IR Extension of IR models with social network and models of approval, trust and reputation. weConnect Investigating narrow-cast of personalized content in close relationships and potential for mobile advertising. VideoSnaps Investigating concepts and services for cross platform media editing and streaming. Projects
8
Methodology – how to develop mobile and social applications. Integration with the ecosystem – pre-requisites for adoption Research Platforms WEB AND ON-LINE COMMUNITIES CONTENT ANALYSIS AND RICH UI MOBILE AND CROSS PLATFORM MEDIA Research Areas InSite Live Web site structure analysis and decomposition into subsites Social Footprints Analysis of social interaction in online communities NodeXL Interactive graph analysis and visualization. Research Desktop Research in information management and tagging practices in the Desktop environment Social IR Extension of IR models with social network and models of approval, trust and reputation. weConnect Investigating narrow-cast of personalized content in close relationships and potential for mobile advertising. VideoSnaps Investigating concepts and services for cross platform media editing and streaming. Projects Connect the quantitative analyses with the qualitative analyses. Principles, mechanisms, and tools for knowledge management. Trust and reputation. Shared summaries and overviews.
9
INTERACTIONS IN TECHNOLOGY MEDIATED COMMUNTIES social as a research topic
10
Community Question-Answering Online Communities Web Boards Question Answering Distribution Lists Forums Newsgroups Blogs 2003 2006 2005 2002-06 2002 2008 2006
11
Community Question-Answering Question Answers
15
Content Organization, Browsing and Search Topic categories Tags
16
100 Most Frequent Tags on Live QnA
17
Politics
18
100 Most Frequent Tags on Live QnA Fun, Life, People, Philosophy
19
Community Analysis and Health Index Towards a sustainable community Support novice users in becoming active community participants Support frequent users in increasing the volume and quality of their content contributions Promote high quality contributions (for external exploitation – through search). 85% of new users start with a question 72% never ask a question again 5% will engage in answering 61% of questions from new users don’t get more than 1 answer (23% get 0 answers)
20
Example: Investigate QnA Voting Practice Approach: Statistical analysis of the user logs Manual inspection of the content –Taxonomy of the users’ intent; to be evolved by the community of practice Define the basic features of the individuals and governing assumptions Derive a mathematical model of the voters metric. Observe the properties with regards to the irregular voting behaviour: random voting or collusion. C A A V answer to vote on answer to comment to Social network activities: Q Answer to a question Comment on an answer Vote on the best answer
21
Which Answer to Vote On? Different ‘best answer’ connotations The notion of the ‘best answer’ thus depends on the context and nature of the answers - from correctness and usefulness to entertainment value Social bias Assignment of votes may be influenced by social and personal ties, voter’s perception, familiarity, and preferential treatment of familiar community members “Microsoft or Apple? Feel free to argue and point out their good and bad points. Also feel free to rebut or debate on other people's standpoint. Best argument/answer will get my friends’ and my "best answer" reward.” Self-promotion Individuals’ aspirations to excel in their social status can adversely affect the quality of their contribution to the community.
22
Reliability as Conformity? Reliability of a voter Relative reliability of two voters is determined by the proportion of all the voters who made the same choice of the best answer: The reliability scores represent a fixed-point for the function F – apply Brouwer Fixed Point Theorem.
23
Real Data Analysis Vote Count FP Method ‘FUN’‘PHILOSOPHY’
24
Random Voting Simulate Random Voting by uniform distribution in place of Zipf’s Law We vary the percentage of affected questions (from 1% to 10%) and the percentage of voters who voted randomly (from 1% to 10%). The number of best answer changed is lower for fixed point score (right) than for plurality voting (left)
25
Simulate the collusion: fix the number of involved voters (‘stuffers’, here 4 and 10) and the percentage of questions affected (here 50%) Both majority voting and fixed point scoring are susceptible to ballot stuffing Fixed point scoring flags out the outliers and helps identifying collusion Ballot Stuffing
26
Detecting Sybil Attack - Leveraging Social Networks Social networks are Fast Mixing –Random walks quickly converge to stationary distribution Sybil attacks induce a bottleneck cut –Fast mixing is disrupted Knowledge of an apriori honest node –Breaks Symmetry Honest Nodes Sybil Nodes Attack edges
27
LEVERAGING COMMUNITIES OF PRACTICE social as facilitator of research
28
Issue: the Scale and the Limitations of Humans We require user input in order to inform the systems’ design and verify our hypotheses In search we build test collections: –A set of topics, a corpus of documents, and relevance judgements for documents in the corpus Question: how do we build test collections for books –Search over Web pages involves low cost of inspection of individual Web pages –Search over Book collections increases the cost due to the size and the coherence of topics across pages.
29
Web scenario
30
Book scenario …
31
DATA STORE AND SEARCHABLE INDEX Read’n Play Architecture comprises four functional layers Implemented using Web services - no client based interaction with the content Can be repurposed for other research projects SEARCH AND NAVIGATION SUPPORT USER ANNOTATIONS SOCIAL GAME SUPPORT Image Database - Scanned Document Page OCR Text Database Text and Metadata Index
32
Social game Explorers Reviewers Reward for finding relevant content Reward for finding mistakes in explorers’ work Reward for re-assessment (agreement is not necessary) Conflicts Penalty
33
Explore
34
Pilot Study Participants Open to everyone 48 registered + 81 INEX participants 17 contributed assessments (16 INEX participants) Collected data Relevance assessments –3,478 judged books with –23,098 judged pages from –29 topics Log data –32,112 navigational events –45,126 judgement events –2,970 ‘search inside a book’ events Incentives for participation Tangible, e.g., monetary, –Winners: Microsoft Hardware and software –All: Access to collected data Intangible reward, e.g., fun, social gain –Leader board: Social statu s
35
Feasibility Averages across the 17 assessors 7.2 days with activity, out of 42 11.4 hours judging time 220 judged books Average effort 7.3 minutes per relevant book, 2.7 minutes per irrelevant book (comparable to INEX 2003 ad hoc track) 37 seconds per relevant page, 22 seconds per irrelevant page Extrapolated statistics 1000 books takes 52.7 hours, 1 : 9 ratio of relevant : irrelevant 33.3 days to judge one topic, with 95 minutes a day 70 topics, 200 books per topic with 20 judges takes 36.9 days 737 judges to complete task in one hour
36
Productivity Games
38
Summary Understanding social media requires cross-disciplinary approach and new methods to study them Defining the characteristics and metrics of ‘healthy communities’ is a challenging task. ‘Social’ is increasing its role as an enabler for large scale experiments Generally, we need to be reflective of our methods and approaches we take when studying online communities.
39
Thank you Microsoft Research Cambridge https://research.microsoft.com/is
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.