Web Science & Technologies University of Koblenz ▪ Landau, Germany Micro-interactions and Macro-observations Deciding Between Competing Models Steffen.

Slides:



Advertisements
Similar presentations
Chapter 2 Principles Of Science And Systems. What Is Science? Science Depends On Skepticism And Accuracy Deductive And Inductive Reasoning Are Both Useful.
Advertisements

Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Chapter 10.  Real life problems are usually different than just estimation of population statistics.  We try on the basis of experimental evidence Whether.
COURSE: JUST 3900 INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE Instructor: Dr. John J. Kerbs, Associate Professor Joint Ph.D. in Social Work and Sociology.
Synopsis of “Emergence of Scaling in Random Networks”* *Albert-Laszlo Barabasi and Reka Albert, Science, Vol 286, 15 October 1999 Presentation for ENGS.
4. PREFERENTIAL ATTACHMENT The rich gets richer. Empirical evidences Many large networks are scale free The degree distribution has a power-law behavior.
The Complex Dynamics of Collaborative Tagging Harry Halpin University of Edinburgh Valentin Robu CWI, Netherlands Hana Shepherd Princeton University WWW.
Chapter 9 – Hypothesis Tests concerning One Population Mean.
On-line resources
Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
Statistics for the Social Sciences Psychology 340 Fall 2006 Review For Exam 1.
Chapter 3 Hypothesis Testing. Curriculum Object Specified the problem based the form of hypothesis Student can arrange for hypothesis step Analyze a problem.
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
By Ciro Cattuto, Vittorio Loreto, and Luciano Pietronero Semiotic dynamics and collaborative tagging Present by Diyue Bu.
PSY 307 – Statistics for the Behavioral Sciences
The t Tests Independent Samples.
The problem of sampling error in psychological research We previously noted that sampling error is problematic in psychological research because differences.
Choosing Statistical Procedures
The Characteristics of an Experimental Hypothesis
AM Recitation 2/10/11.
Overview Definition Hypothesis
Introduction to Hypothesis Testing for μ Research Problem: Infant Touch Intervention Designed to increase child growth/weight Weight at age 2: Known population:
Statistical inference: confidence intervals and hypothesis testing.
Section 9.1 Introduction to Statistical Tests 9.1 / 1 Hypothesis testing is used to make decisions concerning the value of a parameter.
Chapter 8 Introduction to Hypothesis Testing
Chapter 8 Hypothesis Testing I. Chapter Outline  An Overview of Hypothesis Testing  The Five-Step Model for Hypothesis Testing  One-Tailed and Two-Tailed.
Hypothesis Testing: One Sample Cases. Outline: – The logic of hypothesis testing – The Five-Step Model – Hypothesis testing for single sample means (z.
Copyright © Cengage Learning. All rights reserved. Hypothesis Testing 9.
1 Statistical Inference. 2 The larger the sample size (n) the more confident you can be that your sample mean is a good representation of the population.
Chapter 8 Introduction to Hypothesis Testing
STA Statistical Inference
Making decisions about distributions: Introduction to the Null Hypothesis 47:269: Research Methods I Dr. Leonard April 14, 2010.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
VI. Evaluate Model Fit Basic questions that modelers must address are: How well does the model fit the data? Do changes to a model, such as reparameterization,
Web Science & Technologies University of Koblenz ▪ Landau, Germany Micro-Macro-Implications Steffen Staab Slides by Klaas Dellschaft
Biostatistics Class 6 Hypothesis Testing: One-Sample Inference 2/29/2000.
1 Virtual COMSATS Inferential Statistics Lecture-16 Ossam Chohan Assistant Professor CIIT Abbottabad.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
The use & abuse of tests Statistical significance ≠ practical significance Significance ≠ proof of effect (confounds) Lack of significance ≠ lack of effect.
Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall 9-1 σ σ.
Class 9: Barabasi-Albert Model-Part I
The Scientific Method: Terminology Operational definitions are used to clarify precisely what is meant by each variable Participants or subjects are the.
Math 4030 – 9a Introduction to Hypothesis Testing
Chapter 12 Introduction to Analysis of Variance PowerPoint Lecture Slides Essentials of Statistics for the Behavioral Sciences Eighth Edition by Frederick.
Understanding Basic Statistics Fourth Edition By Brase and Brase Prepared by: Lynn Smith Gloucester County College Chapter Nine Hypothesis Testing.
Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.
15 Sep 2015 EunJeong Cheon i501: introduction to informatics Semiotic Dynamics and Collaborative Tagging Ciro Cattuto, Vittorio Loreto, and Luciano Pietronero.
Chapter 8: Introduction to Hypothesis Testing. Hypothesis Testing A hypothesis test is a statistical method that uses sample data to evaluate a hypothesis.
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
Chapter 1 Introduction to Research in Psychology.
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
BIOL 582 Lecture Set 2 Inferential Statistics, Hypotheses, and Resampling.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.
Chapter 9 Introduction to the t Statistic
Descriptive Statistics Report Reliability test Validity test & Summated scale Dr. Peerayuth Charoensukmongkol, ICO NIDA Research Methods in Management.
Copyright © Cengage Learning. All rights reserved. Hypothesis Testing 9.
Chapter Nine Hypothesis Testing.
KARL POPPER ON THE PROBLEM OF A THEORY OF SCIENTIFIC METHOD
Statistics for the Social Sciences
Statistics for Psychology
Copyright © Cengage Learning. All rights reserved.
Science vocabulary (12) 8/22/18 quiz
Discrete Event Simulation - 4
Chapter Nine Part 1 (Sections 9.1 & 9.2) Hypothesis Testing
Chapter 3 Probability Sampling Theory Hypothesis Testing.
CHAPTER 9 Testing a Claim
Hypothesis Testing and Confidence Intervals (Part 2): Cohen’s d, Logic of Testing, and Confidence Intervals Lecture 9 Justin Kern April 9, 2018.
What determines Sex Ratio in Mammals?
Science Review Game.
1 Chapter 8: Introduction to Hypothesis Testing. 2 Hypothesis Testing The general goal of a hypothesis test is to rule out chance (sampling error) as.
Presentation transcript:

Web Science & Technologies University of Koblenz ▪ Landau, Germany Micro-interactions and Macro-observations Deciding Between Competing Models Steffen Staab Slides by Klaas Dellschaft

Klaas Dellschaft Introduction to Web Science 2 of 32 WeST Details of Model-based Research  How to represent observables?  Distribution functions  How to compare simulation and reality?  Analytical evaluation  Visual comparison  Goodness-of-fit tests  How to decide between competing models? Today!

Klaas Dellschaft Introduction to Web Science 3 of 32 WeST Use Case: Dynamics in Tagging Systems  How do users influence each other in tagging systems?  Is this influence positive or negative for the indexing quality?

Klaas Dellschaft Introduction to Web Science 4 of 32 WeST Method of Model-based Research Model Reality Micro-interactionsMacro-observations Stochastic Model Assumed rules of interaction Simulated Properties Unknown ModelObserved Properties Compare Unknown rules of interaction

Klaas Dellschaft Introduction to Web Science 5 of 32 WeST Co-occurrence Streams  Co-occurrence Streams:  All tags co-occurring with a given tag in a posting  Ordered by posting time  Example tag assignments for ‘ajax':  {mackz, r1, {ajax, javascript}, 13:25}  {klaasd, r2, {ajax, rss, web2.0}, 13:26}  {mackz, r2, {ajax, php, javascript}, 13:27}  Resulting co-occurrence stream: javascriptrss web2.0 php javascript time

Klaas Dellschaft Introduction to Web Science 6 of 32 WeST Tag Frequencies

Klaas Dellschaft Introduction to Web Science 7 of 32 WeST Vocabulary Growth

Klaas Dellschaft Introduction to Web Science 8 of 32 WeST Method of Model-based Research Model Reality Micro-interactionsMacro-observations Stochastic Model Assumed rules of interaction Simulated Properties Unknown ModelObserved Properties Compare Unknown rules of interaction

Klaas Dellschaft Introduction to Web Science 9 of 32 WeST Word Frequencies vs. Tag Frequencies  Both are heavy-tailed distributions (cf. power-law)  Distributions differ in their average exponent  Higher variance in the exponents for the tag frequencies  Different distributions  But: Possibly a common pattern of explanation word frequencies in texts tag frequencies

Klaas Dellschaft Introduction to Web Science 10 of 32 WeST Delicious Tagging Interface  Hypothesis:  If users would only have the free input field, then tag frequencies and word frequencies would look alike  Tag recommendations change the original frequency distribution

Klaas Dellschaft Introduction to Web Science 11 of 32 WeST Selection Probabilities for Tag Recommendations  Selection probabilities for 25 resources from a larger Delicious data set  Less than 50% of the tags do not occur in the recommendations Popular TagsUser TagsRecommended TagsFree Input 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1

Klaas Dellschaft Introduction to Web Science 12 of 32 WeST Method of Model-based Research Model Reality Micro-interactionsMacro-observations Stochastic Model Assumed rules of interaction Simulated Properties Unknown ModelObserved Properties Compare Unknown rules of interaction

Klaas Dellschaft Introduction to Web Science 13 of 32 WeST The Epistemic Model  P BK : Probability of entering a tag in the free input field  p(W|t): Probability distribution of words w  W in context of topic t  p(W|t) is modeled by the word frequencies in a text corpus about topic t  P I = 1 – P BK : Probability of imitating a tag from the recommendations  Only modeling the selection from the popular tags  n corresponds to the number of recommended tags P BK PIPI 123…n p(W|t)

Klaas Dellschaft Introduction to Web Science 14 of 32 WeST Alternative Models / Hypotheses

Klaas Dellschaft Introduction to Web Science 15 of 32 WeST Yule-Simon-Model with Memory  p: Probability of adding a new tag  1-p: Probability of an already existing tag  Probability Q t (x) of selecting a tag decreases with its age x  Only models the imitation of tags  Extension of linear preferential attachment

Klaas Dellschaft Introduction to Web Science 16 of 32 WeST Semantic Walker Model  Tag assignments of a user:  Random walk through a graph (=semantic space)  Different models for generating the graph:  Watts-Strogatz model  Erdös-Rényi model  …  Only models the entering of tags in the free input field

Klaas Dellschaft Introduction to Web Science 17 of 32 WeST Deciding between Alternative Models

Klaas Dellschaft Introduction to Web Science 18 of 32 WeST Problem of Induction (I)  Logical problem:  Can the claim that a universal theory is true be justified by empirical reason?  No!!!  Reformulation of the problem:  Can the claim that a universal theory is true or false be justified by empirical reason?  Yes!!! We can justify that it is false!  We should prefer theories which have not be shown to be false Karl Popper Source: WikipediaWikipedia David Hume Source: WikipediaWikipedia

Klaas Dellschaft Introduction to Web Science 19 of 32 WeST Problem of Induction (II)  Popper’s Critical Method:  We should try to falsify theories  If a theory is falsified, … … we should define a succeeding theory … which explains the same observations as the predecessor … and which is successful where the predecessor failed  Following this method, we may hit upon a true theory. But in no case the method establishes its truth, even if it is true.  All theories must be regarded as hypothetical

Klaas Dellschaft Introduction to Web Science 20 of 32 WeST Comparison of Tagging Models  Tag Frequencies  Visual comparison: None of the models can be falsified  Apply goodness-of-fit tests for deciding between models  Vocabulary Growth  Visual comparison: Yule-Simon Model is falsified!  Imitation is not sufficient for explaining the dynamics in tagging systems!  Is imitation part of a minimal model of tagging systems? Model Imitation Processes Background Knowledge Tag Frequencies Vocabulary Growth Yule-Simon Model  power-law with cut-off linear Sem. Walker Model  power-law with cut-off sublinear Epistemic Model power-law with cut-off sublinear

Klaas Dellschaft Introduction to Web Science 21 of 32 WeST Method of Model-based Research Model Reality Micro-interactionsMacro-observations Stochastic Model Assumed rules of interaction Simulated Properties Unknown ModelObserved Properties Compare Unknown rules of interaction

Klaas Dellschaft Introduction to Web Science 22 of 32 WeST Evaluation Setup  Comparing simulations with real co-occurrence streams  Delicious: 10 streams  Bibsonomy: 5 streams  Tag Frequencies  Measure distance between simulation and reality  Apply significance test whether it is a valid model (Kolmogorov-Smirnov Test)  Vocabulary Growth  Minimize distance between simulation and reality  Significance tests not applicable because growth functions can not be represented as random variables  Allow maximal distance of  10% to observed vocabulary growth

Klaas Dellschaft Introduction to Web Science 23 of 32 WeST Tag Frequencies – Results Stream Epistemic ModelSemantic WalkerYule-Simon Model DSign.D D ringtones setup boat historical messages decorative costs ff checkbox datawarehouse tools social design analysis blogs Sign.  0.1: Significantly different distributions

Klaas Dellschaft Introduction to Web Science 24 of 32 WeST Tag Frequencies – Ringtones Stream

Klaas Dellschaft Introduction to Web Science 25 of 32 WeST Tag Frequencies – Messages Stream

Klaas Dellschaft Introduction to Web Science 26 of 32 WeST Vocabulary Growth – Results StreamEpistemic ModelSemantic Walker ringtones setup boat historical messages decorative costs ff checkbox datawarehouse tools social design analysis blogs

Klaas Dellschaft Introduction to Web Science 27 of 32 WeST Vocabulary Growth – Ringtones Stream (I) Where is this “step” coming from?

Klaas Dellschaft Introduction to Web Science 28 of 32 WeST Vocabulary Growth – Ringtones Stream (II) Step is caused by User 157 and 455 who are spammers Step is caused by User 10 who is a spammer

Klaas Dellschaft Introduction to Web Science 29 of 32 WeST Vocabulary Growth – Setup Stream

Klaas Dellschaft Introduction to Web Science 30 of 32 WeST The Influence of Spammers (I)  Epistemic Model  Based on assumptions about tagging behavior of regular users  Difficulties to reproduce observable properties, if spammers are present in the data set  Spammers and regular users have different tagging models?  Compare properties between streams before removing spammers (unfiltered) and after removing spammers (filtered)  Kolmogorov-Smirnov Test should detect significant differences!

Klaas Dellschaft Introduction to Web Science 31 of 32 WeST The Influence of Spammers (II)

Klaas Dellschaft Introduction to Web Science 32 of 32 WeST The Influence of Spammers (III)  Compare difference between tag frequencies in filtered and unfiltered co-occurrence streams  Goodness-of-fit Test: Kolmogorov-Smirnov Test  Ringtones Stream:  Maximal distance D:  Significance: 0.0 (= significant differences exist)  Social Stream:  Maximal distance D:  Significance: 0.0 (=significant differences exist)  Spammers and regular users have different tagging models

Klaas Dellschaft Introduction to Web Science 33 of 32 WeST Possible reasons for failed evaluations 1.The model is completely wrong 2.The model is missing influence factors  Semantic Walker Model: Missing influence of imitation  Yule-Simon Model: Missing influence of free tag input 3.The data contains noise  Epistemic Model: Influence coming from spammers