Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data analysis Incorporating slides from IS208 (© Yale Braunstein) to show you how 208 and 214 are telling you many of the the same things; and how to use.

Similar presentations


Presentation on theme: "Data analysis Incorporating slides from IS208 (© Yale Braunstein) to show you how 208 and 214 are telling you many of the the same things; and how to use."— Presentation transcript:

1 Data analysis Incorporating slides from IS208 (© Yale Braunstein) to show you how 208 and 214 are telling you many of the the same things; and how to use 208 methods for usability

2 Analysis--Introduction The BIG Questions: –What are you trying to discover or show? –How will you present the results?

3 Kinds of data you are likely to have Quantitative –Survey results –Results from usability testing: usability metrics of various kinds –Server log analyses Qualitative –Interview findings –Results from focus groups –Comments people make, e.g., during testing –Transcriptions of inteviews Graphics –Videotape, still photos –Examples (e.g. of forms, documents) –[Audiotape usually used only for your purposes, although sound clips are possible]Q

4 Analysis of Quantitative Date: Topics Covered in 208 Measures of central tendency (various forms of “averages”) Frequencies Histograms Cross-tabulations Recoding & transforming data Missing data or observations

5 Rough categorization of statistical methods Those that in some form report findings –Describe, summarize findings x many instances Those that test –Hypotheses: is this assumption, projection true? –Differences: is this difference significant, or within the range of expect error? –Correlations, causation: is this variable actually related to that one? If so, what is the nature of the relationship?

6 Purposes of Quantitative Data Analysis Summarize or describe what you found: Descriptive statistics Frequencies, histograms Measures of central tendency –Mean, median –Distribution around the mean Compare –E.g. differences x groups, versions of software Cross tabs Time effects – changes over time –Crosstabs and the like compare time periods –Graphing movement over time – time is always the x axis Investigate causality or at least correlation –Cross-tabs –Correlation statistics, graphs, scatterplots

7 Types of data Nominal – names –Male, female Ordinal - ordered –Freshman, sophomore, junior, senior Interval – equal intervals –Farenheit scale Ratio - equal intervals, zero point –Ruler http://coe.sdsu.edu/eet/Articles/measurescales/index.htm

8 Converting qualitative data to quantitative Forcing people to choose and quantify –Survey questions: “How would you rate your experience today, on a scale from 1 (low) to 5 (high)…” Measuring occurrences, Coding data –E.g., # of times people complain about x –Categorizing open-ended responses to questions, comments into a limited # of discrete categories and counting

9 Recap of Tues Statistical methods are used to describe or summarize, and to test

10 Testing Tests of interest to us are on the order of: –Is this group different from that one? –Was there a change over time? –Was this design better than that one? Many tests are based on assumptions about data, including: –What you have is a sample from a much larger population –The data is usually assumed to follow a normal distribution (bell-shaped curve). There are methods that don’t make this assumption, but this is common.

11 Some conventions of data presentation Tables, graphics should be able to stand on their own –Include enough information for the reader to be able to interpret them without your text –Label table, columns, rows –Include unit of measure: %, hours, unique users… Ideally, reader should be able to reconstruct the data from your presentation –Graphics lose information Graphs: –Independent variable on x-axis, dependent on y axis –Time is always independent variable, therefore on x axis

12 Univariate descriptive statistics 1 variable at a time Frequency http://cyberatlas.internet.com/big_picture/geographics/article/0,,59 11_2174111,00.html http://cyberatlas.internet.com/big_picture/geographics/article/0,,59 11_2174111,00.html http://www.cc.gatech.edu/gvu/user_surveys/survey-1998- 10/graphs/general/q54.htmhttp://www.cc.gatech.edu/gvu/user_surveys/survey-1998- 10/graphs/general/q54.htm Histograms http://www.shodor.org/interactivate/activities/histogram/ http://www.shodor.org/interactivate/activities/histogram/ –http://www.ntia.doc.gov/ntiahome/dn/anationonline2.pdf p. 11, p. 12http://www.ntia.doc.gov/ntiahome/dn/anationonline2.pdf Percentages –Pie charts: http://www.ntia.doc.gov/ntiahome/dn/anationonline2.pdf fig. 4-1 p. p. 41http://www.ntia.doc.gov/ntiahome/dn/anationonline2.pdf –Comparing groups: http://www.ntia.doc.gov/ntiahome/dn/html/Chapter5.htm fig 5-7 p. 54 http://www.ntia.doc.gov/ntiahome/dn/html/Chapter5.htm

13 Measures of Central Tendency Type of data matters: –Distinguish between numerical & categorical data Understand the differences between the different measures of “the average” –Mean –Median (middle value) –Mode (most common value) Consider the dispersion around the center Interactive tutorial http://pse.cs.vt.edu/SoSci/converted/MMM/ http://pse.cs.vt.edu/SoSci/converted/MMM/

14 Measures of dispersion Range (max, min) Quartiles –Box and whiskers http://www.winstat.com/english/function/graphics/boxwhisk.htm http://www.winstat.com/english/function/graphics/boxwhisk.htm Standard Deviation – a measure of how dispersed the observations are http://www.robertniles.com/stats/stdev.shtml http://www-stat.stanford.edu/~naras/jsm/FindProbability.html When distribution is normal, 68% of observations are within 1 S.D. of the mean, and 95% are within 1.96 S.D.s http://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html Example of use of confidence interval: –http://www.ntia.doc.gov/ntiahome/dn/anationonline2.pdf p. 14http://www.ntia.doc.gov/ntiahome/dn/anationonline2.pdf

15 Major uses of measures of dispersion Describing a set of observations Comparing two sets of observations; are they actually different? –Drawing repeated samples from the same population, and calculating the mean of each set, the means would follow a normal distribution. –Comparing two means: Could these sets be simply subsets of the same population, or are they likely to have come from different populations? If likely to be different populations, we say the differences are statistically significant. Confidence intervals: http://www.ntia.doc.gov/ntiahome/dn/anationonline2.pdf p. 13 Table 1-1 http://www.ntia.doc.gov/ntiahome/dn/anationonline2.pdf

16 Bi-variate: 2 variables at a time Q: How are these 2 related to one another? –Are 2 groups different from one another? –Are 2 variables causally connected? Changes over time Cross-tabs scatterplots Correlations

17 Bivariate I: Changes over Time http://www.ntia.doc.gov/ntiahome/dn/an ationonline2.pdf p. 16 Fig 2-1http://www.ntia.doc.gov/ntiahome/dn/an ationonline2.pdf http://www.ntia.doc.gov/ntiahome/dn/an ationonline2.pdf p. 17 Fig 2-2http://www.ntia.doc.gov/ntiahome/dn/an ationonline2.pdf

18 Crosstabs http://www.ntia.doc.gov/ntiahome/dn/an ationonline2.pdf p. 40: internet activities as a function of incomehttp://www.ntia.doc.gov/ntiahome/dn/an ationonline2.pdf More complex: p. 30, 31

19 Histograms, bivariate Example: both a table x-tabs table AND histogram of same data –http://www.cc.gatech.edu/gvu/user_surveys/papers/1997- 10/sld023.htmhttp://www.cc.gatech.edu/gvu/user_surveys/papers/1997- 10/sld023.htm –http://www.ntia.doc.gov/ntiahome/dn/anationonline2.pdf p. 39 fig 3- 3http://www.ntia.doc.gov/ntiahome/dn/anationonline2.pdf –Similar, as a table http://www.ntia.doc.gov/ntiahome/dn/anationonline2.pdf p. 40 table 3-1 http://www.ntia.doc.gov/ntiahome/dn/anationonline2.pdf

20 Scatterplots Scatterplots can be useful to present two numerical variables (non-categorical) and show the possible relationship between them. SPSS Commands –Graphs / Scatter / Simple –To add regression line, go to output window, double-click on graph, use Chart / Options / Fit Line

21 Scatterplot example Scatterplots show: associations outliers Interactive scatterplot of Florida 2000 votes: http://cuwu.editthispage.com/2000/11/08 http://cuwu.editthispage.com/2000/11/08

22 Correlation coefficient Ranges from –1.0 to +1.0 Perfect positive correlation: 1.0 NO correlation: 0.0 Statistical tests used to ask whether correlation coefficient is significantly different from zero Interactive: http://noppa5.pc.helsinki.fi/koe/corr/cor7.html Multivariate: multiple factors considered simultaneously http://noppa5.pc.helsinki.fi/koe/corr/cor7.html Histogram http://www.cc.gatech.edu/gvu/user_surveys/papers/1997-10/sld022.htm Table http://www.pewinternet.org/reports/chart.asp?img=88_changed.jpg

23 Causality

24 Survey results - questionnaire http://www.vcn.bc.ca/parkptnr/pdfs/App endix3_CompiledResults.pdfhttp://www.vcn.bc.ca/parkptnr/pdfs/App endix3_CompiledResults.pdf

25 Multivariate table

26 Various examples Mostly histograms http://www.cc.gatech.edu/gvu/user_surveys/papers/1997-10/sld001.htm Graphs http://www.cc.gatech.edu/gvu/user_surveys/pa pers/1997-10/sld006.htm http://www.cc.gatech.edu/gvu/user_surveys/survey-1998- 10/graphs/general/q54.htm http://www.cc.gatech.edu/gvu/user_surveys/survey-1998- 10/graphs/general/q54.htm


Download ppt "Data analysis Incorporating slides from IS208 (© Yale Braunstein) to show you how 208 and 214 are telling you many of the the same things; and how to use."

Similar presentations


Ads by Google