Measure Up! Data Analysis Tools to Optimize Library Management Dr. Lesley FarmerCalifornia State University Long Beach

Slides:



Advertisements
Similar presentations
Data: Quantitative (Histogram, Stem & Leaf, Boxplots) versus Categorical (Bar or Pie Chart) Boxplots: 5 Number Summary, IQR, Outliers???, Comparisons.
Advertisements

McGraw-Hill/Irwin McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies, Inc. All rights reserved.
Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition.
© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 ~ Curve Fitting ~ Least Squares Regression Chapter.
Probability & Statistical Inference Lecture 9
Learning Objectives Copyright © 2004 John Wiley & Sons, Inc. Bivariate Correlation and Regression CHAPTER Thirteen.
Chapter 17 Overview of Multivariate Analysis Methods
Chapter Seventeen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Multivariate Techniques for the Research Process.
Chapter 9 Business Intelligence Systems
Multivariate Data Analysis Chapter 4 – Multiple Regression.
19-1 Chapter Nineteen MULTIVARIATE ANALYSIS: An Overview.
Chapter 19 Data Analysis Overview
Correlational Designs
Correlation and Regression Analysis
Measure Up! Data Analytics and Libraries Alan Safer CSU Long Beach Lesley Farmer CSU Long Beach
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Chapter 5 Data mining : A Closer Look.
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
Ensemble Learning (2), Tree and Forest
Beyond Opportunity; Enterprise Miner Ronalda Koster, Data Analyst.
Regression and Correlation
Correlation & Regression
1 Statistical Tools for Multivariate Six Sigma Dr. Neil W. Polhemus CTO & Director of Development StatPoint, Inc.
Data Mining Techniques
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Selecting the Correct Statistical Test
Trying to Give Business Students What They Need for Their Future Bob Andrews Virginia Commonwealth University.
1 Software Quality Engineering CS410 Class 5 Seven Basic Quality Tools.
Simple Linear Regression
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Graphical Analysis. Why Graph Data? Graphical methods Require very little training Easy to use Massive amounts of data can be presented more readily Can.
CHAPTER NINE Correlational Research Designs. Copyright © Houghton Mifflin Company. All rights reserved.Chapter 9 | 2 Study Questions What are correlational.
(a.k.a: The statistical bare minimum I should take along from STAT 101)
L 1 Chapter 12 Correlational Designs EDUC 640 Dr. William M. Bauer.
Statistical Analysis. Statistics u Description –Describes the data –Mean –Median –Mode u Inferential –Allows prediction from the sample to the population.
2 Categorical Variables (frequencies) Testing mean differences of a continuous variable between groups (categorical variable) 2 Continuous Variables 2.
1 COMP3503 Inductive Decision Trees with Daniel L. Silver Daniel L. Silver.
Interpreting Data for use in Charts and Graphs. V
Regression Analysis Week 8 DIAGNOSTIC AND REMEDIAL MEASURES Residuals The main purpose examining residuals Diagnostic for Residuals Test involving residuals.
Discriminant Analysis Discriminant analysis is a technique for analyzing data when the criterion or dependent variable is categorical and the predictor.
Regression Chapter 16. Regression >Builds on Correlation >The difference is a question of prediction versus relation Regression predicts, correlation.
©2006 Thomson/South-Western 1 Chapter 14 – Multiple Linear Regression Slides prepared by Jeff Heyl Lincoln University ©2006 Thomson/South-Western Concise.
Descriptive statistics Petter Mostad Goal: Reduce data amount, keep ”information” Two uses: Data exploration: What you do for yourself when.
Academic Research Academic Research Dr Kishor Bhanushali M
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 14 Comparing Groups: Analysis of Variance Methods Section 14.1 One-Way ANOVA: Comparing.
1 …continued… Part III. Performing the Research 3 Initial Research 4 Research Approaches 5 Hypotheses 6 Data Collection 7 Data Analysis.
Inferential Statistics. The Logic of Inferential Statistics Makes inferences about a population from a sample Makes inferences about a population from.
Review Lecture 51 Tue, Dec 13, Chapter 1 Sections 1.1 – 1.4. Sections 1.1 – 1.4. Be familiar with the language and principles of hypothesis testing.
Chap 18-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 18-1 Chapter 18 A Roadmap for Analyzing Data Basic Business Statistics.
Using Data Analytics for School Library Assessment and Improvement Dr. Lesley Farmer California State University Long Beach.
ANOVA, Regression and Multiple Regression March
1 Correlation and Regression Analysis Lecture 11.
Applied Multivariate Statistics Cluster Analysis Fall 2015 Week 9.
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
Regression Tree Ensembles Sergey Bakin. Problem Formulation §Training data set of N data points (x i,y i ), 1,…,N. §x are predictor variables (P-dimensional.
Clustering / Scaling. Cluster Analysis Objective: – Partitions observations into meaningful groups with individuals in a group being more “similar” to.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Multivariate statistical methods Cluster analysis.
The State of U.S. School Libraries By the Numbers Dr. Lesley Farmer California State University Long Beach
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Statistical Data Analysis
Correlation, Bivariate Regression, and Multiple Regression
R. E. Wyllys Copyright 2003 by R. E. Wyllys Last revised 2003 Jan 15
Eco 6380 Predictive Analytics For Economists Spring 2016
Chapter 2 Six Sigma Installation
Feature Selection Methods
Statistical Thinking and Applications
Introductory Statistics
Presentation transcript:

Measure Up! Data Analysis Tools to Optimize Library Management Dr. Lesley FarmerCalifornia State University Long Beach

 Research data analytics to assess California school libraries, and identify variables to improve their impactData analysis statistics  Choosing data analysis tools Agenda

 What significant trends between 2007 and 2012 exist in California school library programs?  What is the profile of a consistently highly (and low) effective school library progarms?  What are the predictors for high – and low -- school library impact over time? Research Questions Based on 2007 and 2012 California School Libraries Data

 Trend analysis of California school libraries  Predictive models of impactful California school libraries, which might be generalizable  Increased use of data analytics to improve libraries Needs

 Use California State Department of Education annual school library survey reports datasets ( and )  Code survey variables: e.g., meet standard or not  Compare school libraries that meet state model school library standards baseline criteria with those who did not meet standards  Use several statistical techniques: clustering analysis, decision trees, logistic regression Method

Sample California School Library Reports Distribution

64 Independent Variables

 Meet Standard or Not (binary)  API (Academic Performance Index)  Socio-economic API decile Dependent Variables

 Kth nearest-neighbor (knn) is a clustering method that uses distances between variables to group observations together.  Those with smaller distances between them are assumed to be similar, so  looking closer at the individual clusters can potentially determine important characteristics. Clustering

 Measures the distance between two clusters  Observations with least differences are clustered  Joins “close” clusters so that resulting within- cluster variance is minimized Ward Method of Clustering

 Enhanced access: on weekends, summer  Book budget Important Ward-based Variables

 Measure distance between the centroids (means) of each cluster  Join 2 nearest clusters Centroid Method of Clustering

Centroid Cluster-based Variables Positive:  Access during breaks  Internet access  Online productivity tools  Reference help Negative:  No access before OR after school  No Internet access  No online library catalog  No “extra” funding

 Flowchart of decisions and possible consequences  Node=test, branch=outcome, leaf=decision  Path from root to leaf is classification rule  Split data into training set and test set  Select “information gain” attribute to separate data  Do tree pruning for optimal selection (aim for homogeneous class)  Useful for predictions Decision Trees

 Online library catalog  Internet access  Online DBs  Video DBs  Budget (and funding sources)  Collection currency  Reference help  Dependent variable: met standards or not CART (Classification & Regression Trees) Important Independent Variables

 Budget (and funding sources)  Collection currency  Online lib rary catalog  Reference help  # of books  Dependent variable: met standards or not C4.5 decision tree (more than binary splits) Important Dependent Variables

 Probabilistic statistical classification model  Measure relationship between categorical dependent variable and independent (continuous or categorical) variables  Regression line is nonlinear  Run with combination of main effects  Aim for best fit  Predicts outcome of categorical dependent variable Logistic Regression

 Backward Selection: start with all variables and remove insignificant ones  Forward Selection: start with 1 significant variable until model is complete  Stepwise Selection: add or remove a variable depending on making model better Main Effects: Different ways to determine the best logistic regression model

 Use to compare models  Distinguishes classifiers that are optimal under some class and sub-optimal classifiers  Plotting 2 classes: true-positive versus false-positive rates ROC (Receiver Operating Characteristics)

 DEPENDENT Variable: API  Staffing  Online library catalog  Collection currency  Internet access  Online DBs  Budget (and fund sources)  Reference help CART Best Model: Ultimate Important Predictable Variables

What data do you collect? 22 Circulation figures Patron usage Facilities usage Computer usage Internet usage Reference consultations and fill Library guides/bibliographies use Instructional sessions Website hits (including tutorials) Database usage vs cost ILL processing and turnaround time Ordering, processing, cataloging, preservation, weeding workflow and time Ebook usage vs cost Library software usage vs cost Staff scheduling Equipment maintenance and repairs

What tools do you use to collect data?  Surveys  Web statistics  Circulation statistics  Interviews and interviews  Observation  LibQual / LibPAS  Flowfinity  Document collecting 23

What do you DO with that data?  Descriptive statistics  Analyze workflow for efficiency  Reveal trends  Benchmark efforts  Control quality  Do cost-benefit analysis  Analyze student learning  Optimize scheduling  Optimize queuing 24

 Data: demographics, staff, resources, services  Use: trends over time, correlations between staff and resources/services,  Demographic correlations with staffing, resources and services  AASL membership correlations with staffing, resources and services AASL Longitudinal Data

Copyright Median by State

$/Student by Region

# of Books/Student by School Level

Techniques  Correlation analysis (for relationship between continuous variables)  Multiple Regression(continuous response variable), Logistic Regression(categorical response variable)  Decision Trees  Principle Components, Factor Analysis  Hypothesis testing (paired tests, two sample tests, ANOVA)  Chi-Square tests of independence (for relationship between categorical variables) 29

Graphs  Box Plots  Stem and Leaf Plots  Histograms/Bar Graphs  Pareto Charts  Pie Charts  Time Series Plot  Outlier assessment 30

31

32

Stem-and-Leaf Plot 33

KM ANALYSIS APPROACHDATA ANALYTIC TOOLS Cause identification Fishbone diagram, correlation analysis, regression analysis, ANOVA, clustering, principal components Cost-benefit analysis / ROIPugh matrix, Pearson correlation Customer satisfactionRegression analysis, Likert techniques, chi square DecisionDecision tree, Pugh matrix Error and tolerance analysisPareto analysis, control chart Failure analysisPareto analysis, control chart, clustering Job analysisDemerit systems, flow chart Process capacity Quality analysisPugh matrix, control chart Quality controlControl chart, run chart Quantity analysisHistogram, run chart QueuingPoisson distribution ScalabilityProcess capability Time analysis Run chart, Poisson distribution, activity network diagram Work flow and process analysis Fishbone diagram, activity network diagram, flow chart, run chart

Let’s talk! Next Steps