Download presentation
Presentation is loading. Please wait.
Published byEsther Little Modified over 9 years ago
1
Measure Up! Data Analysis Tools to Optimize Library Management Dr. Lesley FarmerCalifornia State University Long Beach Lesley.Farmer@csulb.edu
2
Research data analytics to assess California school libraries, and identify variables to improve their impactData analysis statistics Choosing data analysis tools Agenda
3
What significant trends between 2007 and 2012 exist in California school library programs? What is the profile of a consistently highly (and low) effective school library progarms? What are the predictors for high – and low -- school library impact over time? Research Questions Based on 2007 and 2012 California School Libraries Data
4
Trend analysis of California school libraries Predictive models of impactful California school libraries, which might be generalizable Increased use of data analytics to improve libraries Needs
5
Use California State Department of Education annual school library survey reports datasets (2007-8 and 2011-2012) Code survey variables: e.g., meet standard or not Compare school libraries that meet state model school library standards baseline criteria with those who did not meet standards Use several statistical techniques: clustering analysis, decision trees, logistic regression Method
6
Sample California School Library Reports Distribution
7
64 Independent Variables
9
Meet Standard or Not (binary) API (Academic Performance Index) Socio-economic API decile Dependent Variables
10
Kth nearest-neighbor (knn) is a clustering method that uses distances between variables to group observations together. Those with smaller distances between them are assumed to be similar, so looking closer at the individual clusters can potentially determine important characteristics. Clustering
11
Measures the distance between two clusters Observations with least differences are clustered Joins “close” clusters so that resulting within- cluster variance is minimized Ward Method of Clustering
12
Enhanced access: on weekends, summer Book budget Important Ward-based Variables
13
Measure distance between the centroids (means) of each cluster Join 2 nearest clusters Centroid Method of Clustering
14
Centroid Cluster-based Variables Positive: Access during breaks Internet access Online productivity tools Reference help Negative: No access before OR after school No Internet access No online library catalog No “extra” funding
15
Flowchart of decisions and possible consequences Node=test, branch=outcome, leaf=decision Path from root to leaf is classification rule Split data into training set and test set Select “information gain” attribute to separate data Do tree pruning for optimal selection (aim for homogeneous class) Useful for predictions Decision Trees
16
Online library catalog Internet access Online DBs Video DBs Budget (and funding sources) Collection currency Reference help Dependent variable: met standards or not CART (Classification & Regression Trees) Important Independent Variables
17
Budget (and funding sources) Collection currency Online lib rary catalog Reference help # of books Dependent variable: met standards or not C4.5 decision tree (more than binary splits) Important Dependent Variables
18
Probabilistic statistical classification model Measure relationship between categorical dependent variable and independent (continuous or categorical) variables Regression line is nonlinear Run with combination of main effects Aim for best fit Predicts outcome of categorical dependent variable Logistic Regression
19
Backward Selection: start with all variables and remove insignificant ones Forward Selection: start with 1 significant variable until model is complete Stepwise Selection: add or remove a variable depending on making model better Main Effects: Different ways to determine the best logistic regression model
20
Use to compare models Distinguishes classifiers that are optimal under some class and sub-optimal classifiers Plotting 2 classes: true-positive versus false-positive rates ROC (Receiver Operating Characteristics)
21
DEPENDENT Variable: API Staffing Online library catalog Collection currency Internet access Online DBs Budget (and fund sources) Reference help CART Best Model: Ultimate Important Predictable Variables
22
What data do you collect? 22 Circulation figures Patron usage Facilities usage Computer usage Internet usage Reference consultations and fill Library guides/bibliographies use Instructional sessions Website hits (including tutorials) Database usage vs cost ILL processing and turnaround time Ordering, processing, cataloging, preservation, weeding workflow and time Ebook usage vs cost Library software usage vs cost Staff scheduling Equipment maintenance and repairs
23
What tools do you use to collect data? Surveys Web statistics Circulation statistics Interviews and interviews Observation LibQual / LibPAS Flowfinity Document collecting 23
24
What do you DO with that data? Descriptive statistics Analyze workflow for efficiency Reveal trends Benchmark efforts Control quality Do cost-benefit analysis Analyze student learning Optimize scheduling Optimize queuing 24
25
Data: demographics, staff, resources, services Use: trends over time, correlations between staff and resources/services, Demographic correlations with staffing, resources and services AASL membership correlations with staffing, resources and services AASL Longitudinal Data
26
Copyright Median by State
27
$/Student by Region 2009-2012
28
# of Books/Student by School Level 2009-12
29
Techniques Correlation analysis (for relationship between continuous variables) Multiple Regression(continuous response variable), Logistic Regression(categorical response variable) Decision Trees Principle Components, Factor Analysis Hypothesis testing (paired tests, two sample tests, ANOVA) Chi-Square tests of independence (for relationship between categorical variables) 29
30
Graphs Box Plots Stem and Leaf Plots Histograms/Bar Graphs Pareto Charts Pie Charts Time Series Plot Outlier assessment 30
31
31
32
32
33
Stem-and-Leaf Plot 33
35
KM ANALYSIS APPROACHDATA ANALYTIC TOOLS Cause identification Fishbone diagram, correlation analysis, regression analysis, ANOVA, clustering, principal components Cost-benefit analysis / ROIPugh matrix, Pearson correlation Customer satisfactionRegression analysis, Likert techniques, chi square DecisionDecision tree, Pugh matrix Error and tolerance analysisPareto analysis, control chart Failure analysisPareto analysis, control chart, clustering Job analysisDemerit systems, flow chart Process capacity Quality analysisPugh matrix, control chart Quality controlControl chart, run chart Quantity analysisHistogram, run chart QueuingPoisson distribution ScalabilityProcess capability Time analysis Run chart, Poisson distribution, activity network diagram Work flow and process analysis Fishbone diagram, activity network diagram, flow chart, run chart
36
Let’s talk! http://www.librarydataanalytics.com/ Next Steps
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.