Download presentation
Presentation is loading. Please wait.
Published byPrimrose Hudson Modified over 7 years ago
1
Creating a Library Learning Analytics Database
Michael Doran, Systems Librarian University of Texas at Arlington Library and Information Technology Association
2
LITA Forum - Michael Doran - Nov 19, 2016
To be covered… What is a library learning analytics database? Why is it needed? A look under the hood Security & privacy issues Library vs. campus systems LITA Forum - Michael Doran - Nov 19, 2016
3
LITA Forum - Michael Doran - Nov 19, 2016
Learning analytics Learning analytics is the measurement, collection, analysis and reporting of data about learners and their contexts, for purposes of understanding and optimizing learning and the environments in which it occurs. [1] [1] From “1st International Conference on Learning Analytics and Knowledge 2011” via Wikipedia article on “Learning analytics” LITA Forum - Michael Doran - Nov 19, 2016 1
4
Creating a library learning analytics database…
…what problem(s) does that solve? LITA Forum - Michael Doran - Nov 19, 2016
5
Obligatory graphic of silos
“silos” photo by Doc Searls CC BY 2.0 LITA Forum - Michael Doran - Nov 19, 2016
6
LITA Forum - Michael Doran - Nov 19, 2016
Problems Library use data resides in separate systems Library systems typically don’t contain the student demographic information (e.g. major, academic program, GPA, student classification, etc.) needed to do learning analytics [a GOOD thing] Library data sets may use different unique identifiers (e.g. Institutional ID number vs. NetID) preventing linking them together LITA Forum - Michael Doran - Nov 19, 2016
7
Data Data Data Data Data Data Centralized database
Use data from various library systems Data Data Data Data Data Demographic data from Library Learning Analytics Database Data Centralized database Chart 12 syllables campus system LITA Forum - Michael Doran - Nov 19, 2016
8
“LIBLAND” LIBrary Learning ANalytics Database
LITA Forum - Michael Doran - Nov 19, 2016
9
LIBLAND Data Data Data Data Data Data Centralized database LIBLAND
Use data from various library systems Data Data Data Data Data Demographic data from LIBLAND Data Centralized database LIBLAND campus system LITA Forum - Michael Doran - Nov 19, 2016 1
10
Examples of systems with library use data
LITA Forum - Michael Doran - Nov 19, 2016
11
Entrance/exit gate turnstiles
Users swipe their “Mav Express” ID card to both enter and exit. Renovation in 2014 Exit Entry LITA Forum - Michael Doran - Nov 19, 2016
12
Interlibrary Loan Requests (e.g. ILLiad)
LITA Forum - Michael Doran - Nov 19, 2016
13
Group Study Room Reservations (e.g. OpenRoom)
LITA Forum - Michael Doran - Nov 19, 2016
14
Off-campus Access to E-resources (e.g. EZproxy)
Note: approximately 15,000 (out of 43,000) Spring 2016 students were online only. LITA Forum - Michael Doran - Nov 19, 2016
15
ILS Catalog (e.g. Voyager)
LITA Forum - Michael Doran - Nov 19, 2016
16
LIBLAND Data Data Data Data Data Data Centralized database LIBLAND
Use data from various library systems Data Data Data Data Data Demographic data from LIBLAND Data Centralized database LIBLAND campus system LITA Forum - Michael Doran - Nov 19, 2016 1
17
Examples of systems with demographic data
LITA Forum - Michael Doran - Nov 19, 2016
18
Campus LDAP Directory (“CEDAR” at UT Arlington)
“provide a consolidated standards-based directory which can provide consistent and complete information on students, faculty, staff, courses, organizations, and other electronically-describable entities and relationships” Central Enterprise Directory and Authentication Realm (CEDAR) LITA Forum - Michael Doran - Nov 19, 2016
19
Blackboard Analytics The center (for Distance Education] aspires to [...] be a resource to faculty and program administrators driving the use of learning analytics, including student learning outcomes; [...] From UT Arlington’s Center for Distance Education LITA Forum - Michael Doran - Nov 19, 2016
20
LITA Forum - Michael Doran - Nov 19, 2016
Your Campus? Access to, or data dumps from: LDAP directory Peoplesoft Banner LMS (Learning Management System) … LITA Forum - Michael Doran - Nov 19, 2016
21
Demographic data sources
Past, present, and future students CEDAR/LDAP 137,000 records; 16 attributes Bb A 43,000 recs; 16 attrib’s LDAP criteria: utaPersonAffiliation=student Blackb’d: Student registered for the term/semester in question Total CEDAR records [filter: (|(utaID=*)(utaEmplID=*))] = 897,067 utaPersonAffiliation = Primary affiliation and all secondary affiliations, as those of: faculty, student, staff, alum, member, affiliate, employee, applicant Current semester’s students LITA Forum - Michael Doran - Nov 19, 2016
22
Attributes (from demographic data sources)
LDAP (CEDAR) Blackboard Analytics UTA ID Student classification Academic program Major Grade points Hours complete GPA (calculated) Student status (i.e. enrolled now?) Ethnicity Gender Permanent address zip code Student address zip code Enrollment code Enrollment session Expected graduation date Library student employee? UTA ID Enrollment term Academic program Gender Ethnicity Age Tuition residency Student type College/school Department Academic plan Is academic partner? Is online student? Instruction mode Academic load Academic standing Attributes chosen was done in consultation with Director of Quantitative Assessment (who coordinated with Library Management Team) LITA Forum - Michael Doran - Nov 19, 2016
23
Attributes (from demographic data sources)
privacy LDAP (CEDAR) Blackboard Analytics UTA ID Student classification Academic program Major Grade points Hours complete GPA (calculated) Student status (i.e. enrolled now?) Ethnicity Gender Permanent address zip code Student address zip code Enrollment code Enrollment session Expected graduation date Is library student employee? UTA ID Enrollment term Academic program Gender Ethnicity Age Tuition residency Student type College/school Department Academic plan Is academic partner? Is online student? Instruction mode Academic load Academic standing Note some attributes we are NOT retrieving ID number… but not name Age… but not date of birth Zip code… but not street address LITA Forum - Michael Doran - Nov 19, 2016
24
LITA Forum - Michael Doran - Nov 19, 2016
3 other important data tables LITA Forum - Michael Doran - Nov 19, 2016
25
LITA Forum - Michael Doran - Nov 19, 2016
Other Data Table #1 Knowing the affiliation of users who are not students helps fill in the gaps when linking library use data. LITA Forum - Michael Doran - Nov 19, 2016
26
LITA Forum - Michael Doran - Nov 19, 2016
CEDAR/LDAP 698,000 records; 1 attribute All records that have a UTA ID Only attribute is Primary Affiliation student faculty employee staff affiliate CEDAR/LDAP 137,000 records; 16 attributes Bb A 43,000 recs; 16 attrib’s Attribute is “utaPrimaryAffiliation” student faculty employee staff affiliate LITA Forum - Michael Doran - Nov 19, 2016
27
LITA Forum - Michael Doran - Nov 19, 2016
Other Data Table #2 Cross-reference table for different unique identifiers LITA Forum - Michael Doran - Nov 19, 2016
28
All the UTA IDs (and associated NetIDs)
Problem: Library data sets may use different unique identifiers (e.g. Institutional ID number vs. NetID) preventing linking them together. Demographic data only has UTA ID as an identifier Much of the use data (e.g. ILLiad, Ezproxy, OpenRoom) only has the users’ NetID as an identifier LITA Forum - Michael Doran - Nov 19, 2016
29
LITA Forum - Michael Doran - Nov 19, 2016
Other Data Table #3 Cross-reference table for cryptographic hash values LITA Forum - Michael Doran - Nov 19, 2016
30
Making a (cryptographic) hash of it
A one-way hash function is an algorithm that takes a string (in this case, a UTA ID number) and returns a fixed- length alphanumeric string (the “hash value”). foo.pl LITA Forum - Michael Doran - Nov 19, 2016
31
Making a (cryptographic) hash of it
Slightly different strings get vastly different hash values The same string always gets the same hash value* LITA Forum - Michael Doran - Nov 19, 2016
32
Making a (cryptographic) hash of it
*The same string always gets the same hash value Which is problematic, since UTA IDs are known to be 10-digit numbers. It wouldn’t be difficult to generate hash values for all the 10 digit numbers in the ranges used for UTA IDs and have a 10-digit number/hash value table, essentially reversing the process. LITA Forum - Michael Doran - Nov 19, 2016 1
33
LITA Forum - Michael Doran - Nov 19, 2016
Cryptographic salt A cryptographic salt is random data that is used as an additional input to a one-way hash. bar.pl LITA Forum - Michael Doran - Nov 19, 2016
34
LITA Forum - Michael Doran - Nov 19, 2016
Cryptographic salt The same input string (UTA ID) gets a different hash value each time… …because it’s being combined with a different random salt each time the SHA256 algorithm is applied. LITA Forum - Michael Doran - Nov 19, 2016
35
LITA Forum - Michael Doran - Nov 19, 2016
This will allow us to do data anonymization [To be continued…] LITA Forum - Michael Doran - Nov 19, 2016
36
Quick Review of What’s in LIBLAND
Other Data Use Data Demographic Data LITA Forum - Michael Doran - Nov 19, 2016
37
LIBLAND Data Data Data Data Data Data Centralized database LIBLAND
Use data from various library systems Data Data Data Data Data Demographic data from LIBLAND Data Centralized database Stylized, simple, graphic LIBLAND campus system LITA Forum - Michael Doran - Nov 19, 2016
38
LITA Forum - Michael Doran - Nov 19, 2016
Anonymize the data in an MS Access database. LITA Forum - Michael Doran - Nov 19, 2016
39
LITA Forum - Michael Doran - Nov 19, 2016
40
LITA Forum - Michael Doran - Nov 19, 2016
Yikes! LITA Forum - Michael Doran - Nov 19, 2016
41
LITA Forum - Michael Doran - Nov 19, 2016
Library use data Demographic data Start small(er) System A LDAP Database Directory To get started on a “LIBLAND” project all you need are: One library use data source One demographic data source LIBLAND script script SQL load files Database LITA Forum - Michael Doran - Nov 19, 2016
42
LITA Forum - Michael Doran - Nov 19, 2016
Requirements Expertise in: Database design SQL Programming (a scripting language such as Perl, Python, or PHP) Access to: (A separate, secure) database server Library systems containing use data Campus systems with demographic data LITA Forum - Michael Doran - Nov 19, 2016
43
LITA Forum - Michael Doran - Nov 19, 2016
scripts Library use data Demographic data System A LDAP Recommend a scripting language like Perl, PHP, or Python Script needs to: Connect to system Execute a query Parse data Output data (as SQL load file) Database Directory LIBLAND script script SQL load files You will need “connector” library/module for the system you are connecting to: e.g. for Perl, the DBI/DBD::Oracle modules for connecting to an Oracle database, or Net::LDAP for connecting to an LDAP directory. Database LITA Forum - Michael Doran - Nov 19, 2016
44
LITA Forum - Michael Doran - Nov 19, 2016
SQL load files Library use data Demographic data System A LDAP Scripts can/should output data as SQL “INSERT” statements For output consisting of many rows of data… Start SQL load file with SET autocommit=0; End with: COMMIT; Start a new INSERT statement every 10,000 rows Database Directory LIBLAND script script SQL load files Database LITA Forum - Michael Doran - Nov 19, 2016
45
SQL Load File mysql -u libland -p libland < illiad.sql
Command to load file: mysql -u libland -p libland < illiad.sql LITA Forum - Michael Doran - Nov 19, 2016
46
Granularity of Use Data Retrieved
privacy We’re not pulling citation data… LITA Forum - Michael Doran - Nov 19, 2016
47
Granularity of Use Data Retrieved
privacy We extract the “destination host” but not the full URL (w/ query string) that identifies the exact resource. Note: By default, EZProxy logs do not retain a username (“session ID” is default); capturing that data requires a configuration change. LITA Forum - Michael Doran - Nov 19, 2016
48
Distributing LIBLAND Data
LITA Forum - Michael Doran - Nov 19, 2016
49
LITA Forum - Michael Doran - Nov 19, 2016
LIBLAND Tables “Other” Data Use Data Demographic Data LITA Forum - Michael Doran - Nov 19, 2016
50
Tables... and Views Views are virtual tables that get created on-the-fly via an SQL select statement. Each view in LIBLAND contains the same data as in the table EXCEPT the UTA ID (or NetID) is replaced with the one-way hash value. The views are what get exported from the LIBLAND server and imported into an MS Access database for distribution to staff.
51
On secure server Distributed to staff
52
No Identifier (only a SHA-256 cryptographic hash) privacy
LITA Forum - Michael Doran - Nov 19, 2016
53
LITA Forum - Michael Doran - Nov 19, 2016
Why go to that trouble? Data privacy is reason #1, #2, & #3 Bonus reason: If there is intent to publish or present the results of the analysis, you typically have to get institutional review board (IRB) approval. In advance. However… LITA Forum - Michael Doran - Nov 19, 2016
54
IRB Review Exemption (YMMV, always discuss with your IRB)
Institutional Review Board LITA Forum - Michael Doran - Nov 19, 2016
55
LITA Forum - Michael Doran - Nov 19, 2016
University Analytics: "We are the Borg. Your data will be added to our own. Resistance is futile." (Custom) Ed Hall cartoon reprinted by permission ANALYTICS LITA Forum - Michael Doran - Nov 19, 2016
56
LITA Forum - Michael Doran - Nov 19, 2016
Questions? Please feel free to contact: Michael Doran Systems Librarian University of Texas at Arlington LITA Forum - Michael Doran - Nov 19, 2016
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.