Presentation is loading. Please wait.

Presentation is loading. Please wait.

Creating a Library Learning Analytics Database

Similar presentations


Presentation on theme: "Creating a Library Learning Analytics Database"— Presentation transcript:

1 Creating a Library Learning Analytics Database
Michael Doran, Systems Librarian University of Texas at Arlington Library and Information Technology Association

2 LITA Forum - Michael Doran - Nov 19, 2016
To be covered… What is a library learning analytics database? Why is it needed? A look under the hood Security & privacy issues Library vs. campus systems LITA Forum - Michael Doran - Nov 19, 2016

3 LITA Forum - Michael Doran - Nov 19, 2016
Learning analytics Learning analytics is the measurement, collection, analysis and reporting of data about learners and their contexts, for purposes of understanding and optimizing learning and the environments in which it occurs. [1] [1] From “1st International Conference on Learning Analytics and Knowledge 2011” via Wikipedia article on “Learning analytics” LITA Forum - Michael Doran - Nov 19, 2016 1

4 Creating a library learning analytics database…
…what problem(s) does that solve? LITA Forum - Michael Doran - Nov 19, 2016

5 Obligatory graphic of silos
“silos” photo by Doc Searls CC BY 2.0 LITA Forum - Michael Doran - Nov 19, 2016

6 LITA Forum - Michael Doran - Nov 19, 2016
Problems Library use data resides in separate systems Library systems typically don’t contain the student demographic information (e.g. major, academic program, GPA, student classification, etc.) needed to do learning analytics [a GOOD thing] Library data sets may use different unique identifiers (e.g. Institutional ID number vs. NetID) preventing linking them together LITA Forum - Michael Doran - Nov 19, 2016

7 Data Data Data Data Data Data Centralized database
Use data from various library systems Data Data Data Data Data Demographic data from Library Learning Analytics Database Data Centralized database Chart 12 syllables campus system LITA Forum - Michael Doran - Nov 19, 2016

8 “LIBLAND” LIBrary Learning ANalytics Database
LITA Forum - Michael Doran - Nov 19, 2016

9 LIBLAND Data Data Data Data Data Data Centralized database LIBLAND
Use data from various library systems Data Data Data Data Data Demographic data from LIBLAND Data Centralized database LIBLAND campus system LITA Forum - Michael Doran - Nov 19, 2016 1

10 Examples of systems with library use data
LITA Forum - Michael Doran - Nov 19, 2016

11 Entrance/exit gate turnstiles
Users swipe their “Mav Express” ID card to both enter and exit. Renovation in 2014 Exit Entry LITA Forum - Michael Doran - Nov 19, 2016

12 Interlibrary Loan Requests (e.g. ILLiad)
LITA Forum - Michael Doran - Nov 19, 2016

13 Group Study Room Reservations (e.g. OpenRoom)
LITA Forum - Michael Doran - Nov 19, 2016

14 Off-campus Access to E-resources (e.g. EZproxy)
Note: approximately 15,000 (out of 43,000) Spring 2016 students were online only. LITA Forum - Michael Doran - Nov 19, 2016

15 ILS Catalog (e.g. Voyager)
LITA Forum - Michael Doran - Nov 19, 2016

16 LIBLAND Data Data Data Data Data Data Centralized database LIBLAND
Use data from various library systems Data Data Data Data Data Demographic data from LIBLAND Data Centralized database LIBLAND campus system LITA Forum - Michael Doran - Nov 19, 2016 1

17 Examples of systems with demographic data
LITA Forum - Michael Doran - Nov 19, 2016

18 Campus LDAP Directory (“CEDAR” at UT Arlington)
“provide a consolidated standards-based directory which can provide consistent and complete information on students, faculty, staff, courses, organizations, and other electronically-describable entities and relationships” Central Enterprise Directory and Authentication Realm (CEDAR) LITA Forum - Michael Doran - Nov 19, 2016

19 Blackboard Analytics The center (for Distance Education] aspires to [...] be a resource to faculty and program administrators driving the use of learning analytics, including student learning outcomes; [...] From UT Arlington’s Center for Distance Education LITA Forum - Michael Doran - Nov 19, 2016

20 LITA Forum - Michael Doran - Nov 19, 2016
Your Campus? Access to, or data dumps from: LDAP directory Peoplesoft Banner LMS (Learning Management System) LITA Forum - Michael Doran - Nov 19, 2016

21 Demographic data sources
Past, present, and future students CEDAR/LDAP 137,000 records; 16 attributes Bb A 43,000 recs; 16 attrib’s LDAP criteria: utaPersonAffiliation=student Blackb’d: Student registered for the term/semester in question Total CEDAR records [filter: (|(utaID=*)(utaEmplID=*))] = 897,067 utaPersonAffiliation = Primary affiliation and all secondary affiliations, as those of: faculty, student, staff, alum, member, affiliate, employee, applicant Current semester’s students LITA Forum - Michael Doran - Nov 19, 2016

22 Attributes (from demographic data sources)
LDAP (CEDAR) Blackboard Analytics UTA ID Student classification Academic program Major Grade points Hours complete GPA (calculated) Student status (i.e. enrolled now?) Ethnicity Gender Permanent address zip code Student address zip code Enrollment code Enrollment session Expected graduation date Library student employee? UTA ID Enrollment term Academic program Gender Ethnicity Age Tuition residency Student type College/school Department Academic plan Is academic partner? Is online student? Instruction mode Academic load Academic standing Attributes chosen was done in consultation with Director of Quantitative Assessment (who coordinated with Library Management Team) LITA Forum - Michael Doran - Nov 19, 2016

23 Attributes (from demographic data sources)
privacy LDAP (CEDAR) Blackboard Analytics UTA ID Student classification Academic program Major Grade points Hours complete GPA (calculated) Student status (i.e. enrolled now?) Ethnicity Gender Permanent address zip code Student address zip code Enrollment code Enrollment session Expected graduation date Is library student employee? UTA ID Enrollment term Academic program Gender Ethnicity Age Tuition residency Student type College/school Department Academic plan Is academic partner? Is online student? Instruction mode Academic load Academic standing Note some attributes we are NOT retrieving ID number… but not name Age… but not date of birth Zip code… but not street address LITA Forum - Michael Doran - Nov 19, 2016

24 LITA Forum - Michael Doran - Nov 19, 2016
3 other important data tables LITA Forum - Michael Doran - Nov 19, 2016

25 LITA Forum - Michael Doran - Nov 19, 2016
Other Data Table #1 Knowing the affiliation of users who are not students helps fill in the gaps when linking library use data. LITA Forum - Michael Doran - Nov 19, 2016

26 LITA Forum - Michael Doran - Nov 19, 2016
CEDAR/LDAP 698,000 records; 1 attribute All records that have a UTA ID Only attribute is Primary Affiliation student faculty employee staff affiliate CEDAR/LDAP 137,000 records; 16 attributes Bb A 43,000 recs; 16 attrib’s Attribute is “utaPrimaryAffiliation” student faculty employee staff affiliate LITA Forum - Michael Doran - Nov 19, 2016

27 LITA Forum - Michael Doran - Nov 19, 2016
Other Data Table #2 Cross-reference table for different unique identifiers LITA Forum - Michael Doran - Nov 19, 2016

28 All the UTA IDs (and associated NetIDs)
Problem: Library data sets may use different unique identifiers (e.g. Institutional ID number vs. NetID) preventing linking them together. Demographic data only has UTA ID as an identifier Much of the use data (e.g. ILLiad, Ezproxy, OpenRoom) only has the users’ NetID as an identifier LITA Forum - Michael Doran - Nov 19, 2016

29 LITA Forum - Michael Doran - Nov 19, 2016
Other Data Table #3 Cross-reference table for cryptographic hash values LITA Forum - Michael Doran - Nov 19, 2016

30 Making a (cryptographic) hash of it
A one-way hash function is an algorithm that takes a string (in this case, a UTA ID number) and returns a fixed- length alphanumeric string (the “hash value”). foo.pl LITA Forum - Michael Doran - Nov 19, 2016

31 Making a (cryptographic) hash of it
Slightly different strings get vastly different hash values The same string always gets the same hash value* LITA Forum - Michael Doran - Nov 19, 2016

32 Making a (cryptographic) hash of it
*The same string always gets the same hash value Which is problematic, since UTA IDs are known to be 10-digit numbers. It wouldn’t be difficult to generate hash values for all the 10 digit numbers in the ranges used for UTA IDs and have a 10-digit number/hash value table, essentially reversing the process. LITA Forum - Michael Doran - Nov 19, 2016 1

33 LITA Forum - Michael Doran - Nov 19, 2016
Cryptographic salt A cryptographic salt is random data that is used as an additional input to a one-way hash. bar.pl LITA Forum - Michael Doran - Nov 19, 2016

34 LITA Forum - Michael Doran - Nov 19, 2016
Cryptographic salt The same input string (UTA ID) gets a different hash value each time… …because it’s being combined with a different random salt each time the SHA256 algorithm is applied. LITA Forum - Michael Doran - Nov 19, 2016

35 LITA Forum - Michael Doran - Nov 19, 2016
This will allow us to do data anonymization [To be continued…] LITA Forum - Michael Doran - Nov 19, 2016

36 Quick Review of What’s in LIBLAND
Other Data Use Data Demographic Data LITA Forum - Michael Doran - Nov 19, 2016

37 LIBLAND Data Data Data Data Data Data Centralized database LIBLAND
Use data from various library systems Data Data Data Data Data Demographic data from LIBLAND Data Centralized database Stylized, simple, graphic LIBLAND campus system LITA Forum - Michael Doran - Nov 19, 2016

38 LITA Forum - Michael Doran - Nov 19, 2016
Anonymize the data in an MS Access database. LITA Forum - Michael Doran - Nov 19, 2016

39 LITA Forum - Michael Doran - Nov 19, 2016

40 LITA Forum - Michael Doran - Nov 19, 2016
Yikes! LITA Forum - Michael Doran - Nov 19, 2016

41 LITA Forum - Michael Doran - Nov 19, 2016
Library use data Demographic data Start small(er) System A LDAP Database Directory To get started on a “LIBLAND” project all you need are: One library use data source One demographic data source LIBLAND script script SQL load files Database LITA Forum - Michael Doran - Nov 19, 2016

42 LITA Forum - Michael Doran - Nov 19, 2016
Requirements Expertise in: Database design SQL Programming (a scripting language such as Perl, Python, or PHP) Access to: (A separate, secure) database server Library systems containing use data Campus systems with demographic data LITA Forum - Michael Doran - Nov 19, 2016

43 LITA Forum - Michael Doran - Nov 19, 2016
scripts Library use data Demographic data System A LDAP Recommend a scripting language like Perl, PHP, or Python Script needs to: Connect to system Execute a query Parse data Output data (as SQL load file) Database Directory LIBLAND script script SQL load files You will need “connector” library/module for the system you are connecting to: e.g. for Perl, the DBI/DBD::Oracle modules for connecting to an Oracle database, or Net::LDAP for connecting to an LDAP directory. Database LITA Forum - Michael Doran - Nov 19, 2016

44 LITA Forum - Michael Doran - Nov 19, 2016
SQL load files Library use data Demographic data System A LDAP Scripts can/should output data as SQL “INSERT” statements For output consisting of many rows of data… Start SQL load file with SET autocommit=0; End with: COMMIT; Start a new INSERT statement every 10,000 rows Database Directory LIBLAND script script SQL load files Database LITA Forum - Michael Doran - Nov 19, 2016

45 SQL Load File mysql -u libland -p libland < illiad.sql
Command to load file: mysql -u libland -p libland < illiad.sql LITA Forum - Michael Doran - Nov 19, 2016

46 Granularity of Use Data Retrieved
privacy We’re not pulling citation data… LITA Forum - Michael Doran - Nov 19, 2016

47 Granularity of Use Data Retrieved
privacy We extract the “destination host” but not the full URL (w/ query string) that identifies the exact resource. Note: By default, EZProxy logs do not retain a username (“session ID” is default); capturing that data requires a configuration change. LITA Forum - Michael Doran - Nov 19, 2016

48 Distributing LIBLAND Data
LITA Forum - Michael Doran - Nov 19, 2016

49 LITA Forum - Michael Doran - Nov 19, 2016
LIBLAND Tables “Other” Data Use Data Demographic Data LITA Forum - Michael Doran - Nov 19, 2016

50 Tables... and Views Views are virtual tables that get created on-the-fly via an SQL select statement. Each view in LIBLAND contains the same data as in the table EXCEPT the UTA ID (or NetID) is replaced with the one-way hash value. The views are what get exported from the LIBLAND server and imported into an MS Access database for distribution to staff.

51 On secure server Distributed to staff

52 No Identifier (only a SHA-256 cryptographic hash) privacy
LITA Forum - Michael Doran - Nov 19, 2016

53 LITA Forum - Michael Doran - Nov 19, 2016
Why go to that trouble? Data privacy is reason #1, #2, & #3 Bonus reason: If there is intent to publish or present the results of the analysis, you typically have to get institutional review board (IRB) approval. In advance. However… LITA Forum - Michael Doran - Nov 19, 2016

54 IRB Review Exemption (YMMV, always discuss with your IRB)
Institutional Review Board LITA Forum - Michael Doran - Nov 19, 2016

55 LITA Forum - Michael Doran - Nov 19, 2016
University Analytics: "We are the Borg. Your data will be added to our own. Resistance is futile." (Custom) Ed Hall cartoon reprinted by permission ANALYTICS LITA Forum - Michael Doran - Nov 19, 2016

56 LITA Forum - Michael Doran - Nov 19, 2016
Questions? Please feel free to contact: Michael Doran Systems Librarian University of Texas at Arlington LITA Forum - Michael Doran - Nov 19, 2016


Download ppt "Creating a Library Learning Analytics Database"

Similar presentations


Ads by Google