Download presentation
Presentation is loading. Please wait.
Published byIrene Lindsey Modified over 9 years ago
1
Please do not reuse these slides without prior permission from scott@scottnicholson.com. scott@scottnicholson.com Copyright 2004 Scott Nicholson, Syracuse University School of Information Studies Striking a Balance: Bibliomining and Privacy Scott Nicholson Assistant Professor Syracuse University School of Information Studies http://bibliomining.org scott@scottnicholson.com
2
Please do not reuse these slides without prior permission from scott@scottnicholson.com. scott@scottnicholson.com Copyright 2004 Scott Nicholson, Syracuse University School of Information Studies What is Bibliomining? Bibliomining is the combination of Bibliometrics and Data Mining used on the data produced during the operation of libraries (physical and digital)
3
Please do not reuse these slides without prior permission from scott@scottnicholson.com. scott@scottnicholson.com Copyright 2004 Scott Nicholson, Syracuse University School of Information Studies What is Bibliomining? Application of advanced analysis tools to data produced by libraries May include Data mining Bibliometrics (patterns in scholarship) Online analytical processing (OLAP) Other statistical techniques
4
Please do not reuse these slides without prior permission from scott@scottnicholson.com. scott@scottnicholson.com Copyright 2004 Scott Nicholson, Syracuse University School of Information Studies Goals of Bibliomining Improved decision-making through better understanding of Patron Behavior Library Staff Behavior Behavior of outside organizations Can provide justification for Library management policies and decisions Acquisitions and ILL source selection Collection development decisions Use of library services (funding bodies)
5
Please do not reuse these slides without prior permission from scott@scottnicholson.com. scott@scottnicholson.com Copyright 2004 Scott Nicholson, Syracuse University School of Information Studies Steps in Bibliomining Determine areas of focus Prediction vs. Description Determine data source needs Internal and External Gather data Create data warehouse Select appropriate analysis tools Create & test models / Create reports Analyze results
6
Please do not reuse these slides without prior permission from scott@scottnicholson.com. scott@scottnicholson.com Copyright 2004 Scott Nicholson, Syracuse University School of Information Studies Creating the data warehouse A data warehouse is a collection of cleaned and anonymized data in a relational database and a point for queries Outside of the operational systems Connects disparate data sources into easily accessible database Can be one time or updated on a regular basis
7
Please do not reuse these slides without prior permission from scott@scottnicholson.com. scott@scottnicholson.com Copyright 2004 Scott Nicholson, Syracuse University School of Information Studies Steps in the Warehousing Process Identify fields of interest Determine fields that contain personally identifiable information (PII) Determine combinations of fields that create PII (dept. + level + gender)
8
Please do not reuse these slides without prior permission from scott@scottnicholson.com. scott@scottnicholson.com Copyright 2004 Scott Nicholson, Syracuse University School of Information Studies Methods for dealing with Personally Identifiable Information Use codes, Ids for matching and discard
9
Please do not reuse these slides without prior permission from scott@scottnicholson.com. scott@scottnicholson.com Copyright 2004 Scott Nicholson, Syracuse University School of Information Studies Codes for PII One typical suggestion – code the PII fields, and then record the codes in the database Appropriate for other parties Do not use a reversible encoding procedure to encode variables. This does not protect patron’s information from an investigation.
10
Please do not reuse these slides without prior permission from scott@scottnicholson.com. scott@scottnicholson.com Copyright 2004 Scott Nicholson, Syracuse University School of Information Studies Coding and not discarding Use a code when there is some aspect of the ID that is important Example – IP addresses Think about the use of the field, and code appropriately Do not generate code from original; rather, use other methods for code that capture key information
11
Please do not reuse these slides without prior permission from scott@scottnicholson.com. scott@scottnicholson.com Copyright 2004 Scott Nicholson, Syracuse University School of Information Studies Methods for dealing with Personally Identifiable Information Use for matching and discard
12
Please do not reuse these slides without prior permission from scott@scottnicholson.com. scott@scottnicholson.com Copyright 2004 Scott Nicholson, Syracuse University School of Information Studies Dealing with categories Make sure that combinations of categories don’t identify an individual.
13
Please do not reuse these slides without prior permission from scott@scottnicholson.com. scott@scottnicholson.com Copyright 2004 Scott Nicholson, Syracuse University School of Information Studies Dealing with Textual data Digital Reference transactions Easy to deal with the metadata Hard to deal with the text Manual cleaning of PII Natural Language Processing research Similar problem with deidentification of medial records
14
Please do not reuse these slides without prior permission from scott@scottnicholson.com. scott@scottnicholson.com Copyright 2004 Scott Nicholson, Syracuse University School of Information Studies People to Involve Institutional Research Board (IRB) Legal counsel Ensures you are following state laws for library data Library administration / Board Patrons If there are policies, follow them If there are not, create them
15
Please do not reuse these slides without prior permission from scott@scottnicholson.com. scott@scottnicholson.com Copyright 2004 Scott Nicholson, Syracuse University School of Information Studies Benefits to creating the Data Warehouse Cleaned resource, ready for analysis Outside of operational system Use for regular reports and research Forces library to examine the life of data Are there backup tapes created? How long are backups kept?
16
Please do not reuse these slides without prior permission from scott@scottnicholson.com. scott@scottnicholson.com Copyright 2004 Scott Nicholson, Syracuse University School of Information Studies Striking a Balance A well-designed data warehouse strikes the balance between Protecting Privacy and Maintaining a Data-Based History
17
Please do not reuse these slides without prior permission from scott@scottnicholson.com. scott@scottnicholson.com Copyright 2004 Scott Nicholson, Syracuse University School of Information Studies For more information About bibliomining: http://bibliomining.com About an active data warehouse project: http://metrics.library.upenn.edu/prototype/ datafarm/ About this presentation: http://bibliomining.com/nicholson “The Bibliomining process: Data warehousing and data mining for library decision-making”
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.