Knowledge Management Systems Knowledge Discovery in Databases Information Retrieval Formal methods to discover information & possibly knowledge. Data collection Documents Usage Data analysis Relationships IR measures
KDD Process Goal: extracting actionable knowledge from data Understandable patterns Rules Updated methods to extend beyond statistical analysis Volumes of data collection Increased computation power Real-time Continuous data Advances in visualization Fayyad, U., G. Piatetsky-Shapiro, et al. (1996). The KDD Process for Extracting Useful Knowledge from Volumes of Data. Communications of the ACM 39(11): 27-34.
KDD in Use Data Mining is only one step Preprocessing Data Transformation Pattern Detection Interpretation Use Most development work is in the preprocessing Most intellectual work should be in forming hypotheses
KDD Practices Classification Regression Clustering Summarization Dependency Modeling Link analysis Sequence analysis
IR & the Semantic Web Rich description of documents enables additional functionality Darpa Agent Markup Language Ontology Interface Layer Is this “semantic markup” derived from tacit or explicit knowledge? How can it be generated? How can it be used? Information Retrieval Question answering (simple & complex) Faith in XML Shah, U., Finin, T., Joshi,A. Cost, R. & Mayfield, J.(2002) Information Retrieval on the Semantic Web, 10th International Conference on Information and Knowledge Management. ACM Press.
Semantic IR How systems should work Events ontology Coordination among individuals Groups? Interdependencies? Processing for Hybrid IR? Trust in ML Trust in System
Navigating Social Cyberspaces Understanding Usenet use Postings Why How Information Distribution Cross postings Specific groups & cultures Free-riders vs. Contributors Usenet readers Smith, Marc. Tools for Navigating Large Social Cyberspaces Ackerman, M. S. AND Malone, T. W. 1990. Answer Garden: A tool for growing organizational memory. SIGOIS Bull. 11, 2&3 (Apr.), 31-39.
Social Cyberspace Dimensions Netscan – social accounting metrics Size of group Culture Social cues Messaging protocols Asynchronous Real time (IM) Discussion Engagement Frequency, Replies Date, Time Thread and Author Tracker Thread Visualization New Threads vs. Replying to Old
Blogs & Social Dimensions Are blogs taking the place of newsgroups? RSS Readers Topic discovery methods Blog rolls Search engines Links Issues of Awareness Posting technologies s. Usenet
Answer Garden A shared organizational memory system Storing, retrieving and viewing information What methods worked best? What about user paricipation? What’s an optimal size? Ackerman, M. S. AND Malone, T. W. 1990. Answer Garden: A tool for growing organizational memory. SIGOIS Bull. 11, 2&3 (Apr.), 31-39.
PeopleGarden Another view of participation How does the community work? Welcoming Volumes of dicsussion Groups found and formed Paired relationships Arguments and issue development Visualizing interaction Personal history Groups and Threads Xiong, Rebecca & Donath, Judith. (1999) PeopleGarden: Creating Data Portraits for Users. Proceedings of UIST. Asheville, NC. (PAAM '96).
Problems in Data Warehousing How about problems in understanding users? Technical issues are easier than social issues Privacy Accuracy Widon, J. (1995). Research Problems in Data Warehousing. Proceedings of the 4th International Conference on Information and Knowledge Management. Nov, 1995. ACM Press.