Download presentation
Presentation is loading. Please wait.
Published byIsabel Barr Modified over 11 years ago
1
Data Mining David Eichmann School of Library and Information Science The University of Iowa David Eichmann School of Library and Information Science The University of Iowa
2
Why? Given enough data represented through enough dimensions, we loose the ability to see the patterns
3
How? Decision Trees Nearest Neighbor Clustering Neural Networks Rule Induction K-Means Clustering Decision Trees Nearest Neighbor Clustering Neural Networks Rule Induction K-Means Clustering
4
What is it? The automated extraction of hidden predictive information from databases. Key points Automated Hidden Predictive The automated extraction of hidden predictive information from databases. Key points Automated Hidden Predictive
5
The Typical Process
6
Evaluation Criteria Receiver Operating Characteristic Curves
7
But Nobody Said We Had To Do MATH….
8
Forms of Data Structured Databases Forms Semi-Structured Tables on the Web Bibliographic citations Graphs & charts Unstructured Full text (e.g., journal articles, physician chart notes) Images Structured Databases Forms Semi-Structured Tables on the Web Bibliographic citations Graphs & charts Unstructured Full text (e.g., journal articles, physician chart notes) Images
9
Text Mining Corpus now is a collection of text artifacts Full text when youve got it (e.g. newswire) Metadata when you dont (e.g. MEDLINE) The trick then becomes extracting interesting relationships between interesting entities Who killed who Who works for who Who makes what Corpus now is a collection of text artifacts Full text when youve got it (e.g. newswire) Metadata when you dont (e.g. MEDLINE) The trick then becomes extracting interesting relationships between interesting entities Who killed who Who works for who Who makes what
10
The Classic Entities Persons Organizations Places (Geography) Events Persons Organizations Places (Geography) Events
11
A Newswire Example APW19981001.0262 [Israel(0.271), Jonathan Pollard (0.153), Benjamin Netanyahu(0.102), Bill Clinton(0.102), United States(0.055),...] Persons Bill Clinton (3) Jonathan Pollard (8) Moshe Fogel (2) Benjamin Netanyahu (2) Israeli Embassy (1) Organizations Cabinet (1) Places Israel (16) United States (5) Washington (2) APW19981001.0262 [Israel(0.271), Jonathan Pollard (0.153), Benjamin Netanyahu(0.102), Bill Clinton(0.102), United States(0.055),...] Persons Bill Clinton (3) Jonathan Pollard (8) Moshe Fogel (2) Benjamin Netanyahu (2) Israeli Embassy (1) Organizations Cabinet (1) Places Israel (16) United States (5) Washington (2)
12
In the Medical/Health Realm UMLS an excellent framework Organism Chemical Activity Disease UMLS an excellent framework Organism Chemical Activity Disease
13
A MEDLINE Example Document: 89316090 - Reconstructive surgery in Nicaragua Provided MeSH Keywords Human Nicaragua Z01.107.169.690 Surgery, Plastic/* G02.403.810.788 Phrases [Reconstructive, surgery] [Nicaragua] [letter] MeSH Terms Surgery (1) G02.403.810.762 Letter [Publication Type] (1) Other Phrases Reconstructive surgery (1) Document: 89316090 - Reconstructive surgery in Nicaragua Provided MeSH Keywords Human Nicaragua Z01.107.169.690 Surgery, Plastic/* G02.403.810.788 Phrases [Reconstructive, surgery] [Nicaragua] [letter] MeSH Terms Surgery (1) G02.403.810.762 Letter [Publication Type] (1) Other Phrases Reconstructive surgery (1)
14
Concept Extraction Example Roman forces under Julius Caesar invade Britain. (S (NP (NP Roman forces) (PP under (NP Julius Caesar))) (VP invade (NP Britain)).) Entity Attributes: Concepts: Roman forces under Julius Caesar invade Britain. (S (NP (NP Roman forces) (PP under (NP Julius Caesar))) (VP invade (NP Britain)).) Entity Attributes: Concepts:
15
And a Small Demo…
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.