Download presentation
Presentation is loading. Please wait.
Published byBruce Andrews Modified over 9 years ago
1
2002 Spring Data Mining Term Project Proposal Data Mining Experimental Study with Oracle & MS SQL 012ITI12 Song Mi-Kyoung
2
2002 Spring Data Mining Term Project Proposal 012ITI12 Song Mi-Kyoung2/10 0. Contents Objects of Experimental Study ORACLE 9i Data Mining Package Microsoft SQL 2000 Server Data Sets Summary Bibliography
3
2002 Spring Data Mining Term Project Proposal 012ITI12 Song Mi-Kyoung3/10 1. Objects of Experimental Study Understanding data mining algorithms Understanding Oracle 9i data mining features Understanding MS SQL 2000 data mining features Evaluation each dataset’s data mining results
4
2002 Spring Data Mining Term Project Proposal 012ITI12 Song Mi-Kyoung4/10 2. ORACLE 9i Data Mining Package Oracle Data Mining features –classification –regression trees –neural networks – k -nearest neighbors (memory-based reasoning) –regression –clustering algorithms.
5
2002 Spring Data Mining Term Project Proposal 012ITI12 Song Mi-Kyoung5/10 3. Microsoft SQL 2000 Server Microsoft Data Mining features –Microsoft Decision Trees –Microsoft Clustering
6
2002 Spring Data Mining Term Project Proposal 012ITI12 Song Mi-Kyoung6/10 4. Data Sets – 1/4 Contraceptive Method Choice –Origin: A subset of the 1987 National Indonesia Contraceptive Prevalence Survey –Donated by Tjen-Sien Lim (limt@stat.wisc.edu) –1473 instances, 2 classes, 10 attributes –This dataset is a subset of the 1987 National Indonesia Contraceptive Prevalence Survey. The samples are married women who were either not pregnant or do not know if they were at the time of interview. The problem is to predict the current contraceptive method choice (no use, long-term methods, or short-term methods) of a woman based on her demographic and socio-economic characteristics. –Ftp AccessFtp Access
7
2002 Spring Data Mining Term Project Proposal 012ITI12 Song Mi-Kyoung7/10 4. Data Sets - 2 /4 Adult Database –Donated by Ron Kohavi –Predicting whether income exceeds $50K/yr based on census data –Documentation: On everything –48842 instances, 14 attributes (6 continuous and 8 nominal) –Missing attribute values –Originally listed as the "Census Income" Database. It was renamed because it is cited as the "Adult" database –Ftp AccessFtp Access
8
2002 Spring Data Mining Term Project Proposal 012ITI12 Song Mi-Kyoung8/10 4. Data Sets - 3 /4 Spambase Database –Donated by George Forman (gforman at nospam hpl.hp.com) –Number of Instances: 4601 (1813 Spam = 39.4%) –Number of Attributes: 58 (57 continuous, 1 nominal class label) –The "spam" concept is diverse: advertisements for products/web sites, make money fast schemes, chain letters, pornography... Our collection of spam e-mails came from our postmaster and individuals who had filed spam. Our collection of non-spam e-mails came from filed work and personal e-mails, and hence the word 'george' and the area code '650' are indicators of non-spam. These are useful when constructing a personalized spam filter. One would either have to blind such non- spam indicators or get a very wide collection of non-spam to generate a general purpose spam filter. –Ftp AccessFtp Access
9
2002 Spring Data Mining Term Project Proposal 012ITI12 Song Mi-Kyoung9/10 4. Data Sets - 4 /4 Credit Screening Databases –Japanese Credit Screening Database –Includes domain theory –Positive instances are people who were granted credit –The theory was generated by talking to Japanese domain experts –Credit Card Application Approval Database –Good mix of attributes -- continuous, nominal with small numbers of values, and nominal with larger numbers of values –690 instances, 15 attributes some with missing values –Ftp AccessFtp Access
10
2002 Spring Data Mining Term Project Proposal 012ITI12 Song Mi-Kyoung10/10 5. Bibliography www.oracle.com www.microsoft.com dataset –http://www.ics.uci.edu/~mlearn/MLRepository.html
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.