Download presentation
Presentation is loading. Please wait.
Published byOwen Maurice Poole Modified over 9 years ago
1
Fundamentals, Design, and Implementation, 9/e Chapter 10 Managing Databases with Oracle 9i SII 654 Fall 2005
2
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/2 Copyright © 2004 Introduction Oracle is the world’s most popular DBMS It is a powerful and robust DBMS that runs on many different operating systems Oracle DBMS engine: Personal Oracle and Enterprise Oracle Example of Oracle products –SQL*Plus: a utility for processing SQL and creating components like stored procedures and triggers PL/SQL is a programming language that adds programming constructs to the SQL language –Oracle Developer (Forms & Reports Builder) –Oracle Designer
3
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/3 Copyright © 2004 Creating an Oracle Database Installing Oracle –Install Oracle 9i Client to use an already created database –Install Oracle 9i Personal Edition to create your own databases Three ways to create an Oracle database –Via the Oracle Database Configuration Assistant –Via the Oracle-supplied database creation procedures –Via the SQL CREATE DATABASE command
4
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/4 Copyright © 2004 SQL*Plus Oracle SQL*Plus or the Oracle Enterprise Manager Console may be used to manage an Oracle database SQL*Plus is a text editor available in all Oracle Except inside quotation marks of strings, Oracle commands are case-insensitive The semicolon (;) terminates a SQL statement The right-leaning slash (/) executes SQL statement stored in Oracle buffer SQL*Plus can be used to –Enter SQL statements –Submit SQL files created by text editors, e.g., notepad, to Oracle
5
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/5 Copyright © 2004 Example: SQL*Plus Prompt
6
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/6 Copyright © 2004 SQL*Plus Buffer SQL*Plus keeps the current statements in a multi-line buffer without executing it LIST is used to see the contents of the buffer –LIST [line_number] is used to change the current line CHANGE/astring/bstring/ is used to change the contents of the current line –astring = the string you want to change –bstring = what you want to change it to Example: change/Table_Name/*/ –‘Table_Name’ is replaced with ‘*’
7
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/7 Copyright © 2004 Example: SQL*Plus Buffer
8
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/8 Copyright © 2004 Creating Tables Some of the SQL-92 CREATE TABLE statements need to be modified for Oracle –Oracle does not support a CASCADE UPDATE constraint –Int data type is interpreted by Oracle as Number(38) –Varchar data type is interpreted as VarChar2 –Money or currency is defined in Oracle using the Numeric data type Oracle sequences must be used for surrogate keys DESCRIBE or DESC command is used to view table status
9
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/9 Copyright © 2004 Oracle Data Types
10
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/10 Copyright © 2004 Oracle Sequences A sequence is an object that generates a sequential series of unique numbers It is the best way to work with surrogate keys in Oracle Two sequence methods –NextVal provides the next value in a sequence –CurrVal provides the current value in a sequence Using sequences does not guarantee valid surrogate key values because it is possible to have missing, duplicate, or wrong sequence value in the table
11
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/11 Copyright © 2004 Example: Sequences Creating sequence CREATE SEQUENCE CustID INCREMENT BY 1 START WITH 1000; Entering data using sequence INSERT INTO CUSTOMER (CustomerID, Name, AreaCode, PhoneNumber) VALUES (CustID.NextVal, ‘Mary Jones’, ‘350’, ‘555–1234); Retrieving the row just created SELECT * FROM CUSTOMER WHERE CustomerID = CustID.CurrVal
12
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/12 Copyright © 2004 DROP and ALTER Statements Drop statements may be used to remove structures from the database –DROP TABLE MYTABLE; Any data in the MYTABLE table will be lost –DROP SEQUENCE MySequence; ALTER statement may be used to drop (add) a column –ALTER TABLE MYTABLE DROP COLUMN MyColumn; –ALTER TABLE MYTABLE ADD C1 NUMBER(4);
13
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/13 Copyright © 2004 TO_DATE Function Oracle requires dates in a particular format TO_DATE function may be used to identify the format –TO_DATE(‘11/12/2002’,’MM/DD/YYYY’) 11/12/2002 is the date value MM/DD/YYYY is the pattern to be used when interpreting the date TO_DATE function can be used with the INSERT and UPDATE statement to enter data –INSERT INTO T1 VALUES (100, TO_DATE (‘01/05/02’, ‘DD/MM/YY’);
14
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/14 Copyright © 2004 Creating Indexes Indexes are created to –Enforce uniqueness on columns –Facilitate sorting –Enable fast retrieval by column values Good candidates for indexes are columns that are frequently used with equal conditions in WHERE clause or in a join Example: –CREATE INDEX CustNameIdx ON CUSTOMER(Name); –CREATE UNIQUE INDEX WorkUniqueIndex ON WORK(Title, Copy, ArtistID);
15
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/15 Copyright © 2004 Restrictions On Column Modifications A column may be dropped at any time and all data will be lost A column may be added at any time as long as it is a NULL column To add a NOT NULL column –Add a NULL column –Fill the new column in every row with data –Change its structure to NOT NULL ALTER TABLE T1 MODIFY C1 NOT NULL;
16
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/16 Copyright © 2004 Creating Views SQL-92 CREATE VIEW command can be used to create views in SQL*Plus Oracle allows the ORDER BY clause in view definitions Only Oracle 9i supports the JOIN…ON syntax Example: CREATE VIEW CustomerInterests AS SELECT C.Name as Customer, A.Name as Artist FROM CUSTOMER C JOIN CUSTOMER_ARTIST_INT I ON C.CustomerID = I.CustomerID JOIN ARTIST A ON I.ArtistID = A.ArtistID;
17
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/17 Copyright © 2004 Enterprise Manager Console The Oracle Enterprise Manager Console provides graphical facilities for managing an Oracle database The utility can be used to manage –Database structures such as tables and views –User accounts, passwords, roles, and privileges The Manager Console includes a SQL scratchpad for executing SQL statements
18
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/18 Copyright © 2004 Application Logic Oracle database application can be processed using –Programming language to invoke Oracle DBMS commands –Stored procedures –Start command to invoke database commands stored in.sql files –Triggers
19
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/19 Copyright © 2004 Stored Procedures A stored procedure is a PL/SQL or Java program stored within the database Stored procedures are programs that can –Have parameters –Invoke other procedures and functions –Return values –Raise exceptions A stored procedure must be compiled and stored in the database Execute or Exec command is used to invoke a stored procedure –Exec Customer_Insert (‘Michael Bench’, ‘203’, ‘555- 2014’, ‘US’);
20
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/20 Copyright © 2004 Example: Stored Procedure Insert Figure 10-20 IN signifies input parameters OUT signifies an output parameter IN OUT signifies a parameter used for both input and output Variables are declared after the keyword AS
21
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/21 Copyright © 2004 Triggers Oracle triggers are PL/SQL or Java procedures that are invoked when specified database activity occurs Triggers can be used to –Enforce a business rule –Set complex default values –Update a view –Perform a referential integrity action –Handle exceptions
22
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/22 Copyright © 2004 Triggers (cont.) Trigger types –A command trigger will be fired once per SQL command –A row trigger will be fired once for every row involved in the processing of a SQL command Three types of row triggers: BEFORE, AFTER, and INSTEAD OF BEFORE and AFTER triggers are placed on tables while INSTEAD OF triggers are placed on views Each trigger can be fired on insert, update, or delete commands
23
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/23 Copyright © 2004 Data Dictionary Oracle maintains a data dictionary of metadata The metadata of the dictionary itself are stored in the table DICT SELECT Table_Name, Comments FROM DICT WHERE Table_Name LIKE (‘%TABLES%’); USER_TABLES contains information about user or system tables DESC USER_TABLES;
24
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/24 Copyright © 2004 Example Oracle Metadata
25
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/25 Copyright © 2004 Concurrency Control Oracle processes database changes by maintaining a System Change Number (SCN) –SCN is a database-wide value that is incremented by Oracle when database changes are made With SCN, SQL statements always read a consistent set of values; those that were committed at or before the time the statement was started Oracle only reads committed changes; it will never reads dirty data
26
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/26 Copyright © 2004 Oracle Transaction Isolation Oracle supports the following transaction isolation levels –Read Committed: Oracle’s default transaction isolation level since it never reads uncommitted data changes –Serializable: Dirty reads are not possible, repeated reads yield the same results, and phantoms are not possible –Read Only: All statements read consistent data. No inserts, updates, or deletions are possible –Explicit locks: Not recommended
27
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/27 Copyright © 2004 Oracle Security Oracle security components: –An ACCOUNT is a user account –A PROFILE is a set of system resource maximums that are assigned to an account –A PRIVILEGE is the right to perform a task –A ROLE consists of groups of PRIVILEGEs and other ROLEs
28
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/28 Copyright © 2004 Account System Privileges Each ACCOUNT can be allocated many SYSTEM PRIVILEGEs and many ROLEs An ACCOUNT has all the PRIVILEGEs –That have been assigned directly –Of all of its ROLEs –Of all of its ROLEs that are inherited through ROLE connections A ROLE can have many SYSTEM PRIVILEGEs and it may also have a relationship to other ROLEs ROLEs simplify the administration of the database –A set of privileges can be assigned to or removed from a ROLE just once
29
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/29 Copyright © 2004 Account Authentication Accounts can be authenticated by –Password –The host operating system Password management can be specified via PROFILEs
30
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/30 Copyright © 2004 Oracle Recovery Facilities Three file types for Oracle recovery: –Datafiles contain user and system data –ReDo log files contain logs of database changes OnLine ReDo files are maintained on disk and contain the rollback segments from recent database changes Offline or Archive ReDo files are backups of the OnLine ReDo files –Control files describe the name, contents, and locations of various files used by Oracle
31
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/31 Copyright © 2004 Oracle Recovery Facilities (cont.) Oracle can operate in either ARCHIVELOG or NOARCHIVELOG mode –If running in ARCHIVELOG mode, Oracle logs all changes to the database –When the OnLine ReDo files fill up, they are copied to the Archive ReDo files The Oracle Recovery Manager (RMAN) is a utility program used to create backups and to perform recovery
32
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/32 Copyright © 2004 Types of Failure Oracle recovery techniques depend on the type of failure –An application failure due to application logic errors –An instance failure occurs when Oracle itself fails due to an operating system or computer hardware failure Oracle can recover from application and instance failure without using the archived log file –A media failure occurs when Oracle is unable to write to a physical file because of a disk failure or corrupted files The database is restored from a backup
33
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/33 Copyright © 2004 Oracle Backup Facilities Two kinds of backups A consistent backup: Database activity must be stopped and all uncommitted changes have been removed from the datafiles –Cannot be done if the database supports 24/7 operations An inconsistent backup: Backup is made while Oracle is processing the database –An inconsistent backup can be made consistent by processing an archive log file
34
Fundamentals, Design, and Implementation, 9/e Chapter 10 Managing Databases with Oracle 9i Instructor: Dragomir R. Radev Winter 2005
35
Fundamentals, Design, and Implementation, 9/e KDD and Data Mining Instructor: Dragomir R. Radev Winter 2005
36
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/36 Copyright © 2004 The big problem Billions of records A small number of interesting patterns “Data rich but information poor”
37
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/37 Copyright © 2004 Data mining Knowledge discovery Knowledge extraction Data/pattern analysis
38
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/38 Copyright © 2004 Types of source data Relational databases Transactional databases Web logs Textual databases
39
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/39 Copyright © 2004 Association rules 65% of all customers who buy beer and tomato sauce also buy pasta and chicken wings Association rules: X Y
40
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/40 Copyright © 2004 Association analysis IF 20 < age < 30 AND 20K < INCOME < 30K THEN –Buys (“CD player”) SUPPORT = 2%, CONFIDENCE = 60%
41
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/41 Copyright © 2004 Basic concepts Minimum support threshold Minimum confidence threshold Itemsets Occurrence frequency of an itemset
42
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/42 Copyright © 2004 Association rule mining Find all frequent itemsets Generate strong association rules from the frequent itemsets
43
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/43 Copyright © 2004 Support and confidence Support (X) Confidence (X Y) = Support(X+Y) / Support (X)
44
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/44 Copyright © 2004 Example TIDList of item IDs T100I1, I2, I5 T200I2, I4 T300I2, I3 T400I1, I2, I4 T500I1, I3 T600I2, I3 T700I1, I3 T800I1, I2, I3, I5 T900I1, I2, I3
45
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/45 Copyright © 2004 Example (cont’d) Frequent itemset l = {I1, I2, I5} I1 AND I2 I5 C = 2/4 = 50% I1 AND I5 I2 I2 AND I5 I1 I1 I2 AND I5 I2 I1 AND I5 I3 I1 AND I2
46
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/46 Copyright © 2004 Example 2 TIDdateitems T10010/15/99{K, A, D, B} T20010/15/99{D, A, C, E, B} T30010/19/99{C, A, B, E} T40010/22/99{B, A, D} min_sup = 60%, min_conf = 80%
47
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/47 Copyright © 2004 Correlations Corr (A,B) = P (A OR B) / P(A) P (B) If Corr < 1: A discourages B (negative correlation) (lift of the association rule A B)
48
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/48 Copyright © 2004 Contingency table Game^GameSum Video4,0003,5007,500 ^Video2,0005002,500 Sum6,0004,00010,000
49
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/49 Copyright © 2004 Example P({game}) = 0.60 P({video}) = 0.75 P({game,video}) = 0.40 P({game,video})/(P({game})x(P({video })) = 0.40/(0.60 x 0.75) = 0.89
50
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/50 Copyright © 2004 Example 2 hotdogs^hotdogsSum hamburgers20005002500 ^hamburgers100015002500 Sum300020005000
51
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/51 Copyright © 2004 Classification using decision trees Expected information need I (s 1, s 2, …, s m ) = - p i log (p i ) s = data samples m = number of classes
52
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/52 Copyright © 2004 RIDAgeIncomestudentcreditbuys? 1<= 30HighNoFairNo 2<= 30HighNoExcellentNo 331.. 40HighNoFairYes 4> 40MediumNoFairYes 5> 40LowYesFairYes 6> 40LowYesExcellentNo 731.. 40LowYesExcellentYes 8<= 30MediumNoFairNo 9<= 30LowYesFairYes 10> 40MediumYesFairYes 11<= 30MediumYesExcellentYes 1231.. 40MediumNoExcellentYes 1331.. 40HighYesFairYes 14> 40Mediumnoexcellentno
53
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/53 Copyright © 2004 Decision tree induction I(s 1,s 2 ) = I(9,5) = = - 9/14 log 9/14 – 5/14 log 5/14 = = 0.940
54
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/54 Copyright © 2004 Entropy and information gain E(A) = I (s 1j,…,s mj ) S 1j + … + s mj s Entropy = expected information based on the partitioning into subsets by A Gain (A) = I (s 1,s 2,…,s m ) – E(A)
55
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/55 Copyright © 2004 Entropy Age <= 30 s 11 = 2, s 21 = 3, I(s 11, s 21 ) = 0.971 Age in 31.. 40 s 12 = 4, s 22 = 0, I (s 12,s 22 ) = 0 Age > 40 s 13 = 3, s 23 = 2, I (s 13,s 23 ) = 0.971
56
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/56 Copyright © 2004 Entropy (cont’d) E (age) = 5/14 I (s11,s21) + 4/14 I (s12,s22) + 5/14 I (S13,s23) = 0.694 Gain (age) = I (s1,s2) – E(age) = 0.246 Gain (income) = 0.029, Gain (student) = 0.151, Gain (credit) = 0.048
57
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/57 Copyright © 2004 Final decision tree excellent age studentcredit noyesnoyes no 31.. 40 > 40 yes fair
58
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/58 Copyright © 2004 Other techniques Bayesian classifiers X: age <=30, income = medium, student = yes, credit = fair P(yes) = 9/14 = 0.643 P(no) = 5/14 = 0.357
59
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/59 Copyright © 2004 Example P (age < 30 | yes) = 2/9 = 0.222 P (age < 30 | no) = 3/5 = 0.600 P (income = medium | yes) = 4/9 = 0.444 P (income = medium | no) = 2/5 = 0.400 P (student = yes | yes) = 6/9 = 0.667 P (student = yes | no) = 1/5 = 0.200 P (credit = fair | yes) = 6/9 = 0.667 P (credit = fair | no) = 2/5 = 0.400
60
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/60 Copyright © 2004 Example (cont’d) P (X | yes) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044 P (X | no) = 0.600 x 0.400 x 0.200 x 0.400 = 0.019 P (X | yes) P (yes) = 0.044 x 0.643 = 0.028 P (X | no) P (no) = 0.019 x 0.357 = 0.007 Answer: yes/no?
61
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/61 Copyright © 2004 Predictive models Inputs (e.g., medical history, age) Output (e.g., will patient experience any side effects) Some models are better than others
62
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/62 Copyright © 2004 Principles of data mining Training/test sets Error analysis and overfitting Cross-validation Supervised vs. unsupervised methods error input size training test
63
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/63 Copyright © 2004 Representing data Vector space salary credit pay off default
64
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/64 Copyright © 2004 Decision surfaces salary credit pay off default
65
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/65 Copyright © 2004 Decision trees salary credit pay off default
66
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/66 Copyright © 2004 Linear boundary salary credit pay off default
67
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/67 Copyright © 2004 kNN models Assign each element to the closest cluster Demos: –http://www- 2.cs.cmu.edu/~zhuxj/courseproject/knnd emo/KNN.html
68
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/68 Copyright © 2004 Other methods Decision trees Neural networks Support vector machines Demos –http://www.cs.technion.ac.il/~rani/LocBo ost/
69
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/69 Copyright © 2004 arff files @relation weather @attribute outlook {sunny, overcast, rainy} @attribute temperature real @attribute humidity real @attribute windy {TRUE, FALSE} @attribute play {yes, no} @data sunny,85,85,FALSE,no sunny,80,90,TRUE,no overcast,83,86,FALSE,yes rainy,70,96,FALSE,yes rainy,68,80,FALSE,yes rainy,65,70,TRUE,no overcast,64,65,TRUE,yes sunny,72,95,FALSE,no sunny,69,70,FALSE,yes rainy,75,80,FALSE,yes sunny,75,70,TRUE,yes overcast,72,90,TRUE,yes overcast,81,75,FALSE,yes rainy,71,91,TRUE,no
70
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/70 Copyright © 2004 Weka http://www.cs.waikato.ac.nz/ml/weka Methods: rules.ZeroR bayes.NaiveBayes trees.j48.J48 lazy.IBk trees.DecisionStump
71
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/71 Copyright © 2004 kMeans clustering http://www.cc.gatech.edu/~dellaert/html/sof tware.html java weka.clusterers.SimpleKMeans -t data/weather.arff
72
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/72 Copyright © 2004 More useful pointers http://www.kdnuggets.com/ http://www.twocrows.com/booklet.htm
73
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/73 Copyright © 2004 More types of data mining Classification and prediction Cluster analysis (clustering) Outlier analysis Evolution analysis
74
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/74 Copyright © 2004 Clustering Exclusive/overlapping clusters Hierarchical/flat clusters
75
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/75 Copyright © 2004 Methods Single-linkage –One common pair is sufficient –disadvantages: long chains Complete-linkage –All pairs have to match –Disadvantages: too conservative Average-linkage Centroid-based (online) –Look at distances to centroids
76
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/76 Copyright © 2004 k-means Needed: small number k of desired clusters hard vs. soft decisions Example: Weka
77
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/77 Copyright © 2004 k-means 1 initialize cluster centroids to arbitrary vectors 2 while further improvement is possible do 3 for each document d do 4 find the cluster c whose centroid is closest to d 5 assign d to cluster c 6 end for 7 for each cluster c do 8 recompute the centroid of cluster c based on its documents 9 end for 10 end while
78
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/78 Copyright © 2004 Example Cluster the following vectors into two groups: –A = –B = –C = –D = –E = –F =
79
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 10/79 Copyright © 2004 Demos http://vivisimo.com/ http://vivisimo.com/ http://www.elet.polimi.it/upload/matteucc/Clustering/tutorial_h tml/AppletKM.html http://www.elet.polimi.it/upload/matteucc/Clustering/tutorial_h tml/AppletKM.html http://cgm.cs.mcgill.ca/~godfried/student_projects/bonnef_k- means http://cgm.cs.mcgill.ca/~godfried/student_projects/bonnef_k- means http://www.cs.washington.edu/research/imagedatabase/dem o/kmcluster http://www.cs.washington.edu/research/imagedatabase/dem o/kmcluster http://www.cc.gatech.edu/~dellaert/html/software.html http://www.cc.gatech.edu/~dellaert/html/software.html http://www-2.cs.cmu.edu/~awm/tutorials/kmeans11.pdf http://www-2.cs.cmu.edu/~awm/tutorials/kmeans11.pdf http://www.ece.neu.edu/groups/rpl/projects/kmeans/ http://www.ece.neu.edu/groups/rpl/projects/kmeans/ % cd /data2/tools/weka-3-3-4 % export CLASSPATH=/data2/tools/weka-3-3-4/weka.jar % java weka.clusterers.SimpleKMeans -t data/weather.arff
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.