Download presentation
Presentation is loading. Please wait.
Published byEvelyn Wright Modified over 9 years ago
1
한국정보통신기술협회 / 데이터기술위원회 2001 년도 워크샵 SQL 데이타마이닝 표준 (SQL Standard for Data Mining) Oct. 2001 Hwan-Seung Yong Ewha Womans Univ. http://dblab.ewha.ac.kr/hsyong
2
Oct. 2001H.S.Yong2 Data Mining Standard by ISO/IEC Information Technology – Database Languages – SQL Multimedia and Application Packages – Part 6: Data Mining FCD 13249-6, 2001-05-21 –Vote deadline 2001-10-05 Other Part –Framework –Full-text –Spatial –Still Image
3
Oct. 2001H.S.Yong3 Data Mining Association rule –Given a set of purchase transactions (baskets) which contain a set of items. –Find rules of the form: If a purchase transaction contains item X and item Y then the purchase transaction also contains item Z in N% of all purchase transactions. –Example application: store layout. Clustering/Segmentation –Given a set of rows with a set of fields. Find sets of rows with common characteristics - the so-called clusters. –Example application: customer mailings.
4
Oct. 2001H.S.Yong4 Data Mining Classification –Given a set of rows with a set of fields and a special field the so-called class label. Compute a classification model such that the class label can be predicted by using the model and a set of field values without the class label. –Example application: insurance risk prediction. Regression –Regression is very similar to classification except for the type of the predicted value. Rather than predicting a class label, regression is predicting a continuous value. –Example application: customer ranking.
5
Oct. 2001H.S.Yong5 Computational Patterns Training Phase –Common to all data mining techniques, this is the phase in which the data mining model is computed. Application Phase –Phase during which a row is evaluated against a data mining model and one or more values are computed Test Phase –Phase that reads a set of rows containing values for the target field, evaluates each row as in the application phase, and compares the predicted value to the actual value in the target field. –only used for data mining classification and regression.
6
Oct. 2001H.S.Yong6 Standard Language By defining various user defined types (UDT) –Just specification not implementation Data Mining Model Types –Storage and retrieval of data mining model values Data Mining Setting Types –Define a target field and parameters for algorithms Data Mining Application Result Types –The result of applying a mining model to a row Data Mining Data Types Status Code
7
Oct. 2001H.S.Yong7 Data Mining Model Types Type: DM_RuleModel Type –Result of association rules Method –DM_impRuleModel (CHARACTER LARGE OBJECT(DM_MaxContentLength)) import rule model expressed as PMML spec. Return DM_RuleModel –DM_expRuleModel(): export rule model using PMML –DM_getNORules(): return number of rules –DM_getRuleTask(): return data mining task value Data mining settings etc.
8
Oct. 2001H.S.Yong8 Data Mining Settings Types Setting Mining parameters DM_RuleSettings Method –setMinSupport(DOUBLE PRECISION) –getMinSupport() –DM_ruleUseDataSpec(DM_LogicalDataSpec) logicalDataSpec is an abstraction of source table for input data –DM_ruleGetDataSpec() –DM_ruleSetGroup(CHARACTER VARYING) Identify grouping field for mining association Ex) Purchase transaction etc –DM_ruleGetGroup().
9
Oct. 2001H.S.Yong9 Data Mining Application Result Types The result of applying a mining model to a row DM_ClusResult Type Method –DM_getClusterID() Return cluster identification number –DM_getQuality() Degree of fitness to predicted cluster
10
Oct. 2001H.S.Yong10 Data Mining Data Types Represent input data needed for data mining DM_LogicalDataSpec Type and Routines –Abstraction of set of field for mining Method –DM_addDataSpecFld(CHARACTER VARYING), –DM_remDataSpecFld(CHARACTER VARYING), –DM_getNOFields(), –DM_getFieldName(INTEGER), –DM_setFieldType(CHARACTER VARYING, SMALLINT), Two kinds of type: Categorical, Numeric –DM_getFieldType(CHARACTER VARYING), –DM_compatibleSpec(DM_LogicalDataSpec).
11
Oct. 2001H.S.Yong11 Data Mining Data Types DM_MiningData Type and Routines –Abstraction of input data for mining Method –DM_defMiningData(CHARACTER VARYING) Input is source table and define as mining data –DM_defFldAlias(CHARACTER VARYING, CHARACTER VARYING), Define field name alias –DM_genLogDataSpec() Generate value of type DM_LogicalDataSpec
12
Oct. 2001H.S.Yong12 PMML Predictive Model Markup Language –Easily define predictive model and share the models between companies –XML based –Driven by Data Mining Group www.dmg.org Consosium of Major Data Mining Vendors –Currently version 2.0 –Aug 26, 2001 2 nd workshop on PMML was held and presented PMML 2.0
13
Oct. 2001H.S.Yong13 <!ELEMENT AssociationModel (Extension*, AssocInputStats, AssocItem+, AssocItemset+, AssocRule+)> <!ATTLIST AssociationModel modelName CDATA #IMPLIED > <!ATTLIST AssocInputStats numberOfTransactions %INT-NUMBER; #REQUIRED maxNumberOfItemsPerTA %INT-NUMBER; #IMPLIED avgNumberOfItemsPerTA %REAL-NUMBER; #IMPLIED minimumSupport %PROB-NUMBER; #REQUIRED minimumConfidence %PROB-NUMBER; #REQUIRED lengthLimit %INT-NUMBER; #IMPLIED numberOfItems %INT-NUMBER; #REQUIRED numberOfItemsets %INT-NUMBER; #REQUIRED numberOfRules %INT-NUMBER; #REQUIRED > DTD of Association Rules Model
14
Oct. 2001H.S.Yong14 PMML Example: Association Rule 1/2 t1: Cracker, Coke, Water t2: Cracker, Water t3: Cracker, Water t4: Cracker, Coke, Water
15
Oct. 2001H.S.Yong15 PMML Example: Association Rule 2/2 t1: Cracker, Coke, Water t2: Cracker, Water t3: Cracker, Water t4: Cracker, Coke, Water
16
Oct. 2001H.S.Yong16 Final Remarks Data Mining is hot and promising area like DBMS Standard activity –SQL Data Mining Standard is ready –PMML standard for exchange of mining result is ready –But no software yet Further Research and Standard Area –SQL for Multimedia Data Mining
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.