Oracle vs SQL Server Dr. Alex Wang
Oracle Text Oracle Text uses standard SQL to do almost everything. Full-text retrieval technology, deal with unstructured data. Data source could be database table, flat files, web sites. Index, search, analyze text and documents. Searching: keyword searching, context query, pattern matching, thematic queries, HTML/XML section searching. Use relevance-ranking to improve search quality. Supported formats: PDF, MS Office, HTML, XML
Search Operators used in Oracle Context search Near - return a score based on the proximity of two or more terms. Pattern search Fuzzy - spelled similar. Soundex - sound alike. Stem - search for all terms with the same root. Use thesaurus Preferred Term - replace query term with prefered term define in a thesaurus. Related Term - Expand to all related term defined in a thesaurus. Synonym - Expand to all terms defined as synonyms. Narrow Term - Expand to all terms defined as the narrower/lower level terms. Broader Term - Expand to all terms defined as broader/higher level terms. Top Term -
Search Operators used in SQL Server CONTAINS can search for: A word near another word. The prefix of a word or phrase. Soundex Function (for search sound alike). A word inflectionally generated from another (for example, the word drive is the inflectional stem of drives, drove, driving, and driven). A word that is a synonym of another word using thesaurus (for example, the word metal can have synonyms such as aluminum and steel).
FeatureOracleMicrosoft Available inSE, EEEE Decision TreeYY Support Vector MachineYN Neural NetworkNY Naive BayesYY Adaptive Bayes NetworkYN K-meansYY Expectation MaximizationNY Orthogonal ClusteringYN Path clusterNY Minimal Descriptor LengthYN Time SeriesYY Association RulesYY Note: Minimal Descriptor Length, identifies the relative importance of an attribute in predicting a given outcome.
Oracle emphasize PL/SQL statement Simple Prediction Query Question: Select all customers who have a high propensity to attrite (> 80% chance) SQL Query: SELECT A.cust_name, A.contact_info FROM customers A WHERE PREDICTION_PROBABILITY(tree_model, ‘attrite’ USING A.*) > 0.8
An Example of Oracle Text Mining Building a DT Models CREATE TABLE dt_settings ( setting_name VARCHAR2(30), setting_value VARCHAR2(30)); BEGIN -- Populate settings table INSERT INTO dt_sample_settings VALUES (dbms_data_mining.algo_name, dbms_data_mining.algo_decision_tree); COMMIT; DBMS_DATA_MINING.CREATE_MODEL( model_name => 'sales_type_model', mining_function => dbms_data_mining.classification, data_table_name => 'sales_dataset', case_id_column_name => 'sales_id', target_column_name => 'sales_type', settings_table_name => 'dt_settings'); END;
An Example of SQL Server Text Mining A Tutorial for Text Classification using SQL Server 2005 Beta2 Data Mining Peter Pyungchul Kim SQL Business Intelligence Microsoft Corporation mmunity/_tutorials/688.aspx
Data Source 5000 postings from 5 news groups We know which posting belong to which group Flat text file Goal: create a model based on these data to classify each posting to its group Randomly chose 70% for training, 30% for testing.
SQL Server You can do it by click through SQL Server GUI tools. 1. SQL Mgmt Studio - Create database, import the data 2. Business Intelligence Development Studio – Build a dictionary, term vectors. 3. Build/Test data mining models
Compare Classification Results