Copyright © 2012, SAS Institute Inc. All rights reserved. INTRODUCTION TO DATA AND TEXT MINING ANDREW PEASE, 8 MARCH 2013
Copyright © 2012, SAS Institute Inc. All rights reserved.
DATA MINING IS: Discovering patterns, trends and relationships represented in data Developing models to understand and describe characteristics and activity based on these patterns Use insights to help evaluate future options and take fact-based decisions Deploy scores and results for timely, appropriate action
Copyright © 2012, SAS Institute Inc. All rights reserved. INDUSTRY SPECIFIC DATA MINING APPLICATIONS ApplicationWhat is Predicted?Driven Business Decision Credit Scoring (Banking) Measure credit worthiness of new and existing set of customers How to assess and control risk within existing (or new) consumer portfolios? Market Basket Analysis (Retail) Which products are likely to purchased together? How to increase sales with cross-sell/up-sell, loyalty programs, promotions? Asset Maintenance (Utilities, Mfg., Oil & Gas) Identify real drivers of asset or equipment failure How to minimize operational disruptions and maintenance costs? Health & Condition Mgmt. (Health Insurance) Identify patients at risk of a chronic illness & offer treatment program How can we reduce healthcare costs and satisfy patients? Fraud Mgmt. (Govt., Insurance, Banks) Detect unknown fraud cases and future risks How to decrease fraud losses and lower false positives? Drug Discovery (Life Science) Find compounds that have desirable effects & detect drug behavior during trials How to bring drugs quickly and effectively to the marketplace?
Copyright © 2012, SAS Institute Inc. All rights reserved. DATA MINING METHODOLOGY SEMMA
Copyright © 2012, SAS Institute Inc. All rights reserved. G T E W V G H U I B C X A Q W E T D F G J K O I U T C M N X H G A L O J U T Q A Z C F T E R T N J H Y U O P H Y R M W S D F M N B V H J U Y T I P Q A P G F S D W V B U I N S W B C Z A L K J T M A P I O I U X F E W I Y N H K D N Q U P Q P S F T E M X T R G E O
Copyright © 2012, SAS Institute Inc. All rights reserved. Content Categorization Text Mining Sentiment Analysis Ontology Management SAS TEXT ANALYTICS: UNCOVERING THE TECHNOLOGY
Copyright © 2012, SAS Institute Inc. All rights reserved. If data is wrong, the basis for decision making is also faulty. Therefore, the Clinically Correct Time-True Registration system makes sense even beyond our department and hospital. - Sten Larsen, Chief Surgeon Creation of database to improving clinical work in research and diagnosis LILLEBAELT HOSPITAL (Denmark) HEALTHCARE Reduce error in patient records Reduce manual effort of patient record audits BUSINESS ISSUERESULTS
Copyright © 2012, SAS Institute Inc. All rights reserved. "By decoding the 'messages' through statistical and root-cause analyses of complaints data, the government can better understand the voice of the people, and help government departments improve service delivery, make informed decisions and develop smart strategies. This in turn helps boost public satisfaction with the government, and build a quality city. - Efficiency Units Assistant Director, W. F. Yuk 1823 HONG KONG EFFICIENCY UNIT PUBLIC 1823 operates round-the- clock, including during Sundays and public holidays. Answers 2.65 million calls and s, including inquiries, suggestions and complaints Developed a Compliant Intelligence System that uncovers the trends, patterns and relationships inherent in the complaints BUSINESS ISSUERESULTS
Copyright © 2012, SAS Institute Inc. All rights reserved. DATA/TEXT MINING RESEARCH CONSIDERATIONS Data Mining for patent research/control Copyright research/control Metadata-driven approach avoids permanent data duplication Analyst needs creative freedom in combining, transforming data User interfaces – programming vs point-and-click Cost to implement highly variable Future Indications In-Memory Big Data Cloud Com
Copyright © 2012, SAS Institute Inc. All rights reserved.