Data Mining Status and Risks Dr. Gregory Newby UNC-Chapel Hill

Slides:



Advertisements
Similar presentations
Chapter 1 Business Driven Technology
Advertisements

Data warehousing and Data mining – an overview Dr. Suman Bhusan Bhattacharyya MBBS, ADHA, MBA.
Chapter 5: Introduction to Information Retrieval
Data-Mining and Record-Matching. Whenever you fill out a form for an organisation, business or government, the information usually ends up being stored.
1. Abstract 2 Introduction Related Work Conclusion References.
C SC 421: Artificial Intelligence …or Computational Intelligence Alex Thomo
Evaluating Search Engine
Managing Data Resources
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Chapter 3 Database Management
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
INFO 624 Week 3 Retrieval System Evaluation
The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.
© 2002 McGraw-Hill Companies, Inc., McGraw-Hill/Irwin TURNING MARKETING INFORMATION INTO ACTION.
Business Intelligence Andrew Davis Andria Zippler Jana Krinsky Tiffany Ferris.
Integration of Applications MIS3502: Application Integration and Evaluation Paul Weinberg Adapted from material by Arnold Kurtz, David.
Data Mining By Archana Ketkar.
Chapter 14 The Second Component: The Database.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Overview of Web Data Mining and Applications Part I
Data Mining: A Closer Look
Business Intelligence
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
Business Intelligence
6/22/2006 DATA MINING I. Definition & Business-Related Examples Mohammad Monakes Fouad Alibrahim.
Data Mining By Jason Baltazar, Phil Cademas, Jillian Latham, Rachel Peeler & Kamila Singh.
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
Data Mining Techniques As Tools for Analysis of Customer Behavior
Database : collection of information. data management tool. huge volumes. like a filing system. providing answers.
Enterprise & Intranet Search How Enterprise is different from Web search What to think about when evaluating Enterprise Search How Intranet use is different.
CIS 9002 Kannan Mohan Department of CIS Zicklin School of Business, Baruch College.
CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY.
Basics of Information Retrieval Lillian N. Cassel Some of these slides are taken or adapted from Source:
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
4. Secondary Data.
Succeeding with Technology Database Systems Basic Data Management Concepts Organizing Data in a Database Database Management Systems Using Database Systems.
Data Mining By : Tung, Sze Ming ( Leo ) CS 157B. Definition A class of database application that analyze data in a database using tools which look for.
Economic Development for the DFW Metroplex Related to Security: An Academic Perspective Dr. Bhavani Thuraisingham The University of Texas at Dallas December.
Banking on Analytics Dr A S Ramasastri Director, IDRBT.
Chapter 1 Business Driven Technology MANGT 366 Information Technology for Business Chapter 1: Management Information Systems: Business Driven MIS.
Introduction – Addressing Business Challenges Microsoft® Business Intelligence Solutions.
Presentation for CS490 Other Topics By: Chihwei Hsu By: Chihwei Hsu Date: Nov 17, 2003 Date: Nov 17, 2003 Class: CS490 Class: CS490.
© 2006 McGraw-Hill Companies, Inc., McGraw-Hill/IrwinSlide 8-1.
3-1 Data Mining Kelby Lee. 3-2 Overview ¨ Transaction Database ¨ What is Data Mining ¨ Data Mining Primitives ¨ Data Mining Objectives ¨ Predictive Modeling.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
ITGS Databases.
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
WEB MINING. In recent years the growth of the World Wide Web exceeded all expectations. Today there are several billions of HTML documents, pictures and.
Managing Data Resources. File Organization Terms and Concepts Bit: Smallest unit of data; binary digit (0,1) Byte: Group of bits that represents a single.
Trustworthy Semantic Webs Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #4 Vision for Semantic Web.
Chapter 13 The Management of Information and Knowledge for Better Decisions.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Monday, February 22,  The term analytics is often used interchangeably with:  Data science  Data mining  Knowledge discovery  Extracting useful.
Sports Market Research. Know Your Customer How do businesses know their customers needs and wants?  Ask them/talking to customers  Surveys  Questionnaires.
Turning Small Business into Big Business. © 2011 Biz2Credit, LLC. All Rights Reserved - Proprietary and Confidential Biz2Credit: In a few words… Biz2Credit.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND.
Chapter 8: Web Analytics, Web Mining, and Social Analytics
Introduction.  Instructor: Cengiz Örencik   Course materials:  myweb.sabanciuniv.edu/cengizo/courses.
Data mining in web applications
Marketing Management Identify and Meet a Marketing Need
Data Mining.
Database Principles.
Introduction to Data Mining
Introduction C.Eng 714 Spring 2010.
Adrian Tuhtan CS157A Section1
Computers and Data Collection
CSE591: Data Mining by H. Liu
TDM=Text Mining “automated processing of large amounts of structured digital textual content for purposes of information retrieval, extraction, interpretation.
CSE591: Data Mining by H. Liu
Presentation transcript:

Data Mining Status and Risks Dr. Gregory Newby UNC-Chapel Hill

Overview What is data mining and related concepts? What is data mining and related concepts? Fundamentals of the science and practice of data mining Fundamentals of the science and practice of data mining What data sources are available? What data sources are available? Causality and correlation Causality and correlation Risks of data mining Risks of data mining Future moves Future moves

Data Mining “An information extraction activity whose goal is to discover hidden facts contained in databases. …[D]ata mining finds patterns and subtle relationships in data and infers rules that allow the prediction of future results. Typical applications include market segmentation, customer profiling, fraud detection, evaluation of retail promotions, and credit risk analysis.” (Via “An information extraction activity whose goal is to discover hidden facts contained in databases. …[D]ata mining finds patterns and subtle relationships in data and infers rules that allow the prediction of future results. Typical applications include market segmentation, customer profiling, fraud detection, evaluation of retail promotions, and credit risk analysis.” (Via

Data Mining Is: Seeking new information from relations among data, possibly from different sources Is: Seeking new information from relations among data, possibly from different sources Is: An important area of academic, corporate and government research Is: An important area of academic, corporate and government research Is: Important from a security standpoint, because data mining might yield emergent information that would otherwise remain unknown Is: Important from a security standpoint, because data mining might yield emergent information that would otherwise remain unknown

The Bigger Picture Information retrievalData mining Data fusion

The Data Universe All data All data All topics All topics All sources All sources Numeric, textual Numeric, textual Discrete, longitudinal Discrete, longitudinal Lots and lots of data! Lots and lots of data! The data universe is growing constantly, and many new data sources are being created as a result of security concerns & technological progress The data universe is growing constantly, and many new data sources are being created as a result of security concerns & technological progress

Challenges of the Data Universe Scale: too much data to deal with Scale: too much data to deal with Format: many different formats which are difficult to merge or query Format: many different formats which are difficult to merge or query Access: most data (over 90%?) are not Web-accessible Access: most data (over 90%?) are not Web-accessible Databases Databases Proprietary or internal data Proprietary or internal data Formatting problems or issues Formatting problems or issues

Solutions Figure out how to get data from one format to another. Standards such as XML and EDI help Figure out how to get data from one format to another. Standards such as XML and EDI help Develop cooperative relationships among data holders for data exchange. This is happening much more in government Develop cooperative relationships among data holders for data exchange. This is happening much more in government Develop tools to identify relationships among data. This is the focus of data mining Develop tools to identify relationships among data. This is the focus of data mining

Data Mining != Web Searching On the Web, we’re doing high precision information retrieval On the Web, we’re doing high precision information retrieval We want the first ranked documents to be relevant We want the first ranked documents to be relevant We don’t want to see irrelevant documents We don’t want to see irrelevant documents The data universe for Web search engines is vast, making this a relatively straightforward problem (though a big engineering challenge!) The data universe for Web search engines is vast, making this a relatively straightforward problem (though a big engineering challenge!)

Data Mining != Web Searching Data mining is all about recall, not precision Data mining is all about recall, not precision Recall means we find all the relevant documents, regardless of how many irrelevant documents Recall means we find all the relevant documents, regardless of how many irrelevant documents This is a tougher problem, since the set of responses to a given inquiry can be huge This is a tougher problem, since the set of responses to a given inquiry can be huge It’s tougher : data formats, data merging, access, etc. It’s tougher : data formats, data merging, access, etc. The data miner’s goal is to set a threshold over which relationships are “interesting” The data miner’s goal is to set a threshold over which relationships are “interesting” Data miners can also search for particular patterns, i.e. related to an individual or group Data miners can also search for particular patterns, i.e. related to an individual or group

Today Law enforcement, industry and government are making their data sources more open to each other (these data sources are not generally publicly available) Law enforcement, industry and government are making their data sources more open to each other (these data sources are not generally publicly available) Data integrity issues are a major concern Data integrity issues are a major concern Data mining is still tough. “False positive” relationships are easy to spot Data mining is still tough. “False positive” relationships are easy to spot Correlation vs. causality Correlation vs. causality Seek and ye shall find Seek and ye shall find Lots of data yields lots of matches Lots of data yields lots of matches

Today’s Data Sources Credit and other financials Credit and other financials Law enforcement records Law enforcement records Travel history Travel history Health data Health data Whatever you put on the Internet If you are targeted: Whatever you put on the Internet If you are targeted: Wiretap data (‘net, phone, etc.) Wiretap data (‘net, phone, etc.) Surveillance data Surveillance data HUMINT, etc., etc. HUMINT, etc., etc.

Tomorrow Decreased barriers among different data sources (this is a main impact of PATRIOT, but more is coming) Decreased barriers among different data sources (this is a main impact of PATRIOT, but more is coming) Increased data collection (via PATRIOT plus technological trends) Increased data collection (via PATRIOT plus technological trends) Better tools for data mining, and new technologies making data sharing and integration easier Better tools for data mining, and new technologies making data sharing and integration easier

Contact Info Greg Newby is moving from UNC to UAF Greg Newby is moving from UNC to UAF New position: New position: Research Faculty at the Arctic Region Supercomputing Center University of Alaska, Fairbanks Research Faculty at the Arctic Region Supercomputing Center University of Alaska, Fairbanks