Social Mining Social Computing
Data Mining Data mining is an important new information technology used to identify significant data from vast amounts of records It is also part of a process called knowledge discovery in databases, which presents and processes data to obtain knowledge.
Goal and Usefulness of Data Mining Goals: Improve quality of interaction between the system and it’s users. Improve decision making Usefulness: An automatic analysis and discovery tool for extraction of useful knowledge from huge amounts of valuable information.
Knowledge Discovery Process
Data Mining Tasks Data Mining Methods Association Rules Clustering Classification Forecast Decision trees and rules Non-linear regression Classification Methods Example based methods Probabilistic Graphical Dependency Models Relational Learning Models
Statistical Inference vs Data Mining Formal statistical Inference is “assumption-driven.” Hypothesis is first formed and then validated against data. Data Mining is “discovery-driven.” In the sense, patterns and hypothesis are automatically extracted from data.
Data Mining – Practical Usage Direct Marketing; Fraud Control; Credit Analysis; Outlier Analysis.
Effective implementation of Data Mining Development of a Data Warehouse Data Warehouse - Functions in three layers: staging, integration and access. The functions are in the DW to meet the users' reporting needs. Staging is used to store raw data for use by developers (analysis and support). Integration layer is used to integrate data and to have a level of abstraction from users. Access layer is for getting data out for users.
Contd. Decision Support Systems Knowledge Discovery in Databases B. Ease and Simplicity of Data Mining Tools Produce an automated real-time detection of patterns or anomalies. Decision Support Systems Knowledge Discovery in Databases Data Warehouse
Contd. C. Knowledge of Data Analysis Database specialists and computer scientists can contribute the most in this area.
Three chief facilities of Search Engines Gather a set of Web Pages that form the universe from where users can retrieve information. Represent pages in this universe in a fashion that attempts to capture their content. They allow searchers to issue queries, employing information retrieval algorithms that attempt to find most relevant pages from the universe.
Data Mining and Web Search Engines A customer service database stores two types of service information: Unstructured customer service reports. Structured Data on Sales, Employees, and Customers. Most search engines have advanced search capabilities that will allow the user to specify additional search parameters to obtain more refined results. DBMS acts as an access to involve search engines in a data warehouse environment.
Differences in Web Search and Data Mining Web searches are usually started with some sort of query in a search engine. While Data Mining does its searching based on the data itself, data mining tools and specified output format.
Role of Social Scientists Contribute to; Research. Development of Rules for flagging anomalous behavior. Identify and understand elements in the data sets. Develop guidelines and methods to ascertain which data mining techniques are the most effective in a particular case.
Assael’s Consumer Information Acquisition and Processing Model
Conceptual Model of Information and Source Utilization
Model of Information Needs
Consumer-Oriented Information Search Model
Contd.
Contd.
Contd.
Examples from the Economist According to the Economist, there’s a big market for such software. “By one estimate there are more than 100 programs for network analysis, also known as link analysis or predictive analysis. The raw data used may extend far beyond phone records to encompass information available from private and governmental entities, and internet sources such as Facebook. IBM, the supplier of the system used by Bharti Airtel, says its annual sales of such software, now growing at double-digit rates, will exceed $15 billion by 2015. In the past five years IBM has spent more than $11 billion buying makers of network-analysis software. Gartner, a market-research firm, ranks the technology at number two in its list of strategic business operations meriting significant investment this year.”
For Example The article also touches on more sophisticated systems that integrate additional information, including V.S. Subrahmanian’s work on STOP: “Called SOMA(Stochastic Opponent Modeling Agents) is a formal, logical-statistical reasoning framework that uses data about past behavior of terror groups in order to learn rules about the probability of an organization, community, or person taking certain actions in different situations.) SOMA Terror Organization Portal, it analyses a wide range of information about politics, business and society in Lebanon to predict, with surprising accuracy, rocket attacks by the country’s Hizbullah militia on Israel. Attacks tend to increase, for example, as more money from Islamic charities flows into Lebanon. Attacks decrease during election years, particularly as more Hizbullah members run for office and campaign energetically. By the middle of 2010 SOMA was sucking up data from more than 200 sources, many of them newspaper websites. The number of sources will have more than doubled by the end of the year.”
References www.emeraldinsight.com/0264-0473.htm www.economist.com/node/16910031 Journal of Financial Crime Vol.12 No.1
Mohd. Ali Khan Murtaza Marvi Musa Bin Hamid Syed Mohsin Hussain Thank You Mohd. Ali Khan Murtaza Marvi Musa Bin Hamid Syed Mohsin Hussain