Data Warehousing 資料倉儲 Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information Management, Tamkang University Dept. of Information ManagementTamkang.

Slides:



Advertisements
Similar presentations
Social Media Marketing Research 社會媒體行銷研究 SMMR08 TMIXM1A Thu 7,8 (14:10-16:00) L511 Exploratory Factor Analysis Min-Yuh Day 戴敏育 Assistant Professor.
Advertisements

Analysis and Modeling of Social Networks Foudalis Ilias.
Web Search and Mining Course Overview 1 Wu-Jun Li Department of Computer Science and Engineering Shanghai Jiao Tong University Lecture 0: Course Overview.
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
Data Mining 資料探勘 文字探勘與網頁探勘 (Text and Web Mining) Min-Yuh Day 戴敏育
WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.
Web Mining Research: A Survey
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
Link Structure and Web Mining Shuying Wang
1 PageSim: A Link-based Similarity Measure for the World Wide Web Zhenjiang Lin, Irwin King, and Michael, R., Lyu Computer Science & Engineering, The Chinese.
商業智慧實務 Practices of Business Intelligence BI04 MI4 Wed, 9,10 (16:10-18:00) (B113) 資料倉儲 (Data Warehousing) Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授.
Overview of Web Data Mining and Applications Part I
Case Study for Information Management 資訊管理個案
Data Warehousing 資料倉儲 Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information Management, Tamkang University Dept. of Information ManagementTamkang.
Introduction to Data Mining Engineering Group in ACL.
Network Measures Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Network Measures Klout.
Special Topics in Social Media Services 社會媒體服務專題 1 992SMS02 TMIXJ1A Sat. 6,7,8 (13:10-16:00) D502 Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information.
Social Media Marketing Research 社會媒體行銷研究 SMMR06 TMIXM1A Thu 7,8 (14:10-16:00) L511 Measuring the Construct Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授.
Web Mining ( 網路探勘 ) WM12 TLMXM1A Wed 8,9 (15:10-17:00) U705 Web Usage Mining ( 網路使用挖掘 ) Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information.
Intelligent Systems Lecture 23 Introduction to Intelligent Data Analysis (IDA). Example of system for Data Analyzing based on neural networks.
Social Media Management 社會媒體管理 SMM06 TMIXM1A Fri. 7,8 (14:10-16:00) L215 Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information Management,
Case Study for Information Management 資訊管理個案 CSIM4B07 TLMXB4B (M1824) Tue 2, 3, 4 (9:10-12:00) B502 Telecommunications, the Internet, and Wireless.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
Special Topics in Social Media Services 社會媒體服務專題 1 992SMS07 TMIXJ1A Sat. 6,7,8 (13:10-16:00) D502 Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information.
Social Media Marketing Analytics 社群網路行銷分析
Automated Social Hierarchy Detection through Network Analysis (SNAKDD07) Ryan Rowe, Germ´an Creamer, Shlomo Hershkop, Salvatore J Stolfo 1 Advisor:
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
商業智慧實務 Practices of Business Intelligence BI07 MI4 Wed, 9,10 (16:10-18:00) (B113) 文字探勘與網路探勘 (Text and Web Mining) Min-Yuh Day 戴敏育 Assistant Professor.
Social Media Marketing Management 社會媒體行銷管理 SMMM12 TLMXJ1A Tue 12,13,14 (19:20-22:10) D325 確認性因素分析 (Confirmatory Factor Analysis) Min-Yuh Day 戴敏育.
Data Warehousing 資料倉儲 Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information Management, Tamkang University Dept. of Information ManagementTamkang.
南台科技大學 資訊工程系 A web page usage prediction scheme using sequence indexing and clustering techniques Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2010/10/15.
Social Media Marketing Research 社會媒體行銷研究 SMMR02 TMIXM1A Thu 7,8 (14:10-16:00) U505 Social Media: Facebook, Youtube, Blog, Microblog Min-Yuh Day 戴敏育.
Social Media Management 社會媒體管理 SMM02 TMIXM1A Fri. 7,8 (14:10-16:00) L215 Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information Management,
Social Media Marketing Research 社會媒體行銷研究 SMMR09 TMIXM1A Thu 7,8 (14:10-16:00) L511 Confirmatory Factor Analysis Min-Yuh Day 戴敏育 Assistant Professor.
Data Mining 資料探勘 DM04 MI4 Thu 9, 10 (16:10-18:00) B216 分群分析 (Cluster Analysis) Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information Management,
Web Mining ( 網路探勘 ) WM04 TLMXM1A Wed 8,9 (15:10-17:00) U705 Unsupervised Learning ( 非監督式學習 ) Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of.
Data Warehousing 資料倉儲 Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information Management, Tamkang University Dept. of Information ManagementTamkang.
Mining Social Networks for Personalized Prioritization Shinjae Yoo, Yiming Yang, Frank Lin, II-Chul Moon [KDD ’09] 1 Advisor: Dr. Koh Jia-Ling Reporter:
Data Mining 資料探勘 DM06 MI4 Thu. 9,10 (16:10-18:00) B513 社會網路分析、意見分析 (Social Network Analysis, Opinion Mining) Min-Yuh Day 戴敏育 Assistant Professor.
CS315-Web Search & Data Mining. A Semester in 50 minutes or less The Web History Key technologies and developments Its future Information Retrieval (IR)
How to Analyse Social Network?
Case Study for Information Management 資訊管理個案 CSIM4B03 TLMXB4B (M1824) Tue 3,4 (10:10-12:00) L212 Thu 9 (16:10-17:00) B601 Global E-Business and Collaboration:
Data Mining 資料探勘 DM04 MI4 Wed, 6,7 (13:10-15:00) (B216) 分群分析 (Cluster Analysis) Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information Management,
Web Mining ( 網路探勘 ) WM07 TLMXM1A Wed 8,9 (15:10-17:00) U705 Social Network Analysis ( 社會網路分析 ) Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of.
Web Mining ( 網路探勘 ) WM06 TLMXM1A Wed 8,9 (15:10-17:00) U705 Information Retrieval and Web Search ( 資訊檢索與網路搜尋 ) Min-Yuh Day 戴敏育 Assistant Professor.
商業智慧 Business Intelligence BI08 IM EMBA Fri 12,13,14 (19:20-22:10) D502 社會網路分析 (Social Network Analysis) Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授.
Data Warehousing 資料倉儲 Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information Management, Tamkang University Dept. of Information ManagementTamkang.
Social Media Marketing Research 社會媒體行銷研究 SMMR04 TMIXM1A Thu 7,8 (14:10-16:00) L511 Marketing Research Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授.
Dept. of Information Management, Tamkang University
Data Warehousing 資料倉儲 Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information Management, Tamkang University Dept. of Information ManagementTamkang.
Mining information from social media
Case Study for Information Management 資訊管理個案
商業智慧實務 Practices of Business Intelligence BI09 MI4 Wed, 9,10 (16:10-18:00) (B130) 社會網路分析 (Social Network Analysis) Min-Yuh Day 戴敏育 Assistant Professor.
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
How to Analyse Social Network? Social networks can be represented by complex networks.
Informatics tools in network science
商業智慧實務 Practices of Business Intelligence BI09 MI4 Wed, 9,10 (16:10-18:00) (B113) 社會網路分析 (Social Network Analysis) Min-Yuh Day 戴敏育 Assistant Professor.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Advisor-Advisee Relationships from Research Publication.
Chapter 8: Web Analytics, Web Mining, and Social Analytics
GRAPH AND LINK MINING 1. Graphs - Basics 2 Undirected Graphs Undirected Graph: The edges are undirected pairs – they can be traversed in any direction.
Big Data Mining 巨量資料探勘 DM05 MI4 (M2244) (3094) Tue, 3, 4 (10:10-12:00) (B216) 分群分析 (Cluster Analysis) Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授.
Case Study for Information Management 資訊管理個案
分群分析 (Cluster Analysis)
SOCIAL NETWORKS Amit Sharma INF -38FQ School of Information
CS7280: Special Topics in Data Mining Information/Social Networks
Data Mining 資料探勘 分群分析 (Cluster Analysis) Min-Yuh Day 戴敏育
Jiawei Han Department of Computer Science
SOCIAL NETWORKS Amit Sharma INF -38FQ School of Information
Case Study for Information Management 資訊管理個案
Presentation transcript:

Data Warehousing 資料倉儲 Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information Management, Tamkang University Dept. of Information ManagementTamkang University 淡江大學 淡江大學 資訊管理學系 資訊管理學系 DW09 MI4 Tue. 6,7 (13:10-15:00) B427 Social Network Analysis and Link Mining

Syllabus 週次 日期 內容( Subject/Topics ) 1 100/09/06 Introduction to Data Warehousing 2 100/09/13 Data Warehousing, Data Mining, and Business Intelligence 3 100/09/20 Data Preprocessing: Integration and the ETL process 4 100/09/27 Data Warehouse and OLAP Technology 5 100/10/04 Data Warehouse and OLAP Technology 6 100/10/11 Data Cube Computation and Data Generation 7 100/10/18 Data Cube Computation and Data Generation 8 100/10/25 Project Proposal 9 100/11/01 期中考試週 2

Syllabus 週次 日期 內容( Subject/Topics ) /11/08 Association Analysis /11/15 Association Analysis /11/22 Classification and Prediction /11/29 Classification and Prediction /12/06 Cluster Analysis /12/13 Social Network Analysis and Link Mining /12/20 Text Mining and Web Mining /12/27 Project Presentation /01/03 期末考試週 3

Outline Social Network Analysis Link Mining 4

Social Network Analysis A social network is a social structure of people, related (directly or indirectly) to each other through a common relation or interest Social network analysis (SNA) is the study of social networks to understand their structure and behavior 5 Source: (c) Jaideep Srivastava, Data Mining for Social Network Analysis

Social Network Analysis Using Social Network Analysis, you can get answers to questions like: – How highly connected is an entity within a network? – What is an entity's overall importance in a network? – How central is an entity within a network? – How does information flow within a network? 6 Source:

Social Network Analysis: Degree Centrality 7 Source: Alice has the highest degree centrality, which means that she is quite active in the network. However, she is not necessarily the most powerful person because she is only directly connected within one degree to people in her clique—she has to go through Rafael to get to other cliques.

Social Network Analysis: Degree Centrality Degree centrality is simply the number of direct relationships that an entity has. An entity with high degree centrality: – Is generally an active player in the network. – Is often a connector or hub in the network. – s not necessarily the most connected entity in the network (an entity may have a large number of relationships, the majority of which point to low-level entities). – May be in an advantaged position in the network. – May have alternative avenues to satisfy organizational needs, and consequently may be less dependent on other individuals. – Can often be identified as third parties or deal makers. 8 Source:

Social Network Analysis: Betweenness Centrality 9 Source: Rafael has the highest betweenness because he is between Alice and Aldo, who are between other entities. Alice and Aldo have a slightly lower betweenness because they are essentially only between their own cliques. Therefore, although Alice has a higher degree centrality, Rafael has more importance in the network in certain respects.

Social Network Analysis: Betweenness Centrality Betweenness centrality identifies an entity's position within a network in terms of its ability to make connections to other pairs or groups in a network. An entity with a high betweenness centrality generally: – Holds a favored or powerful position in the network. – Represents a single point of failure—take the single betweenness spanner out of a network and you sever ties between cliques. – Has a greater amount of influence over what happens in a network. 10 Source:

Social Network Analysis: Closeness Centrality 11 Source: Rafael has the highest closeness centrality because he can reach more entities through shorter paths. As such, Rafael's placement allows him to connect to entities in his own clique, and to entities that span cliques.

Social Network Analysis: Closeness Centrality Closeness centrality measures how quickly an entity can access more entities in a network. An entity with a high closeness centrality generally: – Has quick access to other entities in a network. – Has a short path to other entities. – Is close to other entities. – Has high visibility as to what is happening in the network. 12 Source:

Social Network Analysis: Eigenvalue 13 Source: Alice and Rafael are closer to other highly close entities in the network. Bob and Frederica are also highly close, but to a lesser value.

Social Network Analysis: Eigenvalue Eigenvalue measures how close an entity is to other highly close entities within a network. In other words, Eigenvalue identifies the most central entities in terms of the global or overall makeup of the network. A high Eigenvalue generally: – Indicates an actor that is more central to the main pattern of distances among all entities. – Is a reasonable measure of one aspect of centrality in terms of positional advantage. 14 Source:

Social Network Analysis: Hub and Authority 15 Source: Hubs are entities that point to a relatively large number of authorities. They are essentially the mutually reinforcing analogues to authorities. Authorities point to high hubs. Hubs point to high authorities. You cannot have one without the other.

Social Network Analysis: Hub and Authority Entities that many other entities point to are called Authorities. In Sentinel Visualizer, relationships are directional—they point from one entity to another. If an entity has a high number of relationships pointing to it, it has a high authority value, and generally: – Is a knowledge or organizational authority within a domain. – Acts as definitive source of information. 16 Source:

Social Network Analysis 17 Source:

Social Network Analysis 18 Source:

Social Network Analysis 19 Source:

Link Mining 20

Link Mining (Getoor & Diehl, 2005) Link Mining – Data Mining techniques that take into account the links between objects and entities while building predictive or descriptive models. Link based object ranking, Group Detection, Entity Resolution, Link Prediction Application: – Hyperlink Mining – Relational Learning – Inductive Logic Programming – Graph Mining 21 Source: (c) Jaideep Srivastava, Data Mining for Social Network Analysis

Characteristics of Collaboration Networks (Newman, 2001; 2003; 3004) Degree distribution follows a power-law Average separation decreases in time. Clustering coefficient decays with time Relative size of the largest cluster increases Average degree increases Node selection is governed by preferential attachment 22 Source: (c) Jaideep Srivastava, Data Mining for Social Network Analysis

Social Network Techniques Social network extraction/construction Link prediction Approximating large social networks Identifying prominent/trusted/expert actors in social networks Search in social networks Discovering communities in social network Knowledge discovery from social network 23 Source: (c) Jaideep Srivastava, Data Mining for Social Network Analysis

Social Network Extraction Mining a social network from data sources Three sources of social network (Hope et al., 2006) – Content available on web pages E.g., user homepages, message threads – User interaction logs E.g., and messenger chat logs – Social interaction information provided by users E.g., social network service websites (Facebook) 24 Source: (c) Jaideep Srivastava, Data Mining for Social Network Analysis

Social Network Extraction IR based extraction from web documents – Construct an “actor-by-term” matrix – The terms associated with an actor come from web pages/documents created by or associated with that actor – IR techniques (TF-IDF, LSI, cosine matching, intuitive heuristic measures) are used to quantify similarity between two actors’ term vectors – The similarity scores are the edge label in the network Thresholds on the similarity measure can be used in order to work with binary or categorical edge labels Include edges between an actor and its k-nearest neighbors Co-occurrence based extraction from web documents 25 Source: (c) Jaideep Srivastava, Data Mining for Social Network Analysis

Link Prediction Link Prediction using supervised learning (Hasan et al., 2006) – Citation Network (BIOBASE, DBLP) – Use machine learning algorithms to predict future co- authorship Decision three, k-NN, multilayer perceptron, SVM, RBF network – Identify a group of features that are most helpful in prediction – Best Predictor Features Keywork Match count, Sum of neighbors, Sum of Papers, Shortest distance 26 Source: (c) Jaideep Srivastava, Data Mining for Social Network Analysis

Identifying Prominent Actors in a Social Network Compute scores/ranking over the set (or a subset) of actors in the social network which indicate degree of importance / expertise / influence – E.g., Pagerank, HITS, centrality measures Various algorithms from the link analysis domain – PageRank and its many variants – HITS algorithm for determining authoritative sources Centrality measures exist in the social science domain for measuring importance of actors in a social network 27 Source: (c) Jaideep Srivastava, Data Mining for Social Network Analysis

Identifying Prominent Actors in a Social Network Brandes, 2011 Prominence  high betweenness value Betweenness centrality requires computation of number of shortest paths passing through each node Compute shortest paths between all pairs of vertices 28 Source: (c) Jaideep Srivastava, Data Mining for Social Network Analysis

Summary Social Network Analysis Link Mining 29

References Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques, Second Edition, 2006, Elsevier Efraim Turban, Ramesh Sharda, Dursun Delen, Decision Support and Business Intelligence Systems, Ninth Edition, 2011, Pearson. Michael W. Berry and Jacob Kogan, Text Mining: Applications and Theory, 2010, Wiley Guandong Xu, Yanchun Zhang, Lin Li, Web Mining and Social Networking: Techniques and Applications, 2011, Springer Matthew A. Russell, Mining the Social Web: Analyzing Data from Facebook, Twitter, LinkedIn, and Other Social Media Sites, 2011, O'Reilly Media Bing Liu, Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, 2009, Springer Bruce Croft, Donald Metzler, and Trevor Strohman, Search Engines: Information Retrieval in Practice, 2008, Addison Wesley, Jaideep Srivastava, Nishith Pathak, Sandeep Mane, and Muhammad A. Ahmad, Data Mining for Social Network Analysis, Tutorial at IEEE ICDM 2006, Hong Kong, 2006 Sentinel Visualizer, Text Mining, 30