Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Social Computing and Big Data Analytics ( 社群運算大數據分析 ) Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information Management, Tamkang University.

Similar presentations


Presentation on theme: "1 Social Computing and Big Data Analytics ( 社群運算大數據分析 ) Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information Management, Tamkang University."— Presentation transcript:

1 1 Social Computing and Big Data Analytics ( 社群運算大數據分析 ) Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information Management, Tamkang University Dept. of Information ManagementTamkang University 淡江大學 淡江大學 資訊管理學系 資訊管理學系 http://mail. tku.edu.tw/myday/ 2016-04-18 Tamkang University Time: 2016/04/18 (14:00-16:00) Place: China Airlines

2 戴敏育 博士 (Min-Yuh Day, Ph.D.) 淡江大學資管系專任助理教授 中央研究院資訊科學研究所訪問學人 國立台灣大學資訊管理博士 Publications Co-Chairs, IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013- ) Program Co-Chair, IEEE International Workshop on Empirical Methods for Recognizing Inference in TExt (IEEE EM-RITE 2012- ) Workshop Chair, The IEEE International Conference on Information Reuse and Integration (IEEE IRI) 2

3 Outline 3 Social Computing and Big Data Analytics Analyzing the Social Web: Social Network Analysis

4 Business Insights with Social Analytics 4

5 Social Computing Social Network Analysis Link mining Community Detection Social Recommendation 5

6 Internet Evolution Internet of People (IoP): Social Media Internet of Things (IoT): Machine to Machine 6 Source: Marc Jadoul (2015), The IoT: The next step in internet evolution, March 11, 2015 http://www2.alcatel-lucent.com/techzine/iot-internet-of-things-next-step-evolution/ http://www2.alcatel-lucent.com/techzine/iot-internet-of-things-next-step-evolution/

7 Big Data Analytics and Data Mining 7

8 Stephan Kudyba (2014), Big Data, Mining, and Analytics: Components of Strategic Decision Making, Auerbach Publications 8 Source: http://www.amazon.com/gp/product/1466568704http://www.amazon.com/gp/product/1466568704

9 Architecture of Big Data Analytics 9 Source: Stephan Kudyba (2014), Big Data, Mining, and Analytics: Components of Strategic Decision Making, Auerbach Publications Data Mining OLAP Reports Queries Hadoop MapReduce Pig Hive Jaql Zookeeper Hbase Cassandra Oozie Avro Mahout Others Middleware Extract Transform Load Data Warehouse Traditional Format CSV, Tables * Internal * External * Multiple formats * Multiple locations * Multiple applications Big Data Sources Big Data Transformation Big Data Platforms & Tools Big Data Analytics Applications Big Data Analytics Transformed Data Raw Data

10 Architecture of Big Data Analytics 10 Source: Stephan Kudyba (2014), Big Data, Mining, and Analytics: Components of Strategic Decision Making, Auerbach Publications Data Mining OLAP Reports Queries Hadoop MapReduce Pig Hive Jaql Zookeeper Hbase Cassandra Oozie Avro Mahout Others Middleware Extract Transform Load Data Warehouse Traditional Format CSV, Tables * Internal * External * Multiple formats * Multiple locations * Multiple applications Big Data Sources Big Data Transformation Big Data Platforms & Tools Big Data Analytics Applications Big Data Analytics Transformed Data Raw Data Data Mining Big Data Analytics Applications

11 Social Big Data Mining (Hiroshi Ishikawa, 2015) 11 Source: http://www.amazon.com/Social-Data-Mining-Hiroshi-Ishikawa/dp/149871093Xhttp://www.amazon.com/Social-Data-Mining-Hiroshi-Ishikawa/dp/149871093X

12 Architecture for Social Big Data Mining (Hiroshi Ishikawa, 2015) 12 Hardware Software Social Data Physical Layer Logical Layer Integrated analysis Multivariate analysis Application specific task Data Mining Conceptual Layer Enabling TechnologiesAnalysts Model Construction Explanation by Model Construction and confirmation of individual hypothesis Description and execution of application-specific task Integrated analysis model Natural Language Processing Information Extraction Anomaly Detection Discovery of relationships among heterogeneous data Large-scale visualization Parallel distrusted processing Source: Hiroshi Ishikawa (2015), Social Big Data Mining, CRC Press

13 Business Intelligence (BI) Infrastructure 13 Source: Kenneth C. Laudon & Jane P. Laudon (2014), Management Information Systems: Managing the Digital Firm, Thirteenth Edition, Pearson.

14 Analyzing the Social Web: Social Network Analysis 14

15 15 Source: http://www.amazon.com/Analyzing-Social-Web-Jennifer-Golbeck/dp/0124055311http://www.amazon.com/Analyzing-Social-Web-Jennifer-Golbeck/dp/0124055311 Jennifer Golbeck (2013), Analyzing the Social Web, Morgan Kaufmann

16 Social Network Analysis (SNA) Facebook TouchGraph 16

17 Social Network Analysis 17 Source: http://www.fmsasg.com/SocialNetworkAnalysis/http://www.fmsasg.com/SocialNetworkAnalysis/

18 Social Network Analysis A social network is a social structure of people, related (directly or indirectly) to each other through a common relation or interest Social network analysis (SNA) is the study of social networks to understand their structure and behavior 18 Source: (c) Jaideep Srivastava, srivasta@cs.umn.edu, Data Mining for Social Network Analysis

19 Centrality Prestige 19 Social Network Analysis (SNA)

20 Degree 20 Source: https://www.youtube.com/watch?v=89mxOdwPfxA A A B B D D E E C C

21 Degree 21 Source: https://www.youtube.com/watch?v=89mxOdwPfxA A A B B D D E E C C A: 2 B: 4 C: 2 D:1 E: 1

22 Density 22 Source: https://www.youtube.com/watch?v=89mxOdwPfxA A A B B D D E E C C

23 Density 23 Source: https://www.youtube.com/watch?v=89mxOdwPfxA A A B B D D E E C C Edges (Links): 5 Total Possible Edges: 10 Density: 5/10 = 0.5

24 Density 24 Nodes (n): 10 Edges (Links): 13 Total Possible Edges: (n * (n-1)) / 2 = (10 * 9) / 2 = 45 Density: 13/45 = 0.29 A A B B D D C C E E F F G G H H I I J J

25 Which Node is Most Important? 25 A A B B D D C C E E F F G G H H I I J J

26 Centrality Important or prominent actors are those that are linked or involved with other actors extensively. A person with extensive contacts (links) or communications with many other people in the organization is considered more important than a person with relatively fewer contacts. The links can also be called ties. A central actor is one involved in many ties. 26 Source: Bing Liu (2011), “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data”

27 Social Network Analysis (SNA) Degree Centrality Betweenness Centrality Closeness Centrality 27

28 Degree Centrality 28

29 29 Social Network Analysis: Degree Centrality A A B B D D C C E E F F G G H H I I J J

30 30 Social Network Analysis: Degree Centrality A A B B D D C C E E F F G G H H I I J J NodeScore Standardized Score A2 2/10 = 0.2 B2 C5 5/10 = 0.5 D3 3/10 = 0.3 E3 F2 2/10 = 0.2 G4 4/10 = 0.4 H3 3/10 = 0.3 I1 1/10 = 0.1 J1

31 Betweenness Centrality 31

32 Betweenness centrality: Connectivity Number of shortest paths going through the actor 32

33 Betweenness Centrality 33 Where g jk = the number of shortest paths connecting jk g jk (i) = the number that actor i is on. Normalized Betweenness Centrality Number of pairs of vertices excluding the vertex itself Source: https://www.youtube.com/watch?v=RXohUeNCJiU

34 Betweenness Centrality 34 A A B B D D E E C C A: B  C: 0/1 = 0 B  D: 0/1 = 0 B  E: 0/1 = 0 C  D: 0/1 = 0 C  E: 0/1 = 0 D  E: 0/1 = 0 Total: 0 A: Betweenness Centrality = 0

35 Betweenness Centrality 35 A A B B D D E E C C B: A  C: 0/1 = 0 A  D: 1/1 = 1 A  E: 1/1 = 1 C  D: 1/1 = 1 C  E: 1/1 = 1 D  E: 1/1 = 1 Total: 5 B: Betweenness Centrality = 5

36 Betweenness Centrality 36 A A B B D D E E C C C: A  B: 0/1 = 0 A  D: 0/1 = 0 A  E: 0/1 = 0 B  D: 0/1 = 0 B  E: 0/1 = 0 D  E: 0/1 = 0 Total: 0 C: Betweenness Centrality = 0

37 Betweenness Centrality 37 A A B B D D E E C C A: 0 B: 5 C: 0 D: 0 E: 0

38 38 A A G G D D B B J J C C H H I I E E F F A A D D B B J J C C H H E E F F Which Node is Most Important?

39 39 A A G G D D B B J J C C H H I I E E F F A A G G D D J J H H I I E E F F Which Node is Most Important?

40 40 A A D D B B C C E E Betweenness Centrality

41 41 A A B B D D E E C C A: B  C: 0/1 = 0 B  D: 0/1 = 0 B  E: 0/1 = 0 C  D: 0/1 = 0 C  E: 0/1 = 0 D  E: 0/1 = 0 Total: 0 A: Betweenness Centrality = 0

42 Closeness Centrality 42

43 43 Social Network Analysis: Closeness Centrality A A B B D D C C E E F F G G H H I I J J C  A: 1 C  B: 1 C  D: 1 C  E: 1 C  F: 2 C  G: 1 C  H: 2 C  I: 3 C  J: 3 Total=15 C: Closeness Centrality = 15/9 = 1.67

44 44 Social Network Analysis: Closeness Centrality A A B B D D C C E E F F G G H H I I J J G  A: 2 G  B: 2 G  C: 1 G  D: 2 G  E: 1 G  F: 1 G  H: 1 G  I: 2 G  J: 2 Total=14 G: Closeness Centrality = 14/9 = 1.56

45 45 Social Network Analysis: Closeness Centrality A A B B D D C C E E F F G G H H I I J J H  A: 3 H  B: 3 H  C: 2 H  D: 2 H  E: 2 H  F: 2 H  G: 1 H  I: 1 H  J: 1 Total=17 H: Closeness Centrality = 17/9 = 1.89

46 46 Social Network Analysis: Closeness Centrality A A B B D D C C E E F F G G H H I I J J H: Closeness Centrality = 17/9 = 1.89 C: Closeness Centrality = 15/9 = 1.67 G: Closeness Centrality = 14/9 = 1.56 1 1 2 2 3 3

47 Social Network Analysis (SNA) Tools NetworkX igraph Gephi UCINet Pajek 47

48 Gephi The Open Graph Viz Platform 48 Source: https://gephi.org/

49 Application of SNA Social Network Analysis of Research Collaboration in Information Reuse and Integration 49 Source: Min-Yuh Day, Sheng-Pao Shih, Weide Chang (2011), "Social Network Analysis of Research Collaboration in Information Reuse and Integration"

50 Example of SNA Data Source 50 Source: http://www.informatik.uni-trier.de/~ley/db/conf/iri/iri2010.htmlhttp://www.informatik.uni-trier.de/~ley/db/conf/iri/iri2010.html

51 Research Question RQ1: What are the scientific collaboration patterns in the IRI research community? RQ2: Who are the prominent researchers in the IRI community? 51 Source: Min-Yuh Day, Sheng-Pao Shih, Weide Chang (2011), "Social Network Analysis of Research Collaboration in Information Reuse and Integration"

52 Methodology Developed a simple web focused crawler program to download literature information about all IRI papers published between 2003 and 2010 from IEEE Xplore and DBLP. – 767 paper – 1599 distinct author Developed a program to convert the list of coauthors into the format of a network file which can be readable by social network analysis software. UCINet and Pajek were used in this study for the social network analysis. 52 Source: Min-Yuh Day, Sheng-Pao Shih, Weide Chang (2011), "Social Network Analysis of Research Collaboration in Information Reuse and Integration"

53 Top10 prolific authors (IRI 2003-2010) 1.Stuart Harvey Rubin 2.Taghi M. Khoshgoftaar 3.Shu-Ching Chen 4.Mei-Ling Shyu 5.Mohamed E. Fayad 6.Reda Alhajj 7.Du Zhang 8.Wen-Lian Hsu 9.Jason Van Hulse 10.Min-Yuh Day 53 Source: Min-Yuh Day, Sheng-Pao Shih, Weide Chang (2011), "Social Network Analysis of Research Collaboration in Information Reuse and Integration"

54 Data Analysis and Discussion Closeness Centrality – Collaborated widely Betweenness Centrality – Collaborated diversely Degree Centrality – Collaborated frequently Visualization of Social Network Analysis – Insight into the structural characteristics of research collaboration networks 54 Source: Min-Yuh Day, Sheng-Pao Shih, Weide Chang (2011), "Social Network Analysis of Research Collaboration in Information Reuse and Integration"

55 Top 20 authors with the highest closeness scores RankIDClosenessAuthor 130.024675Shu-Ching Chen 210.022830Stuart Harvey Rubin 340.022207Mei-Ling Shyu 460.020013Reda Alhajj 5610.019700Na Zhao 62600.018936Min Chen 71510.018230Gordon K. Lee 8190.017962Chengcui Zhang 910430.017962Isai Michel Lombera 1010270.017962Michael Armella 114430.017448James B. Law 121570.017082Keqi Zhang 132530.016731Shahid Hamid 1410380.016618Walter Z. Tang 159590.016285Chengjun Zhan 169570.016285Lin Luo 179560.016285Guo Chen 189550.016285Xin Huang 199430.016285Sneh Gulati 209600.016071Sheng-Tun Li 55 Source: Min-Yuh Day, Sheng-Pao Shih, Weide Chang (2011), "Social Network Analysis of Research Collaboration in Information Reuse and Integration"

56 Top 20 authors with the highest betweeness scores RankIDBetweennessAuthor 110.000752Stuart Harvey Rubin 230.000741Shu-Ching Chen 320.000406Taghi M. Khoshgoftaar 4660.000385Xingquan Zhu 540.000376Mei-Ling Shyu 660.000296Reda Alhajj 7650.000256Xindong Wu 8190.000194Chengcui Zhang 9390.000185Wei Dai 10150.000107Narayan C. Debnath 11310.000094Qianhui Althea Liang 121510.000094Gordon K. Lee 1370.000085Du Zhang 14300.000072Baowen Xu 15410.000067Hongji Yang 162700.000060Zhiwei Xu 1750.000043Mohamed E. Fayad 181100.000042Abhijit S. Pandya 191060.000042Sam Hsu 2080.000042Wen-Lian Hsu 56 Source: Min-Yuh Day, Sheng-Pao Shih, Weide Chang (2011), "Social Network Analysis of Research Collaboration in Information Reuse and Integration"

57 Top 20 authors with the highest degree scores RankIDDegreeAuthor 130.035044Shu-Ching Chen 210.034418Stuart Harvey Rubin 320.030663Taghi M. Khoshgoftaar 460.028786Reda Alhajj 580.028786Wen-Lian Hsu 6100.024406Min-Yuh Day 740.022528Mei-Ling Shyu 8170.021277Richard Tzong-Han Tsai 9140.017522Eduardo Santana de Almeida 10160.017522Roumen Kountchev 11400.016896Hong-Jie Dai 12150.015645Narayan C. Debnath 1390.015019Jason Van Hulse 14250.013767Roumiana Kountcheva 15280.013141Silvio Romero de Lemos Meira 16240.013141Vladimir Todorov 17230.013141Mariofanna G. Milanova 1850.013141Mohamed E. Fayad 19 0.012516Chengcui Zhang 20180.011890Waleed W. Smari 57 Source: Min-Yuh Day, Sheng-Pao Shih, Weide Chang (2011), "Social Network Analysis of Research Collaboration in Information Reuse and Integration"

58 Visualization of IRI (IEEE IRI 2003-2010) co-authorship network (global view) 58 Source: Min-Yuh Day, Sheng-Pao Shih, Weide Chang (2011), "Social Network Analysis of Research Collaboration in Information Reuse and Integration"

59 59 Source: Min-Yuh Day, Sheng-Pao Shih, Weide Chang (2011), "Social Network Analysis of Research Collaboration in Information Reuse and Integration"

60 Visualization of Social Network Analysis 60 Source: Min-Yuh Day, Sheng-Pao Shih, Weide Chang (2011), "Social Network Analysis of Research Collaboration in Information Reuse and Integration"

61 61 Source: Min-Yuh Day, Sheng-Pao Shih, Weide Chang (2011), "Social Network Analysis of Research Collaboration in Information Reuse and Integration" Visualization of Social Network Analysis

62 62 Source: Min-Yuh Day, Sheng-Pao Shih, Weide Chang (2011), "Social Network Analysis of Research Collaboration in Information Reuse and Integration" Visualization of Social Network Analysis

63 Summary 63 Social Computing and Big Data Analytics Analyzing the Social Web: Social Network Analysis

64 References Jiawei Han and Micheline Kamber (2011), Data Mining: Concepts and Techniques, Third Edition, Elsevier Jennifer Golbeck (2013), Analyzing the Social Web, Morgan Kaufmann Stephan Kudyba (2014), Big Data, Mining, and Analytics: Components of Strategic Decision Making, Auerbach Publications Hiroshi Ishikawa (2015), Social Big Data Mining, CRC Press 64

65 65 Social Computing and Big Data Analytics ( 社群運算大數據分析 ) Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information Management, Tamkang University Dept. of Information ManagementTamkang University 淡江大學 淡江大學 資訊管理學系 資訊管理學系 http://mail. tku.edu.tw/myday/ 2016-04-18 Tamkang University Time: 2016/04/18 (14:00-16:00) Place: China Airlines Q & A


Download ppt "1 Social Computing and Big Data Analytics ( 社群運算大數據分析 ) Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information Management, Tamkang University."

Similar presentations


Ads by Google