Presentation is loading. Please wait.

Presentation is loading. Please wait.

Social Network Mining for Digital Library Application Dr. Thanachart Ritbumroong King Mongkut’s University of Technology Thonburi Assist. Prof. Dr. Satidchoke.

Similar presentations


Presentation on theme: "Social Network Mining for Digital Library Application Dr. Thanachart Ritbumroong King Mongkut’s University of Technology Thonburi Assist. Prof. Dr. Satidchoke."— Presentation transcript:

1 Social Network Mining for Digital Library Application Dr. Thanachart Ritbumroong King Mongkut’s University of Technology Thonburi Assist. Prof. Dr. Satidchoke Phosaard Suranaree University of Technology

2 Agenda  Social Media Mining Concepts  Data Extraction and Preparation  Social Network Analysis  Social Media Mining for Recommender System

3 SOCIAL MEDIA CONCEPTS

4 Social Media Mining  The process of representing, analyzing, and extracting actionable patterns from social media data  The study on how individuals (also known as social atoms) interact and how Social Molecule communities (i.e., social molecules) form  Social Media is … “the group of internet-based applications that build on the ideological and technological foundations of Web 2.0, and that allow the creation and exchange of user- generated content” … (Kaplan and Haenlein, 2010)

5 Applications  Facebook: People you may know  Amazon: Other customers suggested these items  Netflix: movie suggestions for you  Targeted marketing  Online advertising

6 Major Components of a Network  Vertices – Nodes, agents, entities, or items – Representing people, or social structures (workgroups, teams, organizations, institutions, states, or countries)  Edges – Links, ties, connections, or relationships – Connecting two vertices together – Representing proximity, collaborations, partnerships, transactions, etc. ʋ1ʋ1 ʋ2ʋ2 ʋ3ʋ3 ʋ1ʋ1 ʋ2ʋ2 ʋ3ʋ3 Directed GraphUnidirected Graph

7 Network Data  Differ from attribute data  Two ways of presenting network data – as a matrix – as an edge list NicoleTimMike Nicole011 Tim000 Mike100 Vertex 1Vertex 2 NicoleTim NicoleMike Nicole Tim Mike

8 Types of Networks  Full, Partial, and Egocentric Networks – Full: contain all entities – Partial: subset, topic centric – Egocentric: include only individuals who are connected to a specified ego (person)  Unimodal, Multimodal, and Affiliation Networks – Unimodal: one type of vertex – Multimodal: many types of vertex (persons, posts, pictures, etc.) – Affiliation: bimodal network  Multiplex Networks – multiple types of connection (following, reply to, mention, etc.)

9 Network Analysis Metrics  Aggregate Networks Metrics: describing entire networks – Density the level of interconnectedness of the vertices a count of the number of relationships observed to be present in a network divided by the total number of possible relationships that could be present – Centralization the amount to which the network is centered on one or a few important nodes

10 Density  the total number of possible relationships  directed graph e max = n*(n-1)  undirected graph e max = n*(n-1)/2 ʋ1ʋ1 ʋ2ʋ2 ʋ3ʋ3 ʋ1ʋ1 ʋ2ʋ2 ʋ3ʋ3 Directed GraphUnidirected Graph  density = e/ e max  density = 3/ 6 = 0.5  density = 2/ 3 = 0.67

11 Centralization  Freeman’s general formula for centralization: maximum value in the network

12 Degree Centralization C D = 0.167 C D = 1.0

13 Network Analysis Metrics  Vertex-Specific Networks Metrics: describing a specific vertex – Degree Centrality a simple count of the total number of connections linked to a vertex for directed networks; in-degree (point inward) and out-degree (point outward) – Betweenness Centralities the amount to which the network is centered on one or a few important nodes

14 Normalized Degree Centrality  divide by the max. possible, i.e. (N-1)

15 Betweenness Centrality  how many pairs of individuals would have to go through you in order to reach one another in the minimum number of hops?  A lies between no two other vertices  B lies between A and 3 other vertices: C, D, and E  C lies between 4 pairs of vertices (A,D),(A,E),(B,D),(B,E) ABCED

16 NODEXL: INTRODUCTION

17 What is NodeXL?  An open source software for social network analysis  Extension to Microsoft Excel  Easy to use  Provide basic network analysis and visualization features  www.codeplex.com/NodeXL www.codeplex.com/NodeXL

18 NodelXL Template  Edges – Vertex 1 = source – Vertex 2 = destination – Attributes

19 NodeXL Edge List Vertices will be automatically generated

20 Showing the graph There are several automatic layouts that can be selected from the control in the graph pane or in the NodeXL ribbon.

21 Fruchterman-Reingo

22 Harel-Koren Fast Multiscale

23 Adding descriptive data  Color – CSS color names – RGB format (240, 12, 135)  Size – Between 1 and 100  Shape – 1 = Circle – 2 = Disk – 3 = Sphere – 4 = Square – 5 = Solid Square – 6 = Diamond – 7 = Solid Diamond – 8 = Triangle – 9 = Solid Triangle – 10 = Label – 11 = Image  Label

24 Color

25 Autofilling  Allowsyou to provide instructions on how NodeXL should fill in the worksheet columns such as those relating to size and shape.

26 Autofilling

27 Graph with details

28 Calculating Metrics  Network analysis metrics can be automatically calculated in NodeXL.  Once completed, NodeXL displays each vertex specific metric in a set of Graph Metrics columns in the Vertices worksheet.

29 Graph Metrics  Graph type. Undirected or directed  Vertices. The number of total vertices  Unique edges. The number of unique edges found in the edges worksheet.  Edges with duplicates. The number of repeated vertex pairs on the edges worksheet.  Total edges. The number of total edges  Self-loops. The number of edges that connect a vertex with itself.

30 Graph Metrics (Cont')  Connected components. The number of connected components (i.e., clusters of vertices that are connected to each other but separate from other vertices in the graph).  Single vertex connected components. The number of isolated vertices that are not connected to any other vertices in the graph.  Maximum vertices in a connected component. The number of vertices in the connected component with the most vertices.

31 Graph Metrics (Cont')  Maximum edges in a connected component. The number of edges in the connected component with the most edges.  Maximum geodesic distance (diameter). The geodesic distance is the length of the shortest path between two people.  Average geodesic distance. The average of all geodesic distances. This value gives a sense of how “close” community members are from one another.  Graph density. The number between 0 and 1 indicating how interconnected the vertices are in the network.

32 Vertex Specific Metrics  Degree – The degree of a vertex (sometimes called degree Centrality) is a count of the number of unique edges that are connected to it.  Betweenness Centrality – how many pairs of individuals would have to go through you in order to reach one another in the minimum number of hops?  Closeness Centrality – How close each person is to the other people in the network – the inverse of the sum of the shortest distances between the vertex and all other vertices reachable from it

33 Vertex Specific Metrics  Eigenvector Centrality – takes into consideration not only how many connections a vertex has (i.e., its degree), but also the degree of the vertices that it is connected to – a measure of the importance of a node in a network – It assigns relative scores to all nodes in the network based on the principle that connections to high-scoring nodes contribute more to the score of the node in question than equal connections to low-scoring node  Pagerank – the importance of each vertex within the graph using a link analysis algorithm developed by Larry Page  Clustering Coefficient – a vertex in a graph quantifies how close the vertex and its neighbors are to being a clique (complete graph)."

34 Import data from Social Media Social Network Importer  allow users to directly download and import different Facebook networks  http://socialnetimporter.codeplex.com/ http://socialnetimporter.codeplex.com/  Installation Guide – Close NodeXL – Download the zip file from http://socialnetimporter.codeplex.com/ Unzip the file: you will find two items: FacebookAPI.DLL SocialNetImporter.DLL – Copy these files to the NodeXL Plug-ins Directory specified in the "Import Options..." (Using third-party graph data importers in NodeXL Excel Template 2014) – Restart NodeXL: you should see the Facebook Import option in the NodeXL>Data>Import menu.

35 Exercise: Facebook  Download data from your facebook account

36 Edge List

37 Visualizing Social Network

38 Calculate Metrics

39 Vertex Specific Metrics

40 NODEXL: CLUSTERING

41 Clustering  NodexL can automatically identify clusters based on the network structure.  An algorithm will look for groups of densely clustered vertices that are only loosely connected to vertices in another cluster.  The number of clusters is not predetermined; instead the algorithm dynamically determines the number it thinks is best.

42 Clustering Results

43 Visualizing Clusters

44 NODEXL: MULTIMODAL NETWORK

45 Import data from FB Fanpage  Using Social Network Importer to download FB fanpage data  LibraryCMU

46 FB Fanpage Data

47 Visualizing Likes & Comments

48 PERSONALIZATION AND RECOMMENDER SYSTEMS

49 Personalization  Information and services can be modified to meet the unique and specific needs of an individual or a community  by changing presentation, content, and/or services based on a person’s task, background, history, device, information needs, location, etc. (user’s context)

50 Recommender Systems  A type of personalization that learns about a person’s needs and then proactively identify and recommend information that matches those needs  Useful when they identify information a person was previously unaware of  Can be user-driven which involves a user directly invoking and supporting the personalization process by providing explicit input.

51 Collaborative  Content-Based systems focus on properties of items. Similarity of items is determined by measuring the similarity in their properties.  Collaborative-Filtering systems focus on the relationship between users and items. Similarity of items is determined by the similarity of the ratings of those items by the users who have rated both items.

52 Simple way to do it  Transforming Multimodal Affiliation Networks into Unimodal Networks – Bimodal affiliation networks can be transformed into two single-mode networks – person-to-affiliation network  person-to-person network – person-to-page network  person-to-person network – person-to-item network  person-to-person network

53 Person-to-affiliation network Example of person-to-affiliation network

54 Create person-to-affiliation matrix  Use pivot table to create the matrix affiliation user count of relationships

55 Create an affiliation-to-affiliation matrix  Create the matrix by summing up products of the relationships between two affiliations affiliation Sum of product

56 Similarity measures  Cosine-based similarity Also known as vector-based similarity, this formulation views two items and their ratings as vectors, and defines the similarity between them as the angle between these vectors:

57 Example = (4.75,4.5, 5,4.25,4) = (4,3, 5,2,1)

58 Bibilography  Hansen, D., Shneiderman, B., & Smith, M. A. (2010). Analyzing social media networks with NodeXL: Insights from a connected world. Morgan Kaufmann.  Russell, M. A. (2013). Mining the Social Web: Data Mining Facebook, Twitter, LinkedIn, Google+, GitHub, and More. " O'Reilly Media, Inc.".  Xu, G., Zhang, Y., & Li, L. (2010). Web mining and social networking: techniques and applications (Vol. 6). Springer.  Zafarani, R., Abbasi, M. A., & Liu, H. (2014). Social Media Mining: An Introduction. Cambridge University Press.


Download ppt "Social Network Mining for Digital Library Application Dr. Thanachart Ritbumroong King Mongkut’s University of Technology Thonburi Assist. Prof. Dr. Satidchoke."

Similar presentations


Ads by Google