Social Network Mining for Digital Library Application Dr. Thanachart Ritbumroong King Mongkut’s University of Technology Thonburi Assist. Prof. Dr. Satidchoke.

Slides:



Advertisements
Similar presentations
1 Copyright © 2011, Elsevier Inc. All rights Reserved Chapter 15 Wiki Networks Connections of Creativity and Collaboration Analyzing Social Media Networks.
Advertisements

Network Overview Discovery and Exploration for Excel (NodeXl) Hands On Exercise Presented by: Samer Al-khateeb Class: Social Media Mining and Analytics.
Dr. Henry Hexmoor Department of Computer Science Southern Illinois University Carbondale Network Theory: Computational Phenomena and Processes Social Network.
A Graph-based Recommender System Zan Huang, Wingyan Chung, Thian-Huat Ong, Hsinchun Chen Artificial Intelligence Lab The University of Arizona 07/15/2002.
Introduction to NodeXL Like MSPaint™ for graphs. — the Community.
A Brief Overview on Some Recent Study of Graph Data Yunkai Liu, Ph. D., Gannon University.
Introduction to Network Theory: Modern Concepts, Algorithms
Analysis and Modeling of Social Networks Foudalis Ilias.
Exploring Office Grauer and Barber 1 Committed to Shaping the Next Generation of IT Experts. Chapter 1 – Introduction to Excel: What is a Spreadsheet?
Tutorial 6: Managing Multiple Worksheets and Workbooks
Telligent Social Analytics Research & Tools Marc A. Smith Chief Social Scientist Telligent Systems.
2 The NodeXL Project Team About Me Introductions Marc A. Smith Chief Social Scientist Connected Action Consulting Group
Relationship Mining Network Analysis Week 5 Video 5.
 Copyright 2011 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute Enabling Networked Knowledge.
By: Roma Mohibullah Shahrukh Qureshi
UNDERSTANDING VISIBLE AND LATENT INTERACTIONS IN ONLINE SOCIAL NETWORK Presented by: Nisha Ranga Under guidance of : Prof. Augustin Chaintreau.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Centrality and Prestige HCC Spring 2005 Wednesday, April 13, 2005 Aliseya Wright.
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved Graphs.
Enabling citizen mapping of government networks with NodeXL Derek Hansen and Marc Smith.
1 of 5 This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT. © 2006 Microsoft Corporation.
CS8803-NS Network Science Fall 2013
SOCIAL NETWORK ANALYSIS basic concepts and techniques.
Network Measures Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Network Measures Klout.
Department of Computer Science, University of California, Irvine Site Visit for UC Irvine KD-D Project, April 21 st 2004 The Java Universal Network/Graph.
Social Networks Corina Ciubuc.
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
Exploring Excel 2003 Revised - Grauer and Barber 1 Committed to Shaping the Next Generation of IT Experts. Chapter 1 – Introduction to Excel: What is a.
In addition to Word, Excel, PowerPoint, and Access, Microsoft Office® 2013 includes additional applications, including Outlook, OneNote, and Office Web.
Social Network Analysis: A Non- Technical Introduction José Luis Molina Universitat Autònoma de Barcelona
Principles of Social Network Analysis. Definition of Social Networks “A social network is a set of actors that may have relationships with one another”
Chapter 6 SAS ® OLAP Cube Studio. Section 6.1 SAS OLAP Cube Studio Architecture.
COM1721: Freshman Honors Seminar A Random Walk Through Computing Lecture 2: Structure of the Web October 1, 2002.
Vertices and Edges Introduction to Graphs and Networks Mills College Spring 2012.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 25, 2012.
Digital Image Processing CCS331 Relationships of Pixel 1.
An Internet of Things: People, Processes, and Products in the Spotfire Cloud Library Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist.
Special Topics in Educational Data Mining HUDK5199 Spring 2013 March 25, 2012.
Lecture 13: Network centrality Slides are modified from Lada Adamic.
A project from the Social Media Research Foundation: Finding direction in a sea of connection:
ANALYZING THE SOCIAL WEB an introduction 1. OUTLINE 1.Introduction 2.Network Structure and Measures 3.Social Information Filtering 2.
Slides are modified from Lada Adamic
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
CS 590 Term Project Epidemic model on Facebook
Selected Topics in Data Networking Explore Social Networks: Center and Periphery.
Microsoft Office 2013 Try It! Chapter 4 Storing Data in Access.
How to Analyse Social Network? Social networks can be represented by complex networks.
Informatics tools in network science
Chapter 10: Working with Large Data Spreadsheet-Based Decision Support Systems Prof. Name Position (123) University Name.
Importance Measures on Nodes Lecture 2 Srinivasan Parthasarathy 1.
Topical Analysis and Visualization of (Network) Data Using Sci2 Ted Polley Research & Editorial Assistant Cyberinfrastructure for Network Science Center.
Bennington’s Community Health Network. Study Objective Objective Describe the network of organizations that has emerged in each Blueprint HSA to support.
Mapping Your Digital Audiences Nicole Fernandez, Georgetown Erin Gamble, Charrosé King,
GRAPH AND LINK MINING 1. Graphs - Basics 2 Undirected Graphs Undirected Graph: The edges are undirected pairs – they can be traversed in any direction.
Microsoft Excel Illustrated Introductory Workbooks and Preparing them for the Web Managing.
GUILLOU Frederic. Outline Introduction Motivations The basic recommendation system First phase : semantic similarities Second phase : communities Application.
Data Mining: Concepts and Techniques
Classroom network analysis
Social Networks Analysis
Structural Properties of Networks: Introduction
Principles of Network Analysis
Department of Computer and IT Engineering University of Kurdistan
Network analysis.
Network Science: A Short Introduction i3 Workshop
SOCIAL NETWORK ANALYSIS
Centralities (4) Ralucca Gera,
CS 594: Empirical Methods in HCC Social Network Analysis in HCI
(Social) Networks Analysis II
Practical Applications Using igraph in R Roger Stanton
Analyzing Massive Graphs - ParT I
Presentation transcript:

Social Network Mining for Digital Library Application Dr. Thanachart Ritbumroong King Mongkut’s University of Technology Thonburi Assist. Prof. Dr. Satidchoke Phosaard Suranaree University of Technology

Agenda  Social Media Mining Concepts  Data Extraction and Preparation  Social Network Analysis  Social Media Mining for Recommender System

SOCIAL MEDIA CONCEPTS

Social Media Mining  The process of representing, analyzing, and extracting actionable patterns from social media data  The study on how individuals (also known as social atoms) interact and how Social Molecule communities (i.e., social molecules) form  Social Media is … “the group of internet-based applications that build on the ideological and technological foundations of Web 2.0, and that allow the creation and exchange of user- generated content” … (Kaplan and Haenlein, 2010)

Applications  Facebook: People you may know  Amazon: Other customers suggested these items  Netflix: movie suggestions for you  Targeted marketing  Online advertising

Major Components of a Network  Vertices – Nodes, agents, entities, or items – Representing people, or social structures (workgroups, teams, organizations, institutions, states, or countries)  Edges – Links, ties, connections, or relationships – Connecting two vertices together – Representing proximity, collaborations, partnerships, transactions, etc. ʋ1ʋ1 ʋ2ʋ2 ʋ3ʋ3 ʋ1ʋ1 ʋ2ʋ2 ʋ3ʋ3 Directed GraphUnidirected Graph

Network Data  Differ from attribute data  Two ways of presenting network data – as a matrix – as an edge list NicoleTimMike Nicole011 Tim000 Mike100 Vertex 1Vertex 2 NicoleTim NicoleMike Nicole Tim Mike

Types of Networks  Full, Partial, and Egocentric Networks – Full: contain all entities – Partial: subset, topic centric – Egocentric: include only individuals who are connected to a specified ego (person)  Unimodal, Multimodal, and Affiliation Networks – Unimodal: one type of vertex – Multimodal: many types of vertex (persons, posts, pictures, etc.) – Affiliation: bimodal network  Multiplex Networks – multiple types of connection (following, reply to, mention, etc.)

Network Analysis Metrics  Aggregate Networks Metrics: describing entire networks – Density the level of interconnectedness of the vertices a count of the number of relationships observed to be present in a network divided by the total number of possible relationships that could be present – Centralization the amount to which the network is centered on one or a few important nodes

Density  the total number of possible relationships  directed graph e max = n*(n-1)  undirected graph e max = n*(n-1)/2 ʋ1ʋ1 ʋ2ʋ2 ʋ3ʋ3 ʋ1ʋ1 ʋ2ʋ2 ʋ3ʋ3 Directed GraphUnidirected Graph  density = e/ e max  density = 3/ 6 = 0.5  density = 2/ 3 = 0.67

Centralization  Freeman’s general formula for centralization: maximum value in the network

Degree Centralization C D = C D = 1.0

Network Analysis Metrics  Vertex-Specific Networks Metrics: describing a specific vertex – Degree Centrality a simple count of the total number of connections linked to a vertex for directed networks; in-degree (point inward) and out-degree (point outward) – Betweenness Centralities the amount to which the network is centered on one or a few important nodes

Normalized Degree Centrality  divide by the max. possible, i.e. (N-1)

Betweenness Centrality  how many pairs of individuals would have to go through you in order to reach one another in the minimum number of hops?  A lies between no two other vertices  B lies between A and 3 other vertices: C, D, and E  C lies between 4 pairs of vertices (A,D),(A,E),(B,D),(B,E) ABCED

NODEXL: INTRODUCTION

What is NodeXL?  An open source software for social network analysis  Extension to Microsoft Excel  Easy to use  Provide basic network analysis and visualization features 

NodelXL Template  Edges – Vertex 1 = source – Vertex 2 = destination – Attributes

NodeXL Edge List Vertices will be automatically generated

Showing the graph There are several automatic layouts that can be selected from the control in the graph pane or in the NodeXL ribbon.

Fruchterman-Reingo

Harel-Koren Fast Multiscale

Adding descriptive data  Color – CSS color names – RGB format (240, 12, 135)  Size – Between 1 and 100  Shape – 1 = Circle – 2 = Disk – 3 = Sphere – 4 = Square – 5 = Solid Square – 6 = Diamond – 7 = Solid Diamond – 8 = Triangle – 9 = Solid Triangle – 10 = Label – 11 = Image  Label

Color

Autofilling  Allowsyou to provide instructions on how NodeXL should fill in the worksheet columns such as those relating to size and shape.

Autofilling

Graph with details

Calculating Metrics  Network analysis metrics can be automatically calculated in NodeXL.  Once completed, NodeXL displays each vertex specific metric in a set of Graph Metrics columns in the Vertices worksheet.

Graph Metrics  Graph type. Undirected or directed  Vertices. The number of total vertices  Unique edges. The number of unique edges found in the edges worksheet.  Edges with duplicates. The number of repeated vertex pairs on the edges worksheet.  Total edges. The number of total edges  Self-loops. The number of edges that connect a vertex with itself.

Graph Metrics (Cont')  Connected components. The number of connected components (i.e., clusters of vertices that are connected to each other but separate from other vertices in the graph).  Single vertex connected components. The number of isolated vertices that are not connected to any other vertices in the graph.  Maximum vertices in a connected component. The number of vertices in the connected component with the most vertices.

Graph Metrics (Cont')  Maximum edges in a connected component. The number of edges in the connected component with the most edges.  Maximum geodesic distance (diameter). The geodesic distance is the length of the shortest path between two people.  Average geodesic distance. The average of all geodesic distances. This value gives a sense of how “close” community members are from one another.  Graph density. The number between 0 and 1 indicating how interconnected the vertices are in the network.

Vertex Specific Metrics  Degree – The degree of a vertex (sometimes called degree Centrality) is a count of the number of unique edges that are connected to it.  Betweenness Centrality – how many pairs of individuals would have to go through you in order to reach one another in the minimum number of hops?  Closeness Centrality – How close each person is to the other people in the network – the inverse of the sum of the shortest distances between the vertex and all other vertices reachable from it

Vertex Specific Metrics  Eigenvector Centrality – takes into consideration not only how many connections a vertex has (i.e., its degree), but also the degree of the vertices that it is connected to – a measure of the importance of a node in a network – It assigns relative scores to all nodes in the network based on the principle that connections to high-scoring nodes contribute more to the score of the node in question than equal connections to low-scoring node  Pagerank – the importance of each vertex within the graph using a link analysis algorithm developed by Larry Page  Clustering Coefficient – a vertex in a graph quantifies how close the vertex and its neighbors are to being a clique (complete graph)."

Import data from Social Media Social Network Importer  allow users to directly download and import different Facebook networks   Installation Guide – Close NodeXL – Download the zip file from Unzip the file: you will find two items: FacebookAPI.DLL SocialNetImporter.DLL – Copy these files to the NodeXL Plug-ins Directory specified in the "Import Options..." (Using third-party graph data importers in NodeXL Excel Template 2014) – Restart NodeXL: you should see the Facebook Import option in the NodeXL>Data>Import menu.

Exercise: Facebook  Download data from your facebook account

Edge List

Visualizing Social Network

Calculate Metrics

Vertex Specific Metrics

NODEXL: CLUSTERING

Clustering  NodexL can automatically identify clusters based on the network structure.  An algorithm will look for groups of densely clustered vertices that are only loosely connected to vertices in another cluster.  The number of clusters is not predetermined; instead the algorithm dynamically determines the number it thinks is best.

Clustering Results

Visualizing Clusters

NODEXL: MULTIMODAL NETWORK

Import data from FB Fanpage  Using Social Network Importer to download FB fanpage data  LibraryCMU

FB Fanpage Data

Visualizing Likes & Comments

PERSONALIZATION AND RECOMMENDER SYSTEMS

Personalization  Information and services can be modified to meet the unique and specific needs of an individual or a community  by changing presentation, content, and/or services based on a person’s task, background, history, device, information needs, location, etc. (user’s context)

Recommender Systems  A type of personalization that learns about a person’s needs and then proactively identify and recommend information that matches those needs  Useful when they identify information a person was previously unaware of  Can be user-driven which involves a user directly invoking and supporting the personalization process by providing explicit input.

Collaborative  Content-Based systems focus on properties of items. Similarity of items is determined by measuring the similarity in their properties.  Collaborative-Filtering systems focus on the relationship between users and items. Similarity of items is determined by the similarity of the ratings of those items by the users who have rated both items.

Simple way to do it  Transforming Multimodal Affiliation Networks into Unimodal Networks – Bimodal affiliation networks can be transformed into two single-mode networks – person-to-affiliation network  person-to-person network – person-to-page network  person-to-person network – person-to-item network  person-to-person network

Person-to-affiliation network Example of person-to-affiliation network

Create person-to-affiliation matrix  Use pivot table to create the matrix affiliation user count of relationships

Create an affiliation-to-affiliation matrix  Create the matrix by summing up products of the relationships between two affiliations affiliation Sum of product

Similarity measures  Cosine-based similarity Also known as vector-based similarity, this formulation views two items and their ratings as vectors, and defines the similarity between them as the angle between these vectors:

Example = (4.75,4.5, 5,4.25,4) = (4,3, 5,2,1)

Bibilography  Hansen, D., Shneiderman, B., & Smith, M. A. (2010). Analyzing social media networks with NodeXL: Insights from a connected world. Morgan Kaufmann.  Russell, M. A. (2013). Mining the Social Web: Data Mining Facebook, Twitter, LinkedIn, Google+, GitHub, and More. " O'Reilly Media, Inc.".  Xu, G., Zhang, Y., & Li, L. (2010). Web mining and social networking: techniques and applications (Vol. 6). Springer.  Zafarani, R., Abbasi, M. A., & Liu, H. (2014). Social Media Mining: An Introduction. Cambridge University Press.