Presentation is loading. Please wait.

Presentation is loading. Please wait.

Social Network Analysis Aerospace Data Mining Center

Similar presentations


Presentation on theme: "Social Network Analysis Aerospace Data Mining Center"— Presentation transcript:

1 Social Network Analysis Aerospace Data Mining Center
Overview of Social Network Analysis Donna Nystrom Aerospace Data Mining Center (310)

2 Outline Social Network Analysis (SNA) Example applications
Why is it relevant? What is it? How does it work? Example applications Washington Regional Pawn Data Sharing System Florida Department of Corrections

3 Why is SNA Relevant? Adversaries across the spectrum of conflict and crime are using network centric operations Cheap and easy to move information Movement of information is hard to detect Conflict is now a game of hide and seek Analysts must find adversaries and their activities Collect far more data than analysts can process Vast majority of the data is irrelevant Key to finding relevant data is relationships between entities Automated filtering and analysis is needed SNA can be used to filter and analyze data in the context of relationships between entities

4 What is SNA? The mapping, measuring and analysis of relationships and flows between people, groups, organizations, computers or other entities Nodes in the network represent entities while the edges show relationships or flows between the nodes Not a fixed methodology, but a collection of strategies for analyzing network data Applicable to a wide variety of domains involving complex systems SNA Networks are represented visually and mathematically as graphs Nodes are entities Links represent relationships or flows Apply SNA Group nodes into cliques Place central nodes in the interior Place peripheral nodes at the edges

5 How does SNA work? Typically not interested in
Fully connected networks Regular networks Random networks Fully Connected Regular Random Source: Stocker, R., Green, D. G., and Newth, D. (2001) Consensus and Cohesion in Simulated Social Networks, Journal of Artificial Societies and Social Simulation, Vol 4, No 4.

6 How does SNA work? Scale free networks
Connections are not randomly distributed Connections are not randomly added Distribution of node connections follows the Power Law Small world theory Scale free networks occur in many domains involving complex systems Number of Links Number of Nodes Scale Free ramifications Properties are the same regardless of network size Some well-connected nodes act as hubs and it doesn’t take many connections to get from any node to any other node. Small world theory was originated by Stanley Milgram in the 1960’s. An example of his experiments: a source person in Nebraska was given a letter to deliver to a target person in Massachusetts. The source was given basic information about the target, e.g. name, address, and occupation. The source was instructed to send the letter to a person that he knew on a first name basis in an effort to get the letter to the target. The recipient of the letter would be given the same instructions. Over many trials, the average number of intermediate steps to get from the source to the target was found to be between 5 and 6. Hence the popular “six degrees of separation” principle. Connectedness of a random graph decays steadily as random nodes fail, eventually breaking into unconnected subgroups that cannot communicate. Connectedness of a scale free graph may show little degradation with random node failure. The hubs, which are statistically unlikely to fail under random circumstances, keep the network connected. If a random failure does occur in a hub, then the network may fail catastrophically. In the case of targeted attacks, especially if those attacks are aimed at the hubs, the scale free network fails catastrophically. Source: Stocker, R., Green, D. G., and Newth, D. (2001) Consensus and Cohesion in Simulated Social Networks, Journal of Artificial Societies and Social Simulation, Vol 4, No 4.

7 How does SNA work? Roles Structure Processes Leaders Gatekeepers
Outliers Structure Subgroups Bridges Holes Processes Flow Evolution Resilience Scattered Clusters Hub Gatekeepers have many connections and are central Multi-Hub Core/periphery Source: Krebs, V. And Holley, J. Building Sustainable Communities Through Network Building.

8 How does SNA work? Node Metrics Degree Betweeness Closeness Jane Boy
Tarzan Cheetah Jungle Jim George Ursula At first glance, it may appear that Cheetah is the most powerful member of this network. He has the most direct connections and so has high degree, but all of the connections are within his own clique. In terms of importance in a network, how many connections one has is not as important as who those connections lead to and how they connect the otherwise unconnected. Jungle Jim has few direct ties, so he has low degree but he has high betweeness. In many ways, he has one of the most powerful positions in this network. He connects two cliques that would otherwise be disconnected. He is a broker. Unfortunately, he is also a single point of failure. Tarzan and Tamba have fewer ties than Cheetah, but through their direct and indirect ties they can access the entire network in fewer steps than anyone else and so have high closeness. They have the best view of what is happening in the network. Tarzan, Tamba and Jungle Jim have high network metrics and are known as boundary spanners. They span the boundary between two cliques. They have visibility and access into both cliques and therefore hold powerful positions in the network. George and Ursula are peripheral players in this network, so they have low network metrics. This does not necessarily mean that they are not important. They may be boundary spanners for other cliques or networks that lie outside this network’s boundary. Buli Tamba Bonzo Kite Network developed by David Krackhardt

9 How does SNA work? Node Metrics Network Metrics Degree Betweeness
Jane Degree Betweeness Closeness Boy Tarzan Cheetah Jungle Jim George Ursula Degree distribution – compute distribution of number of degrees over all nodes. This is usually compared to the Power Law curve and characterized by the exponent of that equation. Betweeness distribution – compute distribution of betweeness over all nodes. Also follows Power Law. Average path length – compute the shortest path between each pair of nodes and average. This is the “degree of separation” and is a characteristic property of small worlds. Average path length is typically between 2 and 8 for small world networks. Clustering coefficient – given a triad of nodes with at least two connections present, what is the probability that the third connection also exists. That is, what is the probability that a friend of my friend is also a friend. Typically compared to the clustering coefficient for a random graph with the same number of nodes. Higher clustering coefficients indicate higher “cliquishness” in the network. Network Metrics Buli Tamba Degree distribution Betweeness distribution Bonzo Average path length Clustering coefficient Kite Network developed by David Krackhardt

10 Aerospace Data Mining Toolkit
Input Cleansing & Conditioning Link Discovery Analysis Data Results Pattern Analysis Predictive Social Network Pattern-based query Entity-based All links Warehouse Iterative query

11 Temporal Network

12 Temporal Network Showed to POC. Correlation is showing similar behavior, but not strong indicator of cooperative behavior. Behavior is indicative of shop lifting to support a drug habit. Correlated gaps in activity produced by dropouts in database. POC suggested narrowing correlation to transactions within two hours at same pawn shop, but DC data isn’t time stamped. POC would like to home in on burglary ring rather than low level boosters. Time of day and day of week displays. Shoplifters tend to pawn all day long. Burglaries are often committed overnight, so pawning takes place primarily in the morning hours. Need to get more out of data that has only date. Try more sophisticated correlation using pattern matching.

13 Summary Analysts are overwhelmed with massive amounts of data and automated filtering and analysis is needed. SNA can be used to filter and analyze data in the context of relationships between entities. Aerospace projects are applying data mining and SNA in novel ways to address the needs of our public safety partners. Because of the versatility of SNA, applications to new problems and domains are expected in the future.


Download ppt "Social Network Analysis Aerospace Data Mining Center"

Similar presentations


Ads by Google