Discovery, Analysis and Monitoring of Hidden Social Networks and their Evolution Malik Magdon-Ismail Rensselaer Polytechnic Institute.

Slides:



Advertisements
Similar presentations
MODULE 1 How to learn English
Advertisements

Applications of one-class classification
O CCUPATION - SPECIFIC L ANGUAGE T RAINING (OSLT) Tsae-Ling Tenney October 2012.
Did I Address ALL Parts of the Assignment?. For Example: HCA#4  Analyzing use of verbal communication  Message communicated  How verbal communication.
Uses and Abuses of the Efficient Frontier Michael Schilmoeller Thursday May 19, 2011 SAAC.
BENJAMIN GAMBOA, RESEARCH ANALYST CRAFTON HILLS COLLEGE RESEARCHING: ALPHA TO ZETA.
Graphical Multi-Task Learning Dan Sheldon Cornell University NIPS SISO Workshop 12/12/2008.
Keystroke Biometrics Study Software Engineering Project Team + DPS Student.
Sparse vs. Ensemble Approaches to Supervised Learning
Explorations in Anonymous Communication Andrew Bortz with Luis von Ahn Nick Hopper Aladdin Center, Carnegie Mellon University, 8/19/2003.
Visualizing Network Data Richard A. Becker Stephen G.Eick Allan R.Wilks.
Lecture 5 (Classification with Decision Trees)
Three kinds of learning
Neural Networks Chapter Feed-Forward Neural Networks.
Data Mining – Intro.
MEETINGS THAT GET THINGS DONE Ideas To Go. Agenda Defining ‘Effective’ Preparation Facilitation Follow up.
Online Chat Visualization Mahesh B Mungara. Chatting With A “Purpose”  Lack of support for chatting with a purpose  Three major requirements -Need of.
April 11, 2008 Data Mining Competition 2008 The 4 th Annual Business Intelligence Symposium Hualin Wang Manager of Advanced.
Application of SAS®! Enterprise Miner™ in Credit Risk Analytics
SWARM INTELLIGENCE IN DATA MINING Written by Crina Grosan, Ajith Abraham & Monica Chis Presented by Megan Rose Bryant.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Welcome everyone! Our presentation today is: Getting the word out about #ALW2015 Presenter: Adam Sawell Session time: 1.00 pm AEDT (Eastern Daylight Savings.
Architecture and leadership in knowledge intensive networks: the case of internet forums on photo Paul Muller , Claude Guittard , Julien Pénin   GRANEM,
Tutor: Prof. A. Taleb-Bendiab Contact: Telephone: +44 (0) CMPDLLM002 Research Methods Lecture 9: Quantitative.
BEING CONSISTENT Jim Fulford Executive Coordinator.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Talking Mats™ and Families living with Dementia Dr Joan Murphy Stirling, Scotland © Talking Mats Limited 2014.
Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.
Hierarchical Distributed Genetic Algorithm for Image Segmentation Hanchuan Peng, Fuhui Long*, Zheru Chi, and Wanshi Siu {fhlong, phc,
Qualitative Data Analysis: An introduction Carol Grbich Chapter 18: Conversation analysis.
LYRICS: WE ARE NEVER GETTING BACK TOGETHER THAT’S WHAT MAKES YOU BEAUTIFUL BY: Allison Clary “What Are you Listening to?” A deeper look into the poetry.
INTERACTIVE ANALYSIS OF COMPUTER CRIMES PRESENTED FOR CS-689 ON 10/12/2000 BY NAGAKALYANA ESKALA.
Categorical data. Decision Tree Classification Which feature to split on? Try to classify as many as possible with each split (This is a good split)
Inferring Agent Dynamics from Social Communication Network Hung-Ching (Justin) Chen Mark Goldberg Malik Magdon-Ismail William A. Wallance RPI, Troy, NY.
Collaborative Information Retrieval - Collaborative Filtering systems - Recommender systems - Information Filtering Why do we need CIR? - IR system augmentation.
Data Mining Algorithms for Large-Scale Distributed Systems Presenter: Ran Wolff Joint work with Assaf Schuster 2003.
Mining Social Networks for Personalized Prioritization Shinjae Yoo, Yiming Yang, Frank Lin, II-Chul Moon [KDD ’09] 1 Advisor: Dr. Koh Jia-Ling Reporter:
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
1 CS 430: Information Discovery Lecture 25 Cluster Analysis 2 Thesaurus Construction.
~Ambassador, Adam Green Presented by:. …Success is a Game… ~The More Times You Play, the More Times You Win. ~And the More Times You Win, the More Successfully.
LOGO Summarizing Conversations with Clue Words Giuseppe Carenini, Raymond T. Ng, Xiaodong Zhou (WWW ’07) Advisor : Dr. Koh Jia-Ling Speaker : Tu.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Mining and Visualizing the Evolution of Subgroups in Social Networks Falkowsky, T., Bartelheimer, J. & Spiliopoulou, M. (2006) IEEE/WIC/ACM International.
Identifying Multi-ID Users in Open Forums Hung-Ching Chen Mark Goldberg Malik Magdon-Ismail.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Measuring Behavioral Trust in Social Networks
Neural Networks Demystified by Louise Francis Francis Analytics and Actuarial Data Mining, Inc.
Community-enhanced De-anonymization of Online Social Networks Shirin Nilizadeh, Apu Kapadia, Yong-Yeol Ahn Indiana University Bloomington CCS 2014.
Data Analytics CMIS Short Course part II Day 1 Part 1: Introduction Sam Buttrey December 2015.
MIT CFP Identity and Privacy: Social TV case study Security and Privacy Working Group CFP Plenary Meeting October 29, 2009.
Discovering Hidden Groups in Communication Networks Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail William Wallace.
LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning.
Final Report (30% final score) Bin Liu, PhD, Associate Professor.
Analysis of Algorithms: Math Review Richard Kelley, Lecture 2.
Ensemble Learning, Boosting, and Bagging: Scaling up Decision Trees (with thanks to William Cohen of CMU, Michael Malohlava of 0xdata, and Manish Amde.
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
SYNERGY: A Game-Theoretical Approach for Cooperative Key Generation in Wireless Networks Jingchao Sun, Xu Chen, Jinxue Zhang, Yanchao Zhang, and Junshan.
Data Mining: Confluence of Multiple Disciplines Data Mining Database Systems Statistics Other Disciplines Algorithm Machine Learning Visualization.
Energy Consumption Forecast Using JMP® Pro 11 Time Series Analysis
Data Mining – Intro.
Zeyu You, Raviv Raich, Yonghong Huang (presenter)
cs540 - Fall 2015 (Shavlik©), Lecture 25, Week 14
Supervised Learning Seminar Social Media Mining University UC3M
Malik Magdon-Ismail Rensselaer Polytechnic Institute
A Latent Space Approach to Dynamic Embedding of Co-occurrence Data
Asking for and Giving Advice
Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.
Topological Signatures For Fast Mobility Analysis
Malik Magdon-Ismail, Konstantin Mertsalov, Mark Goldberg
Outlook Getting Started.
Presentation transcript:

Discovery, Analysis and Monitoring of Hidden Social Networks and their Evolution Malik Magdon-Ismail Rensselaer Polytechnic Institute

2 Our Group  M. Goldberg  M-I  B. Szymanski  A. Wallace Students:  Mykola Hayvanovich  Apirak Hoonlor  Stephen Kelley  Konstantin Mertsalov

3 Motivation Communications supporting IED planning have patterns and are correlated…. Analysis of the patterns can reveal the groups as well as their internal group structure.

4 Communications Time: January 12, 2005, 09:35 From: To: Subject: Hello Message: Where have you been? 16:06:31] Republicans were the worst pacifists before ww1 and ww2 [16:06:43] France Fries [16:06:50] As a generality, of course their were Republican Hawks. [16:07:13] Sweet, good pun but bad story! [16:07:18] yup [16:07:23] anyways, he's perpetually tormented by presidential actions [16:07:25] it aint good for no one [16:07:47] I think they knew it was commiing [16:07:51] Rossevelt met monthly in New York with mostly trusted Republicans to talk about how to get america into the war. [16:08:10] and he spent 2 year with Churchill meeting him sometimes secretly in the ocean to discuss the same topic. [16:08:22] Exchanging a lot of letters. [16:08:25] telegrams [16:08:28] There really is nothing like a shorn scrotum. It's breathtaking, I suggest you try it. [16:08:55] Well they didnt literally meet in the ocean, they were on ships.

5 Streaming Example Time From To Message 10:00 Alice Charlie Golf tomorrow? Tell everyone. 10:05 Charlie Felix Alice mentioned golf tomorrow. 10:06 Alice Bob Hey, golf tomorrow. Spread the word. 10:12 Alice Bob Tee off: 8am at Pinehurst. 10:13 Felix Grace Hey guys, golf tomorrow. 10:13 Felix Harry Hey guys, golf tomorrow. 10:15 Alice Charlie Pinehurst Tee time: 8am. 10:20 Bob Elizabeth We’re playing golf tomorrow. 10:20 Bob Dave We’re playing golf tomorrow. 10:22 Charlie Felix Tee time 8am at Pinehurst 10:25 Bob Elizabeth We tee off 8am at Pinehurst. 10:25 Bob Dave We tee off 8am at Pinehurst. 10:31 Felix Grace Tee time 8am, Pinehurst. 10:31 Felix Harry Tee time 8am, Pinehurst.

6 Streaming Example Time From To Message 10:00 Alice Charlie Golf tomorrow? Tell everyone. 10:05 Charlie Felix Alice mentioned golf tomorrow. 10:06 Alice Bob Hey, golf tomorrow. Spread the word. 10:12 Alice Bob Tee off: 8am at Pinehurst. 10:13 Felix Grace Hey guys, golf tomorrow. 10:13 Felix Harry Hey guys, golf tomorrow. 10:15 Alice Charlie Pinehurst Tee time: 8am. 10:20 Bob Elizabeth We’re playing golf tomorrow. 10:20 Bob Dave We’re playing golf tomorrow. 10:22 Charlie Felix Tee time 8am at Pinehurst 10:25 Bob Elizabeth We tee off 8am at Pinehurst. 10:25 Bob Dave We tee off 8am at Pinehurst. 10:31 Felix Grace Tee time 8am, Pinehurst. 10:31 Felix Harry Tee time 8am, Pinehurst.

7 Streaming Example Time From To 10:00 Alice Charlie 10:05 Charlie Felix 10:06 Alice Bob 10:12 Alice Bob 10:13 Felix Grace 10:13 Felix Harry 10:15 Alice Charlie 10:20 Bob Elizabeth 10:20 Bob Dave 10:22 Charlie Felix 10:25 Bob Elizabeth 10:25 Bob Dave 10:31 Felix Grace 10:31 Felix Harry

8 Overview: SIGHTS & RDM buy,trade...buy 2trade...2trade 3,sell...3,hell Pattern id = 2 Pattern = “buy,” Pattern id = 3 Pattern = “2trade”bb Level 0 Level 1 Level 2 Higher ranked leaders Group leader Subgroup leaders Members

9 Communications  , Telephone, Newsgroup, Weblog, Chatrooms, … Time: January 12, 2005, 09:35 From: To: Subject: Hello Message: Where have you been lately? Time: January 12, 2005, 09:35 From: To: Subject: Hello Message: Where have you been lately?

10 Communication Graph January 12, 2005, 09:35

11 Communication Graph What are the social groups/coalitions?

12 Social Groups are Clusters  Clusters may overlap.

13 Social Groups are Clusters  Clusters may overlap.  A cluster is a locally defined object.

14 Social Groups are Clusters  Clusters may overlap.  A cluster is a locally defined object.  Group members are more introverted than extroverted. YESNO

15 Social Groups are Clusters  Clusters may overlap.  A cluster is a locally defined object.  Group members are more introverted than extroverted.  Social groups (clusters) persist

16 SIGHTS Statistical Identification of Groups Hidden in Time and Space - System for statistical analysis of social coalitions in communication networks Data Sources Blogs s (Enron) Chatroom Synthetic data Coalition Discovery Overlapping Clustering Streaming groups Persistent groups. Coalition Analysis Leaders Opposing groups Topic matching Visualizations Size-Density plots Static coalitions Dynamic coalitions Groups matching analyst topic in red Size vs. Density Plot Visualization options Choose time window Group members Different analyses on dataset Leader index

17 Examples Two clusters: Electric circuit design; Optimization of Neural Networks: Intersection: “Sensitivity analysis in degenerate quadratic programming” Citeseer ENRON GROUND TRUTH  Group A  Dog  Vulture  Camel  Yassir Hussein  Bird  (6 others)  Group B  Ahmet  Saleh Sarwuk  Shaid  Pavlammed Pavlah  Osan Domenik SIGHTS  Group A  Dog  Vulture  Camel  Gopher  Group B  Ahmet  Saleh Sarwuk  Shaid  Dajik Ali Baba Data Set (DoD)

18 Build a classifier to identify the relationship between sender and receiver of a message EXAMPLE: “ Do you have time to meet some time this week? ” “ Lets meet 2pm today, ok? ” Which is advisor, which is student? Recursive Data Mining (RDM)

19 Hierarchical Pattern Construction (recursive definition) Captures patterns; patterns of patterns; patterns of patterns of patterns… (can even capture long-range patterns) Pattern Definition Larger patterns buy,trade...buy 2trade...2trade 3,sell...3,hell Pattern id = 2 Pattern = “buy,” Pattern id = 3 Pattern = “2trade” Pattern id = 4 Pattern = “3,_ell” Level 0 Level 1 Level 2

20  Ensemble of classifiers  Classifier for each level in the hierarchical approach  Features gathered from the training messages  Global features include average length and number of sentences  Approximate matching allows treatment of noise A Classifier – Joining the Pieces

21 Binary classification: for a given message m, is m sent by a person with role r? r є {CEO, Manager, Trader, Vice-President} Multi-classification: for a given message m, which role r is the most likely for the sender? r є {CEO, Manager, Trader, Vice-President} The bars show the error of classification. Universally RDM_SVM outperforms other classifiers Results on Enron

22 Summing Up  SIGHTS:  Structural; non-semantic; language independent  Finds groups, their dynamics and structure; visual analytic capabilities.  RDM  Uses statistical semantics; language independent  Identifies roles within the group

23 Thank You