Extract: Mining Social Features from WLAN Traces A Gender-Based Case Study Udayan Kumar and Ahmed Helmy Computer and Information Sciences and Engineering,

Slides:



Advertisements
Similar presentations
AMCS/CS229: Machine Learning
Advertisements

By Venkata Sai Pulluri ( ) Narendra Muppavarapu ( )
SoNIC: Classifying Interference in Sensor Networks Frederik Hermans et al. Uppsala University, Sweden IPSN 2013 Presenter: Jeffrey.
Preference-based Mobility Model and the Case for Congestion Relief in WLANs using Ad hoc Networks Wei-jen Hsu, Kashyap Merchant, Haw-wei Shu, Chih-hsin.
Assessment. Schedule graph may be of help for selecting the best solution Best solution corresponds to a plateau before a high jump Solutions with very.
Mobility and Predictability of Ultra Mobile Users Jeeyoung Kim and Ahmed Helmy.
Dept. of Computer Science & Engineering, CUHK1 Trust- and Clustering-Based Authentication Services in Mobile Ad Hoc Networks Edith Ngai and Michael R.
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
An incomplete picture: GPS inaccuracy and audio upload problems Data Analysis: The Final Frontier Breaking the Norm: the utilization of the G1 mobile phone.
1 Validation and Verification of Simulation Models.
Cluster Validation.
Impact of Different Mobility Models on Connectivity Probability of a Wireless Ad Hoc Network Tatiana K. Madsen, Frank H.P. Fitzek, Ramjee Prasad [tatiana.
The Analysis of Variance
On Comparing Classifiers: Pitfalls to Avoid and Recommended Approach Published by Steven L. Salzberg Presented by Prakash Tilwani MACS 598 April 25 th.
A Survey of Mobile Phone Sensing Michael Ruffing CS 495.
Ubiquitous Advertising: the Killer Application for the 21st Century Author: John Krumm Presenter: Anh P. Nguyen
Data Mining for Intrusion Detection: A Critical Review Klaus Julisch From: Applications of data Mining in Computer Security (Eds. D. Barabara and S. Jajodia)
Ahmed Helmy Computer and Information Science and Engineering (CISE) Department University of Florida Founder.
HAPORI: CONTEXT-BASED LOCAL SEARCH FOR MOBILE PHONES USING COMMUNITY BEHAVIORAL MODELING AND SIMILARITY Presented By: Brandon Ochs Nicholas D. Lane, Dimitrios.
SoundSense: Scalable Sound Sensing for People-Centric Application on Mobile Phones Hon Lu, Wei Pan, Nocholas D. lane, Tanzeem Choudhury and Andrew T. Campbell.
Ambulation : a tool for monitoring mobility over time using mobile phones Computational Science and Engineering, CSE '09. International Conference.
Detecting Node encounters through WiFi By: Karim Keramat Jahromi Supervisor: Prof Adriano Moreira Co-Supervisor: Prof Filipe Meneses Oct 2013.
Chapter 1: Introduction to Statistics
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 13.
Control Over WirelessHART Network S. Han, X. Zhu, Al Mok University of Texas at Austin M. Nixon, T. Blevins, D. Chen Emerson Process Management.
EXTRACT: MINING SOCIAL FEATURES FROM WLAN TRACES: A GENDER-BASED CASE STUDY By Udayan Kumar Ahmed Helmy University of Florida Presented by Ahmed Alghamdi.
IMSS005 Computer Science Seminar
Gender based analysis Udayan Kumar Computer and Information Science and Engineering (CISE) Department, University Of Florida, Gainesville, FL.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Exploring Metropolitan Dynamics with an Agent- Based Model Calibrated using Social Network Data Nick Malleson & Mark Birkin School of Geography, University.
Find regular encounter pattern from mobile users. Regular encounter indicates an encounter trend that is repetitive and consistent. Using this metric can.
WALKING IN FACEBOOK: A CASE STUDY OF UNBIASED SAMPLING OF OSNS junction.
A novel approach of gateway selection and placement in cellular Wi-Fi system Presented By Rajesh Prasad.
Lecture 20: Cluster Validation
Keystroke Recognition using WiFi Signals
Hiding in the Mobile Crowd: Location Privacy through Collaboration.
Energy Efficient Location Sensing Brent Horine March 30, 2011.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Spatial Interpolation III
A Region Based Stereo Matching Algorithm Using Cooperative Optimization Zeng-Fu Wang, Zhi-Gang Zheng University of Science and Technology of China Computer.
Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.
1 Value of information – SITEX Data analysis Shubha Kadambe (310) Information Sciences Laboratory HRL Labs 3011 Malibu Canyon.
Network Community Behavior to Infer Human Activities.
Measuring Behavioral Trust in Social Networks
Module III Multivariate Analysis Techniques- Framework, Factor Analysis, Cluster Analysis and Conjoint Analysis Research Report.
Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉 教授 : 許毅然 作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.
Predicting the Location and Time of Mobile Phone Users by Using Sequential Pattern Mining Techniques Mert Özer, Ilkcan Keles, Ismail Hakki Toroslu, Pinar.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
Saving Bitrate vs. Users: Where is the Break-Even Point in Mobile Video Quality? ACM MM’11 Presenter: Piggy Date:
Distinguishing humans from robots in web search logs preliminary results using query rates and intervals Omer Duskin Dror G. Feitelson School of Computer.
Erik Nicholson COSC 352 March 2, WPA Wi-Fi Protected Access New security standard adopted by Wi-Fi Alliance consortium Ensures compliance with different.
Weighted Waypoint Mobility Model and Its Impact on Ad Hoc Networks Electrical Engineering Department UNIVERSITY OF SOUTHERN CALIFORNIA USC Kashyap Merchant,
Accurate WiFi Packet Delivery Rate Estimation and Applications Owais Khan and Lili Qiu. The University of Texas at Austin 1 Infocom 2016, San Francisco.
Anomaly Detection Carolina Ruiz Department of Computer Science WPI Slides based on Chapter 10 of “Introduction to Data Mining” textbook by Tan, Steinbach,
Privacy Vulnerability of Published Anonymous Mobility Traces Chris Y. T. Ma, David K. Y. Yau, Nung Kwan Yip (Purdue University) Nageswara S. V. Rao (Oak.
Mingze Zhang, Mun Choon Chan and A. L. Ananda School of Computing
Statistical Data Analysis
Cristian Ferent and Alex Doboli
Mining Spatio-Temporal Reachable Regions over Massive Trajectory Data
Software Requirements analysis & specifications
CSE 4705 Artificial Intelligence
Location Recommendation — for Out-of-Town Users in Location-Based Social Network Yina Meng.
CIS 4930/6930, Spring 2018 Experiment 1: Encounter Tracing using Bluetooth Due Date: Feb 19, beginning of class Ph.D. student lead: Mimonah Al-Qathrady.
Cluster Validity For supervised classification we have a variety of measures to evaluate how good our model is Accuracy, precision, recall For cluster.
Keystroke Recognition using Wi-Fi Signals
Statistical Data Analysis
Chapter 7: Sampling Distributions
Retrieval Performance Evaluation - Measures
A handbook on validation methodology. Metrics.
Presentation transcript:

Extract: Mining Social Features from WLAN Traces A Gender-Based Case Study Udayan Kumar and Ahmed Helmy Computer and Information Sciences and Engineering, University of Florida, Gainesville ukumar, 1

Introduction Mobile service more ubiquitous, human centric – Devices (e.g., mobile phones) as sensors, and the human society as the sensor network Challenge: to realistically model social behavior for the simulation, evaluation and design of future mobile networks Approach: capture and analyze extensive network traces – How much information can be extracted from the traces?!! We present, as a first step, systematic methods to classify WLAN users into groups based on WLAN usage traces into social clusters using features like gender, study major 2

Outline 1.Traces Used 2.Classification Methods I.Location Based Classification A.Individual Behavior Based Filter B.Group Behavior Based Filter C.Hybrid Filter D.Validation of Location Based Method II.Name Based Classification A.Validation of Location Based Method via Name Based Classification 3.User Behavior Analysis I.Spatial Distribution II.Temporal Distribution III.Device Preferences 4.Applications 5.Conclusion and Future Work 3

1. Traces Analyzed Consider WLAN association traces from 2 University campus – U1 and U2 (names omitted for privacy) Traces provide following information – MAC, Association Start time, Duration, Location/AP names. Traces from U2 also provide usernames. UniversityTime PeriodUsersAccess Points U1Feb 2006 to Feb 2007~20K150 U2Nov 2007 to Apr 2008~30K700 4

1. Traces Analysis A trace does not provide personal information such as gender, major etc. about the users. How can we use this information classify users into groups based on gender, study major? How much information can we get from these published data sets? 5

2-I. Location Based Classification (LBC)* US university campuses have Fraternities and Sororities. Fraternities house males and sororities house females. (other campuses may have separate male and female housing or urban areas may have places that are gender biased) User association in Fraternity AP can tell us that user is Male (vice-versa for females) But what about visitors? We need filtering! 6 * Results shown are using traces from U1

2-I. Filtering Individual Behavior Based and Group behavior Based Filtering. A.Individual Behavior filter (IBF) considers the fraction of time user associates with AP’s in a building with respect to user’s total associations. B.Group Behavior filter (GBF) considers a user’s association with AP’s in a building with respect all the other users associating to same AP’s. 7

Individual Behavior based Filtering (IBF) We use two metrics based on: – Counts – Duration Consider all users visiting fraternities and sororities. Sharp drop indicates the division between two groups. (PCD/PCM >0.8 considered regular users) Users visiting Fraternity and/or Sorority in decreasing order of their Male probability (U1 feb2006) C f (u) means count of sessions in fraternity by user u C s (u) means count of sessions in sorority by user u D f (u) means duration of sessions in fraternity by user u D s (u) means duration of sessions in sorority by user u Regular Users

I-B. Group Behavior based Filtering (GBF) For using group behavior to filter we use clustering techniques. We use PAM * (Partitioning Around Mediods) algorithm. PAM provides methods for measuring clustering quality. * L. Kaufman and P. J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis. Wiley-Interscience, March

I-B. Group Behavior based Filtering We use 3 metrics for clustering: Distinct Days of login, number of session, duration of sessions Using this clustering technique we are able to distinguish between user Clustering results for University U1 Sororities (feb2006) cluster 10

I-C. Results and Hybrid Filter (HF) U1 IBFU1 GBFU1 HF Feb 2006 Oct 2006 Feb 2007 Feb 2006 Oct 2006 Feb 2007 Feb 2006 Oct 2006 Feb 2007 Total Users Males Females Common Hybrid Filter: based on intersection of results from IBF and GBF. This gives us better confidence on the results. 11

I-D. Validation of LBC To increase confidence in LBC, validation is needed. However, validation with ground truth is difficult. (mac addresses are anonymized, surveys may be incorrect, universities don’t provide data due to privacy issues) Instead, we devised 3 statistical techniques to validate LBC. (3 rd one is presented after Name Based Classification) 12

I-D-a. Temporal Consistency Classification should remain consistent over a time period (adjacent months in same sem.) If our filtering increases the consistency or similarity, it is likely classifying correctly. Month aMonth b Before filtering IBFGBFHF Feb2006Mar-Apr %87.7 % 92.7 % 92.4 % Oct 2006Nov %80.9 % 87.6 % 88.3 % Feb2007Mar-Apr %81.9 % 92.3 % 90.4 % For users visiting sororities 13

I-D-b. IBF vs GBF Both filtering techniques should capture the same set of users. Therefore, we compare the results The comparison shows more than 75% of the users are the same MonthGenderIBFGBFIBF GBF Feb 2006 Male Female Oct 2006 Male Female Feb 2007 Male Female

2-II. Name-Based Classification (NBC) In this technique with augment traces with external data. Traces from U2 provide – username. We combine usernames with publicly available phone book directory maintained by U2 to obtain names of the users. (users have an opt out option) Next we run most common male and female names obtained from US Government SSN office over the names obtained above to determine gender of the user.

2-II. Name-Based Classification (NBC) Nov 2007Apr 2008 Total Users Males (NBC) Females (NBC) Compared to NBC, LBC requires less information (username not needed) NBC provides a ways to validate LBC. The use of NBC is limited as the availability of usernames is limited to a very few currently available traces. Once we check the correctness of LBC, this can become the primary method for classification. 16

II-A. Cross Validation of LBC Compared LBC with NBC using traces from U2 Advantage is that NBC has low error rate in classification. Error in classification is calculated by FL is classification as Female by LBC MN is classification as Male by NBC MonthFLFL MNEfEf MLML FNEmEm Nov Apr ML is classification as Male by LBC FN is classification as Female by NBC Female Classification Male Classification 17

3. User Behavior Analysis Traces allow us to track a user throughout the traces. So we can track classified users throughout the campus and study their network usage behavior ! We consider – User Spatial distribution – Temporal Analysis – Device Preferences 18

3-I. Spatial Distribution U1 U2 Both universities show more females in Social Sciences and Sports buildings Both universities show more males in Economics and Engineering buildings Inconsistent trends observed at Music buildings 19

3-II. Temporal Analysis (session durations) U1 U2 On average males have longer session durations Overall time session durations are getting shorter (are users becoming more mobile?) 20

3-III. Device Preference Using Mac address one can find out the manufacturer of the devices. Our analysis at U1 shows (with statistical significance) that females prefer apple computer over PC. However, no such preference is shown at U2 for the general population. We also see that external adapter vendors like Enterasys, Linksys, D-Link have a decreasing trend in terms of number of users. Most users are getting inbuilt wifi devices. 21

U1 3-III. Device Preference Females prefer Apple computer over Intel based! 22

4. Applications Mobility Models – Incorporate effects of building context, ‘behavioral’ aspects, load (sessions duration) and density among others on correlated collective/group behavior Protocol Design – Effects of group behavior can be incorporated in protocol design for mobile networks. Privacy – The gender of the users could be inferred/extracted from anonymized traces! 23

5. Conclusion & Future Work We introduce new methods to classify users into social groups based on features like gender, study-major among others. We used our methods on traces collected from two different university campuses. The methods are able to distinguish between major differences in group behaviors (mobility, vendor pref.) Issues of privacy and anonymity arise when dealing with wireless networks traces [UH09] This study opens doors for other mobile social networking studies and profile-based service designs based on sensing the human society. 24

25 Thank you! Ahmed Helmy URL:

Appendix 26

Sororities and Fraternities NumberU1U2 Sorority713 Fraternity125 27

PAM PAM attempts to minimize dissimilarity in a cluster. Provides technique called Silhouette Widths and plot to measure quality of the clusters. The average width can be used to estimate the quality of the clustering; above 0.70 for strong clustering, between 0.50 – 0.70 for a reasonable structure and below 0.50 for weak structure All clusters we found where above.65 cluster quality. 28

No gender bias is noticed at U2 U2 3-III. Device Preference 29