Download presentation
Presentation is loading. Please wait.
Published byAlice May Rogers Modified over 9 years ago
1
Extract: Mining Social Features from WLAN Traces A Gender-Based Case Study Udayan Kumar and Ahmed Helmy Computer and Information Sciences and Engineering, University of Florida, Gainesville ukumar, helmy @cise.ufl.edu http://www.cise.ufl.edu/~helmy 1
2
Introduction Mobile service more ubiquitous, human centric – Devices (e.g., mobile phones) as sensors, and the human society as the sensor network Challenge: to realistically model social behavior for the simulation, evaluation and design of future mobile networks Approach: capture and analyze extensive network traces – How much information can be extracted from the traces?!! We present, as a first step, systematic methods to classify WLAN users into groups based on WLAN usage traces into social clusters using features like gender, study major 2
3
Outline 1.Traces Used 2.Classification Methods I.Location Based Classification A.Individual Behavior Based Filter B.Group Behavior Based Filter C.Hybrid Filter D.Validation of Location Based Method II.Name Based Classification A.Validation of Location Based Method via Name Based Classification 3.User Behavior Analysis I.Spatial Distribution II.Temporal Distribution III.Device Preferences 4.Applications 5.Conclusion and Future Work 3
4
1. Traces Analyzed Consider WLAN association traces from 2 University campus – U1 and U2 (names omitted for privacy) Traces provide following information – MAC, Association Start time, Duration, Location/AP names. Traces from U2 also provide usernames. UniversityTime PeriodUsersAccess Points U1Feb 2006 to Feb 2007~20K150 U2Nov 2007 to Apr 2008~30K700 4
5
1. Traces Analysis A trace does not provide personal information such as gender, major etc. about the users. How can we use this information classify users into groups based on gender, study major? How much information can we get from these published data sets? 5
6
2-I. Location Based Classification (LBC)* US university campuses have Fraternities and Sororities. Fraternities house males and sororities house females. (other campuses may have separate male and female housing or urban areas may have places that are gender biased) User association in Fraternity AP can tell us that user is Male (vice-versa for females) But what about visitors? We need filtering! 6 * Results shown are using traces from U1
7
2-I. Filtering Individual Behavior Based and Group behavior Based Filtering. A.Individual Behavior filter (IBF) considers the fraction of time user associates with AP’s in a building with respect to user’s total associations. B.Group Behavior filter (GBF) considers a user’s association with AP’s in a building with respect all the other users associating to same AP’s. 7
8
Individual Behavior based Filtering (IBF) We use two metrics based on: – Counts – Duration Consider all users visiting fraternities and sororities. Sharp drop indicates the division between two groups. (PCD/PCM >0.8 considered regular users) Users visiting Fraternity and/or Sorority in decreasing order of their Male probability (U1 feb2006) C f (u) means count of sessions in fraternity by user u C s (u) means count of sessions in sorority by user u D f (u) means duration of sessions in fraternity by user u D s (u) means duration of sessions in sorority by user u Regular Users
9
I-B. Group Behavior based Filtering (GBF) For using group behavior to filter we use clustering techniques. We use PAM * (Partitioning Around Mediods) algorithm. PAM provides methods for measuring clustering quality. * L. Kaufman and P. J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis. Wiley-Interscience, March 1990. 9
10
I-B. Group Behavior based Filtering We use 3 metrics for clustering: Distinct Days of login, number of session, duration of sessions Using this clustering technique we are able to distinguish between user Clustering results for University U1 Sororities (feb2006) cluster 10
11
I-C. Results and Hybrid Filter (HF) U1 IBFU1 GBFU1 HF Feb 2006 Oct 2006 Feb 2007 Feb 2006 Oct 2006 Feb 2007 Feb 2006 Oct 2006 Feb 2007 Total Users164162240520302164162240520302164162240520302 Males506553545451437417416418399 Females513 570509441456410435453406 Common000223729000 Hybrid Filter: based on intersection of results from IBF and GBF. This gives us better confidence on the results. 11
12
I-D. Validation of LBC To increase confidence in LBC, validation is needed. However, validation with ground truth is difficult. (mac addresses are anonymized, surveys may be incorrect, universities don’t provide data due to privacy issues) Instead, we devised 3 statistical techniques to validate LBC. (3 rd one is presented after Name Based Classification) 12
13
I-D-a. Temporal Consistency Classification should remain consistent over a time period (adjacent months in same sem.) If our filtering increases the consistency or similarity, it is likely classifying correctly. Month aMonth b Before filtering IBFGBFHF Feb2006Mar-Apr 200672.3 %87.7 % 92.7 % 92.4 % Oct 2006Nov 200666.8 %80.9 % 87.6 % 88.3 % Feb2007Mar-Apr 200770.3 %81.9 % 92.3 % 90.4 % For users visiting sororities 13
14
I-D-b. IBF vs GBF Both filtering techniques should capture the same set of users. Therefore, we compare the results The comparison shows more than 75% of the users are the same MonthGenderIBFGBFIBF GBF Feb 2006 Male506451416 Female513441435 Oct 2006 Male553437418 Female570451454 Feb 2007 Male545417399 Female529410406 14
15
2-II. Name-Based Classification (NBC) In this technique with augment traces with external data. Traces from U2 provide – username. We combine usernames with publicly available phone book directory maintained by U2 to obtain names of the users. (users have an opt out option) Next we run most common male and female names obtained from US Government SSN office over the names obtained above to determine gender of the user.
16
2-II. Name-Based Classification (NBC) Nov 2007Apr 2008 Total Users2706829982 Males (NBC)52455807 Females (NBC)59556817 Compared to NBC, LBC requires less information (username not needed) NBC provides a ways to validate LBC. The use of NBC is limited as the availability of usernames is limited to a very few currently available traces. Once we check the correctness of LBC, this can become the primary method for classification. 16
17
II-A. Cross Validation of LBC Compared LBC with NBC using traces from U2 Advantage is that NBC has low error rate in classification. Error in classification is calculated by FL is classification as Female by LBC MN is classification as Male by NBC MonthFLFL MNEfEf MLML FNEmEm Nov 20071280740.058334250.074 Apr 200816901230.072349290.083 ML is classification as Male by LBC FN is classification as Female by NBC Female Classification Male Classification 17
18
3. User Behavior Analysis Traces allow us to track a user throughout the traces. So we can track classified users throughout the campus and study their network usage behavior ! We consider – User Spatial distribution – Temporal Analysis – Device Preferences 18
19
3-I. Spatial Distribution U1 U2 Both universities show more females in Social Sciences and Sports buildings Both universities show more males in Economics and Engineering buildings Inconsistent trends observed at Music buildings 19
20
3-II. Temporal Analysis (session durations) U1 U2 On average males have longer session durations Overall time session durations are getting shorter (are users becoming more mobile?) 20
21
3-III. Device Preference Using Mac address one can find out the manufacturer of the devices. Our analysis at U1 shows (with statistical significance) that females prefer apple computer over PC. However, no such preference is shown at U2 for the general population. We also see that external adapter vendors like Enterasys, Linksys, D-Link have a decreasing trend in terms of number of users. Most users are getting inbuilt wifi devices. 21
22
U1 3-III. Device Preference Females prefer Apple computer over Intel based! 22
23
4. Applications Mobility Models – Incorporate effects of building context, ‘behavioral’ aspects, load (sessions duration) and density among others on correlated collective/group behavior Protocol Design – Effects of group behavior can be incorporated in protocol design for mobile networks. Privacy – The gender of the users could be inferred/extracted from anonymized traces! 23
24
5. Conclusion & Future Work We introduce new methods to classify users into social groups based on features like gender, study-major among others. We used our methods on traces collected from two different university campuses. The methods are able to distinguish between major differences in group behaviors (mobility, vendor pref.) Issues of privacy and anonymity arise when dealing with wireless networks traces [UH09] This study opens doors for other mobile social networking studies and profile-based service designs based on sensing the human society. 24
25
25 Thank you! Ahmed Helmy helmy@ufl.edu URL: www.cise.ufl.edu/~helmy helmy@ufl.edu
26
Appendix 26
27
Sororities and Fraternities NumberU1U2 Sorority713 Fraternity125 27
28
PAM PAM attempts to minimize dissimilarity in a cluster. Provides technique called Silhouette Widths and plot to measure quality of the clusters. The average width can be used to estimate the quality of the clustering; above 0.70 for strong clustering, between 0.50 – 0.70 for a reasonable structure and below 0.50 for weak structure All clusters we found where above.65 cluster quality. 28
29
No gender bias is noticed at U2 U2 3-III. Device Preference 29
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.