Presentation is loading. Please wait.

Presentation is loading. Please wait.

Location Clustering Peter Kamm Marcel Flores Peter Kamm Marcel Flores.

Similar presentations


Presentation on theme: "Location Clustering Peter Kamm Marcel Flores Peter Kamm Marcel Flores."— Presentation transcript:

1 Location Clustering Peter Kamm Marcel Flores Peter Kamm Marcel Flores

2 The Data Set  Sessions  Contains a collection of connections over the course of a week  User ID, Start time, stop time, Tower ID  25 million lines!  Sessions  Contains a collection of connections over the course of a week  User ID, Start time, stop time, Tower ID  25 million lines!

3 ...a little more  A tower location mapping  Tower ID, Longitude, Latitude, Zip Code  Allows us to map to a real world location  Data set is not complete  There are many towers we do not have a location for  A tower location mapping  Tower ID, Longitude, Latitude, Zip Code  Allows us to map to a real world location  Data set is not complete  There are many towers we do not have a location for

4 Applications  Load balancing on the cell-phone networks themselves  Social Networking  Integrate online social networks with the real world  Accounts for mobility and usage patterns  Load balancing on the cell-phone networks themselves  Social Networking  Integrate online social networks with the real world  Accounts for mobility and usage patterns

5 Analysis  See which locations are active at what times  Where do people congregate?  How strongly do they congregate?  Does the locations affect their usage  Connection Duration  How does this map out into the physical world?  See which locations are active at what times  Where do people congregate?  How strongly do they congregate?  Does the locations affect their usage  Connection Duration  How does this map out into the physical world?

6 Day and Night Hotspots  Now uses a proper qualitative metric  Looks at all ratio of day to night (or night to day, depending on which is larger)  Rejected locations with <100 day or night sessions  Gives us a number >1 to rank strength of location  Daytime is defined as 4am to 4pm  Day has more “very strong” hotspots  Now uses a proper qualitative metric  Looks at all ratio of day to night (or night to day, depending on which is larger)  Rejected locations with <100 day or night sessions  Gives us a number >1 to rank strength of location  Daytime is defined as 4am to 4pm  Day has more “very strong” hotspots

7 Day and Night Ranks Top Day Hotspots Tower IDDay HitsNight HitsRatio 1571038421149.213 661387333940.923 1246349214623.918 1361025513867.399 Top Night Hotspots Tower IDDay HitsNight HitsRatio 64451679375.611 241439720185.083 103161225934.861 89101165634.853

8 Strength Distribution Day - 4,479 total Night - 10,812 total

9 Day and Night Plots

10 Day/Night Durations

11 Day Avg Durations

12 Durations  Day/night hotspots tend to exhibit similar patterns of usage  Longest connections during morning/evening commute  Urban towers get longer connections in mornings, residential neighborhoods get longer connections in evenings  Day/night hotspots tend to exhibit similar patterns of usage  Longest connections during morning/evening commute  Urban towers get longer connections in mornings, residential neighborhoods get longer connections in evenings

13 Physical Locations  Have to be done by hand, smaller sample  Incomplete, do not have locations for all towers  For the highest ranked locations  Sadly the top 4 shown previously not in location data set!  In fact, none of the high-ratio day or night spots appear (until down to a ratio of <2)!  Have to be done by hand, smaller sample  Incomplete, do not have locations for all towers  For the highest ranked locations  Sadly the top 4 shown previously not in location data set!  In fact, none of the high-ratio day or night spots appear (until down to a ratio of <2)!

14 Some Locations…  Tower 79 - Night Tower, 1.255 ratio  Located in Englewood  Residential  South Chicago  Not very strong ratio  Tower 79 - Night Tower, 1.255 ratio  Located in Englewood  Residential  South Chicago  Not very strong ratio

15 Tracing a User  Turns out, the data set was (maybe) rich enough to provide information on a per user level!  Followed the first 5000 users in the data set, ranked them based on activity  Considered the busiest (by hand)  Compared to day/night ratio of each location  Turns out, the data set was (maybe) rich enough to provide information on a per user level!  Followed the first 5000 users in the data set, ranked them based on activity  Considered the busiest (by hand)  Compared to day/night ratio of each location

16 Tracing a User: Results  User 1:  Busiest at tower 24 (20,729)  Night tower with a 2.339 ratio But the user accounts for over 99% of the tower traffic!  2nd Busiest at tower 1197 (3,660)  Night tower with a 1.528 ratio  Again accounts for 99% of traffic!  User 1:  Busiest at tower 24 (20,729)  Night tower with a 2.339 ratio But the user accounts for over 99% of the tower traffic!  2nd Busiest at tower 1197 (3,660)  Night tower with a 1.528 ratio  Again accounts for 99% of traffic!

17 Tracing a User: Results  User 5:  Busiest at tower 258 (7,449)  Night tower with a 1.711 ratio (75% of traffic!)  No location data  2nd Busiest at tower 309 (5,773)  Night tower with a 1.765 ratio (only 60%…)  Residential, Longview, Washington  User 5:  Busiest at tower 258 (7,449)  Night tower with a 1.711 ratio (75% of traffic!)  No location data  2nd Busiest at tower 309 (5,773)  Night tower with a 1.765 ratio (only 60%…)  Residential, Longview, Washington

18 Tracing a User: Results  Had to go to user 113 to get a more reasonable user  Busiest at tower 100 (1,602)  Night tower at 1.207 ratio  Not an unreasonable amount of traffic  Solon, Iowa  Second busiest at 5045 (602)  Night tower at 2.004  Had to go to user 113 to get a more reasonable user  Busiest at tower 100 (1,602)  Night tower at 1.207 ratio  Not an unreasonable amount of traffic  Solon, Iowa  Second busiest at 5045 (602)  Night tower at 2.004

19 Single-user Durations

20 Hotspots  Looking at certain user traces, seemed that certain users seemed to use the busiest towers the most  So are the busiest towers really seeing a lot of users, or a few very busy users?  Analyzed the numbers of unique users that a tower sees in a day  Looking at certain user traces, seemed that certain users seemed to use the busiest towers the most  So are the busiest towers really seeing a lot of users, or a few very busy users?  Analyzed the numbers of unique users that a tower sees in a day

21 Unique User Data  Count how many users a specific tower sees over the duration  Allows us to give an alternate ranking of the tower traffic  Easily ignore points where a single user accounts for the majority of a towers traffic  Actual data is forthcoming…  Count how many users a specific tower sees over the duration  Allows us to give an alternate ranking of the tower traffic  Easily ignore points where a single user accounts for the majority of a towers traffic  Actual data is forthcoming…


Download ppt "Location Clustering Peter Kamm Marcel Flores Peter Kamm Marcel Flores."

Similar presentations


Ads by Google