Download presentation
Presentation is loading. Please wait.
Published byCory Campbell Modified over 9 years ago
1
Bipin Shetty Santosh Kalyankrishnan
2
Project Thesis In this project, we have analyzed information gathered from social networks to understand the nature of the bias, if any. We aim to look at preference in making friend linkages among various Orkut users to figure out if there is a preference with respect to caste and language and to what extent. We have also calculated bias on various cities on above criteria. We have a large amount of Orkut data e.g Names,friends links provided to us, which we will use to mine various information and metrics. Based on these metrics, we hope to derive conclusions on the degree of bias existing.
3
Milestones completed We did lot of data gathering on identifying caste name, language and associated last names. We were able to identify 616 frequently occurring last names, their caste, religion, language associated. We have stored above information in XML format with respect to tags We have processed last names provided in our data, compare with last name listing of our listing and identify the caste, language,parentCaste of each individual using mysql scripts. We will then insert those data into a table that identify user profile, caste name, language, location.
4
Milestones completed We were able to indentify the links between caste(intracaste)/intra- languange/intra-ParentCaste and links outside caste(Inter-caste)/Inter- language/inter-ParentCaste. Calculation of Modularity : We have used the formula Q = (e(ii) − a 2 (i) ) Modularity is then a measure of the fraction of intra-community edges minus the expected value of the same quantity in a network with the same community divisions, but with edges placed without regard to communities. Modularity therefore ranges from -1 to 1, with 0 representing no more community structure than would be expected in a random graph, and significantly positive values representing the presence of strong community structure.
5
Accomplishment We were able to identify caste/language/parentcaste of about 25% profiles. Calculated bias on caste, language, parentCaste using above modularity algorithm. AllBombayHyderabad Sub-Caste 0.240.140.26 Parent Caste 0.550.36 0.55 Language 0.430.320.43
6
Interpretation and Conclusion We find strong bias towards parent Caste in making friends in orkut social network. This is attributed to the fact that only 2 major castes find maximum occurrence. We can conclude that language is a significant criteria for making friends in orkut. We also find strong bias in making friends with respect to sub-caste. Our finding also points stronger bias in caste and language in non-cosmopolitan cities like Hyderabad in contrast to metropolitan and multilingual cities like Bombay.
7
Next Milestone Calculate the bias on the 3 parameters for few more cities to understand the distribution. Alan has also suggested to run an algorithm to find strong community structure in our data. We would then calculate the bias with in the community structure
8
Tradeoffs and bottlenecks Many orkut user names were not crawled so we will not be able to properly identify caste. Some orkut users don’t have lastname, also last name for many don’t map to a caste.
9
Any Questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.