Bipin Shetty Santosh Kalyankrishnan. Project Thesis In this project, we have analyzed information gathered from social networks to understand the nature.

Slides:



Advertisements
Similar presentations
The Hierarchical Model
Advertisements

Introduction Goal of this work is to better understand Guelph’s 2007 LibQUAL+ comments (in aggregate), within the context of the quantitative findings.
Assessment. Schedule graph may be of help for selecting the best solution Best solution corresponds to a plateau before a high jump Solutions with very.
Data analysis and interpretation. Agenda Part 2 comments – Average score: 87 Part 3: due in 2 weeks Data analysis.
Absorbing Random walks Coverage
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Friend Recommendations in Social Networks using Genetic Algorithms and Network Topology Jeff Naruchitparames, Mehmet Gunes, Sushil J. Louis University.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Introduction to Quality Engineering
Chapter 4 Probability Distributions
Measurement and Analysis of Online Social Networks By Alan Mislove, Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, Bobby Bhattacharjee Attacked.
Computing Trust in Social Networks
Social Networking.
Lecture Slides Elementary Statistics Twelfth Edition
Network A/B Testing: From Sampling to Estimation
Chapter 4 Database Management Systems. Chapter 4Slide 2 What is a Database Management System (DBMS)?  Database An organized collection of related data.
The Bell Shaped Curve By the definition of the bell shaped curve, we expect to find certain percentages of the population between the standard deviations.
March  There is a maximum of one obtuse angle in a triangle, but can you prove it?  To prove something like this, we mathematicians must do a.
5-2 Probability Distributions This section introduces the important concept of a probability distribution, which gives the probability for each value of.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
Chapter 2 Modeling and Finding Abnormal Nodes. How to define abnormal nodes ? One plausible answer is : –A node is abnormal if there are no or very few.
Modeling Relationship Strength in Online Social Networks Rongjing Xiang: Purdue University Jennifer Neville: Purdue University Monica Rogati: LinkedIn.
Suggesting Friends using the Implicit Social Graph Maayan Roth et al. (Google, Inc., Israel R&D Center) KDD’10 Hyewon Lim 1 Oct 2014.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Review and Preview This chapter combines the methods of descriptive statistics presented in.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Chapter 5 Discrete Probability Distributions 5-1 Review and Preview 5-2.
Slide 1 Copyright © 2004 Pearson Education, Inc..
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
14 Elements of Nonparametric Statistics
IIT BOMBAYIDP in Educational Technology * Paper Planning Template Resource – Paper-Planning-Template(SPT)Version 1.0, Dec 2013 Download from:
DATA MINING LECTURE 13 Absorbing Random walks Coverage.
DETECTING SPAMMERS AND CONTENT PROMOTERS IN ONLINE VIDEO SOCIAL NETWORKS Fabrício Benevenuto ∗, Tiago Rodrigues, Virgílio Almeida, Jussara Almeida, and.
Hotspot Detection in a Service Oriented Architecture Pranay Anchuri,
Outline Introduction Descriptive Data Summarization Data Cleaning Missing value Noise data Data Integration Redundancy Data Transformation.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
Copyright © 2012 by Nelson Education Limited. Chapter 6 Estimation Procedures 6-1.
ICOM 6115: Computer Systems Performance Measurement and Evaluation August 11, 2006.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
Statistics in Biology. Histogram Shows continuous data – Data within a particular range.
Appraisal and Its Application to Counseling COUN 550 Saint Joseph College For Class # 3 Copyright © 2005 by R. Halstead. All rights reserved.
User Interactions in Social Networks and their Implications Christo Wilson, Bryce Boe, Alessandra Sala, Krishna P. N. Puttaswamy, Ben Y. Zhao (UC Santa.
Concept Switching Azadeh Shakery. Concept Switching: Problem Definition C1C2Ck …
Extracting binary signals from microarray time-course data Debashis Sahoo 1, David L. Dill 2, Rob Tibshirani 3 and Sylvia K. Plevritis 4 1 Department of.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
19 th International Conference on Information Visualization IEEE, 2015 Hyoji Ha, Wonjoo Hwang, Sungyun Bae, Hanmin Choi, Hyunwoo Han, Gi-nam Kim, Kyungwon.
INTRODUCTORY LECTURE 3 Lecture 3: Analysis of Lab Work Electricity and Measurement (E&M)BPM – 15PHF110.
PCB 3043L - General Ecology Data Analysis.
Tagging Systems and Their Effect on Resource Popularity Austin Wester.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Chi-Square Analysis AP Biology.
Copyright © 2011, 2005, 1998, 1993 by Mosby, Inc., an affiliate of Elsevier Inc. Chapter 19: Statistical Analysis for Experimental-Type Research.
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
COGNITIVE NETWORK ACCESS USING FUZZY DECISION MAKING Nicola Baldo and Michele Zorzi Department of Information Engineering – University of Padova, Italy.
AP/H SCIENCE SKILLS: SPREADSHEETS & SIG FIG Suggested summer work for incoming students.
Measurements and Their Analysis. Introduction Note that in this chapter, we are talking about multiple measurements of the same quantity Numerical analysis.
Topical Analysis and Visualization of (Network) Data Using Sci2 Ted Polley Research & Editorial Assistant Cyberinfrastructure for Network Science Center.
+ Mortality. + Starter for 10…. In pairs write on a post it note: One statistic that we use to measure mortality On another post it note write down: A.
Investigate Plan Design Create Evaluate (Test it to objective evaluation at each stage of the design cycle) state – describe - explain the problem some.
XRANK: RANKED KEYWORD SEARCH OVER XML DOCUMENTS Lin Guo Feng Shao Chavdar Botev Jayavel Shanmugasundaram Abhishek Chennaka, Alekhya Gade Advanced Database.
Factors affecting Coca-Cola’s consumer satisfaction in UAE BY ANSH & HUMAID.
Alan Mislove Bimal Viswanath Krishna P. Gummadi Peter Druschel.
GUILLOU Frederic. Outline Introduction Motivations The basic recommendation system First phase : semantic similarities Second phase : communities Application.
Clustering and Curvature in a Network Our aim in this project is to use the intrinsic geometry of a network, to cluster it's nodes.
Social Networking sites and Indian caste system
Item-to-Item Recommender Network Optimization
Applied Fieldwork Enquiry
Summary Presented by : Aishwarya Deep Shukla
Assortativity (people associate based on common attributes)
Algorithms Lecture # 27 Dr. Sohail Aslam.
A framework for ontology Learning FROM Big Data
Presentation transcript:

Bipin Shetty Santosh Kalyankrishnan

Project Thesis In this project, we have analyzed information gathered from social networks to understand the nature of the bias, if any. We aim to look at preference in making friend linkages among various Orkut users to figure out if there is a preference with respect to caste and language and to what extent. We have also calculated bias on various cities on above criteria. We have a large amount of Orkut data e.g Names,friends links provided to us, which we will use to mine various information and metrics. Based on these metrics, we hope to derive conclusions on the degree of bias existing.

Milestones completed  We did lot of data gathering on identifying caste name, language and associated last names.  We were able to identify 616 frequently occurring last names, their caste, religion, language associated. We have stored above information in XML format with respect to tags  We have processed last names provided in our data, compare with last name listing of our listing and identify the caste, language,parentCaste of each individual using mysql scripts. We will then insert those data into a table that identify user profile, caste name, language, location.

Milestones completed  We were able to indentify the links between caste(intracaste)/intra- languange/intra-ParentCaste and links outside caste(Inter-caste)/Inter- language/inter-ParentCaste. Calculation of Modularity : We have used the formula Q = (e(ii) − a 2 (i) ) Modularity is then a measure of the fraction of intra-community edges minus the expected value of the same quantity in a network with the same community divisions, but with edges placed without regard to communities. Modularity therefore ranges from -1 to 1, with 0 representing no more community structure than would be expected in a random graph, and significantly positive values representing the presence of strong community structure.

Accomplishment  We were able to identify caste/language/parentcaste of about 25% profiles.  Calculated bias on caste, language, parentCaste using above modularity algorithm. AllBombayHyderabad Sub-Caste Parent Caste Language

Interpretation and Conclusion We find strong bias towards parent Caste in making friends in orkut social network. This is attributed to the fact that only 2 major castes find maximum occurrence. We can conclude that language is a significant criteria for making friends in orkut. We also find strong bias in making friends with respect to sub-caste. Our finding also points stronger bias in caste and language in non-cosmopolitan cities like Hyderabad in contrast to metropolitan and multilingual cities like Bombay.

Next Milestone Calculate the bias on the 3 parameters for few more cities to understand the distribution. Alan has also suggested to run an algorithm to find strong community structure in our data. We would then calculate the bias with in the community structure

Tradeoffs and bottlenecks  Many orkut user names were not crawled so we will not be able to properly identify caste.  Some orkut users don’t have lastname, also last name for many don’t map to a caste.

Any Questions?