Dr. Bhavani Thuraisingham September 25, 2015 Analyzing and Securing Social Media Location Mining in Social Networks.

Slides:



Advertisements
Similar presentations
Chapter 2 The Process of Experimentation
Advertisements

B2B Advertising.
Animal, Plant & Soil Science
Advertising research What makes us buy some products and not others? Why do we prefer some brands over others? Do print ads and TV commercials actually.
Location Mining from Online Social Networks
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Social Media Mining Chapter 5 1 Chapter 5, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool, September, 2010.
SOCIAL MEDIA FOR CONSUMER INSIGHT Chapter Chapter Objectives  Describe the types of data used in social media research  Explain the different.
The Role of Software Engineering Brief overview of relationship of SE to managing DSD risks 1.
2 4. But first  A bit more from Tuesday about Privacy Social Media Marketing, 2e© 2-2.
Week 9 Data Mining System (Knowledge Data Discovery)
Constructivism Constructivism — particularly in its "social" forms — suggests that the learner is much more actively involved in a joint enterprise with.
Security Models for Trusting Network Appliances From : IEEE ( 2002 ) Author : Colin English, Paddy Nixon Sotirios Terzis, Andrew McGettrick Helen Lowe.
Data Mining By Archana Ketkar.
Statement of the Problem Goal Establishes Setting of the Problem hypothesis Additional information to comprehend fully the meaning of the problem scopedefinitionsassumptions.
HRM-755 PERFORMANCE MANAGEMENT
Role and Place of Statistical Data Analysis and very simple applications Simplified diagram of a scientific research When you know the system: Estimation.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
©2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Overview of Web Data Mining and Applications Part I
BPT 3113 – Management of Technology
Norm Theory and Descriptive Translation Studies
Business Communication Research Class 1 : What is Research? Leena Louhiala-Salminen, Spring 2013.
3-1 Chapter Three. 3-2 Secondary Data vs. Primary Data Secondary Data: Data that have been gathered previously. Primary Data: New data gathered to help.
+ Hybrid Roles in Your School If not now, then when?
What’s New in Search? How destinations can leverage new search trends.
Chapter 17 Nursing Diagnosis
Tal Mor  Create an automatic system that given an image of a room and a color, will color the room walls  Maintaining the original texture.
Models of Influence in Online Social Networks
Chapter 4 Principles of Quantitative Research. Answering Questions  Quantitative Research attempts to answer questions by ascribing importance (significance)
Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.
Social Networking and On-Line Communities: Classification and Research Trends Maria Ioannidou, Eugenia Raptotasiou, Ioannis Anagnostopoulos.
Assistive Technology Clinical Outcomes Research Management System (AT-CORMS) Tool Utilizing the International Classification of Functioning (ICF) Cognitive.
Enabling Organization-Decision Making
1 Performance Evaluation of Computer Networks: Part II Objectives r Simulation Modeling r Classification of Simulation Modeling r Discrete-Event Simulation.
MARKET RESEARCH. CH. 6: MARKET RESEARCH  A business must satisfy the needs of its customers to succeed  To find out what customers need/want  Businesses.
The Marketing Research Process and Proposals
MIS – 3030 Business Technologies Social Media & Conversation Big Data.
Why Use MONAHRQ for Health Care Reporting? March 2015 Note: This is one of eight slide sets outlining MONAHRQ and its value, available at
Using Publicly Available Data 20 th Meeting Course Name: Business Intelligence Year: 2009.
CHAPTER 1 Understanding RESEARCH
Chapter Three The Marketing Environment. 3-2 Marketing Environment  Consists of actors and forces outside the organization that affect management’s ability.
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign User Profiling in Ego-network: Co-profiling Attributes and Relationships.
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
Accounting Information System. System A system is a set of parts coordinated to accomplish a set of goals. It is also an organized set of interrelated.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
Problem Reduction So far we have considered search strategies for OR graph. In OR graph, several arcs indicate a variety of ways in which the original.
Early Childhood Special Education. Dunst model interest engagement competence mastery.
Maintaining and Sustaining System Integrity Configuration Management for Transportation Management Systems Configuration management (CM) describes a series.
Market Analysis Glencoe Entrepreneurship: Building a Business Doing Market Research Industry and Market Analysis 6.1 Section 6.2 Section 6 6.
The Rules of Sociological Method, Durkheim (1895) Tamara Sole Clark Backus HOL 8100 Organizational Culture.
Chapter 9 The People in Information Systems. Learning Objectives Upon successful completion of this chapter, you will be able to: Describe each of the.
Refined Online Citation Matching and Adaptive Canonical Metadata Construction CSE 598B Course Project Report Huajing Li.
Don’t Follow me : Spam Detection in Twitter January 12, 2011 In-seok An SNU Internet Database Lab. Alex Hai Wang The Pensylvania State University International.
Market Analysis 1 To ensure success, the entrepreneur needs to understand the industry and the market. He or she should define areas of analysis and conduct.
Ch. 6: Market Research A business must satisfy the __________of its customers to succeed To find out what customers need/want – Businesses conduct __________.
Sybil Attacks VS Identity Clone Attacks in Online Social Networks Lei Jin, Xuelian Long, Hassan Takabi, James B.D. Joshi School of Information Sciences.
Dr. Bea Bourne 1. 2 If you have any trouble in seminar, please do call Tech Support at: They can assist if you get “bumped” from the seminar.
Author name here for Edited books Chapter 4 Inclusion Concepts, Processes, and Models 4 chapter Terry Long and Terry Robertson.
Nearest Neighbour and Clustering. Nearest Neighbour and clustering Clustering and nearest neighbour prediction technique was one of the oldest techniques.
May 9th, 2015 Market Research Describe the purpose of marketing research.
MGT301 Principles of Marketing Lecture-12. Summary of Lecture-11.
A Self-organizing Semantic Map for Information Retrieval Xia Lin, Dagobert Soergel, Gary Marchionini presented by Yi-Ting.
The Scientific Method. Scientifically Solving a Problem Observe Define a Problem Review the Literature Observe some More Develop a Theoretical Framework.
JANI AARTI En No:  By the end of this lecture, students should be able to: 1.Explain the functions of management 2.Define and explain strategy.
Data mining in web applications
E-Commerce Theories & Practices
Location Recommendation — for Out-of-Town Users in Location-Based Social Network Yina Meng.
Biological Science Applications in Agriculture
Presentation transcript:

Dr. Bhavani Thuraisingham September 25, 2015 Analyzing and Securing Social Media Location Mining in Social Networks

Outline l Location Mining l Patented Algorithms - Tweethood - Tweecalization - Tweeque

Importance of Location Mining l The advances in location-acquisition and mobile communication technologies empower people to use location data with existing online social networks. l The knowledge of location allows the user to expand his or her current social network, explore new places to eat, etc. l Just like time, location is one of the most important components of user context, and further analysis can reveal more information about an individual’s interests, behaviors, and relationships with others. l Three Uses: Privacy and Security, Trustworthiness, Marketing

Privacy and Security l Location privacy is the ability of an individual to move in public space with the expectation that under normal circumstances their location will not be systematically and secretly recorded for later use. l Many people apart from friends and family are interested in the information users post on social networks. - This includes identity thieves, stalkers, debt collectors, con artists, and corporations wanting to know more about the consumers. l Once collected, this sensitive information can be left vulnerable to access by the government and third parties. And unfortunately, the existing laws give more emphasis to the financial interests of corporations than to the privacy of consumers.

Trustworthiness l Trustworthiness is another reason which makes location discovery so important. l It is well-known that social media had a big role to play in the revolutionary wave of demonstrations and protests occurring in the Arab world termed as the “Arab Spring” to accelerate social protest. l The Department of State has effectively used social networking sites to gauge the sentiments within societies. l Maintaining a social media presence in deployed locations also allows commanders to understand potential threats and emerging trends within the regions. l The online community can provide a good indicator of prevailing moods and emerging issues. l Many of the vocal opposition groups will likely use social media to air grievances publicly. l In such cases and others similar to these, it becomes very important for organizations (like the US State Department) to be able to verify the correct location of the users posting these messages.

Marketing l Impact of social media in marketing and garnering feedback from consumers. First social media facilitates marketers to communicate with peers and customers (both current and future). l It provides significantly more visibility for the company or the product and helps you to spread your message in a relaxed and conversational way. l The second major contribution of social media towards business is for getting feedback from users. l Social media gives you the ability to get the kind of quick feedback inbound marketers require to stay agile. l Large corporations from Wal-Mart to Starbucks are leveraging social networks beyond your typical posts and updates to get feedback on the quality of their products and services, especially ones that have been recently launched on Twitter.

Tweethood l Tweethood is an algorithm for Agglomerative Clustering on Fuzzy k- Closest Friends with Variable Depth. Graph-related approaches are the methods that rely on the social graph of the user while deciding on the location of the user. In this chapter, we describe three such methods that show the evolution of the algorithm currently used in Tweethood. l Each node in the graph represents a user and an edge represents friendship. The root represents the user U whose location is to be determined, and the F 1, F 2,…, F n represents the n friends of the user. Each friend can have his or her own network, like F 2 has a network comprising of m friends F 21, F 22,…., F 2m.

Naïve Approach l A naïve approach for solving the location identification problem would be to take simple majority on the locations of friends (followers and following) and assign it as the label of the user. l Since a majority of friends will not contain a location explicitly, we can go further into exploring the social network of the friend (friend of a friend). l For example, if the location of Friend F 2 is not known, instead of labeling it as null, we can go one step further and use F 2 ’s friends in choosing the label for it. It is important to note here that each node in the graph will have just one label (single location) here.

K- Closest Friends with Variable Depth l As Twitter has a high majority of users with public profiles, a user has little control over the people following him or her. In such cases, considering spammers, marketing agencies, etc., while deciding on the user’s location can lead to inaccurate results. Additionally, it is necessary to distinguish the influence of each friend while deciding the final location. We further modify this approach and just consider the k closest friends of the user. l Closeness among two people is a subjective term and we can implement it in several ways including number of common friends, semantic relatedness between the activities (verbs) of the two users collected from the messages posted by each one of them, etc. Based on the experiments we conducted, we adopted the number of common friends as the optimum choice because of the low time complexity and better accuracy.

Fuzzy_k_Closest_Friends l The idea behind the Fuzzy k closest friends with variable depth is the fact that each node of the social graph is assigned multiple locations of which each is associated with a certain probability. And these labels get propagated throughout the social network; no locations are discarded whatsoever. At each level of depth of the graph, the results are aggregated and boosted similar to the previous approaches so as to maintain a single vector of locations with their probabilities.

Tweecalization l Graph-related approaches are the methods that rely on the social graph of the user while deciding on the location of the user. As observed earlier, the location data of users on social networks is a rather scarce resource and only available to a small portion of the users. l This creates a need for a methodology that makes use of both labeled and unlabeled data for training. In this case, the location concept serves the purpose of class label. l Therefore, our problem is a classic example for the application of semi-supervised learning algorithms. In this chapter, we propose a semi-supervised learning method for label propagation

Label Propagation l The labeled propagation algorithm is based on transductive learning. l In this environment, the dataset is divided into two sets. l One is the training set, consisting of the labeled data. l On the basis of this labeled data, we try to predict the class for the second set, called the test or validation data consisting of unlabeled data.

Trustworthiness and Similarity Measure l The single most important thing is the way we define similarity (or distance) between two data points or, in this case, users. l We introduce the notion of trustworthiness for two specific reasons. First, we want to differentiate between various friends when propagating the labels to the central user and second, to implicitly take into account the social phenomenon of migration and thus provide for a simple yet intelligent way of defining similarity between users. l Trustworthiness (TW) is defined as the fraction of friends which have the same label as the user himself. So, if a user, John Smith, mentions his location to be Dallas, Texas and 15 out of his 20 friends are from Dallas, we say that the trustworthiness of John is 15/20=0.75. l It is worthwhile to note here that users who have lived all their lives at a single city will have a large percentage of their friends from the same city and hence will have a high trustworthiness value. On the other hand, someone who has lived in several places will have a social graph consisting of people from all over and hence such a user should have little say when propagating labels to users with unknown locations. For users without a location, TW is zero.

Trustworthiness and Similarity Measure l Friendship similarity among two people is a subjective term and we can implement it in several ways including number of common friends, semantic relatedness between the activities (verbs) of the two users collected from the messages posted by each one of them l Based on the experiments we conducted, we adopted the number of common friends as the optimum choice because of the low time complexity and better accuracy.

Tweeque l People migrate from city to city, state to state and country to country all the time. l Therefore our algorithms may be impacted by such migration. That is, how does one extract the location of a person when he or his friends may be continually migrating? l Towards this end we have proposed a set of algorithms that we call Tweeque. l That is, Tweeque takes into account the migration effect. In particular, it identifies social cliques for location mining.

Agglomerative Clustering l Labeling algorithms treats the concepts purely as labels, with no mutual relatedness. Since the concepts are actual geographical cities, we agglomerate the closely located cities and suburbs in an effort to improve the confidence and thus the accuracy of the system. l We use the concept of Location Confidence Threshold (LCT). The idea behind LCT is to ensure that when the algorithm reports the possible locations, it does so with some minimum level of confidence. LCT depends on the user itself. The LCT increases with the increasing number of friends for the user, because more friends imply more labeled data.

Directions l Different Algorithms for Location Mining l Other Demographics: Age, Gender, etc. l Develop systems with real-world applications