SOCIAL NETWORKS ANALYSIS SEMINAR INTRODUCTORY LECTURE #2 Danny Hendler and Yehonatan Cohen Advanced Topics in on-line Social Networks Analysis.

SOCIAL NETWORKS ANALYSIS SEMINAR INTRODUCTORY LECTURE #2 Danny Hendler and Yehonatan Cohen Advanced Topics in on-line Social Networks Analysis

Introductory lecture #1 5/3/14 No seminar (Purim!) Semester ends 12/3/14 Introductory lecture #2 Papers list published, students send their 3 preferences 14/3/14 11 weeks of Student talks 19/3/14 Student talks start All students preferences must be received 10/3/14 26/3/14 Seminar schedule

 Nodes centrality  Degree  Closeness  Betweenness  Machine-learning Talk outline

 Name the most central/significant node: 1 2 3 4 56 78 9 10 11 12 13 Nodes centrality

 Name the most central/significant node: 1 2 34 56 7 8 9 10 11 12 13 Nodes centrality

 What makes a node central?  Number of connections  It is central if it disconnects the graph  High number of paths passing through the node  Proximity to all other nodes  Central node is the one whose neighbors are central  …

 Detection of the most popular actor in a network  Spamming / Advertising  Network vulnerability  Health care / Epidemics  Clustering similar structural positions  Recommendation systems  … Nodes centrality: Applications

Nodes centrality: Degree

 Name the most central/significant node: 1 2 3 4 5 6 7 8 9 Nodes centrality: Degree

1 2 34 56 7 8 9 10 11 12 13 DegreeNode 44 36 37 38 39 310 211 212

Nodes centrality: Closeness (Reach)

1 2 34 56 7 8 9 10 11 12 13 ReachDegreeNode 5.8444 5.9336 6.1237 5.7538 5.2539 5.18310 211 212

Nodes centrality: Betweenness

Nodes centrality: Beetweenness 1 2 34 5678 9 10 11 12 13 BetweennessReachNode 605.844 785.936 726.127 435.758 155.259 415.1810 11 12

 Nodes centrality  Machine Learning  The learning process  Classification  Evaluation Talk outline

 Herbert Alexander Simon: “Learning is any process by which a system improves performance from experience.”  “Machine Learning is concerned with computer programs that automatically improve their performance through experience. “ Herbert Simon Turing AwardTuring Award 1975 Nobel Prize in Economics 1978 Nobel Prize in Economics Machine Learning

 Learning = Improving with experience at some task  Improve over task T,  With respect to performance measure, P  Based on experience, E. Herbert Simon Turing AwardTuring Award 1975 Nobel Prize in Economics 1978 Nobel Prize in Economics Machine Learning

Example: Spam Filtering  T: Identify Spam Emails  P: % of spam emails that were filtered % of ham/ (non-spam) emails that were incorrectly filtered-out  E: a database of emails that were labelled by users i.e. Feedback on emails: “Move to Spam”, “Move to Inbox”

Machine Learning Applications?

Machine Learning: The learning process Model Learning Model Testing

Machine Learning: The learning process Email Server ● Content of the email ● Number of recipients ● Size of message ● Number of attachments ● Number of "re's" in the subject line … ● Content of the email ● Number of recipients ● Size of message ● Number of attachments ● Number of "re's" in the subject line … Model Learning Model Testing

 From e-mails to feature vectors:  Textual-Based Content Features: Email is tokenized Each token is a feature  Meta-Features: Number of recipients Size of message Machine Learning: The learning process

Email Type Free...LotteryViagra Ham010 101 Spam000 111 Ham000 110 Spam001 Vocabulary Target Attribute Instances Binary

Machine Learning: The learning process Email Type Customer Type Country (IP) Email Length (K) Number of new Recipients HamGoldGermany20 HamSilverGermany41 SpamBronzeNigeria25 SpamBronzeRussia42 HamBronzeGermany43 HamSilverUSA10 SpamSilverUSA24 Input Attributes Target Attribute Instances Numeric Nominal Ordinal

Machine Learning: Model learning Learner Classifier

Machine Learning: Model testing Database Training Set Learner

Machine Learning: Decision trees categorical continuous class Training Data

Machine Learning: Decision trees categorical continuous class Refund Yes Splitting Attribute Training Data Model: Decision Tree

Machine Learning: Decision trees categorical continuous class Refund NO Yes Splitting Attribute Training Data Model: Decision Tree

Machine Learning: Decision trees categorical continuous class Refund MarSt NO YesNo Married Splitting Attributes Training Data Model: Decision Tree

Machine Learning: Decision trees categorical continuous class Refund MarSt NO YesNo Married Splitting Attributes Training Data Model: Decision Tree NO

Machine Learning: Decision trees categorical continuous class Refund MarSt NO YesNo Married Single, Divorced Splitting Attributes Training Data Model: Decision Tree NO

Machine Learning: Decision trees categorical continuous class Refund MarSt NO YesNo Married Single, Divorced Splitting Attributes Training Data Model: Decision Tree NO TaxInc > 80K

Machine Learning: Decision trees categorical continuous class Refund MarSt NO YesNo Married Single, Divorced Splitting Attributes Training Data Model: Decision Tree NO TaxInc > 80K YES

Machine Learning: Decision trees categorical continuous class Refund MarSt NO YesNo Married Single, Divorced Splitting Attributes Training Data Model: Decision Tree NO TaxInc > 80K YES < 80K

Machine Learning: Decision trees categorical continuous class Refund MarSt NO YesNo Married Single, Divorced Splitting Attributes Training Data Model: Decision Tree NO TaxInc > 80K YESNO < 80K

Machine Learning: Classification  Binary classification  (Instances, Class labels): (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n )  y i {1,-1} - valued  Classifier: provides class prediction Ŷ for an instance  Outcomes for a prediction: 1 1True positive (TP) False positive (FP) False negative (FP) True negative (TN) True class Predicted class

Machine Learning: Classification  P( Ŷ = Y): accuracy  P( Ŷ = 1 | Y = 1): true positive rate  P( Ŷ = 1 | Y = -1): false positive rate  P(Y = 1 | Ŷ = 1): precision 1 1True positive (TP) False positive (FP) False negative (FP) True negative (TN) True class Predicted class

Machine Learning: Classification  Consider diagnostic test for a disease  Test has 2 possible outcomes:  ‘positive’ = suggesting presence of disease  ‘negative’  An individual can test either positive or negative for the disease

Machine Learning: Classification Test Result Individuals with disease Individuals without the disease

Machine Learning: Classification Test Result Call these patients “negative”Call these patients “positive”

Machine Learning: Classification Test Result Call these patients “negative”Call these patients “positive” without the disease with the disease True Positives

Machine Learning: Classification Test Result Call these patients “negative”Call these patients “positive” without the disease with the disease False Positives

Machine Learning: Classification Test Result Call these patients “negative”Call these patients “positive ” without the disease with the disease True negatives

Machine Learning: Classification Test Result Call these patients “negative”Call these patients “positive” without the disease with the disease False negatives

Machine Learning: Cross-Validation  What if we don’t have enough data to set aside a test dataset?  Cross-Validation: Each data point is used both as train and test data.  Basic idea: Fit model on 90% of the data; test on other 10%. Now do this on a different 90/10 split. Cycle through all 10 cases. 10 “folds” a common rule of thumb.

Machine Learning: Cross-Validation  Divide data into 10 equal pieces P 1 …P 10.  Fit 10 models, each on 90% of the data.  Each data point is treated as an out-of-sample data point by exactly one of the models.

SOCIAL NETWORKS ANALYSIS SEMINAR INTRODUCTORY LECTURE #2 Danny Hendler and Yehonatan Cohen Advanced Topics in on-line Social Networks Analysis.

Similar presentations

Presentation on theme: "SOCIAL NETWORKS ANALYSIS SEMINAR INTRODUCTORY LECTURE #2 Danny Hendler and Yehonatan Cohen Advanced Topics in on-line Social Networks Analysis."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

SOCIAL NETWORKS ANALYSIS SEMINAR INTRODUCTORY LECTURE #2 Danny Hendler and Yehonatan Cohen Advanced Topics in on-line Social Networks Analysis.

Similar presentations

Presentation on theme: "SOCIAL NETWORKS ANALYSIS SEMINAR INTRODUCTORY LECTURE #2 Danny Hendler and Yehonatan Cohen Advanced Topics in on-line Social Networks Analysis."— Presentation transcript:

Similar presentations

About project

Feedback