Presentation is loading. Please wait.

Presentation is loading. Please wait.

SOCIAL NETWORKS ANALYSIS SEMINAR INTRODUCTORY LECTURE #2 Danny Hendler and Yehonatan Cohen Advanced Topics in on-line Social Networks Analysis.

Similar presentations


Presentation on theme: "SOCIAL NETWORKS ANALYSIS SEMINAR INTRODUCTORY LECTURE #2 Danny Hendler and Yehonatan Cohen Advanced Topics in on-line Social Networks Analysis."— Presentation transcript:

1 SOCIAL NETWORKS ANALYSIS SEMINAR INTRODUCTORY LECTURE #2 Danny Hendler and Yehonatan Cohen Advanced Topics in on-line Social Networks Analysis

2 Introductory lecture #1 5/3/14 No seminar (Purim!) Semester ends 12/3/14 Introductory lecture #2 Papers list published, students send their 3 preferences 14/3/14 11 weeks of Student talks 19/3/14 Student talks start All students preferences must be received 10/3/14 26/3/14 Seminar schedule

3  Nodes centrality  Degree  Closeness  Betweenness  Machine-learning Talk outline

4  Name the most central/significant node: 1 2 3 4 56 78 9 10 11 12 13 Nodes centrality

5  Name the most central/significant node: 1 2 34 56 7 8 9 10 11 12 13 Nodes centrality

6  What makes a node central?  Number of connections  It is central if it disconnects the graph  High number of paths passing through the node  Proximity to all other nodes  Central node is the one whose neighbors are central  …

7  Detection of the most popular actor in a network  Spamming / Advertising  Network vulnerability  Health care / Epidemics  Clustering similar structural positions  Recommendation systems  … Nodes centrality: Applications

8 Nodes centrality: Degree

9  Name the most central/significant node: 1 2 3 4 5 6 7 8 9 Nodes centrality: Degree

10 1 2 34 56 7 8 9 10 11 12 13 DegreeNode 44 36 37 38 39 310 211 212

11 Nodes centrality: Closeness (Reach)

12 1 2 34 56 7 8 9 10 11 12 13 ReachDegreeNode 5.8444 5.9336 6.1237 5.7538 5.2539 5.18310 211 212

13 Nodes centrality: Betweenness

14 Nodes centrality: Beetweenness 1 2 34 5678 9 10 11 12 13 BetweennessReachNode 605.844 785.936 726.127 435.758 155.259 415.1810 11 12

15  Nodes centrality  Machine Learning  The learning process  Classification  Evaluation Talk outline

16  Herbert Alexander Simon: “Learning is any process by which a system improves performance from experience.”  “Machine Learning is concerned with computer programs that automatically improve their performance through experience. “ Herbert Simon Turing AwardTuring Award 1975 Nobel Prize in Economics 1978 Nobel Prize in Economics Machine Learning

17  Learning = Improving with experience at some task  Improve over task T,  With respect to performance measure, P  Based on experience, E. Herbert Simon Turing AwardTuring Award 1975 Nobel Prize in Economics 1978 Nobel Prize in Economics Machine Learning

18 Example: Spam Filtering  T: Identify Spam Emails  P: % of spam emails that were filtered % of ham/ (non-spam) emails that were incorrectly filtered-out  E: a database of emails that were labelled by users i.e. Feedback on emails: “Move to Spam”, “Move to Inbox”

19 Machine Learning Applications?

20 Machine Learning: The learning process Model Learning Model Testing

21 Machine Learning: The learning process Email Server ● Content of the email ● Number of recipients ● Size of message ● Number of attachments ● Number of "re's" in the subject line … ● Content of the email ● Number of recipients ● Size of message ● Number of attachments ● Number of "re's" in the subject line … Model Learning Model Testing

22  From e-mails to feature vectors:  Textual-Based Content Features: Email is tokenized Each token is a feature  Meta-Features: Number of recipients Size of message Machine Learning: The learning process

23 Email Type Free...LotteryViagra Ham010 101 Spam000 111 Ham000 110 Spam001 Vocabulary Target Attribute Instances Binary

24 Machine Learning: The learning process Email Type Customer Type Country (IP) Email Length (K) Number of new Recipients HamGoldGermany20 HamSilverGermany41 SpamBronzeNigeria25 SpamBronzeRussia42 HamBronzeGermany43 HamSilverUSA10 SpamSilverUSA24 Input Attributes Target Attribute Instances Numeric Nominal Ordinal

25 Machine Learning: Model learning Learner Classifier

26 Machine Learning: Model testing Database Training Set Learner

27 Machine Learning: Decision trees categorical continuous class Training Data

28 Machine Learning: Decision trees categorical continuous class Refund Yes Splitting Attribute Training Data Model: Decision Tree

29 Machine Learning: Decision trees categorical continuous class Refund NO Yes Splitting Attribute Training Data Model: Decision Tree

30 Machine Learning: Decision trees categorical continuous class Refund MarSt NO YesNo Married Splitting Attributes Training Data Model: Decision Tree

31 Machine Learning: Decision trees categorical continuous class Refund MarSt NO YesNo Married Splitting Attributes Training Data Model: Decision Tree

32 Machine Learning: Decision trees categorical continuous class Refund MarSt NO YesNo Married Splitting Attributes Training Data Model: Decision Tree NO

33 Machine Learning: Decision trees categorical continuous class Refund MarSt NO YesNo Married Single, Divorced Splitting Attributes Training Data Model: Decision Tree NO

34 Machine Learning: Decision trees categorical continuous class Refund MarSt NO YesNo Married Single, Divorced Splitting Attributes Training Data Model: Decision Tree NO TaxInc > 80K

35 Machine Learning: Decision trees categorical continuous class Refund MarSt NO YesNo Married Single, Divorced Splitting Attributes Training Data Model: Decision Tree NO TaxInc > 80K YES

36 Machine Learning: Decision trees categorical continuous class Refund MarSt NO YesNo Married Single, Divorced Splitting Attributes Training Data Model: Decision Tree NO TaxInc > 80K YES

37 Machine Learning: Decision trees categorical continuous class Refund MarSt NO YesNo Married Single, Divorced Splitting Attributes Training Data Model: Decision Tree NO TaxInc > 80K YES < 80K

38 Machine Learning: Decision trees categorical continuous class Refund MarSt NO YesNo Married Single, Divorced Splitting Attributes Training Data Model: Decision Tree NO TaxInc > 80K YESNO < 80K

39 Machine Learning: Classification  Binary classification  (Instances, Class labels): (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n )  y i {1,-1} - valued  Classifier: provides class prediction Ŷ for an instance  Outcomes for a prediction: 1 1True positive (TP) False positive (FP) False negative (FP) True negative (TN) True class Predicted class

40 Machine Learning: Classification  P( Ŷ = Y): accuracy  P( Ŷ = 1 | Y = 1): true positive rate  P( Ŷ = 1 | Y = -1): false positive rate  P(Y = 1 | Ŷ = 1): precision 1 1True positive (TP) False positive (FP) False negative (FP) True negative (TN) True class Predicted class

41 Machine Learning: Classification  Consider diagnostic test for a disease  Test has 2 possible outcomes:  ‘positive’ = suggesting presence of disease  ‘negative’  An individual can test either positive or negative for the disease

42 Machine Learning: Classification Test Result Individuals with disease Individuals without the disease

43 Machine Learning: Classification Test Result Call these patients “negative”Call these patients “positive”

44 Machine Learning: Classification Test Result Call these patients “negative”Call these patients “positive” without the disease with the disease True Positives

45 Machine Learning: Classification Test Result Call these patients “negative”Call these patients “positive” without the disease with the disease False Positives

46 Machine Learning: Classification Test Result Call these patients “negative”Call these patients “positive ” without the disease with the disease True negatives

47 Machine Learning: Classification Test Result Call these patients “negative”Call these patients “positive” without the disease with the disease False negatives

48 Machine Learning: Cross-Validation  What if we don’t have enough data to set aside a test dataset?  Cross-Validation: Each data point is used both as train and test data.  Basic idea: Fit model on 90% of the data; test on other 10%. Now do this on a different 90/10 split. Cycle through all 10 cases. 10 “folds” a common rule of thumb.

49 Machine Learning: Cross-Validation  Divide data into 10 equal pieces P 1 …P 10.  Fit 10 models, each on 90% of the data.  Each data point is treated as an out-of-sample data point by exactly one of the models.


Download ppt "SOCIAL NETWORKS ANALYSIS SEMINAR INTRODUCTORY LECTURE #2 Danny Hendler and Yehonatan Cohen Advanced Topics in on-line Social Networks Analysis."

Similar presentations


Ads by Google