Download presentation
Presentation is loading. Please wait.
Published byMilton Page Modified over 9 years ago
1
SOCIAL NETWORKS ANALYSIS SEMINAR INTRODUCTORY LECTURE #2 Danny Hendler and Yehonatan Cohen Advanced Topics in on-line Social Networks Analysis
2
Introductory lecture #1 5/3/14 No seminar (Purim!) Semester ends 12/3/14 Introductory lecture #2 Papers list published, students send their 3 preferences 14/3/14 11 weeks of Student talks 19/3/14 Student talks start All students preferences must be received 10/3/14 26/3/14 Seminar schedule
3
Nodes centrality Degree Closeness Betweenness Machine-learning Talk outline
4
Name the most central/significant node: 1 2 3 4 56 78 9 10 11 12 13 Nodes centrality
5
Name the most central/significant node: 1 2 34 56 7 8 9 10 11 12 13 Nodes centrality
6
What makes a node central? Number of connections It is central if it disconnects the graph High number of paths passing through the node Proximity to all other nodes Central node is the one whose neighbors are central …
7
Detection of the most popular actor in a network Spamming / Advertising Network vulnerability Health care / Epidemics Clustering similar structural positions Recommendation systems … Nodes centrality: Applications
8
Nodes centrality: Degree
9
Name the most central/significant node: 1 2 3 4 5 6 7 8 9 Nodes centrality: Degree
10
1 2 34 56 7 8 9 10 11 12 13 DegreeNode 44 36 37 38 39 310 211 212
11
Nodes centrality: Closeness (Reach)
12
1 2 34 56 7 8 9 10 11 12 13 ReachDegreeNode 5.8444 5.9336 6.1237 5.7538 5.2539 5.18310 211 212
13
Nodes centrality: Betweenness
14
Nodes centrality: Beetweenness 1 2 34 5678 9 10 11 12 13 BetweennessReachNode 605.844 785.936 726.127 435.758 155.259 415.1810 11 12
15
Nodes centrality Machine Learning The learning process Classification Evaluation Talk outline
16
Herbert Alexander Simon: “Learning is any process by which a system improves performance from experience.” “Machine Learning is concerned with computer programs that automatically improve their performance through experience. “ Herbert Simon Turing AwardTuring Award 1975 Nobel Prize in Economics 1978 Nobel Prize in Economics Machine Learning
17
Learning = Improving with experience at some task Improve over task T, With respect to performance measure, P Based on experience, E. Herbert Simon Turing AwardTuring Award 1975 Nobel Prize in Economics 1978 Nobel Prize in Economics Machine Learning
18
Example: Spam Filtering T: Identify Spam Emails P: % of spam emails that were filtered % of ham/ (non-spam) emails that were incorrectly filtered-out E: a database of emails that were labelled by users i.e. Feedback on emails: “Move to Spam”, “Move to Inbox”
19
Machine Learning Applications?
20
Machine Learning: The learning process Model Learning Model Testing
21
Machine Learning: The learning process Email Server ● Content of the email ● Number of recipients ● Size of message ● Number of attachments ● Number of "re's" in the subject line … ● Content of the email ● Number of recipients ● Size of message ● Number of attachments ● Number of "re's" in the subject line … Model Learning Model Testing
22
From e-mails to feature vectors: Textual-Based Content Features: Email is tokenized Each token is a feature Meta-Features: Number of recipients Size of message Machine Learning: The learning process
23
Email Type Free...LotteryViagra Ham010 101 Spam000 111 Ham000 110 Spam001 Vocabulary Target Attribute Instances Binary
24
Machine Learning: The learning process Email Type Customer Type Country (IP) Email Length (K) Number of new Recipients HamGoldGermany20 HamSilverGermany41 SpamBronzeNigeria25 SpamBronzeRussia42 HamBronzeGermany43 HamSilverUSA10 SpamSilverUSA24 Input Attributes Target Attribute Instances Numeric Nominal Ordinal
25
Machine Learning: Model learning Learner Classifier
26
Machine Learning: Model testing Database Training Set Learner
27
Machine Learning: Decision trees categorical continuous class Training Data
28
Machine Learning: Decision trees categorical continuous class Refund Yes Splitting Attribute Training Data Model: Decision Tree
29
Machine Learning: Decision trees categorical continuous class Refund NO Yes Splitting Attribute Training Data Model: Decision Tree
30
Machine Learning: Decision trees categorical continuous class Refund MarSt NO YesNo Married Splitting Attributes Training Data Model: Decision Tree
31
Machine Learning: Decision trees categorical continuous class Refund MarSt NO YesNo Married Splitting Attributes Training Data Model: Decision Tree
32
Machine Learning: Decision trees categorical continuous class Refund MarSt NO YesNo Married Splitting Attributes Training Data Model: Decision Tree NO
33
Machine Learning: Decision trees categorical continuous class Refund MarSt NO YesNo Married Single, Divorced Splitting Attributes Training Data Model: Decision Tree NO
34
Machine Learning: Decision trees categorical continuous class Refund MarSt NO YesNo Married Single, Divorced Splitting Attributes Training Data Model: Decision Tree NO TaxInc > 80K
35
Machine Learning: Decision trees categorical continuous class Refund MarSt NO YesNo Married Single, Divorced Splitting Attributes Training Data Model: Decision Tree NO TaxInc > 80K YES
36
Machine Learning: Decision trees categorical continuous class Refund MarSt NO YesNo Married Single, Divorced Splitting Attributes Training Data Model: Decision Tree NO TaxInc > 80K YES
37
Machine Learning: Decision trees categorical continuous class Refund MarSt NO YesNo Married Single, Divorced Splitting Attributes Training Data Model: Decision Tree NO TaxInc > 80K YES < 80K
38
Machine Learning: Decision trees categorical continuous class Refund MarSt NO YesNo Married Single, Divorced Splitting Attributes Training Data Model: Decision Tree NO TaxInc > 80K YESNO < 80K
39
Machine Learning: Classification Binary classification (Instances, Class labels): (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ) y i {1,-1} - valued Classifier: provides class prediction Ŷ for an instance Outcomes for a prediction: 1 1True positive (TP) False positive (FP) False negative (FP) True negative (TN) True class Predicted class
40
Machine Learning: Classification P( Ŷ = Y): accuracy P( Ŷ = 1 | Y = 1): true positive rate P( Ŷ = 1 | Y = -1): false positive rate P(Y = 1 | Ŷ = 1): precision 1 1True positive (TP) False positive (FP) False negative (FP) True negative (TN) True class Predicted class
41
Machine Learning: Classification Consider diagnostic test for a disease Test has 2 possible outcomes: ‘positive’ = suggesting presence of disease ‘negative’ An individual can test either positive or negative for the disease
42
Machine Learning: Classification Test Result Individuals with disease Individuals without the disease
43
Machine Learning: Classification Test Result Call these patients “negative”Call these patients “positive”
44
Machine Learning: Classification Test Result Call these patients “negative”Call these patients “positive” without the disease with the disease True Positives
45
Machine Learning: Classification Test Result Call these patients “negative”Call these patients “positive” without the disease with the disease False Positives
46
Machine Learning: Classification Test Result Call these patients “negative”Call these patients “positive ” without the disease with the disease True negatives
47
Machine Learning: Classification Test Result Call these patients “negative”Call these patients “positive” without the disease with the disease False negatives
48
Machine Learning: Cross-Validation What if we don’t have enough data to set aside a test dataset? Cross-Validation: Each data point is used both as train and test data. Basic idea: Fit model on 90% of the data; test on other 10%. Now do this on a different 90/10 split. Cycle through all 10 cases. 10 “folds” a common rule of thumb.
49
Machine Learning: Cross-Validation Divide data into 10 equal pieces P 1 …P 10. Fit 10 models, each on 90% of the data. Each data point is treated as an out-of-sample data point by exactly one of the models.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.