Download presentation
Presentation is loading. Please wait.
Published byRosemary Dixon Modified over 9 years ago
1
CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015
2
About me Assistant Professor, CS – Member, Discovery Analytics Center Previously – Ph.D. in Computer Science, Carnegie Mellon University – B.Tech in Computer Science and Engg, Indian Institute of Technology (IIT) – Bombay – Internships at Sprint, Yahoo, Microsoft Research Prakash 20152
3
3
4
4 Data contains value and knowledge Prakash 2015
5
Data and Business Prakash 20155 Source: A. Machhanavajjhala
6
Data and Science Prakash 20156
7
Data and Government Prakash 20157 Source: A. Machhanavajjhala
8
Data and Culture Prakash 20158 Source: A. Machhanavajjhala
9
Prakash 20159
10
Good news: Demand for Data Mining 10Prakash 2015
11
How to extract value from data? Manipulate Data – CS, Domain expertise Analyze Data – Math, CS, Stat… Communicate your results – CS, Domain Expertise Prakash 201511
12
Communication is important! Prakash 201512
13
What is Data Mining? Given lots of data Discover patterns and models that are: – Valid: hold on new data with some certainty – Useful: should be possible to act on the item – Unexpected: non-obvious to the system – Understandable: humans should be able to interpret the pattern 13Prakash 2015
14
Data Mining Tasks Descriptive methods – Find human-interpretable patterns that describe the data Example: Clustering Predictive methods – Use some variables to predict unknown or future values of other variables Example: Recommender systems 14Prakash 2015
15
ML & Stats. Comp. Systems Theory & Algo. Biology Econ. Social Science Physics 15 Big data Prakash 2015
16
Data at CS, VT Knowledge, Information and Data http://www.cs.vt.edu/undergraduate/tracks/k id http://www.cs.vt.edu/undergraduate/tracks/k id People: Fox, Harrison, Huang, Lu (in NVA), Ramakrishnan (in NVA), Rozovskaya, Prakash Prakash 201516
17
Courses Background in some areas: – CS3414 (Numerical Methods); also prob/stat 4000 level – 4244 Internet Software Development – 4604 Database Management Systems – 4624 Capstone (Multimedia, Information Access) – 4634 Design of Information (Capstone) – 4804 AI – 4984 Computational Linguistics (Capstone) Prakash 201517
18
Discovery Analytics Center Prakash 201518
19
MY RESEARCH Prakash 201519
20
Networks are everywhere! Human Disease Network [Barabasi 2007] Gene Regulatory Network [Decourty 2008] Facebook Network [2010] The Internet [2005] 20Prakash 2015
21
What else do they have in common? Prakash 201521
22
High School Dating Network Prakash 201522 Bearman et. al. Am. Jnl. of Sociology, 2004. Image: Mark Newman Blue: Male Pink: Female Interesting observations?
23
The Internet Prakash 201523 Skewed Degrees Robustness
24
Karate Club Network Prakash 201524
25
25 Dynamical Processes over networks are also everywhere! Prakash 2015
26
Why do we care? Social collaboration Information Diffusion Viral Marketing Epidemiology and Public Health Cyber Security Human mobility Games and Virtual Worlds Ecology........ 26Prakash 2015
27
Why do we care? (1: Epidemiology) Dynamical Processes over networks [AJPH 2007] CDC data: Visualization of the first 35 tuberculosis (TB) patients and their 1039 contacts Diseases over contact networks 27Prakash 2015
28
Why do we care? (1: Epidemiology) Dynamical Processes over networks Each circle is a hospital ~3000 hospitals More than 30,000 patients transferred [US-MEDICARE NETWORK 2005] 28 Problem: Given k units of disinfectant, whom to immunize? Prakash 2015
29
Why do we care? (1: Epidemiology) CURRENT PRACTICEOUR METHOD ~6x fewer! [US-MEDICARE NETWORK 2005] 29 Hospital-acquired inf. took 99K+ lives, cost $5B+ (all per year) Prakash 2015
30
Why do we care? (2: Online Diffusion) 30 > 800m users, ~$1B revenue [WSJ 2010] ~100m active users > 50m users Prakash 2015
31
Why do we care? (2: Online Diffusion) Dynamical Processes over networks Celebrity Buy Versace™! Followers 31 Social Media Marketing Prakash 2015
32
Social Biological Contagion Automatically learn models Prakash 201432
33
Why do we care? (3: To change the world?) Dynamical Processes over networks Social networks and Collaborative Action 33Prakash 2015
34
High Impact – Multiple Settings Q. How to squash rumors faster? Q. How do opinions spread? Q. How to market better? 34 epidemic out-breaks products/viruses transmit s/w patches Prakash 2015
35
Dynamical Processes = (a lot of) Networks + (some) Time-Series Prakash 201535
36
Research Theme DATA Large real-world networks & processes 36 ANALYSIS Understanding POLICY/ ACTION Managing Prakash 2015
37
Research Theme – Public Health DATA Modeling # patient transfers ANALYSIS Will an epidemic happen? POLICY/ ACTION How to control out-breaks? 37Prakash 2015
38
Research Theme – Social Media DATA Modeling Tweets spreading POLICY/ ACTION How to market better? ANALYSIS # cascades in future? 38Prakash 2015
39
A Question How many of you think your friends have more friends than you? A recent Facebook study – Examined all of FB’s users: 721 million people with 69 billion friendships. about 10 percent of the world’s population! – Found that user’s friend count was less than the average friend count of his or her friends, 93 percent of the time. – Users had an average of 190 friends, while their friends averaged 635 friends of their own. Prakash 201539
40
Possible Reasons? You are a loner? Your friends are extroverts? There are more extroverts than introverts in the world? Prakash 201540
41
Example Prakash 201541 Source: S. Strogatz, NYT 2012 Average number of friends?
42
Example Prakash 201542 Source: S. Strogatz, NYT 2012 Average number of friends = ( 1 + 3 + 2 + 2 ) / 4 = 2
43
Example Prakash 201543 Source: S. Strogatz, NYT 2012 Average number of friends = ( 1 + 3 + 2 + 2 ) / 4 = 2 Average number of friends of friends
44
Example Prakash 201544 Source: S. Strogatz, NYT 2012 Average number of friends = ( 1 + 3 + 2 + 2 ) / 4 = 2 Average number of friends of friends = (3 + 1 + 2 + 2 + 3 + 2 + 3 + 2)/8 = ((1x1) + (3x3) + (2x2) + (2x2))/8
45
Example Prakash 201545 Source: S. Strogatz, NYT 2012 Average number of friends = ( 1 + 3 + 2 + 2 ) / 4 = 2 Average number of friends of friends = (3 + 1 + 2 + 2 + 3 + 2 + 3 + 2)/8 = ((1x1) + (3x3) + (2x2) + (2x2))/8 = 2.25!
46
Actually it is (almost) always true! Proof? Prakash 201546
47
Actually it is (almost) always true! Proof? Prakash 201547
48
Actually it is (almost) always true! Proof? Prakash 201548
49
Actually it is (almost) always true! Proof? Prakash 201549
50
Actually it is (almost) always true! Proof? Prakash 201550 Essentially, it is true if there is any spread in # of friends (non-zero variance)!
51
Implications Immunization – acquaintance immunization Immunize friend-of-friend Early warning of outbreaks – Again, monitor friends of friends Prakash 201551
52
Thanks---Questions? B. Aditya Prakash 3160 F Torgersen Hall badityap@cs.vt.edu See my homepage for more details and papers: http://www.cs.vt.edu/~badityap Prakash 201552
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.