Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015.

Similar presentations


Presentation on theme: "CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015."— Presentation transcript:

1 CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

2 About me  Assistant Professor, CS – Member, Discovery Analytics Center  Previously – Ph.D. in Computer Science, Carnegie Mellon University – B.Tech in Computer Science and Engg, Indian Institute of Technology (IIT) – Bombay – Internships at Sprint, Yahoo, Microsoft Research Prakash 20152

3 3

4 4 Data contains value and knowledge Prakash 2015

5 Data and Business Prakash 20155 Source: A. Machhanavajjhala

6 Data and Science Prakash 20156

7 Data and Government Prakash 20157 Source: A. Machhanavajjhala

8 Data and Culture Prakash 20158 Source: A. Machhanavajjhala

9 Prakash 20159

10 Good news: Demand for Data Mining 10Prakash 2015

11 How to extract value from data?  Manipulate Data – CS, Domain expertise  Analyze Data – Math, CS, Stat…  Communicate your results – CS, Domain Expertise Prakash 201511

12 Communication is important! Prakash 201512

13 What is Data Mining?  Given lots of data  Discover patterns and models that are: – Valid: hold on new data with some certainty – Useful: should be possible to act on the item – Unexpected: non-obvious to the system – Understandable: humans should be able to interpret the pattern 13Prakash 2015

14 Data Mining Tasks  Descriptive methods – Find human-interpretable patterns that describe the data Example: Clustering  Predictive methods – Use some variables to predict unknown or future values of other variables Example: Recommender systems 14Prakash 2015

15 ML & Stats. Comp. Systems Theory & Algo. Biology Econ. Social Science Physics 15 Big data Prakash 2015

16 Data at CS, VT  Knowledge, Information and Data  http://www.cs.vt.edu/undergraduate/tracks/k id http://www.cs.vt.edu/undergraduate/tracks/k id  People: Fox, Harrison, Huang, Lu (in NVA), Ramakrishnan (in NVA), Rozovskaya, Prakash Prakash 201516

17 Courses  Background in some areas: – CS3414 (Numerical Methods); also prob/stat  4000 level – 4244 Internet Software Development – 4604 Database Management Systems – 4624 Capstone (Multimedia, Information Access) – 4634 Design of Information (Capstone) – 4804 AI – 4984 Computational Linguistics (Capstone) Prakash 201517

18 Discovery Analytics Center Prakash 201518

19 MY RESEARCH Prakash 201519

20 Networks are everywhere! Human Disease Network [Barabasi 2007] Gene Regulatory Network [Decourty 2008] Facebook Network [2010] The Internet [2005] 20Prakash 2015

21 What else do they have in common? Prakash 201521

22 High School Dating Network Prakash 201522 Bearman et. al. Am. Jnl. of Sociology, 2004. Image: Mark Newman Blue: Male Pink: Female Interesting observations?

23 The Internet Prakash 201523 Skewed Degrees Robustness

24 Karate Club Network Prakash 201524

25 25 Dynamical Processes over networks are also everywhere! Prakash 2015

26 Why do we care?  Social collaboration  Information Diffusion  Viral Marketing  Epidemiology and Public Health  Cyber Security  Human mobility  Games and Virtual Worlds  Ecology........ 26Prakash 2015

27 Why do we care? (1: Epidemiology)  Dynamical Processes over networks [AJPH 2007] CDC data: Visualization of the first 35 tuberculosis (TB) patients and their 1039 contacts Diseases over contact networks 27Prakash 2015

28 Why do we care? (1: Epidemiology)  Dynamical Processes over networks Each circle is a hospital ~3000 hospitals More than 30,000 patients transferred [US-MEDICARE NETWORK 2005] 28 Problem: Given k units of disinfectant, whom to immunize? Prakash 2015

29 Why do we care? (1: Epidemiology) CURRENT PRACTICEOUR METHOD ~6x fewer! [US-MEDICARE NETWORK 2005] 29 Hospital-acquired inf. took 99K+ lives, cost $5B+ (all per year) Prakash 2015

30 Why do we care? (2: Online Diffusion) 30 > 800m users, ~$1B revenue [WSJ 2010] ~100m active users > 50m users Prakash 2015

31 Why do we care? (2: Online Diffusion)  Dynamical Processes over networks Celebrity Buy Versace™! Followers 31 Social Media Marketing Prakash 2015

32 Social  Biological Contagion Automatically learn models Prakash 201432

33 Why do we care? (3: To change the world?)  Dynamical Processes over networks Social networks and Collaborative Action 33Prakash 2015

34 High Impact – Multiple Settings Q. How to squash rumors faster? Q. How do opinions spread? Q. How to market better? 34 epidemic out-breaks products/viruses transmit s/w patches Prakash 2015

35 Dynamical Processes = (a lot of) Networks + (some) Time-Series Prakash 201535

36 Research Theme DATA Large real-world networks & processes 36 ANALYSIS Understanding POLICY/ ACTION Managing Prakash 2015

37 Research Theme – Public Health DATA Modeling # patient transfers ANALYSIS Will an epidemic happen? POLICY/ ACTION How to control out-breaks? 37Prakash 2015

38 Research Theme – Social Media DATA Modeling Tweets spreading POLICY/ ACTION How to market better? ANALYSIS # cascades in future? 38Prakash 2015

39 A Question  How many of you think your friends have more friends than you?  A recent Facebook study – Examined all of FB’s users: 721 million people with 69 billion friendships. about 10 percent of the world’s population! – Found that user’s friend count was less than the average friend count of his or her friends, 93 percent of the time. – Users had an average of 190 friends, while their friends averaged 635 friends of their own. Prakash 201539

40 Possible Reasons?  You are a loner?  Your friends are extroverts?  There are more extroverts than introverts in the world? Prakash 201540

41 Example Prakash 201541 Source: S. Strogatz, NYT 2012 Average number of friends?

42 Example Prakash 201542 Source: S. Strogatz, NYT 2012 Average number of friends = ( 1 + 3 + 2 + 2 ) / 4 = 2

43 Example Prakash 201543 Source: S. Strogatz, NYT 2012 Average number of friends = ( 1 + 3 + 2 + 2 ) / 4 = 2 Average number of friends of friends

44 Example Prakash 201544 Source: S. Strogatz, NYT 2012 Average number of friends = ( 1 + 3 + 2 + 2 ) / 4 = 2 Average number of friends of friends = (3 + 1 + 2 + 2 + 3 + 2 + 3 + 2)/8 = ((1x1) + (3x3) + (2x2) + (2x2))/8

45 Example Prakash 201545 Source: S. Strogatz, NYT 2012 Average number of friends = ( 1 + 3 + 2 + 2 ) / 4 = 2 Average number of friends of friends = (3 + 1 + 2 + 2 + 3 + 2 + 3 + 2)/8 = ((1x1) + (3x3) + (2x2) + (2x2))/8 = 2.25!

46 Actually it is (almost) always true!  Proof? Prakash 201546

47 Actually it is (almost) always true!  Proof? Prakash 201547

48 Actually it is (almost) always true!  Proof? Prakash 201548

49 Actually it is (almost) always true!  Proof? Prakash 201549

50 Actually it is (almost) always true!  Proof? Prakash 201550 Essentially, it is true if there is any spread in # of friends (non-zero variance)!

51 Implications  Immunization – acquaintance immunization Immunize friend-of-friend  Early warning of outbreaks – Again, monitor friends of friends Prakash 201551

52 Thanks---Questions? B. Aditya Prakash 3160 F Torgersen Hall badityap@cs.vt.edu See my homepage for more details and papers: http://www.cs.vt.edu/~badityap Prakash 201552


Download ppt "CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015."

Similar presentations


Ads by Google