CMU SCS Big (graph) data analytics Christos Faloutsos CMU.

Slides:



Advertisements
Similar presentations
CMU SCS : Multimedia Databases and Data Mining Lecture #17: Text - part IV (LSI) C. Faloutsos.
Advertisements

CMU SCS : Multimedia Databases and Data Mining Lecture #19: SVD - part II (case studies) C. Faloutsos.
CMU SCS I2.2 Large Scale Information Network Processing INARC 1 Overview Goal: scalable algorithms to find patterns and anomalies on graphs 1. Mining Large.
FUNNEL: Automatic Mining of Spatially Coevolving Epidemics Yasuko Matsubara, Yasushi Sakurai (Kumamoto University) Willem G. van Panhuis (University of.
CMU SCS : Multimedia Databases and Data Mining Lecture #21: Tensor decompositions C. Faloutsos.
SCALING SGD to Big dATA & Huge Models
CMU SCS : Multimedia Databases and Data Mining Extra: intro to hadoop C. Faloutsos.
Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.
15-826: Multimedia Databases and Data Mining
Node labels as random variables prior belief observed neighbor potentials compatibility potentials Opinion Fraud Detection in Online Reviews using Network.
CMU SCS : Multimedia Databases and Data Mining Lecture#1: Introduction Christos Faloutsos CMU
CMU SCS Large Graph Mining - Patterns, Tools and Cascade Analysis Christos Faloutsos CMU.
CMU SCS C. Faloutsos (CMU)#1 Large Graph Algorithms Christos Faloutsos CMU McGlohon, Mary Prakash, Aditya Tong, Hanghang Tsourakakis, Babis Akoglu, Leman.
CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU.
Social Networks and Graph Mining Christos Faloutsos CMU - MLD.
WindMine: Fast and Effective Mining of Web-click Sequences SDM 2011Y. Sakurai et al.1 Yasushi Sakurai (NTT) Lei Li (Carnegie Mellon Univ.) Yasuko Matsubara.
Detecting Fraudulent Personalities in Networks of Online Auctioneers Duen Horng (“Polo”) Chau Shashank Pandit Christos Faloutsos School of Computer Science.
CMU SCS Bio-informatics, Graph and Stream mining Christos Faloutsos CMU.
CMU SCS Graph and stream mining Christos Faloutsos CMU.
Data Mining – Intro.
CMU SCS : Multimedia Databases and Data Mining Lecture #30: Conclusions C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture#1: Introduction Christos Faloutsos CMU
Building Efficient Time Series Similarity Search Operator Mijung Kim Summer Internship 2013 at HP Labs.
Data Mining on the Web via Cloud Computing COMS E6125 Web Enhanced Information Management Presented By Hemanth Murthy.
CMU SCS Big (graph) data analytics Christos Faloutsos CMU.
CMU SCS Large Graph Mining Christos Faloutsos CMU.
SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P0-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
CMU SCS Large Graph Mining Christos Faloutsos CMU.
Fast Mining and Forecasting of Complex Time-Stamped Events Yasuko Matsubara (Kyoto University), Yasushi Sakurai (NTT), Christos Faloutsos (CMU), Tomoharu.
CMU SCS U Kang (CMU) 1KDD 2012 GigaTensor: Scaling Tensor Analysis Up By 100 Times – Algorithms and Discoveries U Kang Christos Faloutsos School of Computer.
AutoPlait: Automatic Mining of Co-evolving Time Sequences Yasuko Matsubara (Kumamoto University) Yasushi Sakurai (Kumamoto University) Christos Faloutsos.
Mining and Querying Multimedia Data Fan Guo Sep 19, 2011 Committee Members: Christos Faloutsos, Chair Eric P. Xing William W. Cohen Ambuj K. Singh, University.
CMU SCS Mining Billion Node Graphs Christos Faloutsos CMU.
Data Mining Algorithms for Large-Scale Distributed Systems Presenter: Ran Wolff Joint work with Assaf Schuster 2003.
EVENT DETECTION IN TIME SERIES OF MOBILE COMMUNICATION GRAPHS
CMU SCS Mining Large Graphs: Fraud Detection, and Algorithms Christos Faloutsos CMU.
CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 5: Graphs over time & tensors Faloutsos,
ParCube: Sparse Parallelizable Tensor Decompositions
 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.
CMU SCS Patterns, Anomalies, and Fraud Detection in Large Graphs Christos Faloutsos CMU.
CMU SCS Mining Large Social Networks: Patterns and Anomalies Christos Faloutsos CMU.
CopyCatch: Stopping Group Attacks by Spotting Lockstep Behavior in Social Networks (WWW2013) BEUTEL, ALEX, WANHONG XU, VENKATESAN GURUSWAMI, CHRISTOPHER.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P9-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
CMU SCS Panel: Social Networks Christos Faloutsos CMU.
CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P8-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 8: hadoop and Tera/Peta byte graphs.
CMU SCS Anomaly Detection in Large Graphs Christos Faloutsos CMU.
Data Mining – Intro.
Forecasting with Cyber-physical Interactions in Data Centers (part 3)
Large Graph Mining: Power Tools and a Practitioner’s guide
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
Non-linear Mining of Competing Local Activities
Data Science Research in Big Data Era
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
Data Warehousing and Data Mining
Smart Portal To Protect Child Online
15-826: Multimedia Databases and Data Mining
Data Mining: Concepts and Techniques
Graph and Tensor Mining for fun and profit
Graph and Tensor Mining for fun and profit
Data Mining: Concepts and Techniques
Graph and Tensor Mining for fun and profit
Christos Faloutsos CMU
Graph and Tensor Mining for fun and profit
15-826: Multimedia Databases and Data Mining
Data Mining: Concepts and Techniques
Presentation transcript:

CMU SCS Big (graph) data analytics Christos Faloutsos CMU

CMU SCS CMU SCS IC '14C. Faloutsos2 CONGRATULATIONS!

CMU SCS CMU SCS IC '14C. Faloutsos3 Outline Q+A Problem definition / Motivation Graphs, tensors and brains Anomaly detection Conclusions

CMU SCS CMU SCS IC '14C. Faloutsos4 Q+A Are you recruiting? How many? How many do you have? How frequently you meet them? What is your advising style? How do you feel about summer internships?

CMU SCS CMU SCS IC '14C. Faloutsos5 Q+A Are you recruiting? How many? How many do you have? How frequently you meet them? What is your advising style? How do you feel about summer internships? 1 or 2 6 (+5pdocs) 1/week results Yes/Maybe (FB, MSR, IBM, ++)

CMU SCS CMU SCS IC '14C. Faloutsos6 Outline Q+A Problem definition / Motivation Graphs, tensors and brains Anomaly detection Conclusions

CMU SCS CMU SCS IC '14C. Faloutsos7 Motivation Data mining: ~ find patterns (rules, outliers) How do real graphs look like? Anomalies? Time series / Monitoring PA, NY, …

CMU SCS CMU SCS IC '14C. Faloutsos8 Graphs - why should we care?

CMU SCS C. Faloutsos9 Graphs - why should we care? Internet Map [lumeta.com] Food Web [Martinez ’91] ~1B users $10-$100B revenue CMU SCS IC '14

CMU SCS CMU SCS IC '14C. Faloutsos10 Outline Q+A Problem definition / Motivation Graphs, tensors and brains Anomaly detection Conclusions

CMU SCS NELL & concepts (=groups) Predicates (subject, verb, object) in knowledge base “Barack Obama is the president of U.S.” “Eric Clapton plays guitar” (26M) (48M) NELL (Never Ending Language Learner) data Nonzeros =144M CMU SCS IC '14C. Faloutsos Tom Mitchell CMU/CS-MLD 11 Vagelis Papalexakis CMU-CS

CMU SCS Answer : tensor factorization Recall: (SVD) matrix factorization: finds blocks CMU SCS IC '14C. Faloutsos12 N users M products ‘meat-eaters’ ‘steaks’ ‘vegetarians’ ‘plants’ ‘kids’ ‘cookies’ ~ ++

CMU SCS PARAFAC decomposition CMU SCS IC '14C. Faloutsos13 = + + subject object verb politicians artistsathletes Answer : tensor factorization

CMU SCS PARAFAC decomposition Results for who-calls-whom-when –4M x 15 days CMU SCS IC '14C. Faloutsos14 = + + caller callee time ?? Answer : tensor factorization

CMU SCS Concept Discovery Concept Discovery in Knowledge Base CMU SCS IC '14C. Faloutsos15

CMU SCS Concept Discovery Concept Discovery in Knowledge Base CMU SCS IC '14C. Faloutsos16 NP1: Internet, file, data NP2: Protocol, software, suite

CMU SCS Neuro-semantics Brain Scan Data * 9 persons 60 nouns Questions 218 questions ‘is it alive?’, ‘can you eat it?’ CMU SCS IC '1417C. Faloutsos *Mitchell et al. Predicting human brain activity associated with the meanings of nouns. Science,

CMU SCS Neuro-semantics Brain Scan Data * 9 persons 60 nouns Questions 218 questions ‘is it alive?’, ‘can you eat it?’ CMU SCS IC '1418C. Faloutsos Patterns?

CMU SCS Neuro-semantics Brain Scan Data * 9 persons 60 nouns Questions 218 questions ‘is it alive?’, ‘can you eat it?’ … airplane dog persons nouns questions voxels CMU SCS IC '1419C. Faloutsos Patterns?

CMU SCS Neuro-semantics 20CMU SCS IC '14C. Faloutsos =

CMU SCS Neuro-semantics 21CMU SCS IC '14C. Faloutsos Small items -> Premotor cortex =

CMU SCS Neuro-semantics 22CMU SCS IC '14C. Faloutsos Evangelos Papalexakis, Tom Mitchell, Nicholas Sidiropoulos, Christos Faloutsos, Partha Pratim Talukdar, Brian Murphy, Turbo-SMT: Accelerating Coupled Sparse Matrix-Tensor Factorizations by 200x, SDM 2014 Small items -> Premotor cortex

CMU SCS CMU SCS IC '14C. Faloutsos23 Scalability Google: > 450,000 processors in clusters of ~2000 processors each [ Barroso+, “Web Search for a Planet: The Google Cluster Architecture” IEEE Micro 2003 ] Yahoo: 5Pb of data [Fayyad, KDD’07] Google-NY, Aug’14: ‘graph with 1T edges, 300B nodes’ Problem: machine failures, on a daily basis How to parallelize data mining tasks, then? A: map/reduce – hadoop (open-source clone)

CMU SCS CMU SCS IC '14C. Faloutsos24 Outline Q+A Problem definition / Motivation Graphs, tensors and brains Anomaly/fraud detection Conclusions

CMU SCS App-store fraud Opinion Fraud Detection in Online Reviews using Network Effects Leman Akoglu, Rishi Chandy, CF ICWSM’13 CMU SCS IC '14C. Faloutsos25 (NSF grant, with Alex Beutel)

CMU SCS Problem Given –user-product review network –review sign (+/-) Classify –objects into type-specific classes: users: `honest’ / `fraudster’ products: `good’ / `bad’ reviews: `genuine’ / `fake’ No side data! (e.g., timestamp, review text) CMU SCS IC '14C. Faloutsos26

CMU SCS Formulation: BP UserProduct honestbad honestgood CMU SCS IC '14C. Faloutsos27 – + Before After

CMU SCS Top scorers CMU SCS IC '14C. Faloutsos28 + positive (4-5) rating o negative (1-2) rating Users Products

CMU SCS Top scorers CMU SCS IC '14C. Faloutsos29 + positive (4-5) rating o negative (1-2) rating Users Products

CMU SCS ‘Fraud-bot’ member reviews CMU SCS IC '14C. Faloutsos30 Same developer!Duplicated text! Same day activity!

CMU SCS CMU SCS IC '14C. Faloutsos31 Outline Q+A Problem definition / Motivation Graphs, tensors and brains Anomaly/fraud detection Time series, monitoring / forecasting Conclusions

CMU SCS ‘Tycho’ – epidemics analysis CMU SCS IC '1432C. Faloutsos Yasuko Matsubara 50 states x 46 diseases

CMU SCS ‘Tycho’ – epidemics analysis CMU SCS IC '1433C. Faloutsos Prof. Yasuko Matsubara

CMU SCS ‘Tycho’ – epidemics analysis CMU SCS IC '1434C. Faloutsos Prof. Yasuko Matsubara Flu? Measles? August? No periodicity?

CMU SCS ‘Tycho’ – epidemics analysis CMU SCS IC '1435C. Faloutsos Prof. Yasuko Matsubara Flu? Measles? August? No periodicity?

CMU SCS ‘Tycho’ – epidemics analysis CMU SCS IC '1436C. Faloutsos Prof. Yasuko Matsubara Flu? Measles? August? No periodicity?

CMU SCS ‘Tycho’ – epidemics analysis CMU SCS IC '1437C. Faloutsos Prof. Yasuko Matsubara Flu? Measles? August? No periodicity?

CMU SCS ‘Tycho’ – epidemics analysis CMU SCS IC '1438C. Faloutsos Prof. Yasuko Matsubara Flu? Measles? August? No periodicity?

CMU SCS ‘Tycho’ – epidemics analysis CMU SCS IC '1439C. Faloutsos Prof. Yasuko Matsubara from U. Pitt (epidemiology dept.) Yasuko Matsubara, Yasushi Sakurai, Willem van Panhuis, and Christos Faloutsos, FUNNEL: Automatic Mining of Spatially Coevolving Epidemics, KDD 2014, New York City, NY, USA, Aug , 2014.

CMU SCS Open research questions Patterns/anomalies for time-evolving graphs (Call graph, 3M people x 6mo) Spot fraudsters in soc-net (eg., Twitter ‘$10 -> 1000 followers’) How is the human brain wired CMU SCS IC '14C. Faloutsos40

CMU SCS CMU SCS IC '14C. Faloutsos41 Contact info GHC 8019 Ph#: x ic/ FYI: Course: , Tu-Th 3:00-4:20 and, again WELCOME!