Massive Data Analysis Lab (MassDAL) S. Muthukrishnan CS Dept.

Slides:



Advertisements
Similar presentations
City Innovation Lab : Manchester Computer Science 30 th June 2010.
Advertisements

M-learning thru M-devices- Is it real learning?. Real challenges? Mobility – a fad, trend or a culture? Learning – have we change our learning behaviour?
Prof. Carolina Ruiz Department of Computer Science Worcester Polytechnic Institute INTRODUCTION TO KNOWLEDGE DISCOVERY IN DATABASES AND DATA MINING.
Principles of Information Systems, Sixth Edition The Internet, Intranets, and Extranets Chapter 7.
How Smart, Connected Products are Transforming Competition: Executive Summary Eric Snow SVP, Corporate Communications April 9, 2015.
CPSC 695 Future of GIS Marina L. Gavrilova. The future of GIS.
1 GENI: Global Environment for Network Innovations Jennifer Rexford Princeton University
July 11, Telecom Research at Rutgers University Fred S. Roberts Director, DIMACS Commission on Jobs Growth and Economic.
Is 'Designing' Cyberinfrastructure - or, Even, Defining It - Possible? Peter A. Freeman National Science Foundation January 29, 2007 The views expressed.
1 GENI: Global Environment for Network Innovations Jennifer Rexford On behalf of Allison Mankin (NSF)
DIMACS Working Group on Data Mining and Epidemiology.
Data Mining – Intro.
Masters in Information Science and Technology (IST) Thesis and Non-Thesis Option (30 Credits)
Summary of “New Ways to Exploit Raw Data May Bring Surge of Innovation, a Study Says” Steve Lohr, New York Times, May 13th, 2011 Presented by: Zhe Jiang.
CIT 858: Data Mining and Data Warehousing Course Instructor: Bajuna Salehe Web:
Big Data A big step towards innovation, competition and productivity.
Context-aware data access models (Declaration of Intent Draft) Dmitry Namiot Lomonosov Moscow State University Proposal: SkTech.RC/IT/Madnick.
Connected Car Come of Age Pavan Mathew, Head of Connected Car Telefonica Digital CTIA 2013, Las Vegas May 22 nd 2013.
Agile Manufacturing Industries From Mechatronics to Collaborative Supply Chains Industrial Technologies Aarhus, 19 June Dr. Massimo Mattucci Session.
Career Opportunities in Statistics John Borkowski Montana State University Panel Discussion Pattaya Conference on Statistics Pattaya, Thailand.
Basic Concepts in Big Data
Yinhai Wang University of Washington and Harbin Institute of Technology For OpenITS Symposium Oct.
Big speech data analytics for contact centers - BISON European Horizon 2020 project No Rationale: The Contact Center (CC) industry involves more.
Data Quality, Data Cleaning and Treatment of Noisy Data DIMACS Workshop November 3-4, 2003 Organizer: Tamraparni Dasu, AT&T Labs - Research.
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
Computing in Atmospheric Sciences Workshop: 2003 Challenges of Cyberinfrastructure Alan Blatecky Executive Director San Diego Supercomputer Center.
A Research Agenda for Accelerating Adoption of Emerging Technologies in Complex Edge-to-Enterprise Systems Jay Ramanathan Rajiv Ramnath Co-Directors,
Data Mining Techniques As Tools for Analysis of Customer Behavior
The U.S. Federal Budget in Science and Technology Kei Koizumi April 14, 2008 for the International Seminar on Policies of Science, Technology and Innovation.
Ch. 1. The Third ICT Wave The Third ICT Wave.
1 10 THE INTERNET AND THE NEW INFORMATION TECHNOLOGY INFRASTRUCTURE.
Mainlining Data Mining: Jim Gray Microsoft Panel talk at ICDE2000 San Diego, 2 Mar 2000.
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Web Analytics Unit 4-1(2005 Fall) Managing the Digital Enterprise By Professor Michael Rappa.
CSL Global Infotech Prof Chong Tow Chong Executive Director Science & Engineering Research Council Agency for Science, Technology and Research (A*STAR)
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
WG4 SUMMARY DATA WG4 can be summarized in one word: Management of data: Use of data How do we keep track of, exchange and manage all the data that is generated.
DOE 2000, March 8, 1999 The IT 2 Initiative and NSF Stephen Elbert program director NSF/CISE/ACIR/PACI.
Where are the Academic Jobs ? Interactive Exploration of Job Advertisements in Geospatial and Topical Space Angela M. Zoss 1, Michael Conover 2 and Katy.
Data Mining By Dave Maung.
Chapter 1 Communication Networks and Services Network Architecture and Services.
Presented by Document Clustering on Supercomputers Yu (Cathy) Jiao, Ph.D. Applied Software Engineering Research Group Computational Sciences and Engineering.
Integrating Upward Supporting managers and executives.
The MSR-UR Curriculum Repository Tom Healy Lead Program Manager Microsoft Research University Relations.
1.less than 3 million. 2.less than 10 million. 3.over 23 million. 4.over 100 million. 5.Not sure In the U.S., the number of managers that rely on Information.
+ Big Data IST210 Class Lecture. + Big Data Summary by EMC Corporation ( More videos that.
Principles of Information Systems, Sixth Edition 1 The Internet, Intranets, and Extranets Chapter 7.
Marv Adams Chief Information Officer November 29, 2001.
Computational Science & Engineering meeting national needs Steven F. Ashby SIAG-CSE Chair March 24, 2003.
1 Melanie Alexander. Agenda Define Big Data Trends Business Value Challenges What to consider Supplier Negotiation Contract Negotiation Summary 2.
Yan Chen Dept. of Electrical Engineering and Computer Science Northwestern University Spring Review 2008 Award # : FA Intrusion Detection.
1 e-Arts and Humanities Scoping an e-Science Agenda Sheila Anderson Arts and Humanities Data Service Arts and Humanities e-Science Support Centre King’s.
© 2002 ConnecTerra, Inc. ConnecTerra Confidential/Proprietary.
Introduction Web analysis includes the study of users’ behavior on the web Traffic analysis – Usage analysis Behavior at particular website or across.
Revision Unit 1 – The Online World Online Services Online Documents Online Communication Cloud Computing The Internet Internet Infrastructure Internet.
How to Make Cyber Threat Intelligence Actionable
Why Should You Apply to Graduate School? Masters Degree
Department of Computer and Information Sciences
Context-aware data access models (Declaration of Intent Draft)
Insurance Fraud Analytics in the Cloud with Saama and Microsoft Azure
Introduction C.Eng 714 Spring 2010.
Earthquakes: Some staggering facts
Data Warehousing and Data Mining
Big Data.
Enabling ML Based Research
Identifying Slow HTTP DoS/DDoS Attacks against Web Servers DEPARTMENT ANDDepartment of Computer Science & Information SPECIALIZATIONTechnology, University.
Presentation transcript:

Massive Data Analysis Lab (MassDAL) S. Muthukrishnan CS Dept

MassDAL Agenda: Gather, manage and process massive data logs---- Web, IP/wireless traffic data, location trajectories of objects, sensor readings of physical world. Key Challenges: –Scale: Beyond the traditional “human” scale. Eg., IP data at a single router interface for an hour exceeds total yearly worldwide credit card transactions! –Data Collection: probes/sensors with associated data quality and communication problems. Need breakthroughs in Mathematics, Algorithms, Systems and Engineering, to meet these challenges. Potential: Major impact in Homeland Security, Telecom, Transportation and Society-at-large.

State of MassDAL Mathematics and Computer Science. –Algorithmic tools for embedding vectors, strings, trees and other objects for “compact” representation. –Algorithmic tools for analyzing data summaries for heavy hitters, deviants, clustering, decision trees, etc. –Invited talks at ACM, SIAM, European conferences in Algorithms, Databases, Statistics, and Data Mining on novel models and algorithms. –Over dozen research papers in last 2 years on experience with massive data analysis. –Supported by NSF grants. Partner: MIT, DIMACS.

State of MassDAL Science –Developing wearable sensors for tracking location of objects as well as “interactions” between objects. Measuring behavioral data. –Current partner: Telcordia. Their initial investment: $300k/3 months (est). Potential parter in works: Los Alamos National Lab. –Potential: Analysis of social networks for Epidemiology and Homeland Security, and health industry.

State of MassDAL Engineering. –Consulting in analysis of wireless network logs. AT&T Wireless, 3 rd largest in US, 20 Million customers. Terabytes/month. Fully operational, telco- grade! –Incorporated novel algorithms in operational IP network data analysis tools. Partner: Gigascope. –Developed principled approach to data cleaning and data quality monitoring for operational IP network. Partner: PACMAN. –Developed new burst-detection algorithms for text streams. Partner: DIMACS, Monitoring message streams.

Future See

Future of MassDAL Research: Need breakthrough research in mathematics, systems, databases, algorithms, sensor networking. Expand data domains. –Potential partners: Google, NJ auto insurance fraud data, USPTO patent data, AWS location trajectories, etc. Build state-of-art facility at Rutgers. –Secure, 24X7, data hosting and analysis infrastructure capable of gathering and processing petabytes of data/month across domains, data sources, etc. Unique in the world! Potential. –Every wireless, telecom, internet service provider is looking to farm out this crucial piece of their operations. Estimated market for these services: 100’s of millions in US $ per year. Crucial for NJ State. Interest from multiple VCs now.