Data Science Research in Big Data Era

Slides:



Advertisements
Similar presentations
Prof. Carolina Ruiz Department of Computer Science Worcester Polytechnic Institute INTRODUCTION TO KNOWLEDGE DISCOVERY IN DATABASES AND DATA MINING.
Advertisements

Nokia Technology Institute Natural Partner for Innovation.
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, Introduction to IR Research ChengXiang Zhai Department of Computer.
MS DB Proposal Scott Canaan B. Thomas Golisano College of Computing & Information Sciences.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Advanced Topics COMP163: Database Management Systems University of the Pacific December 9, 2008.
An Overview of Our Course:
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
1 Data Mining Books: 1.Data Mining, 1996 Pieter Adriaans and Dolf Zantinge Addison-Wesley 2.Discovering Data Mining, 1997 From Concept to Implementation.
Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University.
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, Prepare Yourself for IR Research ChengXiang Zhai Department of Computer.
CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY.
Charles Tappert Seidenberg School of CSIS, Pace University
How to get the most out of the survey task + suggested survey topics for CS512 Presented by Nikita Spirin.
Automatically Extracting Data Records from Web Pages Presenter: Dheerendranath Mundluru
Overview of CS Class Jiawei Han Department of Computer Science
A Model for Fast Web Mining Prototyping Nivio Ziviani UFMG – Brazil Álvaro Pereir a Ricardo Baeza-Yates Jesus Bisbal UPF – Spain.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, March 29, 2000.
Tallahassee, Florida, Sept., 2015 Research in Data Sciences Peixiang Zhao Department of Computer Science Florida State University Introduction.
WEB MINING. In recent years the growth of the World Wide Web exceeded all expectations. Today there are several billions of HTML documents, pictures and.
Most of contents are provided by the website Introduction TJTSD66: Advanced Topics in Social Media Dr.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
CSCE 5073 Section 001: Data Mining Spring Overview Class hour 12:30 – 1:45pm, Tuesday & Thur, JBHT 239 Office hour 2:00 – 4:00pm, Tuesday & Thur,
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
CS & CS ST: Probabilistic Data Management Fall 2016 Xiang Lian Kent State University Kent, OH
Synopsis Introduction to Data Science
CSCI5570 Large Scale Data Processing Systems
Data Analytics 1 - THE HISTORY AND CONCEPTS OF DATA ANALYTICS
Term Project Proposal By J. H. Wang Apr. 7, 2017.
Data Mining – Intro.
Outline Introduction State-of-the-art solutions
Data Science Research in Big Data Era
MIS2502: Data Analytics Advanced Analytics - Introduction
ITCS 6157/8157: Visual Database
Proposal for Term Project
Introduction to IR Research
Modern Data Management
Web Mining Ref:
Three Goals to Accomplish by Writing Papers
中国计算机学会学科前沿讲习班:信息检索 Course Overview
Course Summary (Lecture for CS410 Intro Text Info Systems)
Jiawei Han Department of Computer Science
CS & CS Probabilistic Data Management
CS7280: Special Topics in Data Mining Information/Social Networks
机器感知与智能教育部重点实验室学术报告 Key Laboratory of Machine Perception (Minister of Education) Peking University Scalable, Robust and Integrative Algorithms for Analyzing.
Overview of IR Research
CS510 (Fall 2018) Advanced Topics in Information Retrieval
CSE591: Data Mining by H. Liu
Data Warehousing and Data Mining
Luke Do, Jessica Olmedo, Arely Romero, and Vianca Santana
OMIS 665, Big Data Analytics
Data Mining: Concepts and Techniques
CS & CS ST: Probabilistic Data Management
Overview of Machine Learning
WIS Strategy – WIS 2.0 Submitted by: Matteo Dell’Acqua(CBS) (Doc 5b)
Big Data Young Lee BUS 550.
Data Mining: Concepts and Techniques
INNOvation in TRAINING BUSINESS ANALYSTS HAO HElEN Zhang UniVERSITY of ARIZONA
Course Summary ChengXiang “Cheng” Zhai Department of Computer Science
Web Mining Department of Computer Science and Engg.
Course Introduction CSC 576: Data Mining.
Data Mining: Concepts and Techniques
Welcome! Knowledge Discovery and Data Mining
CSCE 4143 Section 001: Data Mining Spring 2019.
CSE591: Data Mining by H. Liu
Promising “Newer” Technologies to Cope with the
Computer Science Dr Hwang Chair, Computer Science Department
Presentation transcript:

Data Science Research in Big Data Era Introduction to Research Seminar, 2018 Peixiang Zhao Department of Computer Science Florida State University zhao@cs.fsu.edu

Synopsis Introduction to Data Sciences How to prepare yourself for (data) research My research portfolio Conclusions

Who am I? Peixiang Zhao Associate Professor at CS @ FSU Homepage: http://www.cs.fsu.edu/~zhao/ Office: 262 Love Building, FSU Ph.D.: University of Illinois at Urbana-Champaign, Aug. 2012 Research Interest: Database, data mining, data-intensive computation and analytics, and Graph/Information Network Analysis!

Who am I? I am hiring highly-motivated Ph.D. students! Courses I am offering COP4710: Introductory database systems What are databases and how to use databases A programming project on Web-based DB programming COP 5725: Advanced databases systems Database internals and advanced topics, such as MapReduce, data mining and Web search A research/implementation project I am hiring highly-motivated Ph.D. students!

Introduction What are data sciences? The sub-area of computer science dealing with the acquisition, management, querying and mining data drawn from real-world applications Include, but are not limited to Database systems Data mining Information retrieval Web technologies Network science Big data

Data Sciences Data: Common Tasks: Model: Fully structured or relational, semi-structured, unstructured, schema-less, graphical, …… Format: textual, numeric, categorical, sequential, graph-structured, audio/video, time-series, streaming data Scale: from megabytes to zetabytes Quality, resolution, privacy, usability …… Common Tasks: Data acquisition, storage, maintenance and integration Knowledge discovery, mining and machine learning Indexing , querying and ranking …… Information networks have formed a critical component of modern information infrastructure

Data Sciences Skillsets and Requirement Your Bright Future Motivation and passion to work on the state-of-the-art problems Strong mathematical reasoning and algorithm design abilities Good programming skills Your Bright Future DBAs at Goldman-Sachs or D. E. Shaw Data scientists at Google, Facebook, Twitter or Foursquare Data engineers at Oracle, IBM or Microsoft Researchers at MSR or IBM Research Professors showing up in SIGMOD, KDD or SIGIR

How to prepare yourself for (data) research What is research? Discover new knowledge Seek answers to non-trivial questions Research Process Identification of the topic (e.g., Web search) Hypothesis formulation (e.g., algorithm X is better than Y=state-of-the-art) Experiment design (measures, data, etc) (e.g., retrieval accuracy on a sample of web data) Test hypothesis (e.g., compare X and Y on the data) Draw conclusions and repeat the cycle of hypothesis formulation and testing if necessary (e.g., Y is better only for some queries, now what?)

What is Good Research? Solid work: A clear hypothesis (research question) with conclusive result (either positive or negative) Clearly adds to our knowledge base (what can we learn from this work?) Implications: a solid, focused contribution is often better than a non-conclusive broad exploration High impact = high-importance-of-problem * high-quality-of-solution Open up an important problem Close a problem with the best solution Major milestones in between

Challenge-Impact Analysis Level of Challenges High impact High risk (hard) Good long-term research problems Difficult basic research Problems, but questionable impact High impact Low risk (easy) Good short-term research problems Low impact Low risk Bad research problems (May not be publishable) Good applications Not interesting for research Unknown “entry point” problems Known Impact/Usefulness

How to Do Research in Data Sciences? Curiosity: allow you to ask questions Critical thinking: allow you to challenge assumptions Make sense of what you have read/heard Learning: take you to the frontier of knowledge Start with textbooks and courses Read papers in top-notch conferences/journals Implement your prototype ideas Persistence: so that you don’t give up Respect data and truth: ensure your research is solid Don’t throw away negative results Communication: publish and present your work

Tuning the Problem Unknown Known Level of Challenges Make an easy problem harder Increase impact (more general) Make a hard problem easier Unknown Known Impact/Usefulness

Where to Publish? Databases Data Mining Information Retrieval SIGMOD, VLDB, ICDE ACM TODS, VLDB J., IEEE TKDE Data Mining KDD, ICDM, SDM ACM TKDD Information Retrieval SIGIR, CIKM ACM TOIS Web & Applications WWW, WSDM

My Research Theme Modelling, managing, querying, and mining big graph-structured, networked data Social network Brain graph Information networks have formed a critical component of modern information infrastructure IoT WWW Collaboration network Protein network

Key Challenges Real-world graphs and networks are BIG Heterogeneous Web graph: 8.94 billion pages Facebook: 901 million active users and 125 billion friendship relations Heterogeneous Complicated interplay of topologies and multi-dimensional contents Dynamic Facebook U.S. grows 149% in 2009 Dirty Structure/content are noisy, inconsistent, and distorted Volatile and vulnerable

Research Thrusts Managing and querying big networked data Scalable indexing solutions for exact/approximate graph query processing in graph databases and information networks Summarizing big graphs Querying dynamic graph streams Representative Applications Business intelligence Biology and bioinformatics Network evolution

Research Thrusts Mining social/information networks Graph classification, prediction, outlier detection Graph partitioning, clustering, and community detection Credibility/Accountability analysis in social networks Representative Applications Social targeting and viral marketing Recommendation User studies Veracity analysis

Other Research Topics Location-based mining and ranking Text mining Mobile local search, ranking, and recommendation Text mining Classification, clustering, graphical models Mining structural patterns Association analysis on structured patterns Industry-strength systems Hadoop-ML with IBM research Trinity with Microsoft research

Future Research Agenda Foundations and models of Information Networks Model, manage and access multi-genre heterogeneous information networks Querying and mining volatile, noisy and uncertain information networks Cyber-physical information networks Efficient and scalable computation in Information Networks A unified declarative language for graph and network data A distributed graph computational framework for large-scale information networks Knowledge discovery in large Information Networks

Conclusions We are in an information network era! Internet, social networks, collaboration and recommender networks, public health-care networks, technological/biological networks …… Data are pervasive, big, and of great value Research in data sciences is interesting and highly rewarding Follow your heart and don’t give up!

Good Luck! Q & A