Introduction to Data Science

Slides:



Advertisements
Similar presentations
CRISP-DM (required for cw, useful for any project…)
Advertisements

Note: Lists provided by the Conference Board of Canada
INTRODUCTION TO MODELING
DAMA-NCR Tuesday, November 13, 2001 Laura Squier Technical Consultant What is Data Mining?
/faculteit technologie management Introduction to Data Mining a.j.m.m. (ton) weijters (slides are partially based on an introduction of Gregory Piatetsky-Shapiro)
Data Mining.
SLIDE 1IS 257 – Fall 2008 Data Mining and the Weka Toolkit University of California, Berkeley School of Information IS 257: Database Management.
OPERATIONS and LOGISTICS MANAGEMENT
Process, Key Success Factors, Illustrations
Continuation From Chapter From Chapter 1
More on Data Mining KDnuggets Datanami ACM SIGKDD
Unit 2: Engineering Design Process
The CRISP-DM Process Model
Chapter 6 : Software Metrics
1 Research Groups : KEEL: A Software Tool to Assess Evolutionary Algorithms for Data Mining Problems SCI 2 SMetrology and Models Intelligent.
“Faster aircraft, bolder video games, better medicines— technology moves forward every day. And tech-savvy workers make those advances happen. Without.
Data Mining Process A manifestation of best practices A systematic way to conduct DM projects Different groups has different versions Most common standard.
Demystifying the Data Scientist Dan McClary, Ph.D. Big Data Product Management Oracle Note: The speaker notes for this slide include detailed instructions.
STANDARDS :CODY AND GARY:CODYGARY NILL TO ARTHURARTHUR Home Animatronics Architectural Renovation Biotechnology Design Career Preparation Chapter Team.
1 Software Engineering: A Practitioner’s Approach, 6/e Chapter 15a: Product Metrics for Software Software Engineering: A Practitioner’s Approach, 6/e Chapter.
Impact of the New ASA Undergraduate Curriculum Guidelines on the Hiring of Future Undergraduates Robert Vierkant Mayo Clinic, Rochester, MN.
Ch 8. What Makes a Great Analytic Professional? Taming The Big Data Tidal Wave 7 June 2012 SNU IDB Lab. Jee-bum Park.
CIS 4251 / CIS 5930 SOFTWARE DEVELOPMENT Fall 1999 Sept. 1, 1999 Marge Holtsinger.
1 Seattle University Master’s of Science in Business Analytics Key skills, learning outcomes, and a sample of jobs to apply for, or aim to qualify for,
Data Science Interview Questions 1.What do you mean by word Data Science? Data Science is the extraction of knowledge from large.
Wake Technical Community College “Wake Tech” Largest community college in NC 70,000+ students a year attending.
What Business Analytics Can Do For You!
Computer Information Technology
Foundations of Technology The Engineering Design Process
SNS COLLEGE OF TECHNOLOGY
Chapter 1: Introduction to Systems Analysis and Design
Statistical education in times of Big Data
IB Assessments CRITERION!!!.
Systems Implementation,
Risk Management for Technology Projects
Analytical Business Consultants
System Requirements Specification
School of Information Management Nanjing University China
Introduction to Software Engineering
THE ENTERPRISE ANALYTICAL JOURNEY
Data Science and its role in Big Data analytics
Department of Computer Science The University of Texas at Dallas
Engineering Overview Introduction to Engineering Design
CLINICAL INFORMATION SYSTEM
MBML_Efficient Testing Methodology for Machine Learning
A Unifying View on Instance Selection
Data Warehousing and Data Mining
OMIS 665, Big Data Analytics
 Deep Analytical Talent  Data Savvy Professionals  Technology and Data Enablers.
Assessment and Accreditation
Engineering Overview Introduction to Engineering Design
Richelle Serrano Clindata Insight September 4-8, 2018
Introduction To software engineering
INNOvation in TRAINING BUSINESS ANALYSTS HAO HElEN Zhang UniVERSITY of ARIZONA
Introduction to Systems Analysis and Design Stefano Moshi Memorial University College System Analysis & Design BIT
Foundations of Technology The Engineering Design Process
Chapter 1: Introduction to Systems Analysis and Design
Fields of Engineering Principles of EngineeringTM
Foundations of Technology The Engineering Design Process
Statistical education in times of Big Data
Information Systems in Organizations 1.1 Introduction to MIS
Advanced Design Applications The Engineering Design Process
Engineering Overview.
Engineering Overview.
Business concentration, minor and certificate programs
Engineering Overview.
CRISP Process Stephen Wyrick.
Chapter 1: Introduction to Systems Analysis and Design
Bridge the Gap Between Statistician and Data Analysis Professionals
Presentation transcript:

Introduction to Data Science Dr. Kalpakis, Fall 2017

What is Data Science? Data scientists, "The Sexiest Job of the 21st Century" (Davenport and Patil, Harvard Business Review, 2012) Much of the data science explosion is coming from the tech-world What does Data Science mean? Is it the science of Big Data? What is Big Data anyway? Who does Data Science and where? What existed before Data Science came along? Is it simply a rebranding of statistics and machine learning? “Anything that has to call itself a science isn’t.” Hype increases noise-to-signal ratio in perceiving reality and makes it harder to focus on the gems Why and how to hire a data scientist? http://goo.gl/F4K4hE

Why now? massive amounts of data about many aspects of our lives, both online and offline activities, real- time as well as past-time Datafication=“taking all aspects of life and turning them into data” “Once we datafy things, we can transform their purpose and turn the information into new forms of value.” abundance of inexpensive computing power, communication capacity proliferation of small footprint low-power sensors (IoT) feedback loop between our behavior, environment, and data products

Data Science take I “Data science, as it’s practiced, is a blend of Red-Bull-fueled hacking and espresso-inspired statistics. But data science is not merely hacking—because when hackers finish debugging their Bash one-liners and Pig scripts, few of them care about non-Euclidean distance metrics. And data science is not merely statistics, because when statisticians finish theorizing the perfect model, few could read a tab-delimited file into R if their job depended on it. Data science is the civil engineering of data. Its acolytes possess a practical knowledge of tools and materials, coupled with a theoretical understanding of what’s possible.” Mike Driscoll (CEO of Metamarket) Drew Conway’s Venn diagram of data science Many posers “It’s not enough to just know how to run a black box algorithm. You actually need to know how and why it works, so that when it doesn’t work, you can adjust. “ Cathy O’Neil

Data Science team individual data scientist profiles are merged to make a Data science team team profile should align with the profile of the data problems to tackle

Data science: skills and actors Clustering and visualization of data science subfields based on a survey of data science practitioners (Analyzing the Analyzers by Harlan Harris, Sean Murphy, and Marck Vaisman, 2012) Data Businesspeople are the product and profit-focused data scientists. They’re leaders, managers, and entrepreneurs, but with a technical bent. A common educational path is an engineering degree paired with an MBA. Data Creatives are eclectic jacks-of-all-trades, able to work with a broad range of data and tools. They may think of themselves as artists or hackers, and excel at visualization and open source technologies. Data Developers are focused on writing software to do analytic, statistical, and machine learning tasks, often in production environments. They often have computer science degrees, and often work with so-called “big data”. Data Researchers apply their scientific training, and the tools and techniques they learned in academia, to organizational data. They may have PhDs, and their creative applications of mathematical tools yields valuable insights and products.

Types of Data Scientists Machine Learning Scientist Statistician Software Programming Analyst Data Engineer Actuarial Scientist Business Analytic Practitioner Quality Analyst Spatial Data Scientist Mathematician Digital Analytic Consultant

What do data scientists do? “define what data science is by what data scientists get paid to do” (O’Neil and Schutt) In academia, a data scientist is trained in some discipline, works with large amounts of data, grapples with computational problems posed by the structure, size, messiness, and the complexity and nature of the data, and solves real-world problems. In industry, a data scientist knows how to extract meaning from and interpret data, which requires both tools and methods from statistics and machine learning, as well as being human. spends a lots of effort in collecting, cleaning, and munging data utilizing statistics and software engineering skills. performs exploratory data analysis, finds patterns, builds models, and algorithms. communicates the findings in clear language and with data visualizations so that even if her/his colleagues unfamiliar with the data can understand the implications

Data Science take II “Data science, also known as data-driven science, is an interdisciplinary field about scientific methods, processes, and systems to extract knowledge or insights from data in various forms, either structured or unstructured, similar to data mining.” (Wikipedia) The 4th paradigm of science (theoretical, empirical, computational, and data-driven) (Jim Gray)

Data Science Process CRISP-DM (Cross Industry Standard Process for Data Mining) Data science process flowchart (O’Neil and Schutt)

CRISP-DM Phases, tasks, outputs Business Understanding Data Preparation Modeling Evaluation Deployment Determine Business Objectives Background Business Objectives Business Success Criteria Collect Initial Data Initial Data Collection Report Data Set Data Set Description Select Modeling Technique Modeling Technique Modeling Assumptions Evaluate Results Assessment of Data Mining Results w.r.t. Business Success Criteria Approved Models Plan Deployment Deployment Plan Situation Assessment Inventory of Resources Requirements,Assumptions, and Constraints Risks and Contingencies Terminology Costs and Benefits Describe Data Data Description Report Select Data Rationale for Inclusion / Exclusion Generate Test Design Test Design Review Process Review of Process Plan Monitoring and Maintenance Monitoring & Maintenance Plan Determine Data Mining Goal Data Mining Goals Data Mining Success Criteria Explore Data Data Exploration Report Clean Data Data Cleaning Report Build Model Parameter Settings Models Model Description Determine Next Steps List of Possible Actions Decision Produce Final Report Final Report Final Presentation Produce Project Plan Project Plan Initial Asessment of Tools and Techniques Verify Data Quality Data Quality Report Construct Data Derived Attributes Generated Records Assess Model Model Assessment Revised Parameter Settings Review Project Experience Documentation Integrate Data Merged Data Format Data Reformatted Data