EXPLORING PROCESS OF DOING DATA SCIENCE VIA AN ETHNOGRAPHIC STUDY OF A MEDIA ADVERTISING COMPANY J.SALTZ, I.SHAMSHURIN 2015 IEEE INTERNATIONAL CONFERENCE.

Slides:



Advertisements
Similar presentations
Gakava L Roche Products Ltd., Welwyn, UK
Advertisements

Performance Assessment
Webquests I – an exploration Sue Spence November 2002.
Re-defining Localization at Yahoo! as a corporate strategic growth driver Salvo Giammarresi, Ph.D. Senior Director of Localization
Systems Analysis and Design in a Changing World
Chapter 8: Evaluating Alternatives for Requirements, Environment, and Implementation.
© 2005 by Prentice Hall Appendix 2 Automated Tools for Systems Development Modern Systems Analysis and Design Fourth Edition Jeffrey A. Hoffer Joey F.
Text Mining, Text Analytics and Business Intelligence November 28, 2007.
Systems Analysis and Design 9th Edition
Chapter 2.
Object-Oriented Analysis and Design
12 C H A P T E R Systems Investigation and Analysis and Analysis.
Fundamentals of Information Systems, Second Edition
Creating Architectural Descriptions. Outline Standardizing architectural descriptions: The IEEE has published, “Recommended Practice for Architectural.
Systems Analysis and Design in a Changing World, Fourth Edition
Analyzing the Business Case
8 Systems Analysis and Design in a Changing World, Fifth Edition.
Chapter 4: Beginning the Analysis: Investigating System Requirements
Dafne González Universidad Simón Bolívar - TISLID'10 -
User Experience Design Goes Agile in Lean Transformation – A Case Study (2012 Agile Conference) Minna Isomursu, Andrey Sirotkin (VTT Technical Research.
Lean Supply Chain Action Learning Program September 2007.
© 2005 by Prentice Hall Appendix 2 Automated Tools for Systems Development Modern Systems Analysis and Design Fourth Edition Jeffrey A. Hoffer Joey F.
IAEA International Atomic Energy Agency The IAEA Safety Culture Assessment Methodology.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Chapter 4: Beginning the Analysis: Investigating System Requirements
How Halton ICT Business Services climbed the management Ziggurat Mike Horsley Lead Analyst, ICT services, Halton BC.
© 2014 The Regents of the University of Michigan. This work is licensed under the Creative Commons Attribution 4.0 Unported License. To view a copy of.
Nursing Science and the Foundation of Knowledge
SDLC: System Development Life Cycle Dr. Bilal IS 582 Spring 2006.
Appendix 2 Automated Tools for Systems Development © 2006 ITT Educational Services Inc. SE350 System Analysis for Software Engineers: Unit 2 Slide 1.
Introduction to SDLC: System Development Life Cycle Dr. Dania Bilal IS 582 Spring 2009.
1 Identifying and selecting system Development project Chapter-4.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 6 Slide 1 Chapter 6 Requirements Engineering Process.
Advanced Topics in Requirement Engineering. Requirements Elicitation Elicit means to gather, acquire, extract, and obtain, etc. Requirements elicitation.
Human Resource Management Lecture 27 MGT 350. Last Lecture What is change. why do we require change. You have to be comfortable with the change before.
IT systems in business Presented by: Damian Constantin University of Pitesti,Romania.
Manpower Planning.
SDLC: System Development Life Cycle Dr. Dania Bilal IS 582 Spring 2007.
Requirements Engineering Requirements Elicitation Process Lecture-8.
1 Knowledge & Knowledge Management “Knowledge is power” to “Sharing K is power” Yaseen Hayajneh, PhD.
Organizational Structure and Controls Organizational structure specifies: –The firm’s formal reporting relationships, procedures, controls, and authority.
ATL’s in the Personal Project
Software Engineering Prof. Ing. Ivo Vondrak, CSc. Dept. of Computer Science Technical University of Ostrava
August 10, 2004 “Best in Class” Leadership Coaching Program at CSAA.
Fundamentals of Information Systems, Second Edition 1 Systems Development.
Teaching Systems Analysis and Design in a Practical Way: A Collaborative Effort Between Computer Science and Business School by Ken Surendran-CS Chellappa.
Writing Software Documentation A Task-Oriented Approach Thomas T. Barker Chapter 5: Analyzing Your Users Summary Cornelius Farrell Emily Werschay February.
Strategic Research. 6-2 Chapter Outline I.Chapter Key Points II.Research: The Quest for Intelligence and Insight III.The Uses of Research IV.Research.
Deepening Our Understanding of Communities of Practice in Large-Scale Agile Development 凌杰甫.
Theme 2: Data & Models One of the central processes of science is the interplay between models and data Data informs model generation and selection Models.
Three Critical Matters in Big Data Projects for e- Science Kerk F. Kee, Ph.D. Assistant Professor, Chapman University Orange, California
Theories of Agile, Fails of Security Daniel Liber CyberArk.
Freedom to think: The Science of Data Dr Quentin Williams.
Requirements Engineering Processes. Syllabus l Definition of Requirement engineering process (REP) l Phases of Requirements Engineering Process: Requirements.
1. October 25, 2011 Louis Everett & John Yu Division of Undergraduate Education National Science Foundation October 26, 2011 Don Millard & John Yu Division.
1 Systems Analysis & Design 7 th Edition Chapter 2.
System A system is a set of elements and relationships which are different from relationships of the set or its elements to other elements or sets.
Media Design Research Media Design Research focus area aims at building a productive connection between the research and the design practice in the field.
Systems Analysis & Design 7 th Edition Chapter 2.
Devising Assessment Tasks PGCE CS IT. Objectives To consider how to plan for assessment To consider progression To think about collaborative learning.
4 Chapter 4: Beginning the Analysis: Investigating System Requirements Systems Analysis and Design in a Changing World, 3 rd Edition.
Unit 6 Application Design.
Systems Analysis and Design in a Changing World, Fifth Edition
Appendix 2 Automated Tools for Systems Development
Attention CFOs How to tighten your belt and still survive May 18, 2017.
SDLC: System Development Life Cycle
Business System Development
FORMAL SYSTEM DEVELOPMENT METHODOLOGIES
Data Warehousing and Data Mining
Leadership for Safety Through the Case Method
Presentation transcript:

EXPLORING PROCESS OF DOING DATA SCIENCE VIA AN ETHNOGRAPHIC STUDY OF A MEDIA ADVERTISING COMPANY J.SALTZ, I.SHAMSHURIN 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA 29 OCTOBER, SANTA CLARA, CA, USA

OUTLINE Introduction Related Work Data Collection Findings Observed Issues Possible Improvements SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY2

INTRODUCTION Data science teams do not have an explicit data science team-based process methodology: What steps should be done first? How long each phase of a project should take? Which people with what skills should be involved in the project? SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY3

RELATED WORK Data is increasingly being viewed as a strategic resource for the organization [Wade, 2004]. Bid Data can enable new and improved business models that have not been feasible in the past [Tiefenbacher, 2015] Lack of focus on the process teams should use to actually do a data science project [Saltz, 2015] Teams doing data analysis and data science work in an ad hoc fashion, using trial and error to identify the right tools [Bhardwaj, 2015] Data science as a step-by-step process: o Acquisition, information extraction and cleaning, data integration, modeling, analysis, interpretation and deployment [Jagadish, 2014] o Preparation, Analysis, Reflection and Dissemination [Guo, 2013] Understanding of what might be an appropriate data science process methodology is to document case studies of how teams are actually doing data science, especially within a corporate context [Saltz, 2015] SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY4

BACKGROUND AND STAKEHOLDER ANALYSIS One of the researchers was embedded within the data science team A global media advertising software company headquartered in New York City The company had a total of 100 people distributed globally SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY5

RESEARCH QUESTIONS RQ1. What is the current methodology that they follow? RQ2. What are some possible ways to improve the current methodology, i.e. to make the projects more efficient in time and cost? SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY6

DATA COLLECTION Phase I: information was collected prior to one of the researchers being embedded within the data science team Phase II: during a 9 week period, one of the researchers participated as part of the data science team, and in addition to collecting data and observing how the team functioned, actually helped the team with various tasks Phase III: interview with the VP of Data Science SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY7

DATA SCIENCE TEAM 2 Data Scientists, including VP of Data Science 3 Data Operations people 3 Software Developers 1 Data Engineer The team was divided across multiple locations SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY8

FINDINGS: TYPES OF PROJECTS Routine Projects o on a regular basis o more external o data transformation and pre-processing o performed by data group o deadlines Exploratory Projects o research oriented o no standard methodology is used o performed by VP of data science and embedded researcher o duration of these projects can vary from a week to a year o no official deadlines SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY9

FINDINGS: ROLES Data Science o Explores the data and generates insight from the data, including tasks such as data mining and data visualization. This team included the data scientist who was the embedded observer. Data Operations o Getting data from data providers, transformation and preparation of the data for analysis (i.e., for use by the data science team) Software Development o Develop software tools to help the data science team perform data analysis Data Engineering o Supports and improves the existing system and participates in some data science projects SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY10

FINDINGS: HIGH LEVEL PROCESS DESCRIPTION SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY11

PROCESS FLOW DESCRIPTION: PREPARATION SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY12

PROCESS FLOW DESCRIPTION: ANALYSIS SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY13

PROCESS FLOW DESCRIPTION: DISSEMINATION SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY14

OBSERVED ISSUES / CHALLENGES No specific deadlines for the whole data science project or for any individual phase of the project Project organization and planning Whenever the data science team needs to have a task completed, they send a request to the developers, but the developers typically respond the following day Developers are involved in several projects at the same time SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY15

POSSIBLE PROCESS IMPROVEMENTS Documenting the current process Better structuring developer interactions Imposing deadlines Process automation Better preparation SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY16

FEEDBACKS ON SUGGESTION SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY17 Suggestion Make sense?Can Implement?Short or Long Term? Documenting the current process 44short Better structuring developers interactions 32long Imposing deadlines 43long Process Automation 43long Better Preparation 33short

EFFECTIVE PRACTICES OBSERVED Pre-processing Frequent dialog with senior management: Engaging Senior Management Using a defined SDLC with the software team SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY18

CONCLUSION The data science team was not thinking about the process of doing the projects Suggestions received positive feedbacks Studying additional organizations might be helpful to examine if the suggestions and feedback from this study are related to the current size, organizational structure or domain of the company, and if there are any patterns observed across the organizations doing data science projects SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY19

THANK YOU SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY20