Download presentation
Presentation is loading. Please wait.
Published byWillis Davis Modified over 9 years ago
1
§IV 3:30 - 4:30PM: Data - The various sources, qualities, and metrics: What do you need to know before you can take it to the model? Copyright 2014 Institute for Marketing Productivity - All rights reserved
2
Photos for face recognition algorithms Recorded telephone calls for NSA snooping Anything you or anyone else posts on Facebook
3
Formal analyses are only as good as the data you create, collect, maintain, and pass-forward. It is not “Garbage in, Nugget of Gold Out.”It is as they say, “Garbage in, Garbage Out.”
4
Various sources of corruption and inconsistent reporting standards drastically jeopardize the accuracy of any analysis. Lead to suboptimal managerial decisions. Delay actions. Are expensive to address and remedy.
5
“Clean data” best practices require flat files and clean fields. An example of a field is name or income level. Bob Smith 75,000
6
CSV stands for comma separated values and the delimiter is a comma. These are easy to open in excel. Here is an example of a csv file opened in a notepad. Flat Files – usually txt or csv files. Are files that contain multiple fields per row where each field is separated by a delimiter. There is no other structure to the file. A delimiter is any character not in the data and is often a comma or tab. Flat Files:
7
Three types of data: Numeric: - These are numbers. - For example, GRP, Sales, Impressions, Spend, etc… - It is important to note the magnitude of the values. Alphanumeric: - These are any symbols, usually a mix of letters and numbers. - For example, Cross-sectional identifier, in the case above, scientific notation. Dates (Time Periods) are a special type of Alphanumeric: - These indicate a specific date or time stamp. - For example, which day, month, quarter, etc… - These need to be in the appropriate format for the applicable software. Really Big 5.97219 × 10 8 Really Small 1.660538921×10 − 4
8
Clean Fields: Clean fields are where the values are authentic to the collection and recorded according to field type. That is the value (inputs in the field) is not corrupted by processing, human error, or recording error. Common types of corruption: - Modifying the data by hand. - Reading in the wrong type of file for the software. This often happens when opening files formatted for one type of software in another type of software, e.g. opening a Stata file in Matlab. - Incomplete downloads. - Specifying incorrect delimiter / multiple delimiters. - Not knowing the field type and incorrectly recording the value. If the fields are not clean, delayed actionsand costly to fix:
9
Data code book: Whenever you create, collect, maintain, and pass-forward data, you want to have a data code book and a field code book: A good data code book will answer the following: - Who collected the data, why, when, how, etc…? - What is the data population? - What is the structure of the data: cross sectional, time series, panel, balanced, or otherwise? - Number of observations, time period over which observations were collected, etc… - Technical information on the data storage. What type of file, file formatting, etc…
10
Field code book: A good field code book will tell you: - Field type (numeric, alphanumeric, date, etc…). - Typical values and range of field. - Field description and naming convention. - Code to distinguish missing values from non-response. Non-traditional data can likewise be described in code books. For example, you can build code books for a collection of photos.
11
1. Flat files 2. Clean Fields 3. Data code book and field code book When you buy data, assemble it in house, or send it out: you need to hold yourself and your vendors to these standards!!
12
If not... Lead to suboptimal managerial decisions. Delay actions. Are expensive to address and remedy.
13
Not Workable Files: Wide - Data
14
Not Workable Files: Just a Pivot Table or Chart
15
Incomplete Files: What was sent: What is needed:
16
Don’t Modify Raw Data by Hand: -For large datasets, this is often not an option. -You risk corrupting the data. -You have no record of what you did. Data modification needs to be formally scripted in your importing and parsing software.
17
Now onto Excel for Live Examples
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.