Download presentation
Presentation is loading. Please wait.
Published byOliver Richardson Modified over 8 years ago
1
Lecture 10 Aggregating & Pivoting Data in SQL and Excel David E. Rosenberg CEE 6930 – Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482
2
Learning Objectives 1.Differentiate ways to present tabular data 2.Order rows and columns using SQL 3.Cross tabulate data using SQL 4.Normalize data using SQL 5.Dynamically pivot data in Excel 2CEE 6930
3
Presenting tabular data Use formats people can read and understand Make the presentation easy and reproducible Facilitate dynamic interaction with data CEE 6930David Rosenberg 3 Which first 15 variables were measured at sites 1, 2, and 3 in the Little Bear River in 2011 and 2012? SELECT DISTINCT SiteID, DatePart(YYYY,LocalDateTime) AS Year, VariableID FROM DataValues ORDER BY SiteID, Year, Variable
4
Which first 15 variables were measured at sites 1, 2, and 3 in the Little Bear River in 2011 and 2012? CEE 6930David Rosenberg 4 Option 2: Cross Tabulated Option 1: Normalized
5
Grouping and cross tabulating CEE 6930David Rosenberg 5 (Suero, et al, 2012)
6
Steps to build a tabular display CEE 6930David Rosenberg 6 Step SQL Part 1.Find table/query/view that contains the desired data FROM 2.Identify fields that define rows SELECT and ORDER BY 3.Identify fields that define additional columns SELECT 4.Identify field whose values define cross tab columns PIVOT 5.Choose field values to useSELECT, PIVOT Field1Field2Field3Val1Val2Val3 Field1Field2Field3Field 4 Desired data Output
7
Order Rows and Columns in SQL Control by the SELECT and ORDER BY statements Order by site then by year CEE 6930David Rosenberg 7 SELECT DISTINCT SiteID, DatePart(YYYY, LocalDateTime) AS Year, VariableID FROM DataValues ORDER BY SiteID, Year, VariableID
8
Example #1. How do you instead order by year then by site? CEE 6930David Rosenberg 8 Order by Year, SiteID SELECT DISTINCT DatePart(YYYY, LocalDateTime) AS Year, SiteID, VariableID FROM DataValues ORDER BY Year, SiteID, VariableID
9
Cross Tabulating in SQL Use a field’s values as column headers –Pivot the data from rows to columns Steps to implement 1.Determine the field and field values to pivot 2.Choose an aggregate function result to enter in cross-tabbed columns (here SUM) 3.Write the cross tab query with the field values CEE 6930David Rosenberg 9 Field 1 ABC 112 234 Field 2Field 3 1A1 1B2 2A3 2C4 Cross tab
10
Cross Tabulating in SQL CEE 6930David Rosenberg 10 SELECT Field1, …, [Val1], [Val2], …, [Valn] FROM (DataSource) up PIVOT (AggregateFunction(DataField) FOR PivotField IN [Val1], [Val2], …, [Valn]) AS pvt ORDER BY Field1, …
11
Example #2. Cross tab by variable CEE 6930David Rosenberg 11 Which of the first 15 variables were measured at sites 1, 2, and 3 in the Little Bear River in 2011 and 2012? Define rows by sites and years and columns by variables. Step LBR Data Model Element 1.Data sourceDataValues 2.Fields to define rowsSiteID, Year 3.Fields to define additional columns NA 4.Fields whose values define columns VariableID 5.Column values1, 2, 3, 4, …, 15 SiteIDYear123…15 12011 12012 22011 22012
12
Example #2. Cross tab by variable CEE 6930David Rosenberg 12 Which of the first 15 variables were measured at sites 1, 2, and 3 in the Little Bear River in 2011 and 2012? Define rows by sites and years and columns by variables. SELECT SiteID, Year, [1], [2], [3],[4], [5], [6], [7], [8], [9], [10],[11],[12],[13],[14],[15] FROM ( SELECT Distinct SiteID, DatePart(YYYY, LocalDateTime) As Year, VariableID FROM DataValues WHERE (VariableID =2011)) up PIVOT (Count(VariableID) FOR VariableID IN ([1],[2],[3],[4],[5],[6],[7],[8],[9], [10],[11],[12],[13],[14],[15])) AS pvt ORDER BY SiteID, Year
13
Example #3. Cross tab by site CEE 6930David Rosenberg 13 Which Little Bear River sites were monitored in 2010, 2011 and 2012? Use years to define the rows and sites to define the columns. Step LBR Data Model Element 1.Data sourceDataValues 2.Fields to define rowsYear 3.Fields to define additional columns NA 4.Fields whose values define columns SiteID 5.Column values1, 2, 3, 4, …, 16 Year123…1516 2010 2011 2012
14
Example #3. Cross tab by site CEE 6930David Rosenberg 14 Which Little Bear River sites were monitored in 2010, 2011 and 2012? Use years to define the rows and sites to define the columns. SELECT Year, [1], [2], [3],[4], [5], [6], [7], [8], [9], [10],[11],[12],[13],[14],[15], [16] FROM ( SELECT SiteID,DatePart(YYYY, LocalDateTime) As Year,ValueID FROM DataValues WHERE (DatePart(YYYY, LocalDateTime) >=2010)) up PIVOT (Count(ValueID) FOR SiteID IN ([1],[2],[3],[4],[5],[6],[7],[8],[9], [10],[11],[12],[13],[14],[15],[16])) AS pvt ORDER BY Year
15
Example #4. Cross tab by year CEE 6930David Rosenberg 15 Use example #3 but use sites to define the rows and years to define the columns. Step LBR Data Model Element 1.Data source 2.Fields to define rows 3.Fields to define additional columns 4.Fields whose values define columns 5.Column values
16
Example #4. Cross tab by year CEE 6930David Rosenberg 16 Use example #3 but use sites to define the rows and years to define the columns. Step LBR Data Model Element 1.Data sourceDataValues 2.Fields to define rowsSites 3.Fields to define additional columns NA 4.Fields whose values define columns Year 5.Column values2010, 2011, 2012 SiteID201020112012 1 2 3 4 … 16
17
Example #4. Cross tab by year CEE 6930David Rosenberg 17 Use example #3 but use sites to define the rows and years to define the columns. SELECT SiteID, [2010], [2011], [2012] FROM ( SELECT SiteID,DatePart(YYYY, LocalDateTime) As Year,ValueID FROM DataValues WHERE (DatePart(YYYY, LocalDateTime) >=2010)) up PIVOT (Count(ValueID) FOR Year IN ([2010], [2011], [2012])) AS pvt ORDER BY SiteID
18
Normalizing Data in SQL Turn cross tabulated columns into field values –Undo cross tabulations –Combine data records Steps to implement 1.Define a new field to track the cross tabulated column labels (Field 2) 2.Union instances of the cross-tabulated dataset, One instance for each cross- tabulated column (A, B, and C) Use the column label as the value for the new field (Field 2) Assign column values to a second new field (Field 3) CEE 6930David Rosenberg 18 Field 1 ABC 1120 2304 Field 2Field 3 1A1 2A3 1B2 2B0 1C0 2C4 Normalize
19
The SQL Union Command CEE 6930David Rosenberg 19 Append records from additional datasets Field 1Field 2Field 3 1AE 2BF Field 1Field 2Field 3 4DE 5CB Field 1Field 2Field 3 1AE 2BF 4DE 5CB Set 1 + Set 2 = Set 3 SELECT Field1, Field2, Field3 FROM Set1 UNION (SELECT Field1, Field2, Field3 FROM Set2)
20
Normalize with Union CEE 6930David Rosenberg 20 Field 1NewFieldVal 1Cat1A 2 B Field 1NewFieldVal 1Cat2E 2 F Field 1NewFieldVal 1Cat1A 2 B 1Cat2E 2 F Cross Tabbed Source 1 st Cross Tab Column 2 nd Cross Tab Column Normalized Table Field 1Cat1Cat2 1AE 2BF SELECT Field1, ‘Cat1’ AS NewField, Cat1 AS Val FROM Source SELECT Field1, ‘Cat2’ AS NewField, Cat2 AS Val FROM Source
21
Example #5 - Normalize the cross tabulation by year results (example #4) CEE 6930David Rosenberg 21 Select SiteID, 2010 As MyYear, [2010] As NumObs FROM (SQLExample3) up2 UNION (Select SiteID, 2011 As MyYear, [2011] As NumObs FROM (SQLExample3) up2) UNION (Select SiteID, 2012 As MyYear, [2012] As NumObs FROM (SQLExample3) up2) ORDER BY MyYear,SiteID
22
Use Pivot Tables to dynamically pivot data in Excel CEE 6930 David Rosenberg 22
23
The steps 1.Organize and select your data in Excel –First row mush have field headers!! 2.Select Insert tab => Pivot Table 3.Select PivotTable and location to place table 4.Drag and drop fields into filters, rows, columns, and value areas 5.Select values for filters 6.Drag fields to new areas (See also numerous tutorials on the web) CEE 6930David Rosenberg 23
24
The Source Data Counts of records in the Little Bear River ODM data values table grouped by site, variable, data type, quality control level, year, and month Joined to the definitions of the site, variable, quality control level IDs CEE 6930David Rosenberg 24 SELECT SiteCode, VariableName, DataType, QualityControlLevelCode, Definition, Year, Month, RecordCount FROM (SELECT SiteID, VariableID, QualityControlLevelID, DatePart(YYYY,LocalDateTime) AS Year, DatePart(MM,LocalDateTime) AS Month, Count(DataValue)AS RecordCount FROM DataValues WHERE DataValues.SiteID <= 3 GROUP BY SiteID, VariableID, QualityControlLevelID, DatePart(YYYY,LocalDateTime), DatePart(MM,LocalDateTime)) As InnerTab Inner Join Sites On (InnerTab.SiteID=Sites.SiteID) Inner Join Variables On (InnerTab.VariableID=Variables.VariableID) Inner Join QualityControlLevels ON (InnerTab.QualityControlLevelID=QualityControlLevels.QualityControlLevelID) ORDER BY SiteCode, VariableName, DataType, QualityControlLevelCode, Year, Month
25
Pivot views Example 6. Which quality control types have the most number of records at each site? Example 7. At what site and in which months and years were the most raw data temperature measurements taken? Example 8. What variable measurements are derived products? How many derived product observations are available each year? Which derived product variable has the most records in 2009? CEE 6930David Rosenberg 25
26
Pivot Charts Also display data as an interactive bar chart Select PivotChart Drag and drop fields to control filters, Y axis, X axis, and Legend traces. CEE 6930David Rosenberg 26
27
Wrap Up 1.Multiple ways to present tabular data 2.Use SQL to –Order rows and columns –Cross tabulate data –Normalize data 3.Excel Pivot Tables and Charts provide a dynamic way to tabulate data 27CEE 6930
28
References 1.Francisco Suero, David E. Rosenberg, Peter Mayer (2012). "Estimating and Verifying United States Households' Potential to Conserve Water." ASCE-Journal of Water Resources Planning and Management. 138(3), pp. 209-306. doi: 10.1061/(ASCE)WR.1943- 5452.0000182.Estimating and Verifying United States Households' Potential to Conserve Water 2.Excel Pivot Table tutorial. http://chandoo.org/wp/2009/08/19/excel- pivot-tables-tutorial/ 28CEE 6930
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.