Uexplore/Dexter Tutorial

Uexplore/Dexter Tutorial
Part 2 More Advanced Topics And Exercises Rev , jgb By John Blodgett, OSEDA, U of Missouri Columbia; under contract with the Missouri Census Data Center.

Generate a Report Showing …
Population change (estimated) for Missouri cities (places) from 2000 to 2004. Show change and pct change sorted by change in population, descending. Only show cities that had growth of at least 100 people and 5% over the period. Use variable labels (rather than names) as column headings in the report.

Navigate to popests Filetype
Pop Estimates is a major category with (currently) 4 filetypes. By far the most important of these is popests. Choose the Current version of popests (as opposed to the older estimates in popests2).

Finding the Relevant Dataset
The Census Bureau does estimates at various levels, including nation, state, county and “subcounty”. The latter includes cities (“places”) and other sub-county governmental units. Find the relevant dataset by scanning the Datasets.html page in the popests dir. By now you should know that whenever you encounter a Datasets.html file in a data directory you should take advantage of it. It makes finding what you are looking for a lot easier.

The Winner is dataset mosc04
We got here by clicking in the Details column of the mosc04 row of the Datasets.html table.

Determine SumLev Value for Filter
Turns out that this set has some non subcounty data as well (i.e. state & county summaries). We want complete places – level 162. Level 157 would give us place-within-county.

Choose Output Format(s)
We will want to switch to PDF output at the end but while creating a complex query it is best to use some other report format that does not take so long to generate.

Create the Filter First row selects complete-place summaries.
2nd row says only places with growth of 100 or more persons. 3rd row says the change must be at least 5%.

Choose Columns I would normally keep geocode but the report was tight for horizontal space and most users will ignore it so we left it off.

Section IV Important This Time
Section IV may not be essential but it is easy to use and comes in real handy when labeling of reports is required.

Use An Option in New Sec V
Section V is brand new (summer, 2005) and as of the date this slide was created we still had not created the help page for it. Macho users only. Mostly of use to MCDC personnel.

HTML Output We had to change the % character in the title to the Pct because Dexter strips out special characters.

Summary Log for this Query
Note the link to a “saved query file”. This is where we store all the specs resulting from your page clicks.

The Dexter Saved Query File
Do not concern yourself with the details here. Just know that this is way to codify your query. We hope at some point to allow for public queries, but for now such entities can only be re-used by authorized MCDC personnel.

Dexter Query Files Are simple text files used to encapsulate a query (i.e. save all the specs so that the query can be rerun.) Written to a temporary file which goes away within 48 hrs. For now, most users cannot replay a saved query. But authorized MCDC personnel can use these to create public queries. Veteran Dexter users may recall our earlier attempts at capturing queries using very long URLs with all the parm specs as passed to Dexter. There were technical problems with that approach which forced us to seek better alternatives, and this led us to develop these query files.

Invoking a Saved Query Can be done via a URL with a parm spec as we see in the current example. The name of a stored query can also be entered on the Dexter query form near the bottom of the page. For techies who care: Dexter looks for the file named &query.txt in the Queries subdirectory of the &path data directory.

Saved Queries Are relatively new to Dexter and not yet fully implemented. Have good potential for creating “virtual” data products. You save the query file that generates the report (and/or csv file) rather than the files. We are experimenting with adding “run-time parms” and generating query front-ends to allow customizing the query. For example, think of turning the growing cities query into one where you could specify the state rather than having it always be Missouri. We have removed several slides here that referenced a saved query that we no longer support. We are still working on a better way to utilize this tool.

Saved Queries and xsamples
We are experimenting with a new kind of documentation for using Dexter. Sample dexter queries are documented and stored in shtml files in an xsamples dir. These sample pages include links to let you view and/or invoke the saved query file. See these at As of this writing ( ) we do not have an index page for the xsample directory. But we obviously will need to create one. We are still experimenting with format and content for these pages and welcome user feedback. One of the things we would like to be able to do at some point is to be able to get user-contributed xsamples to be shared with others and included in this directory. We shall also need to provide linkage to these pages from the Queries subdirectories of the felevant filetype directories.

V. Advanced Options A new section on the Dexter input form, targeted at more sophisticated user who wants more control over output. As with all Dexter sections, click on section header to see online documentation. Easily ignored. Pretend it’s not there if you want. Many of the new features are things we want to do when we build public queries.

Advanced Features Example
We will use several advanced features here including data aggregation. The dataset we shall access is the latest (thru 2004) county level estimates with components of change since 2000 for the entire country. We have added cbsa (metropolitan and micropolitan area) codes to this dataset. The latest such estimates as of 2006 are thru 2005 and are stored in datasets with names just like the ones we use in the examples except substitute “com05” wherever you see “com04”.

Use Datasets.html Page in popests
We choose uscom04 because it has the latest estimates (2004), for the universe wanted (US) and at the level we need (county). When you read this it is likely that there will be a more recent dataset such as uscom05 or uscom06 with the latest estimates. We tend to do everything the same with these data each year so selecting the latest should be OK.

Query Specifications We have a dataset that has county-level data but also has CBSA (core based statistical area) codes identifying the metro area. We want to aggregate (sum up) the pop data to get cbsa-level summaries. To further complicate matters, we have a variable, cbsatype, that tells us whether it is a Metro or Micro (-politan) area. We not only want summaries for each cbsa within state, but we also want a summary for all the cbsa’s of a type (metro/micro) within each state. The states of interest are Illinois, Kansas and Mo. We want HTML output in a custom style.

Define Filter County level summaries only.
Code of indicates not in a CBSA; exclude. Choose states using the postal abbreviations.

Select Columns Not interested in any variables that ID the county (eg. County, areaname). Only want the variables by which we intend to aggregate and the variables which are to be summed.

Titles and Footnotes

The Really Hard Part Don’t feel bad if this makes no sense to you. It may or may not be easier once we actually have a help page.

Aggregation Specs Aggby: stab cbsatype cbsa indicates that you want Dexter to combine all rows that have the same value for these 3 category variables, summing all the numeric variables from these rows. Agglvl: 2 indicates that you want summaries for the rightmost 2 aggby variables. A summary will be generated for all stab/cbsatype combinations, regardless of the value of cbsa. A value of 3 would have indicate we also want a summary at the state level. A value greater than 3 would be an error. The default level is 1.

More Aggregation Specs
Grand Totals? – No means you do not want Dexter to add a summary row at the end of the file with totals for all rows in the entire dataset. Means or Percents – you specify here cols. that cannot be just summed up. They have to be specially processed using something called a weighted average.

Aggregation Specs 3 Weights for Means/Pcts – this is a list of columns to correspond with the list specified just above for Means or Percents. We are saying here that we want the program to weight the value of pctchang using the value of pop00c. Each pctchang value is multiplied by pop00c (“weighted”) prior to aggregation. During the agg step the weighted values are summed. In a post-agg step the sum of the weighted values is divided by the sum of the weights.

Just Know That … Whenever you are aggregating and you have a column/variable that is a percentage, you need to specify it in your Means or Percents list, and the corresponding col/var to use in the “Weights for ..” list is a variable containing the value of which it is the percentage. (We call this the “universe variable”). E.g. if the variable is PctAsian (Asians as a pct of total persons) then the weight variable is TotPop – the total persons.

If you misspell a Variable Name
When entering variable names in any of the boxes in Section V be extra careful to spell the name exactly. Also be certain that you select the variable in Section III – you cannot aggregated by State if you have not selected State as 1 or the variables to keep. You will not get a specific error message about this but instead it will just say that ther were no observations selected. You can sometimes get help in locating such errors by typing the code “131” in the box near the bottom of the form label “Internal use only (_debug opt)”. This results in a stream of technical gobbledygook but in that stream you can look for SAS-generated error messages generated in red. This will usually tell you where the query failed. E.g. The message might say something such as “The variable pctchange in the DROP, KEEP, or RENAME list has never been referenced.” This is your clue that you typed in a variable name incorrectly (the name of the variable is “pctchang”, without the “e” at the end.)

Variables to Drop Not important (99% of the time).
The program generates 2 extra variables, named _lvl_ and _nag_ , that occasionally may be useful. _lvl_ indicates the summary level (in our example it would have value 1 or 2). _nag_ keeps a count of how many rows/observations were used to form the output summary row/observation.

Advanced Report Formatting
We check the option to use variable labels as column headers in the report. Not very advanced, but it did not fit elsewhere. By variables for report allows specifying one of more variables that are listed on a separate “by line” instead of as a column. ID variables for report (not specified here) are variables listed at the far left of each row to identify the observation (instead of having “Obs”, the observation #, used.)

Style to Use for html/pdf Output
You get to pick from a menu of 14 or so. Those followed by ** are recommended. The default (sasweb) is minimal blue & white. In this example we chose brick, one of our favorites. Names are not very mnemonic; you just have to try them to see what they look like.

This is an example of what our output looks like using the brick style.
What is the data on Obs 12?

Exercise 1 Access 2000 census long-form (sample) data for census tracts in state of Nebraska. Create a csv file where each record corresponds to a census tract and the variables/columns tell us what metro area and county the tract is in and reports the total population and the number and percent of persons who were poor.

Exercise 2 Access filetype stf903x2 (under 1990 census data).
Create a file (sas, dbf, or Excel – whichever you prefer) that has the number of hispanics and pct hispanics for all census tracts in Greene and Christian counties (MO). Use 2000 tract geography. Do similar query using filetype sf32000x to get comparable data for 2000. (Not part of exercise, but we hope you will have tools to merge these 2 results).

Exercises 3 Access the beareis filetype.
Pull data for all counties in the new Jefferson City metro area (CBSA). The id variables for your output should be county, LineCd and LineCdMeaning. The numeric variables here are a time series; select data for 1993 and 2003. In the Advanced Report Formatting section specify county as a by variable and linecd as an ID variable. Select electronics as the style.

Exercise 4 In the beareis filetype we have datasets that report on total transfer payments. Locate the dataset with this data for Mo. Print a report showing total transfer payments for the state of Missouri for each of the most recently-available 10 years. Extra credit option: specify that you want all the trf variables displayed using a dollar12. format.

Exercise 5 Navigate to the filetype georef under the Geography/GIS major category. Access the mocogeos dataset. (Missouri county geocodes). Do a plain text report showing all counties in Mo, their FIPS codes & names along with the DED-Region, RPC and dot (MoDOT) region in which they are contained. Extra credit opt: the format code $rpcname. Can be used to display names for rpc codes. Specify that you want to see rpc names instead of codes in your report.

Exercise 6 Find the 25 wealthiest counties in the United States per the 2000 census, using Median Household Income as the measure of wealth. Print them out in descending order (highest income first) – just the top 25. Hint: run Dexter twice, the 2nd time applying a filter based on medhhinc. If you’ve been paying attention you should be able to find the appropriate dataset on your own.

Exercise 7 Filetype sf12000x contains the standard extract of data from Summary File 1, 2000 census. As a bonus, we added the 1990 pop for the corresponding geography to this otherwise all-2k dataset. Access the block level dataset for Mo and generate a tab-delimited file showing the 1990 & 2000 pops along with change & % change for all blocks in Adair county.

Exercise 8 In filetype sf32000x, access dataset moschlcos.
The MCDC has done many custom geographic aggregations of 2k census data -- including data for school districts and school districts/counties. In filetype sf32000x, access dataset moschlcos. Print a report showing total pop, rural pop and % rural pop for all the districts within St. Charles county.

Want More Exercises? Take the MCDC Trivia challenge at Has ten rather challenging exercises, all involving data in the popests filetype (current population estimates). Answers included (sort of).

As usual, questions and comments are encouraged. E-mail preferred:
Thank You As usual, questions and comments are encouraged. preferred:

Uexplore/Dexter Tutorial

Similar presentations

Presentation on theme: "Uexplore/Dexter Tutorial"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Uexplore/Dexter Tutorial

Similar presentations

Presentation on theme: "Uexplore/Dexter Tutorial"— Presentation transcript:

Similar presentations

About project

Feedback