May 21, 2009 Stata for micro targeting using C++ and ODBC Masahiko Aida.

Slides:



Advertisements
Similar presentations
Copyright © SoftTree Technologies, Inc. DB Tuning Expert.
Advertisements

Database System Concepts and Architecture
Random Forest Predrag Radenković 3237/10
Running a Successful Field Campaign Determine Your Strategic Task Establish Your Vote Goal Build the Coalition Write a Plan Secure a Voter File Execute.
10 REASONS Why it makes a good option for your DB IN-MEMORY DATABASES Presenter #10: Robert Vitolo.
Sharing Enterprise Data Data administration Data administration Data downloading Data downloading Data warehousing Data warehousing.
Ann Arbor ASA ‘Up and Running’ Series: SPSS Prepared by volunteers of the Ann Arbor Chapter of the American Statistical Association, in cooperation with.
Chapter 7 Sampling Distributions
PHP (2) – Functions, Arrays, Databases, and sessions.
15 Chapter 15 Web Database Development Database Systems: Design, Implementation, and Management, Fifth Edition, Rob and Coronel.
INTERNET DATABASE Chapter 9. u Basics of Internet, Web, HTTP, HTML, URLs. u Advantages and disadvantages of Web as a database platform. u Approaches for.
INTERNET DATABASE. Internet and E-commerce Internet – a worldwide collection of interconnected computer network Internet – a worldwide collection of interconnected.
Evaluation of MineSet 3.0 By Rajesh Rathinasabapathi S Peer Mohamed Raja Guided By Dr. Li Yang.
JDBC. In This Class We Will Cover: What SQL is What ODBC is What JDBC is JDBC basics Introduction to advanced JDBC topics.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
Objective In this session we will discuss about : What is ADO. NET ?
RESEARCH HUB AT THE UNIVERSITY LIBRARIES PENN STATE UNIVERSITY TOUR OF STATISTICAL PACKAGES.
Version 4 for Windows NEX T. Welcome to SphinxSurvey Version 4,4, the integrated solution for all your survey needs... Question list Questionnaire Design.
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
Cache Memory By Sean Hunter.
Label production Solution with Label Gallery programs Label Gallery is used for general label design and print GalleryData is used to create small database.
Object Oriented Databases by Adam Stevenson. Object Databases Became commercially popular in mid 1990’s Became commercially popular in mid 1990’s You.
Comparison of Classification Methods for Customer Attrition Analysis Xiaohua Hu, Ph.D. Drexel University Philadelphia, PA, 19104
6/1/2001 Supplementing Aleph Reports Using The Crystal Reports Web Component Server Presented by Bob Gerrity Head.
Getting connected.  Java application calls the JDBC library.  JDBC loads a driver which talks to the database.  We can change database engines without.
SharePoint 2010 Business Intelligence Module 6: Analysis Services.
SSIS Over DTS Sagayaraj Putti (139460). 5 September What is DTS?  Data Transformation Services (DTS)  DTS is a set of objects and utilities that.
SQL Server Integration Services (SSIS) Presented by Tarek Ghazali IT Technical Specialist Microsoft SQL Server (MVP) Microsoft Certified Technology Specialist.
ROOT: A Data Mining Tool from CERN Arun Tripathi and Ravi Kumar 2008 CAS Ratemaking Seminar on Ratemaking 17 March 2008 Cambridge, Massachusetts.
Week 7 Lecture Web Database Development Samuel Conn, Asst. Professor
SednaSpace A software development platform for all delivers SOA and BPM.
 Overview of SPSS  Interface  Getting Started  Managing Data  Descriptive Statistics  Basic Analysis  Additional Resources.
9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.
MySQL. Dept. of Computing Science, University of Aberdeen2 In this lecture you will learn The main subsystems in MySQL architecture The different storage.
©2010 John Wiley and Sons Chapter 12 Research Methods in Human-Computer Interaction Chapter 12- Automated Data Collection.
Using SAS® Information Map Studio
Developing C/C++ applications with the Eclipse CDT David Gallardo.
The application of selective editing to the ONS Monthly Business Survey Emma Hooper Office for National Statistics
Replay Compilation: Improving Debuggability of a Just-in Time Complier Presenter: Jun Tao.
1 Chapter Overview Performing Configuration Tasks Setting Up Additional Features Performing Maintenance Tasks.
Lecture DSCI 4520/5240 DATA MINING MYRAW Nonprofit donor data MYRAW Overview Determine who is likely to donate to a non-profit organization campaign.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 7-1 Chapter 7 Sampling Distributions Basic Business Statistics.
The course. Description Computer systems programming using the C language – And possibly a little C++ Translation of C into assembly language Introduction.
Introduction to Compilers. Related Area Programming languages Machine architecture Language theory Algorithms Data structures Operating systems Software.
6/1/2001 Supplementing Aleph Reports Using The Crystal Reports Web Component Server Presented by Bob Gerrity Head.
Allyson Camp October 6, Micro-targeting Definition: – “Micro-targeting is sending a message to a highly specific portion of an audience based on.
Konstantina Christakopoulou Liang Zeng Group G21
Memory Hierarchy: Terminology Hit: data appears in some block in the upper level (example: Block X)  Hit Rate : the fraction of memory access found in.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Learning and Acting with Bayes Nets Chapter 20.. Page 2 === A Network and a Training Data.
What is a compiler? –A program that reads a program written in one language (source language) and translates it into an equivalent program in another language.
Chapter 1 Database Access from Client Applications.
Campaigns and Data Analytics Why it is important to know YOU.
Field Organizing September 26 th, 2009 Paid for by Democracy for America, and not authorized by any candidate or candidate’s.
5.8 Finalise data files 5.6 Calculate weights Price index for legal services Quality Management / Metadata Management Specify Needs Design Build CollectProcessAnalyse.
จัดทำโดย นายชนากานต์ สันติคุณาภรณ์ นายธฤษพงศ์ ศิริบูรณ์ นางสาวศุภาภรณ์ ถ่านคำ.
Capacity Planning in a Virtual Environment Chris Chesley, Sr. Systems Engineer
Conducting Marketing Research Chapter 29. Sec – The Marketing Research Process The steps in conducting marketing research The difference between.
OCR A Level F453: The function and purpose of translators Translators a. describe the need for, and use of, translators to convert source code.
Integrating and Extending Workflow 8 AA301 Carl Sykes Ed Heaney.
Chapter 29 Conducting Market Research. Objectives  Explain the steps in designing and conducting market research  Compare primary and secondary data.
Stat 101Dr SaMeH1 Statistics (Stat 101) Associate Professor of Environmental Eng. Civil Engineering Department Engineering College Almajma’ah University.
Topic 2: Hardware and Software
Bridging the Data Science and SQL Divide for Practitioners
CS-3013 Operating Systems C-term 2008
Database Performance Tuning and Query Optimization
Tutorial 8 Objectives Continue presenting methods to import data into Access, export data from Access, link applications with data stored in Access, and.
Content of Presentation
Targeting Wait Statistics with Extended Events
Chapter 11 Database Performance Tuning and Query Optimization
Presentation transcript:

May 21, 2009 Stata for micro targeting using C++ and ODBC Masahiko Aida

May 21, 2009 Page 2 The Aim of the Research  Outline of presentation — What is micro-targeting? — Work flows  Utilizing best part of three computing platform — Stata for its flexibility. — SQL server for handling large data. — C++ for fast complex calculation.

May 21, 2009 Page 3 What is targeting?  Political campaigns targeted voters to convey candidate’s views or to encourage voting (GOTV).  Targeting using macro level data — Buy ad spots by media market (defined by county). — Send canvassers to particular precincts.  Nature of macro level targeting —Pros: Candidate-level breakdown of votes are only reported by precinct level. —Cons: Not very efficient.

May 21, 2009 Page 4 1.Create a model with training data 2.Extrapolate and estimate predicted values for all registered voters. How does Micro-targeting Work?  Attitudes  Behaviors  Voter file data (age, gender, party registration & vote histories)  Census tract/block summary data  Consumer data Predicted values from the model

May 21, 2009 Page 5 Statistical Models for Micro-Targeting  Required properties for targeting models — Large number of covariates. — Incorporating complex interaction terms. — Robustness. — Need to avid over fitting.  Working solutions for above requirement (2009 version) — Decision tree models. — Model averaging (bagging or boosting). — Cross validation. — Use (political) common sense.

May 21, 2009 Page 6 Workflow of Micro-Targeting Analysis & Scoring (2) Recode, clean & Analyze Voter files are stored in SQL server ODBC connection Dynamic link Library in C++ Load plug-in (3) Modeling & validation Data mining software (1) Survey Data From call center Export model as C++code (4) Score propensity  Direct mail consultant  Telphone Contact  Canvassing  Survey Sampling Export recoding syntax as C++ code

May 21, 2009 Page 7 How to Make Plug-ins stplugin.c Works as a bridge Between your C++ Code and Stata Wrapper.c Load/export data Within memory Of Stata to functions Written in C++ scoring.c Scoring function Written in C++ DLL Compile (ex. gcc) Now Stata has new function!! recoding.c Translate generate, replace syntax of Stata into C++

May 21, 2009 Page 8 Example capture program drop Scoring program define Scoring, rclass odbc load, dialog(complete) dsn(“xxx") clear sqlshow exec("SELECT VAR1, VAR2 … FROM Table_`1’ WHERE Rand = `2’) do recoding.do gen Predicted =. plugin call Scoring VAR1 VAR2 Predicted save w:\Scores\Score_`1’_`2’, replace end clear Scoring MI 4  Example of scoring 4 th random subset (2 nd argument) of Michigan voter file (1 st argument).

May 21, 2009 Page 9 Is Plug-ins Worth Investing? Expected time to score AR voter file Expected time to score CA voter file Singe Decision Tree (100 nodes) 241 sec39 min Boosted Tree 30 trees (100 nodes) 316 sec51 min Boosted Tree 300 trees (100 nodes) 914 sec147 min Bagged Tree 300 trees (248 nodes) 62 min596 min  It is not feasible to write scoring syntax for boosted/bagged model using SQL command or Stata do file.  Time needed for coding in C++ becomes insignificant for large applications.  21 predictors  N= 67,177

May 21, 2009 Page 10 Putting It in Perspective  Loading Data via ODBC: 49% of total process —Under windows 2003 server platform, ODBC seems fast enough. —DB should be indexed by the query variable. —SQL server can act faster if more memory/CPUs are dedicated.  Scoring Data (Bagging): 48% of total process —This of course depends on complexity of the model.  Recoding: 2% of total process —Recoding data that are already in Stata’s memory is fast.  Saving/exporting data: 1% of total time — Saving Stata file in fast disk array (RAID 10) requires negligible time.

May 21, 2009 Page 11 Summary  Stata’s flexibility and various functions make it ideal platform for data analysis and data management.  However, relational database can handle large data with much ease.  Specialized software (ex. data mining) can run certain analysis more efficiently (decision tree & boosting), and many of them can spit C++ codes.  Stata can speak both ODBC and C++.