Risk based selection for On the Spot Control at Agricultural and Rural Development Agency Miklós Lelkes Central Physical Control Department Agricultural.

Slides:



Advertisements
Similar presentations
©2011, Cognizant Fraud Control - IT Interventions and Solutions.
Advertisements

Supporting End-User Access
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.
What is Statistical Modeling
30. Conference of Directors of EU Paying Agencies Workshop1: The possibilities for optimizing the processes of implementation of direct payments Agency.
Conclusions of Workshop 2 Rural development simplification (investment measures) Partner logo Title Luís Barreiros President of the Board of Directors,
Costs of Control Exercise Carfi Salvatore 37th Conference of Directors of EU Paying Agencies Riga 7 May
1 Optimizing Marketing Campaigns by the Use of Data Mining Methods for the Hamburg-Mannheimer Insurance Die Kaiser-Rente ® Glück ist planbar Thomas Rauscher.
Clementine Server Clementine Server A data mining software for business solution.
Evaluation of MineSet 3.0 By Rajesh Rathinasabapathi S Peer Mohamed Raja Guided By Dr. Li Yang.
AUDITING INFORMATION TECHNOLOGY USING COMPUTER ASSISTED AUDIT TOOLS AND TECHNIQUES.
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
April 11, 2008 Data Mining Competition 2008 The 4 th Annual Business Intelligence Symposium Hualin Wang Manager of Advanced.
Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.
Logistic Regression KNN Ch. 14 (pp ) MINITAB User’s Guide
Application of SAS®! Enterprise Miner™ in Credit Risk Analytics
Data Mining Techniques
Fraud Detection McGraw-Hill/Irwin Copyright © 2012 by The McGraw-Hill Companies, Inc. All rights reserved.
DATA MINING Team #1 Kristen Durst Mark Gillespie Banan Mandura University of DaytonMBA APR 09.
Data Management Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Dresdner Straße Wien Administration of Investment Measures in Paying Agency of Austria Günter Griesmayr May, 6th 2015 K-A.
London, Microsimulation in decision support The latest news about our results József Csicsman
Chapter 9 Business Intelligence and Information Systems for Decision Making.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Copyright © 2010, SAS Institute Inc. All rights reserved. Applied Analytics Using SAS ® Enterprise Miner™
Deliverable 2.6: Selective Editing Hannah Finselbach 1 and Orietta Luzi 2 1 ONS, UK 2 ISTAT, Italy.
Introduction A GENERAL MODEL OF SYSTEM OPTIMIZATION.
Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Chapter 3 DECISION SUPPORT SYSTEMS CONCEPTS, METHODOLOGIES, AND TECHNOLOGIES: AN OVERVIEW Study sub-sections: , 3.12(p )
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
Electronic Cost Transfers Risk-Based Assessment NCURA April 25, 2007 University of California San Diego Presenters: Lyle Kafader & Rachel Mercado.
Copyright © 2012, SAS Institute Inc. All rights reserved. ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY,
A way to integrate IR and Academic activities to enhance institutional effectiveness. Introduction The University of Alabama (State of Alabama, USA) was.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
1 STAT 5814 Statistical Data Mining. 2 Use of SAS Data Mining.
QM Spring 2002 Business Statistics Probability Distributions.
September 18-19, 2006 – Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development Conducting and interpreting multivariate analyses.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc
1 Statistics & R, TiP, 2011/12 Neural Networks  Technique for discrimination & regression problems  More mathematical theoretical foundation  Works.
Data Mining and Decision Support
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006.
Session 21-1 Session 44 The Verification Selection Process.
Istanbul 13-14th September, 2007 Marios A.Adamides Plant Health Inspection Service Dep. Of Agriculture Min. of Agriculture Cyprus Workshop on Pesticide.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Workshop 1 – Implementation of the new CAP Michael Cooper Director – UK Co-ordinating Body 12 September 2012.
2nd Joint Workshop on Pesticide Indicators Pesticide Usage Survey on Wheat in Hungary Zsuzsanna Szabó Hungarian Central Statistical Office September.
Model based approach for estimating and forecasting crop statistics: Update, consolidation and improvement of AGROMET model “AGROMET Project” Working Group.
Fraud Detection Notes from the Field. Introduction Dejan Sarka –Data.
The CAP towards 2020 Direct payments DG Agriculture and Rural Development European Commission.
AUDIT QUALITY AND ASSURANCE 2 ND AND 3 RD OCTOBER 2014 HILTON HOTEL MATERIALITY IN PLANNING AND PERFORMING THE AUDIT (ISA 320) 1.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Surface Defect Inspection: an Artificial Immune Approach Dr. Hong Zheng and Dr. Saeid Nahavandi School of Engineering and Technology.
STATISTICAL TOOLS FOR AUDITING
Location Prediction and Spatial Data Mining (S. Shekhar)
Introduction to Data Mining and Classification
Final Year Project Presentation --- Magic Paint Face
Advanced Analytics Using Enterprise Miner
Dr. Morgan C. Wang Department of Statistics
The implementation of a more efficient way of collecting data
THE BULGARIAN EXPERIENCE IN APPLYING THE REVISED GUIDELINES
Neural Networks and Their Application in the Fields of Coporate Finance By Eric Séverin Hanna Viinikainen.
Classification and Prediction
Overview: ICS Evaluation Procedures
Workshop on Pesticide Indicators
Outline Announcement Neural networks Perceptrons - continued
Presentation transcript:

Risk based selection for On the Spot Control at Agricultural and Rural Development Agency Miklós Lelkes Central Physical Control Department Agricultural and Rural Development Agency (ARDA) Budapest, Hungary Control Methods Workshop April Ispra Italy

Control Methods Workshop 2010, Ispra Italy Selection for physical control Start point: 100 % of claims Selection of control sample (cost effectiveness) (e.g. min. 5% control rate) –Random selection:20-25% (representative overview) –Risk analysis: 75-80% (financial risk of EU) –Direct selection To have effective control methods Agricultural and Rural Development Agency (ARDA) Budapest, Hungary

Control Methods Workshop 2010, Ispra Italy Defining the risk factors and weighting First year(s) –Based on expert appraisal Evaluation of the results of the Control  update of the selection method Changes in the category limits Changes in the scoring Changes in the weighting Agricultural and Rural Development Agency (ARDA) Budapest, Hungary

Definition of a risk factor (example) small / large cases are with a higher risk Agricultural and Rural Development Agency (ARDA) Budapest, Hungary Average parcel size Risk score

Control Methods Workshop 2010, Ispra Italy Evaluation of the effectiveness of the risk analysis (2007) ~ claims For > 80 measures 4th year of Hungary/ARDA in the EU  Huge amount of information in the IACS  Need for special solution for deriving information from the DB  Data mining software/technique Agricultural and Rural Development Agency (ARDA) Budapest, Hungary

Control Methods Workshop 2010, Ispra Italy The goal of Data Mining ?  Not to find the perfect model for a certain problem  but to find the optimal model for a certain problem that is: Robust Generalizes well Easy to understand Provides insight into drivers of the problem Easy to implement Agricultural and Rural Development Agency (ARDA) Budapest, Hungary

Control Methods Workshop 2010, Ispra Italy Risk Analysis#1 Create an abstract mathematical model that behaves coherently with regard to risky farmers. Generalize risk patterns for automatic detection -Identify typical patterns -Score applications for probability of non- compliance Agricultural and Rural Development Agency (ARDA) Budapest, Hungary Model 1 Model 2 trainingvalidation

Control Methods Workshop 2010, Ispra Italy Risk Analysis#2 Modeling techniques: -Predictive Modeling: Decision Trees Neural Networks Regression Scorecards Pre-requirements: -confirmed historical cases as input data (non-compliance flag) Result: Risk score Agricultural and Rural Development Agency (ARDA) Budapest, Hungary

Control Methods Workshop 2010, Ispra Italy Data mining / estimation models Target variable:Errors (bad / good) Agricultural and Rural Development Agency (ARDA) Budapest, Hungary Average rate of anomalies % of anomaly as a function of sampling rate (with population ordered by decreasing rate of anomaly)

Control Methods Workshop 2010, Ispra Italy Data mining with SAS in Hu Pilot in 2007 –Selection of OTSC sample of SAPS by data mining Operational from 2008 –All area and animal based subsidies 2nd year of operational work in 2009 –Extend use to all measures Agricultural and Rural Development Agency (ARDA) Budapest, Hungary

Control Methods Workshop 2010, Ispra Italy Interesting first results (2007) Less categories in the factors Some criterion were not relevant! (but in 2007 they were used because of the regulation) Agricultural and Rural Development Agency (ARDA) Budapest, Hungary Area size>0.3 and < >1.0 and < >10.0 Age>18 and <401 >40 and <608 >603 High risk = low score

Control Methods Workshop 2010, Ispra Italy Operational work from 2008 The Integrated Risk Analysis System includes: Interface to –IACS System (direct access) –Hungarian ovine and caprine I&R System –Hungarian bovine I&R System –Hungarian porcine I&R System –Farmers Registry Risk datamart Analytical models (approx. 15 analytical models) –Area based measures (e.g. SAPS); –Agricultural Environment Protection Program based measures; –Rural Development (investment projects) based measures Agricultural and Rural Development Agency (ARDA) Budapest, Hungary

Control Methods Workshop 2010, Ispra Italy Data mining Statistical analysis Architecture of Integrated Risk Analysis System Place of operational work (system for transactions) Utilization Extraction data quality integration transformation IACS Datawarehouse Source External Datamart Risk management Information for management Web reporting Uniform handling of metadata Agricultural and Rural Development Agency (ARDA) Budapest, Hungary

Control Methods Workshop 2010, Ispra Italy Models Decision tree –Easy to understand (if not too complicated) –Big groups Regression Neural network –Normally the best prediction result –The result can not be interpreted (“black box”) Scorecard 14 Agricultural and Rural Development Agency (ARDA) Budapest, Hungary

Control Methods Workshop 2010, Ispra Italy Scorecard Easy to work with, can be calculated “by hand” Easy to interpret Good for non linear variables (i.e. age) Not a problem, if the variable has strange distribution Sometimes can result in big groups Not a group of factors, but a uniform model! 15 Agricultural and Rural Development Agency (ARDA) Budapest, Hungary

Control Methods Workshop 2010, Ispra Italy Data Mining Model Training and Scoring Target Variable Prediction Analysis Scoring Score Rules (Score Code) Converting a complex control result in a binary format (black or white) Should be defined by PA Agricultural and Rural Development Agency (ARDA) Budapest, Hungary

Control Methods Workshop 2010, Ispra Italy Operational work from 2008 The Integrated Risk Analysis System includes (cont.): –Scoring lists –Evaluation statistics (Random and Risk based) –OLAP reports –Ad-hoc reporting and analysis interface Agricultural and Rural Development Agency (ARDA) Budapest, Hungary SAS® Enterprise Guide SAS® Enterprise Miner 5

Control Methods Workshop 2010, Ispra Italy Results Analysis of results (e.g.: interpretation of scorecards) Model documentation –Automatic report in Enterprise Miner –Auditable documentation of data mining process Documentation of all selection procedures Review of model quality Statistics for EU (OLAP cubes) Agricultural and Rural Development Agency (ARDA) Budapest, Hungary

Control Methods Workshop 2010, Ispra Italy Risk factors found relevant by data mining Agricultural and Rural Development Agency (ARDA) Budapest, Hungary Total amount of application--- Total area change in the area Number of physical blocks Number of physical blocks in Number of physical blocks in 2006 Number of parcels number of parcels change in number of parcels from previous year change in number of parcels from 2005 to change in number of parcels from 2004 to Number of different land use type--- average parcel size--- underclaimed reference parcel underclaimed reference parcel in the previous year ---

Control Methods Workshop 2010, Ispra Italy Risk factors found relevant by data mining Agricultural and Rural Development Agency (ARDA) Budapest, Hungary percentage of risky parcels in the application (close to minimum parcel size) --- joint cultivation--- grassland in the application---grassland in the application percentage of orchard and vineyard--- percentage of orchard and vineyard compactness of the farm--- change of application complexity--- result of earlier controls --- TOP-UP crops in the application--- Gender of applicant --- Age of applicant Other area based applications ---

Control Methods Workshop 2010, Ispra Italy Results by mean of efficiency Cumulative lift: times higher compared to random sample –Fine tuning of criterion and weighting (0.2-3 lift increase) –Selection of variables (0.1-1 lift increase) –Proposal of new variables to include ( lift increase) –Global optimization of cross-validation effects ( lift increase) Agricultural and Rural Development Agency (ARDA) Budapest, Hungary Lift = ratio of anomalies in risk sample over random sample

Control Methods Workshop 2010, Ispra Italy Some objectives found by OTSC The quality of application could be better  level of irregularities in the random sample still high ?(although it is better from year to year) –Higher control rate?  better risk assessment (in terms of % of anomaly)! Rate of irregularities in classical field inspection sample is significantly higher than in RS sample (research needed) –Difference in technique –Difference in selection / population Agricultural and Rural Development Agency (ARDA) Budapest, Hungary

Control Methods Workshop 2010, Ispra Italy Sliding window Agricultural and Rural Development Agency (ARDA) Budapest, Hungary Blocks  farmers  risk calculation per window, other info

Control Methods Workshop 2010, Ispra Italy Various shapes Agricultural and Rural Development Agency (ARDA) Budapest, Hungary 30 x 30,30 x 42,10 x 30,30 x 10,10 x 10…

Control Methods Workshop 2010, Ispra Italy Efficiency Agricultural and Rural Development Agency (ARDA) Budapest, Hungary Net over gross ratio constraint may be opposite to risk?

Control Methods Workshop 2010, Ispra Italy Conculsion for the effective risk analysis Worth to use data mining solution Yes, we need RS, but the level of anomalies in RS sample is a question Better not to use all farms in the RS zone –20-25% use of VHR? –Site shape? –Quota for MS? Cost of data mining solution? –Expensive –But cheaper, than a flat rate correction! Agricultural and Rural Development Agency (ARDA) Budapest, Hungary

Control Methods Workshop 2010, Ispra Italy Thank you for your attention! Agricultural and Rural Development Agency H-1095 Soroksári út Central Physical Control Department Tel.: Fax: