Large Data Bases: Advantages, Problems and Puzzles: Some naive observations from an economist Alan Kirman, GREQAM Marseille Jerusalem September 2008.

Slides:



Advertisements
Similar presentations
Cointegration and Error Correction Models
Advertisements

Statistically motivated quantifiers GUHA matrices/searches can be seen from a statistical point of view, too. We may ask ‘Is the coincidence of two predicates.
Now You See It, Now You Don’t: Why Do Real Estate Agents Withhold Available Houses from Black Customers? Jan Ondrich Stephen L. Ross John Yinger.
Asymmetric Information and Bubbles: An Experiment Michael Brandner Jürgen Huber Michael Kirchler Matthias Sutter all University of Innsbruck.
The Economics of Environmental Regulations Pollution Tax and Markets for Transferable Pollution Permits.
Introduction price evolution of liquid stocks after large intraday price change Significant reversal Volatility and volume stay high NYSE-widen bid-ask.
XIV International Conference on Economic and Social Development, 2-5 April 2013, Moscow A new copula approach for high-dimensional real world portfolios.
Variance reduction techniques. 2 Introduction Simulation models should be coded such that they are efficient. Efficiency in terms of programming ensures.
Carrying out an Empirical Project n A researcher conducting an empirical study follows these basic steps : –formulate a model –gather the data –estimate.
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Information-based Trading, Price Impact of Trades, and Trade Autocorrelation Kee H. Chung Mingsheng Li Thomas H. McInish.
More Demand / Begin Information. Fundamental Problems with Demand Estimation for Health Care Measuring quantity, price, income. Quantity first. It is.
1 Robert Engle UCSD and NYU July WHAT IS LIQUIDITY? n A market with low “transaction costs” including execution price, uncertainty and speed n.
Exchange Rate “Fundamentals” FIN 40500: International Finance.
© 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license.
Stress testing and Extreme Value Theory By A V Vedpuriswar September 12, 2009.
Portfolio Management Lecture: 26 Course Code: MBF702.
Price patterns, charts and technical analysis: The momentum studies Aswath Damodaran.
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license.
Learning Objectives Copyright © 2004 John Wiley & Sons, Inc. Sample Size Determination CHAPTER Eleven.
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
Copyright © 2009 Pearson Addison-Wesley. All rights reserved. Chapter 21 Monetary Policy Strategy.
LECTURE 22 VAR 1. Methods of calculating VAR (Cont.) Correlation method is conceptually simple and easy to apply; it only requires the mean returns and.
This paper is about model of volume and volatility in FX market Our model consists of 4 parts; 1. M odel of Order Flow Generation 2. Model of Price Determination.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Sample Size Determination CHAPTER thirteen.
Large events on the stock market: A study of high resolution data Kertész János Institute of Physics, BME with Adam Zawadowski (BME) Tóth Bence (BME) György.
Federico M. Bandi and Jeffrey R. Russell University of Chicago, Graduate School of Business.
Effect of Learning and Market Structure on Price Level and Volatility in a Simple Market Walt Beyeler 1 Kimmo Soramäki 2 Robert J. Glass 1 1 Sandia National.
Determinants of Credit Default Swap Spread: Evidence from the Japanese Credit Derivative Market.
1 MBF 2263 Portfolio Management & Security Analysis Lecture 5 Capital Asset Pricing Model.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
1 Robert Engle and Asger Lunde NYU and UCSD and University of Aarhus May 2001.
1 MARKETING RESEARCH Week 5 Session A IBMS Term 2,
The changing geography of banking – Ancona, Sept. 23 rd 2006 Discussion of: “Cross border M&As in the financial sector: is banking different from insurance?”
Managerial Economics Demand Estimation & Forecasting.
Gile Sampling1 Sampling. Fundamental principles. Daniel Gile
Chapter 14 Supplementary Notes. What is Money? Medium of Exchange –A generally accepted means of payment A Unit of Account –A widely recognized measure.
Discussion of Evans and Lyons, “A New Micro Model of Exchange Rate Dynamics” Nelson C. Mark University of Notre Dame.
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
1 13. Empirical market microstructure Empirical analysis of market microstructure focuses on order flows, bid/ask spread, and structure of the limit-order.
Chapter 4 Exchange Rate Determination. Copyright  2010 McGraw-Hill Australia Pty Ltd PPTs t/a International Finance: An Analytical Approach 3e by Imad.
FAT TAILS REFERENCES CONCLUSIONS SHANNON ENTROPY AND ADJUSTMENT OF PARAMETERS AN ADAPTIVE STOCHASTIC MODEL FOR RETURNS An adaptive stochastic model is.
Academic Research Academic Research Dr Kishor Bhanushali M
Public Finance and Public Policy Jonathan Gruber Third Edition Copyright © 2010 Worth Publishers 1 of 24 Empirical Tools of Public Finance 3.1 The Important.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
1 Theory and Practice of International Financial Management Assessing the Risk of Foreign Exchange Exposure.
1. 2 EFFICIENT MARKET HYPOTHESIS n In its simplest form asserts that excess returns are unpredictable - possibly even by agents with special information.
REGIME CHANGES AND FINANCIAL MARKETS Prepared for Topics in Quantitative Finance | Abhishek Rane - Andrew Ang and Allan Timmermann.
Patterns in the Jump Process, and the Relationship Between Jump Detection and Volatility Dynamics Matthew Rognlie Econ 201FS Wednesday, March 18, 2009.
Chapter 9 Audit Sampling – Part a.
Public Finance and Public Policy Jonathan Gruber Third Edition Copyright © 2010 Worth Publishers 1 of 24 Copyright © 2010 Worth Publishers.
The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.
1 Lecture Plan Modelling Profit Distribution from Wind Production (Excel Case: Danish Wind Production and Spot Prices) Reasons for copula.
Chapter 18 Consumer Behavior and Pricing Strategy
Copyright 2010, The World Bank Group. All Rights Reserved. Producer prices, part 2 Measurement issues Business Statistics and Registers 1.
A controllable laboratory stock market for modeling real stock markets Kenan An, Xiaohui Li, Guang Yang, and Jiping Huang Department of Physics and State.
Predicting Returns and Volatilities with Ultra-High Frequency Data -
Insurance IFRS Seminar Hong Kong, December 1, 2016 Eric Lu
Author: Konstantinos Drakos Journal: Economica
Peter Van Tassel 18 April 2007 Final Econ 201FS Presentation
ECN741: Urban Economics Notes Based on: “Now You See It, Now You Don’t: Why Do Real Estate Agents Withhold Available Houses from Black Customers?” Jan.
Liquidity Premia & Transaction costs
Empirical Tools of Public Finance
Statistical Data Analysis
ECN741: Urban Economics Notes Based on: “Now You See It, Now You Don’t: Why Do Real Estate Agents Withhold Available Houses from Black Customers?” Jan.
Lecturer Dr. Veronika Alhanaqtah
ECN741: Urban Economics Notes Based on: “Now You See It, Now You Don’t: Why Do Real Estate Agents Withhold Available Houses from Black Customers?” Based.
Presentation transcript:

Large Data Bases: Advantages, Problems and Puzzles: Some naive observations from an economist Alan Kirman, GREQAM Marseille Jerusalem September 2008

Some Basic Points Economic data bases may be large in two ways Firstly they may simply contain a very large number of observations. The best examples being tick by tick data. Secondly as with some panel data each observation may have many dimensions.

The Advantages and Problems From a statistical point of view at least the high frequency data might seem to be unambiguously advantageous. However the very nature of the data has to be examined carefully and certain stylised facts emerge which are not present at lower frequency In the case of multidimensional data, the « curse of dimensionality » may arise.

FX: A classic example of high frequency data Usually Reuters indicative quotes are used for the analysis. What do they consist of? Banks enter bids and asks for a particular currency pair, such as the euro-dollar. They put a time stamp to indicate the exact time of posting These quotes are « indicative » and the banks are not legally obliged to honour them. For euro-dollar there are between 10 and 20 thousand updates per day.

Brief Reminder of the Characteristics of this sort of data Returns are given by We know that there is no autocorrelation between successive returns but that and are positively autocorrelated except at very small time intervals and have slow decay Volatility exhibits spikes referred to as volatility clustering

A Problem The idea of using such data as Brousseau (2007) points out is to track the « true value » of the exchange rate through a period. But all the data are not of the same « quality » Although the quotes hold, at least briefly, between major banks, they may not do so for other customers and they may also depend on the amounts involved. There may be mistakes, quotes placed as « advertising » and one with spreads so large that they encompass the spread between the best bid and ask and thus convey no information

Cleaning the Data Brousseau and other authors propose various filtering methods, from simple to sophisticated. If the jump between two successive mid-points exceeds a certain threshold for example the observation is eliminated. ( a primitive first run) However, how can one judge whether the filtering is successful? One idea is to test against quotes which are binding such as those on EBS. But this is not a guarantee.

Some stylised facts

Microstructure Noise In principle, the higher the sampling frequency is, the more precise the estimates of integrated volatility become However, the presence of so-called market microstructure features at very high sampling frequencies may create important complications. Financial transactions - and hence price changes and non- zero returns- arrive discretely rather than continuously over time The presence of negative serial correlation of returns to successive transactions (including the so-called bid-ask bounce), and the price impact of trades. For a discussion see Hasbrouck (2006), O’Hara (1998), and Campbell et al. (1997, Ch. 3)

Microstructure Noise Why should we treat this as « noise » rather than integrate it into our models? One argument is that it overemphasises volatility. In other words sampling too frequently gives a spuriously high value. On the other hand, Hansen and Lunde (2006) assert that empirically market microstructure noise is negatively correlated with the returns, and hence biases the estimated volatility downward. However, this empirical stylized fact, based on their analysis of high-frequency stock returns, does not seem to carry over to the FX market

Microstructure Noise « For example, if an organized stock exchange has designated market makers and specialists, and if these participants are slow in adjusting prices in response to shocks (possibly because the exchangeís rules explicitly prohibit them from adjusting prices by larger amounts all at once), it may be the case that realized volatility could drop if it is computed at those sampling frequencies for which this behavior is thought to be relevant. In any case, it is widely recognized that market microstructure issues can contaminate estimates of integrated volatility in important ways, especially if the data are sampled at ultra-high frequencies, as is becoming more and more common. » Chaboud et al. (2007)

What do we claim to explain? Let’s look at rapidly at a standard model and see how we determine the prices. What we claim for this model is that it is the switching from chartist to fundamentalist behaviour that leads to 1.Fat tails 2.Long memory 3.Volatility clustering What does high frequency data have to do with this?

Stopping the process from exploding Bound the probability that an individual can become a chartist If we do not do this the process may simply explode We do not put arbitrary limits on the prices that can be attained however

Nice Story! But…

The Real Problem We have a market clearing equilibrium but this is not the way these markets function They function on the basis of an order book and that is what we should model. Each price in very high frequency data corresponds to an individual transaction The mechanics of the order book will influence the structure of the time series How often do our agents revise their prices? They infer information from the actions of others revealed by the transactions

How to solve this? This is the subject of a project with Ulrich Horst We will model an arrival process for orders and the distribution from which these orders are drawn will be determined by the movements of prices In this way we model directly what is too often referred to as « microstructure noise » and remove one of the problems with using high frequency data.

A Challenge « In deep and liquid markets, market microstructure noise should pose less of a concern for volatility estimation. It should be possible to sample returns on such assets more frequently than returns on individual stocks, before estimates of integrated volatility encounter significant bias caused by the market microstructure features.. It is possible to sample the FX data as often as once every 15 to 20 seconds without the standard estimator of integrated volatility showing discernible effects stemming rom market microstructure noise. This interval is shorter than the sampling intervals of several minutes, usually five or more minutes, often recommended in the empirical literature This shorter sampling interval and associated larger sample size affords a considerable gain in estimation precision. In very deep and liquid markets, microstructure-induced frictions may be much less of an issue for volatility estimation than was previously thought. » Chaboud et al. (2007)

Our job is to explain why this is so!

The Curse of Dimensionality

Why does this matter? We collect more and more data on individuals and, in particular, on consumers and the unemployed If we have D observations on N individuals the relationship between D and N is important if we wish to estimate some functional relation between the variables There is now a whole battery of approaches for reducing the dimensionality of the problem and these represent a major challenge for econometrics

A blessing? Mathematicians assert that such high dimensionality leads to a « concentration of measure » Someone here can no doubt explain how this might help economists!