Data snooping occurs when a data set is used more than once for inference, prediction or model selection and any satisfactory result can be produced by chance. For example, when comparing several trading strategies on a single historical data set, how can an investor detect strategies whose above-market performance cannot be attributed to randomness but to the strategy’s superiority?
Large-scale testing methods formalize and attempt to solve this problem. Scope of this topic is a concise, theoretical summary of classical and modern multiple testing approaches, a short comparative Monte Carlo study, and an empirical application that compares different trading strategies, or prediction models. Requirements: programming skills, deep understanding of (basic) statistical testing, basic knowledge of resampling methods, and, ideally, experience with financial data.
- WHITE, H. L. (2000): “A Reality Check for Data Snooping,” Econometrica, 68, 1097–1126.
- ROMANO, J.P. and WOLF, M.(2005): “Stepwise multiple testing as formalized data snooping”, Econometrica, 73, 1237–1282.
- ROMANO, J.P. and SHAIKH, A. M. and WOLF, M. (2008): “Formalized data snooping based on generalized error rates”, Econometric Theory, 24, 404-447.
Kontakt: Shi Chen