ACM Home Page
Please provide us with feedback. Feedback
Extrapolation errors in linear model trees
Full text PdfPdf (553 KB)
Source
ACM Transactions on Knowledge Discovery from Data (TKDD) archive
Volume 1 ,  Issue 2  (August 2007) table of contents
Article No. 6  
Year of Publication: 2007
ISSN:1556-4681
Authors
Wei-Yin Loh  University of Wisconsin, Madison, WI
Chien-Wei Chen  University of Wisconsin, Madison, WI
Wei Zheng  University of Wisconsin, Madison, WI
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 89,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1267066.1267067
What is a DOI?

ABSTRACT

Prediction errors from a linear model tend to be larger when extrapolation is involved, particularly when the model is wrong. This article considers the problem of extrapolation and interpolation errors when a linear model tree is used for prediction. It proposes several ways to curtail the size of the errors, and uses a large collection of real datasets to demonstrate that the solutions are effective in reducing the average mean squared prediction error. The article also provides a proof that, if a linear model is correct, the proposed solutions have no undesirable effects as the training sample size tends to infinity.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Aaberge, R., Colombino, U., and Strom, S. 1999. Labor supply in Italy: An empirical analysis of joint household decisions, with taxes and quantity constraints. J. Appl. Econom. 14, 403--422.
 
2
 
3
Belsley, D. A., Kuh, E., and Welsch, R. E. 1980. Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. Wiley, New York.
 
4
Berndt, E. R. 1991. The Practice of Econometrics. Addison-Wesley, New York.
 
5
Blake, C. and Merz, C. 1998. UCI Repository of Machine Learning Databases, http://www.ics.uci.edu/~mlearn/MLRepository.html.
 
6
Bollino, C. A., Perali, F., and Rossi, N. 2000. Linear household technologies. J. Appl. Econom. 15, 253--274.
 
7
 
8
Breiman, L. and Friedman, J. 1988. Estimating optimal transformations for multiple regression and correlation. J. Amer. Stat. Assoc. 83, 580--597.
 
9
Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. 1984. Classification and Regression Trees. Wadsworth, Belmont, CA.
 
10
 
11
Chattopadhyay, S. 2003. Divergence in alternative Hicksian welfare measures: The case of revealed preference for public amenities. J. Appl. Econom. 17, 641--666.
 
12
Chu, S. 2001. Pricing the C's of diamond stones. J. Stat. Educat. 9. http://www.amstat.org/publications/jse.
 
13
Cochran, J. J. 2000. Career records for all modern position players eligible for the Major League Baseball Hall of Fame. J. Stat. Educat. 8. http://www.amstat.org/publications/jse.
 
14
Cochran, J. J. 2002. Data management, exploratory data analysis, and regression analysis with 1969--2000 Major League Baseball Attendance. J. Stat. Educat. 10. http://www.amstat.org/publications/jse.
 
15
Cook, D. 1998. Regression Graphics: Ideas for Studying Regression Through Graphics. Wiley, New York.
 
16
Cook, D. and Weisberg, S. 1994. An Introduction to Regression Graphics. Wiley, New York.
 
17
Deb, P. and Trivedi, P. K. 1997. Demand for medical care by the elderly: A finite mixture approach. J. Appl. Econom. 12, 313--336.
 
18
Denman, N. and Gregory, D. 1998. Analysis of sugar cane yields in the Mulgrave area, for the 1997 sugar cane season. Tech. rep., MS305 Data Analysis Project, Department of Mathematics, University of Queensland, Queensland, Australia.
 
19
Delgado, M. A. and Mora, J. 1998. Testing non-nested semiparametric models: An application to Engel curves specification. J. Appl. Econom. 13, 145--162.
 
20
Fernandez, C., Ley, E., and Steel, M. F. J. 2002. Bayesian modelling of catch in a north-west Atlantic fishery. Appl. Stat. 51, 257--280.
 
21
Friedman, J. 1991. Multivariate adaptive regression splines (with discussion). Ann. Stat. 19, 1--141.
 
22
 
23
Hallin, M. and Ingenbleek, J.-F. 1983. The Swedish automobile portfolio in 1977: A statistical study. Scand. Actuarial J. 83, 49--64.
 
24
Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J., and Stahel, W. A. 1986. Robust Statistics: The Approach Based on Influence Functions. Wiley, New York.
 
25
 
26
Hastie, T. and Tibshirani, R. 1990. Generalized Additive Models. CRC Press.
 
27
Horrace, W. C. and Schmidt, P. 2000. Multiple comparisons with the best, with economic applications. J. Appl. Econom. 15, 1--26.
 
28
Kenkel, D. S. and Terza, J. V. 2001. The effect of physician advice on alcohol consumption: Count regression with an endogenous treatment effect. J. Appl. Economet. 16, 165--184.
 
29
Kim, H., Loh, W.-Y., Shih, Y.-S., and Chaudhuri, P. 2007. A visualizable and interpretable regression model with good prediction power. IIE Transactions 39, 565--579.
 
30
Lai, T. L., Robbins, H., and Wei, C. Z. 1977. Strong consistency of least squares estimates in multiple regression. Proc. Nat. Acad. Sci., USA 75, 3034--3036.
 
31
Laroque, G. and Salanie, B. 2002. Labor market institutions and employment in France. J. Appl. Econom. 17, 25--28.
 
32
Liu, Z. and Stengos, T. 1999. Non-linearities in cross country growth regressions: A semiparametric approach. J. Appl. Econom. 14, 527--538.
 
33
Loh, W.-Y. 2002. Regression trees with unbiased variable selection and interaction detection. Stat. Sinica 12, 361--386.
 
34
Lutkepohl, H., Terasvirta, T., and Wolters, J. 1999. Investigating stability and linearity of a German M1 money demand function. J. Appl. Econom. 14, 511--525.
 
35
Martins, M. F. O. 2001. Parametric and semiparametric estimation of sample selection models: An empirical application to the female labour force in Portugal. J. Appl. Economet. 16, 23--40.
 
36
Neter, J., Kutner, M. H., Nachtsheim, C. J., and Wasserman, W. 1996. Applied Linear Statistical Models, 4th ed. Irwin.
 
37
Olson, C. A. 1998. A comparison of parametric and semiparametric estimates of the effect of spousal health insurance coverage on weekly hours worked by wives. J. Appl. Econom. 13, 543--565.
 
38
Onoyama, K., Ohsumi, N., Mitsumochi, N., and Kishihara, T. 1998. Data analysis of deer-train collisions in eastern Hokkaido, Japan. In Data Science, Classification, and Related Methods, (Tokyo, Japan) C. Hayashi, N. Ohsumi, K. Yajima, Y. Tanaka, H.-H. Bock, and Y. Baba, Eds. Springer-Verlag, New York, 746--751.
 
39
Pace, R. K. and Barry, R. 1997. Sparse spatial autoregressions. Stat. Probab. Lett. 33, 291--297.
 
40
Penrose, K., Nelson, A., and Fisher, A. 1985. Generalized body composition prediction equation for men using simple measurement techniques. Med. Sci. Sports Exer. 17, 189.
 
41
Quinlan, J. R. 1992. Learning with continuous classes. In Proceedings of the Australian Joint Conference on Artificial Intelligence (Singapore), World Scientific, 343--348.
 
42
R Development Core Team. 2005. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, (Vienna, Austria). ISBN 3-900051-07-0.
 
43
Rawlings, J. O. 1988. Applied Regression Analysis: A Research Tool. Wadsworth & Brooks/Cole Advanced Books & Software.
 
44
Schafgans, M. M. 1998. Ethnic wage differences in Malaysia: Parametric and semiparametric estimation of the Chinese-Malay wage gap. J. Appl. Econom. 13, 481--504.
 
45
Simonoff, J. 1996. Smoothing Methods in Statistics. Springer-Verlag, New York.
 
46
Torgo, L. 1999. Inductive Learning of Tree-Based Regression Models. PhD thesis, Department of Computer Science, Faculty of Sciences, University of Porto.
 
47
Wang, Y. and Witten, I. 1997. Inducing model trees for continuous classes. In Proceedings of the Poster Papers of the European Conference on Machine Learning (Prague).
 
48
Weiss, S. and Indurkhya, N. 1995. Rule-based machine learning methods for functional prediction. J. Artif. Int. Res. 3, 383--403.
 
49

Collaborative Colleagues:
Wei-Yin Loh: colleagues
Chien-Wei Chen: colleagues
Wei Zheng: colleagues