| Mining for the most certain predictions from dyadic data |
| Full text |
Mov
(18:25),
Pdf
(439 KB)
|
Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
table of contents
Paris, France
SESSION: Research track papers
table of contents
Pages 249-258
Year of Publication: 2009
ISBN:978-1-60558-495-9
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 42, Downloads (12 Months): 116, Citation Count: 0
|
|
|
ABSTRACT
In several applications involving regression or classification, along with making predictions it is important to assess how accurate or reliable individual predictions are. This is particularly important in cases where due to finite resources or domain requirements, one wants to make decisions based only on the most reliable rather than on the entire set of predictions. This paper introduces novel and effective ways of ranking predictions by their accuracy for problems involving large-scale, heterogeneous data with a dyadic structure, i.e., where the independent variables can be naturally decomposed into three groups associated with two sets of elements and their combination. These approaches are based on modeling the data by a collection of localized models learnt while simultaneously partitioning (co-clustering) the data. For regression this leads to the concept of "certainty lift". We also develop a robust predictive modeling technique that identifies and models only the most coherent regions of the data to give high predictive accuracy on the selected subset of response values. Extensive experimentation on real life datasets highlights the utility of our proposed approaches.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
| |
3
|
|
| |
4
|
M. Blum, R. Floyd, V. Pratt, R. Rivest, and R. Tarjan. Time bounds for selection. J. Comput. Syst. Sci., 7(4):448--461, 1973.
|
| |
5
|
G. Cawley, N. Talbot, and O. Chapelle. Estimating predictive variances with kernel ridge regression. In Machine Learning Challenges '06, pages 56--77, 2006.
|
| |
6
|
C. Chow. On optimum recognition error and reject tradeoff. IEEE Transactions on Information Theory, 16:41--46, 1970.
|
 |
7
|
|
 |
8
|
Meghana Deodhar , Gunjan Gupta , Joydeep Ghosh , Hyuk Cho , Inderjit Dhillon, A scalable framework for discovering coherent co-clusters in noisy data, Proceedings of the 26th Annual International Conference on Machine Learning, p.241-248, June 14-18, 2009, Montreal, Quebec, Canada
[doi> 10.1145/1553374.1553405]
|
| |
9
|
C. Ferri and J. Hernandez-Orallo. Cautious classifiers. In ROCAI '04, pages 27--36, 2004.
|
| |
10
|
J. Fox. Applied Regression Analysis, Linear Models, and Related Methods. Sage Publications, 1997.
|
| |
11
|
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer, New York, 2001.
|
| |
12
|
T. Heskes. Practical confidence and prediction intervals. In NIPS '97, pages 176--82, 1997.
|
| |
13
|
J.Leonard, M.Kramer, and L.Ungar. Using radial basis functions to approximate a function and its error bounds. IEEE Transactions on Neural Networks, 3:4:624--627, July 1992.
|
| |
14
|
J. Johnston. Econometric Methods. New York: McGraw-Hill, 1963.
|
| |
15
|
B. Kim and M. Sullivan. The effect of parent brand experience on line extension trial and repeat purchase. Marketing Letters, pages 181 -- 193, 1998.
|
| |
16
|
|
| |
17
|
A. Nix and A. Weigend. Estimating the mean and variance of the target probability distribution. In Neural Networks '94, pages 55--60, 1994.
|
| |
18
|
|
| |
19
|
P. Seetharaman, A. Ainslie, and P. Chintagunta. Investigating household state dependence effects across categories. JMR, pages 488 -- 500, 1999.
|
| |
20
|
|
| |
21
|
|
|