|
ABSTRACT
It is often expensive to acquire data in real-world data mining applications. Most previous data mining and machine learning research, however, assumes that a fixed set of training examples is given. In this paper, we propose an online cost-sensitive framework that allows a learner to dynamically acquire examples as it learns, and to decide the ideal number of examples needed to minimize the total cost. We also propose a new strategy for Partial Example Acquisition (PAS), in which the learner can acquire examples with a subset of attribute values to reduce the data acquisition cost. Experiments on UCI datasets show that the new PAS strategy is an effective method in reducing the total cost for data acquisition.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Blake, C. L., and Merz, C. J. 1998. UCI Repository of machine learning databases (website). Irvine, CA: University of California, Department of Information and Computer Science.
|
| |
3
|
Cestnik, B. 1990. Estimating probabilities: A crucial task in machine learning. In Proceedings of the 9th European Conference on Artificial Intelligence, 147--149, Sweden.
|
| |
4
|
|
 |
5
|
|
| |
6
|
Elkan, C. 2001. The Foundations of Cost-Sensitive Learning. In Proceedings of the Seventeenth International Joint Conference of Artificial Intelligence, 973--978. Seattle, Washington: Morgan Kaufmann.
|
| |
7
|
Fayyad, U. M., and Irani, K. B. 1993. Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the 13th International Joint Conference on Artificial Intelligence, 1022--1027. France: Morgan Kaufmann.
|
| |
8
|
Good, I. J. 1965. The estimation of probabilities: An essay on modern Bayesian methods. M.I.T. Press, Cambridge, Mass.
|
| |
9
|
Kapoor, A., and Greiner, R. 2005. Learning and Classifying under Hard Budgets. In Proceedings of the 16th European Conference on Machine Learning (Porto, Portugal), Springer, 170--181.
|
| |
10
|
Lizotte, D., Madani, O., and Greiner R. 2003. Budgeted Learning of Naive--Bayes Classifiers. In Proceeding of the Conference on Uncertainty in Artificial Intelligence, Acapulco, Mexico, August 2003.
|
 |
11
|
Charles X. Ling , Qiang Yang , Jianning Wang , Shichao Zhang, Decision trees with minimal costs, Proceedings of the twenty-first international conference on Machine learning, p.69, July 04-08, 2004, Banff, Alberta, Canada
[doi> 10.1145/1015330.1015369]
|
| |
12
|
Margineantu, D.D. 2005. Active Cost-Sensitive Learning. In Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence. Edinburgh, Scotland.
|
| |
13
|
|
 |
14
|
Prem Melville , Foster Provost , Maytal Saar-Tsechansky , Raymond Mooney, Economical active feature-value acquisition through Expected Utility estimation, Proceedings of the 1st international workshop on Utility-based data mining, p.10-16, August 21-21, 2005, Chicago, Illinois
[doi> 10.1145/1089827.1089828]
|
| |
15
|
|
| |
16
|
|
| |
17
|
|
| |
18
|
|
| |
19
|
|
| |
20
|
Turney, P. D. 2000. Types of cost in inductive concept learning. In Proceedings of the Workshop on Cost-Sensitive Learning at the Seventeenth International Conference on Machine Learning, Stanford University, California.
|
| |
21
|
Turney, P. D. 1995. Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm. Journal of Artificial Intelligence Research 2:369--409.
|
| |
22
|
|
| |
23
|
|
| |
24
|
|
INDEX TERMS
Primary Classification:
I.
Computing Methodologies
I.2
ARTIFICIAL INTELLIGENCE
I.2.6
Learning
Subjects:
Induction
General Terms:
Algorithms,
Economics,
Measurement,
Performance
Keywords:
active cost-sensitive learning,
active learning,
cost-sensitive learning,
data acquisition,
data mining,
induction,
interactive and online data mining,
machine learning
|