ACM Home Page
Please provide us with feedback. Feedback
Learning relational probability trees
Full text PdfPdf (411 KB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Washington, D.C.
POSTER SESSION: Research track table of contents
Pages: 625 - 630  
Year of Publication: 2003
ISBN:1-58113-737-0
Authors
Jennifer Neville  University of Massachusetts, Amherst, MA
David Jensen  University of Massachusetts, Amherst, MA
Lisa Friedland  University of Massachusetts, Amherst, MA
Michael Hay  University of Massachusetts, Amherst, MA
Sponsors
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 14,   Downloads (12 Months): 73,   Citation Count: 30
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/956750.956830
What is a DOI?

ABSTRACT

Classification trees are widely used in the machine learning and data mining communities for modeling propositional data. Recent work has extended this basic paradigm to probability estimation trees. Traditional tree learning algorithms assume that instances in the training data are homogenous and independently distributed. Relational probability trees (RPTs) extend standard probability estimation trees to a relational setting in which data instances are heterogeneous and interdependent. Our algorithm for learning the structure and parameters of an RPT searches over a space of relational features that use aggregation functions (e.g. AVERAGE, MODE, COUNT) to dynamically propositionalize relational data and create binary splits within the RPT. Previous work has identified a number of statistical biases due to characteristics of relational data such as autocorrelation and degree disparity. The RPT algorithm uses a novel form of randomization test to adjust for these biases. On a variety of relational learning tasks, RPTs built using randomization tests are significantly smaller than other models and achieve equivalent, or better, performance.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
 
4
 
5
 
6
 
7
 
8
D. Jensen, J. Neville and M. Hay. Avoiding bias when aggregating relational data with degree disparity. Proc. of the 20th Intl Joint Conf. on Machine Learning, to appear.
 
9
 
10
S. Kramer. Structural regression trees. Proc. of the 13th National Conference on Artificial Intelligence, 812--819, 1996.
 
11
 
12
 
13
J. Neville, D. Jensen, B. Gallagher and R. Fairgrieve. Simple estimators for relational Bayesian classifiers. University of Massachusetts Amherst, Tech Report 03--04, 2003.
 
14
F. Provost and P. Domingos. Well-trained PETs: Improving probability estimation trees. CDER Working Paper #00-04-IS, Stern School of Business, NYU, 2000.
 
15

CITED BY  30
Collaborative Colleagues:
Jennifer Neville: colleagues
David Jensen: colleagues
Lisa Friedland: colleagues
Michael Hay: colleagues