ACM Home Page
Please provide us with feedback. Feedback
Machine learning for information architecture in a large governmental website
Full text PdfPdf (1.49 MB)
Source International Conference on Digital Libraries archive
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries table of contents
Tuscon, AZ, USA
SESSION: Automated techniques for managing collections table of contents
Pages: 151 - 159  
Year of Publication: 2004
ISBN:1-58113-832-6
Authors
Miles Efron  University of North Carolina, Chapel Hill, NC
Jonathan Elsas  University of North Carolina, Chapel Hill, NC
Gary Marchionini  University of North Carolina, Chapel Hill, NC
Junliang Zhang  University of North Carolina, Chapel Hill, NC
Sponsors
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 37,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/996350.996386
What is a DOI?

ABSTRACT

This paper describes ongoing research into the application of machine learning techniques for improving access to governmental information in complex digital libraries. Under the auspices of the GovStat Project, our goal is to identify a small number of semantically valid concepts that adequately spans the intellectual domain of a collection. The goal of this discovery is twofold. First we desire a practical aid for information architects. Second, automatically derived document-concept relationships are a necessary precondition for real-world deployment of many dynamic interfaces. The current study compares concept learning strategies based on three document representations: keywords, titles, and full-text. In statistical and user-based studies, human-created keywords provide significant improvements in concept learning over both title-only and full-text representations.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
A. Agresti. An Introduction to Categorical Data Analysis Wiley, New York, 1996.
2
 
3
4
5
 
6
M. Efron, G. Marchionini, and J. Zhang. Implications of the recursive representation problem for automatic concept identification in on-line governmental information In Proceedings of the ASIST Special Interest Group on Classification Research (ASIST SIG-CR), 2003.
 
7
C. Fraley and A. E. Raftery How many clusters? which clustering method? answers via model-based cluster analysis. The Computer Journal, 41(8):578--588, 1998.
8
 
9
 
10
 
11
I. T. Jolliffe Principal Component Analysis Springer, 2nd edition, 2002.
 
12
L. Kaufman and P. J. Rosseeuw Finding Groups in Data: an Introduction to Cluster Analysis Wiley, 1990.
 
13
G. Marchionini and B. Brunk Toward a general relation browser: a GUI for information architects Journal of Digital Information, 4(1), 2003 http://jodi.ecs.soton.ac.uk/Articles/v04/i01/Marchionini/
 
14
A. K. McCallum Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering http://www cs cmu edu/mccallum/bow, 1996.
 
15
 
16
 
17
R. Tibshirani, G. Walther, and T. Hastie Estimating the number of clusters in a dataset via the gap statistic, 2000 http://citeseer.nj.nec.com/tibshirani00estimating.html
 
18


Collaborative Colleagues:
Miles Efron: colleagues
Jonathan Elsas: colleagues
Gary Marchionini: colleagues
Junliang Zhang: colleagues