ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
Building a web thesaurus from web link structure
Full text PdfPdf (292 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval table of contents
Toronto, Canada
SESSION: Web table of contents
Pages: 48 - 55  
Year of Publication: 2003
ISBN:1-58113-646-3
Authors
Zheng Chen  Microsoft Research Asia, Beijing, China
Shengping Liu  Peking University, Beijing, China
Liu Wenyin  City Univ. of Hong Kong, Kowloon, Hong Kong
Geguang Pu  Peking University, Beijing, China
Wei-Ying Ma  Microsoft Research Asia, Beijing, China
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 13,   Downloads (12 Months): 84,   Citation Count: 9
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/860435.860447
What is a DOI?

ABSTRACT

Thesaurus has been widely used in many applications, including information retrieval, natural language processing, and question answering. In this paper, we propose a novel approach to automatically constructing a domain-specific thesaurus from the Web using link structure information. The proposed approach is able to identify new terms and reflect the latest relationship between terms as the Web evolves. First, a set of high quality and representative websites of a specific domain is selected. After filtering out navigational links, link analysis is applied to each website to obtain its content structure. Finally, the thesaurus is constructed by merging the content structures of the selected websites. The experimental results on automatic query expansion based on our constructed thesaurus show 20% improvement in search precision compared to the baseline.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
4
5
 
6
 
7
 
8
G. A. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. J. Miller. WordNet: An On-line Lexical Database, International Journal of Lexicography, Vol. 3, No. 4, 1990.
 
9
10
11
 
12
13
14
 
15
K. Efe, V. V. Raghavan, C. H. Chu, A. L. Broadwater, L. Bolelli, and S. Ertekin. The shape of the web and its implications for searching the web. In Proc. of SSGRR, 2000.
 
16
 
17
L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank Citation Ranking: Bring Order to the Web. Technical Report, Stanford University, 1998.
 
18
19
 
20
Natural Language Processing Group, Microsoft Research. Tools for Large-Scale Parser Development. In Proc. of COLING 2000.
 
21
 
22
23
24
 
25
S. E. Robertson and S. Walker. Microsoft Cambridge at TREC-9: Filtering track. In TREC-9, 2000.
 
26
 
27
wordHOARD, http://www.mda.org.uk/wrdhrd1.htm.

CITED BY  9

Collaborative Colleagues:
Zheng Chen: colleagues
Shengping Liu: colleagues
Liu Wenyin: colleagues
Geguang Pu: colleagues
Wei-Ying Ma: colleagues