| Building a web thesaurus from web link structure |
| Full text |
Pdf
(292 KB)
|
| Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
table of contents
Toronto, Canada
Pages: 48 - 55
Year of Publication: 2003
ISBN:1-58113-646-3
|
|
Authors
|
|
Zheng Chen
|
Microsoft Research Asia, Beijing, China
|
|
Shengping Liu
|
Peking University, Beijing, China
|
|
Liu Wenyin
|
City Univ. of Hong Kong, Kowloon, Hong Kong
|
|
Geguang Pu
|
Peking University, Beijing, China
|
|
Wei-Ying Ma
|
Microsoft Research Asia, Beijing, China
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 9, Downloads (12 Months): 93, Citation Count: 9
|
|
|
ABSTRACT
Thesaurus has been widely used in many applications, including information retrieval, natural language processing, and question answering. In this paper, we propose a novel approach to automatically constructing a domain-specific thesaurus from the Web using link structure information. The proposed approach is able to identify new terms and reflect the latest relationship between terms as the Web evolves. First, a set of high quality and representative websites of a specific domain is selected. After filtering out navigational links, link analysis is applied to each website to obtain its content structure. Finally, the thesaurus is constructed by merging the content structures of the selected websites. The experimental results on automatic query expansion based on our constructed thesaurus show 20% improvement in search precision compared to the baseline.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Allan Borodin , Gareth O. Roberts , Jeffrey S. Rosenthal , Panayiotis Tsaparas, Finding authorities and hubs from link structures on the World Wide Web, Proceedings of the 10th international conference on World Wide Web, p.415-429, May 01-05, 2001, Hong Kong, Hong Kong
[doi> 10.1145/371920.372096]
|
 |
2
|
|
 |
3
|
David Durand , Paul Kahn, MAPA: a system for inducing and visualizing hierarchy in Websites, Proceedings of the ninth ACM conference on Hypertext and hypermedia : links, objects, time and space---structure in hypermedia systems: links, objects, time and space---structure in hypermedia systems, p.66-76, June 20-24, 1998, Pittsburgh, Pennsylvania, United States
[doi> 10.1145/276627.276635]
|
 |
4
|
David Gibson , Jon Kleinberg , Prabhakar Raghavan, Inferring Web communities from link topology, Proceedings of the ninth ACM conference on Hypertext and hypermedia : links, objects, time and space---structure in hypermedia systems: links, objects, time and space---structure in hypermedia systems, p.225-234, June 20-24, 1998, Pittsburgh, Pennsylvania, United States
[doi> 10.1145/276627.276652]
|
 |
5
|
Eric J. Glover , Kostas Tsioutsiouliklis , Steve Lawrence , David M. Pennock , Gary W. Flake, Using web structure for classifying and describing web pages, Proceedings of the 11th international conference on World Wide Web, May 07-11, 2002, Honolulu, Hawaii, USA
[doi> 10.1145/511446.511520]
|
| |
6
|
|
| |
7
|
|
| |
8
|
G. A. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. J. Miller. WordNet: An On-line Lexical Database, International Journal of Lexicography, Vol. 3, No. 4, 1990.
|
| |
9
|
|
 |
10
|
|
 |
11
|
Jinlin Chen , Baoyao Zhou , Jin Shi , Hongjiang Zhang , Qiu Fengwu, Function-based object model towards website adaptation, Proceedings of the 10th international conference on World Wide Web, p.587-596, May 01-05, 2001, Hong Kong, Hong Kong
[doi> 10.1145/371920.372161]
|
| |
12
|
|
 |
13
|
|
 |
14
|
|
| |
15
|
K. Efe, V. V. Raghavan, C. H. Chu, A. L. Broadwater, L. Bolelli, and S. Ertekin. The shape of the web and its implications for searching the web. In Proc. of SSGRR, 2000.
|
| |
16
|
|
| |
17
|
L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank Citation Ranking: Bring Order to the Web. Technical Report, Stanford University, 1998.
|
| |
18
|
|
 |
19
|
|
| |
20
|
Natural Language Processing Group, Microsoft Research. Tools for Large-Scale Parser Development. In Proc. of COLING 2000.
|
| |
21
|
|
| |
22
|
Stephen D. Richardson , William B. Dolan , Lucy Vanderwende, MindNet: acquiring and structuring semantic information from text, Proceedings of the 17th international conference on Computational linguistics, p.1098-1102, August 10-14, 1998, Montreal, Quebec, Canada
|
 |
23
|
Soumen Chakrabarti , Byron Dom , Piotr Indyk, Enhanced hypertext categorization using hyperlinks, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.307-318, June 01-04, 1998, Seattle, Washington, United States
|
 |
24
|
Soumen Chakrabarti , Mukul M. Joshi , Kunal Punera , David M. Pennock, The structure of broad topics on the web, Proceedings of the 11th international conference on World Wide Web, May 07-11, 2002, Honolulu, Hawaii, USA
[doi> 10.1145/511446.511480]
|
| |
25
|
S. E. Robertson and S. Walker. Microsoft Cambridge at TREC-9: Filtering track. In TREC-9, 2000.
|
| |
26
|
|
| |
27
|
wordHOARD, http://www.mda.org.uk/wrdhrd1.htm.
|
CITED BY 9
|
|
|
|
|
Dou Shen , Zheng Chen , Qiang Yang , Hua-Jun Zeng , Benyu Zhang , Yuchang Lu , Wei-Ying Ma, Web-page classification through summarization, Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, July 25-29, 2004, Sheffield, United Kingdom
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Masahiro Ito , Kotaro Nakayama , Takahiro Hara , Shojiro Nishio, Association thesaurus construction methods based on link co-occurrence analysis for wikipedia, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|