ACM Home Page
Please provide us with feedback. Feedback
SemTag and seeker: bootstrapping the semantic web via automated semantic annotation
Full text PdfPdf (178 KB)
Source International World Wide Web Conference archive
Proceedings of the 12th international conference on World Wide Web table of contents
Budapest, Hungary
SESSION: Establishing the semantic web 1 table of contents
Pages: 178 - 186  
Year of Publication: 2003
ISBN:1-58113-680-3
Authors
Stephen Dill  IBM Almaden Research Center, San Jose, CA
Nadav Eiron  IBM Almaden Research Center, San Jose, CA
David Gibson  IBM Almaden Research Center, San Jose, CA
Daniel Gruhl  IBM Almaden Research Center, San Jose, CA
R. Guha  IBM Almaden Research Center, San Jose, CA
Anant Jhingran  IBM Almaden Research Center, San Jose, CA
Tapas Kanungo  IBM Almaden Research Center, San Jose, CA
Sridhar Rajagopalan  IBM Almaden Research Center, San Jose, CA
Andrew Tomkins  IBM Almaden Research Center, San Jose, CA
John A. Tomlin  IBM Almaden Research Center, San Jose, CA
Jason Y. Zien  IBM Almaden Research Center, San Jose, CA
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 37,   Downloads (12 Months): 269,   Citation Count: 61
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/775152.775178
What is a DOI?

ABSTRACT

This paper describes Seeker, a platform for large-scale text analytics, and SemTag, an application written on the platform to perform automated semantic tagging of large corpora. We apply SemTag to a collection of approximately 264 million web pages, and generate approximately 434 million automatically disambiguated semantic tags, published to the web as a label bureau providing metadata regarding the 434 million annotations. To our knowledge, this is the largest scale semantic tagging effort to date.We describe the Seeker platform, discuss the architecture of the SemTag application, describe a new disambiguation algorithm specialized to support ontological disambiguation of large-scale data, evaluate the algorithm, and present our final results with information about acquiring and making use of the semantic tags. We argue that automated large scale semantic tagging of ambiguous content can bootstrap and accelerate the creation of the semantic web.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J. Wiener. The lorel query language for semistructured data. International Journal of Digital Libraries, 1(1):68--88, 1997.
2
 
3
AltaVista. http://www.altavista.com.
 
4
 
5
T. Berners-Lee, J. Hendler, and O. Lassila. Semantic web. Scientific American, 1(1):68--88, 2000.
 
6
D. Box, D. Ehnebuske, G. Kakivaya, A. Layman, N. Mendelsohn, H. F. Nielsen, S. Thatte, and D. Winder. Simple Object Access Protocol. http://www.w3.org/TR/SOAP/, May 2000.
 
7
D. Brickley and R.V.Guha. Rdf schema. http://www.w3.org/TR/rdf-schema/.
 
8
 
9
C. Clarke, G. Cormack, and F. Burkowski. Shortest substring ranking. In Proceedings of the Fourth Text Retrieval Conference, pages 295--304, Gaithersburg, MD, November 1995.
 
10
W. Cohen and L. Jensen. A structured wrapper induction system for extracting information from semi-structured documents. In Proceedings of the Workshop on Adaptive Text Extraction and Mining (IJCAI'01), 2001.
 
11
M. Erdmann, A. Maedche, H. Schnurr, and S. Staab. From manual to semi-automatic semantic annotation: About ontology-based text annotation tools. In P. Buitelaar and K. Hasida, editors, Proceedings of the COLING 2000 Workshop on Semantic Annotation and Intelligent Content, August 2000.
 
12
Google. http://www.google.com.
 
13
T. R. Gruber. Towards Principles for the Design of Ontologies Used for Knowledge Sharing. In N. Guarino and R. Poli, editors, Formal Ontology in Conceptual Analysis and Knowledge Representation, Deventer, The Netherlands, 1993. Kluwer Academic Publishers.
 
14
J. Heflin and J. Hendler. Searching the web with shoe. In AAAI-2000 Workshop on AI for Web Search, 2000.
 
15
J. M. Hellerstein, M. J. Franklin, S. Chandrasekaran, A. Deshpande, K. Hilldrum, D. Maden, V. Raman, and M. A. Shah. Adaptive query processing: Technology in evolution. IEEE Data Engineering Bulletin, 23(2):7--18, 2000.
 
16
17
 
18
N. Kushmerick, D. S. Weld, and R. B. Doorenbos. Wrapper induction for information extraction. In Intl. Joint Conference on Artificial Intelligence (IJCAI), pages 729--737, 1997.
 
19
T. Leonard and H. Glaser. Large scale acquisition and maintenance from the web without source access. http://semannot2001.aifb.uni-karlsruhe.de/positionpapers/Leonard.pdf, 2001.
 
20
K. Lerman, C. Knoblock, and S. Minton. Automatic data extraction from lists and tables in web sources. In IJCAI-2001 Workshop on Adaptive Text Extraction and Mining, August 2001.
 
21
 
22
J. Li, L. Zhang, and Y. Yu. Learning to generate semantic annotation for domain specific sentences. http://semannot2001.aifb.uni-karlsruhe.de/positionpapers/GenerateSemAnnot.pdf.
 
23
P. K. Lockheed. AeroDAML: Applying information extraction to generate DAML annotations from web pages.
 
24
D. L. McGuinness. Description logics emerge from ivory towers. In Description Logics, 2001.
 
25
 
26
R. Mihalcea. Word sense disambiguation and its application to the internet search. Master's thesis, Southern Methodist University, 1999.
 
27
A. Newell. Some problems of the basic organization in problem-solving programs. In Proceedings of the Second Conference on Self-Organizing Systems, pages 393--423, Washington, DC, 1962.
 
28
 
29
J. Pustejovsky, B. Boguraev, M. Verhagen, P. Buitelaar, and M. Johnston. Semantic indexing and typed hyperlinking. In Proceedings of the American Association for Artificial Intelligence Conference, Spring Symposium, NLP for WWW, pages 120--128, 1997.
 
30
R.Guha and R. McCool. Tap: Towards a web of data. http://tap.stanford.edu/.
 
31
E. Riloff and J. Shepherd. A corpus-based approach for building semantic lexicons. In Proceedings of the Second Conference on Empirical Methods in Natural Language Processing (EMNLP-97), pages 117--124, Providence, RI, 1997.
 
32
 
33
 
34
S. Staab, A. Maedche, and S. Handschuh. An annotation framework for the semantic web. In S. Isjizaki, editor, Proceedings of the First Workshop on Multimedia Annotation, Tokyo, Japan, January 2001.
 
35
The Internet Archive. http://www.archive.org.
 
36
 
37
W3C. Platform for internet content selection. http://www.w3.org/PICS/.
 
38
W3C. Web ontology language. http://www.w3.org/2001/sw/WebOnt/.
 
39
Web-in-a-Box. http://research.compaq.com/SRC/WebArcheology/wib.html.
 
40
Y. Wilks and M. Stevenson. Sense tagging: Semantic tagging with a lexicon. In Proceedings of the SIGLEX Workshop Tagging Text with Lexical Semantics: What, why and how?, pages 47--51, 1997.

CITED BY  61

Collaborative Colleagues:
Stephen Dill: colleagues
Nadav Eiron: colleagues
David Gibson: colleagues
Daniel Gruhl: colleagues
R. Guha: colleagues
Anant Jhingran: colleagues
Tapas Kanungo: colleagues
Sridhar Rajagopalan: colleagues
Andrew Tomkins: colleagues
John A. Tomlin: colleagues
Jason Y. Zien: colleagues