| Semantic deep web: automatic attribute extraction from the deep web data sources |
| Full text |
Pdf
(218 KB)
|
| Source
|
Symposium on Applied Computing
archive
Proceedings of the 2007 ACM symposium on Applied computing
table of contents
Seoul, Korea
SESSION: Web technologies
table of contents
Pages: 1667 - 1672
Year of Publication: 2007
ISBN:1-59593-480-4
|
|
Authors
|
|
Yoo Jung An
|
New Jersey Institute of Technology, Newark, NJ
|
|
James Geller
|
New Jersey Institute of Technology, Newark, NJ
|
|
Yi-Ta Wu
|
University of Michigan, Ann Arbor, MI
|
|
Soon Ae Chun
|
CUNY, College of Staten Island, Staten Island, NY
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 22, Downloads (12 Months): 219, Citation Count: 1
|
|
|
ABSTRACT
"Deep Web" refers to the rich information and data hidden in backend databases, etc., that search engines or Web crawlers cannot access. It is mostly accessible through manual query interfaces. This paper introduces the Semantic Deep Web, utilizing an ontology to determine relevance of query interface attributes to access the Deep Web. In addition, we present a novel approach to automatically extracting attributes from query interfaces in order to address the current limitations in accessing Deep Web data sources. Our Automatic Attribute Extraction method (1) identifies attributes that are used by query Web page designers, called Programmer Viewpoint Attributes, and (2) attributes that are presented as labels to users, called User Viewpoint Attributes. An ontology enriches the candidate query attributes by providing synonyms and by supporting the attributes used by designers and users. Our experimental results in several e-commerce domains show that the attributes obtained by our algorithm compare favorably with manually determined attributes to be used for Deep Web queries.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
UC Berkeley. Invisible or Deep Web: What it is, Why it exists, How to find it, and Its inherent ambiguity. Available at http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/InvisibleWeb.html, July 2006.
|
| |
4
|
M. K. Bergman, The Deep Web: Surfacing Hidden Value. Available at http://www.brightplanet.com/resources/details/deepweb.html, May 2006.
|
| |
5
|
|
 |
6
|
|
| |
7
|
|
| |
8
|
T. Berners-Lee, J. Hendler, and O. Lassila. The Semantic Web. Scientific American, 284, 5 (May 2001), 34--43.
|
| |
9
|
W. Wu, A. Doan, C. Yu and W. Meng. Bootstrapping Domain Ontology for Semantic Web Services from Source Web Sites. In Proceedings of the 6th VLDB workshop on Technologies for E-Services (TES-05) (Trondheim, Norway, Sep. 2--3, 2005). ACM press, Sep. 2005, 11--12.
|
| |
10
|
|
 |
11
|
|
 |
12
|
Giannis Varelas , Epimenidis Voutsakis , Paraskevi Raftopoulou , Euripides G.M. Petrakis , Evangelos E. Milios, Semantic similarity methods in wordNet and their application to information retrieval on the web, Proceedings of the 7th annual ACM international workshop on Web information and data management, November 04-04, 2005, Bremen, Germany
[doi> 10.1145/1097047.1097051]
|
| |
13
|
K. C. Chang, B. He, C. Li and Z. Zhang. The UIUC web integration repository. Computer Science Department, University of Illinois at Urbana-Champaign. Available at http://metaquerier.cs.uiuc.edu/repository/, 2003.
|
CITED BY
|
|
Manuel Álvarez , Alberto Pan , Juan Raposo , Fernando Bellas , Fidel Cacheda, Extracting lists of data records from semi-structured web pages, Data & Knowledge Engineering, v.64 n.2, p.491-509, February, 2008
|
|