ACM Home Page
Please provide us with feedback. Feedback
Towards domain-independent information extraction from web tables
Full text PdfPdf (432 KB)
Source
International World Wide Web Conference archive
Proceedings of the 16th international conference on World Wide Web table of contents
Banff, Alberta, Canada
SESSION: Data mining table of contents
Pages: 71 - 80  
Year of Publication: 2007
ISBN:978-1-59593-654-7
Authors
Wolfgang Gatterbauer  Vienna University of Technology, Vienna, Austria
Paul Bohunsky  Vienna University of Technology, Vienna, Austria
Marcus Herzog  Vienna University of Technology, Vienna, Austria
Bernhard Krüpl  Vienna University of Technology, Vienna, Austria
Bernhard Pollak  Vienna University of Technology, Vienna, Austria
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 14,   Downloads (12 Months): 208,   Citation Count: 7
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1242572.1242583
What is a DOI?

ABSTRACT

Traditionally, information extraction from web tables has focused on small, more or less homogeneous corpora, often based on assumptions about the use of


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
 
4
M. Banko, M. J. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. Open information extraction from the Web. In Proc. 20th IJCAI, pp. 2670--2676, Jan. 2007.
 
5
 
6
D. Cai, S. Yu, J.-R. Wen, and W.-Y. Ma. Extracting content structure for web pages based on visual representation. In Proc. 5th AP Web, pp. 406--417. Springer, Apr. 2003.
 
7
8
 
9
M. Cosulschi, N. Constantinescu, and M. Gabroveanu. Classification and comparison of information structures from a web page. The Annals of the University of Craiova, 31:109--121, 2004.
 
10
 
11
 
12
D.W. Embley, M. Hurst, D. P. Lopresti, and G. Nagy. Table-processing paradigms: a research survey. IJDAR, 8(2-3):66--86, June 2006.
 
13
D.W. Embley, D.P. Lopresti, and G. Nagy. Notes on contemporary table recognition. In Proc. 7th Int. Workshop on Document Analysis Systems (DAS), pp. 164--175. Springer, Feb. 2006.
 
14
O. Etzioni, M.J. Cafarella, D. Downey, A.-M. Popescu, T. Shaked, S. Soderland, D.S. Weld, and A. Yates. Methods for domain-independent information extraction from the Web: An experimental comparison. In Proc. 19th AAAI, pp. 391--398. AAAI Press/MIT Press, July 2004.
 
15
W. Gatterbauer and P. Bohunsky. Table extraction using spatial reasoning on the CSS2 visual box model. In Proc. 21st AAAI, pp. 1313--1318. AAAI Press, July 2006.
 
16
 
17
 
18
M. Hurst. Layout and language: Challenges for table understanding on the Web. In Proc. 1st WDA at 6th ICDAR, pp. 27--30, Sept. 2001.
 
19
20
21
22
23
24
 
25
 
26
A. Pivk, P. Cimiano, and Y. Sure. From tables to frames. Journal of Web Semantics, 3(2--3):132--146, 2005.
 
27
B. Pollak and W. Gatterbauer. Creating permanent test sets of web pages for information extraction research. In Proc. 33rd SOFSEM: Theory and Practice of Computer Science, volII, pp. 103--115, Jan. 2007.
28
 
29
 
30
 
31
C. Vanoirbeek. Formatting structured tables. In Proc. of Electronic Publishing'92, pp. 291--309. Cambridge University Press, Apr. 1992.
 
32
33
 
34
H. Wium Lie, B. Bos, C. Lilley, and I. Jacobs. Cascading Style Sheets, level 2. Technical report, World WideSS2.
 
35
T. Wohlberg. Hypertables: Development of a structure description language for tables in XML. Master thesis, University of Hamburg, Germany, 1999.(Original title in German: Hypertables: Entwicklung einer Strukturbeschreibungssprache für Tabellen in XML).
36
 
37
 
38
M. Yoshida, K. Torisawa, and J. Tsujii. A method to integrate tables of the world wide web. In Proc. 1st WDA at 6th ICDAR, pp. 31--34, Sept. 2001.
 
39
40
41


Collaborative Colleagues:
Wolfgang Gatterbauer: colleagues
Paul Bohunsky: colleagues
Marcus Herzog: colleagues
Bernhard Krüpl: colleagues
Bernhard Pollak: colleagues