ACM Home Page
Please provide us with feedback. Feedback
Documentum ECI self-repairing wrappers: performance analysis
Full text PdfPdf (291 KB)
Source International Conference on Management of Data archive
Proceedings of the 2006 ACM SIGMOD international conference on Management of data table of contents
Chicago, IL, USA
SESSION: Semantic heterogeneity table of contents
Pages: 708 - 717  
Year of Publication: 2006
ISBN:1-59593-434-0
Authors
Boris Chidlovskii  Xerox Research Centre, France
Bruno Roustant  EMC Documentum, France
Marc Brette  EMC Documentum, France
Sponsors
ACM: Association for Computing Machinery
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 10,   Downloads (12 Months): 72,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1142473.1142555
What is a DOI?

ABSTRACT

Documentum Enterprise Content Integration (ECI) services is a content integration middleware that provides one-query access to the Intranet and Internet content resources. The ECI Adapter technology offers an interface to any application for data and metadata extraction from unstructured Web pages. It offers a unique frame-work of wrapper production, automatic recovery and maintenance, developed at Xerox Research Centre Europe and based on state-of-art algorithms from machine learning and grammatical inference. In this presentation we analyze the performance of ECI adapters deployed in current commercial installations. We benefit from accessing reports on daily tests for all ECI commercially deployed adapters collected from June 2003 to September 2005. Using the daily reports, we analyze different aspects of the wrapper technology.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
D. Bredelet and B. Roustant. Java IWrap: Wrapper Induction by Grammar Learning. Master's thesis, ENSIMAG, Grenoble, France, 2000.
 
4
 
5
6
 
7
 
8
T. G. Dietterich. Machine learning for sequential data: A review. In T. Caelli, editor, Lecture Notes in Computer Science. Springer-Verlag, 2002.
 
9
 
10
Documentum Services ECI Adapter Library. http://www.documentum.com/products/glossary/al.htm.
 
11
Documentum Enterprise Content Integration. http://www.documentum.com/solutions/eci.
 
12
Fetch technologies. http://www.fetch.com/.
13
 
14
A Primer for Building Portlets Using Oracle Dynamic Services. Oracle Portal Development Kit. http://portalstudio.oracle.com/pls/ops/docs/, 2000.
15
16
 
17
 
18
C.-N. Hsu and C.-C. Chang. Finite-State Transducers for Semi-Structured Text Mining. In Proceedings of IJCAI-99 Workshop on Text Mining: Foundations, Techniques and Applications, 1999.
19
 
20
Itemfield. http://www.itemfield.com/.
 
21
R. Kosala, J. den Bussche, M. Bruynooghe, and H. Blockeel. Information extraction in structured documents using tree automata induction, 2002.
 
22
 
23
 
24
 
25
K. Lerman, S. Minton, and C. A. Knoblock. Wrapper maintenance: A machine learning approach. Journal of Artif. Intell. Research (JAIR), 18:149--181, 2003.
 
26
 
27
Lixto software gmbh. http://www.lixto.com/.
28
 
29
 
30
A. Sahuguet and F. Azavant. Building light-weight wrappers for legacy Web data-sources using W4F. The VLDB Journal, pages 738--741, 1999.
 
31
Ian H. Witten and Eibe Frank. Data Mining: Practical machine learning tools and techniques, 2nd Edition. Morgan Kaufmann, San Francisco, 2005.


Collaborative Colleagues:
Boris Chidlovskii: colleagues
Bruno Roustant: colleagues
Marc Brette: colleagues