|
ABSTRACT
Documentum Enterprise Content Integration (ECI) services is a content integration middleware that provides one-query access to the Intranet and Internet content resources. The ECI Adapter technology offers an interface to any application for data and metadata extraction from unstructured Web pages. It offers a unique frame-work of wrapper production, automatic recovery and maintenance, developed at Xerox Research Centre Europe and based on state-of-art algorithms from machine learning and grammatical inference. In this presentation we analyze the performance of ECI adapters deployed in current commercial installations. We benefit from accessing reports on daily tests for all ECI commercially deployed adapters collected from June 2003 to September 2005. Using the daily reports, we analyze different aspects of the wrapper technology.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
| |
3
|
D. Bredelet and B. Roustant. Java IWrap: Wrapper Induction by Grammar Learning. Master's thesis, ENSIMAG, Grenoble, France, 2000.
|
| |
4
|
|
| |
5
|
|
 |
6
|
|
| |
7
|
|
| |
8
|
T. G. Dietterich. Machine learning for sequential data: A review. In T. Caelli, editor, Lecture Notes in Computer Science. Springer-Verlag, 2002.
|
| |
9
|
|
| |
10
|
Documentum Services ECI Adapter Library. http://www.documentum.com/products/glossary/al.htm.
|
| |
11
|
Documentum Enterprise Content Integration. http://www.documentum.com/solutions/eci.
|
| |
12
|
Fetch technologies. http://www.fetch.com/.
|
 |
13
|
|
| |
14
|
A Primer for Building Portlets Using Oracle Dynamic Services. Oracle Portal Development Kit. http://portalstudio.oracle.com/pls/ops/docs/, 2000.
|
 |
15
|
Georg Gottlob , Christoph Koch , Robert Baumgartner , Marcus Herzog , Sergio Flesca, The Lixto data extraction project: back and forth between theory and practice, Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 14-16, 2004, Paris, France
[doi> 10.1145/1055558.1055560]
|
 |
16
|
|
| |
17
|
|
| |
18
|
C.-N. Hsu and C.-C. Chang. Finite-State Transducers for Semi-Structured Text Mining. In Proceedings of IJCAI-99 Workshop on Text Mining: Foundations, Techniques and Applications, 1999.
|
 |
19
|
Neil Ireson , Fabio Ciravegna , Mary Elaine Califf , Dayne Freitag , Nicholas Kushmerick , Alberto Lavelli, Evaluating machine learning for information extraction, Proceedings of the 22nd international conference on Machine learning, p.345-352, August 07-11, 2005, Bonn, Germany
[doi> 10.1145/1102351.1102395]
|
| |
20
|
Itemfield. http://www.itemfield.com/.
|
| |
21
|
R. Kosala, J. den Bussche, M. Bruynooghe, and H. Blockeel. Information extraction in structured documents using tree automata induction, 2002.
|
| |
22
|
Stefan Kuhlins , Ross Tredwell, Toolkits for Generating Wrappers, Revised Papers from the International Conference NetObjectDays on Objects, Components, Architectures, Services, and Applications for a Networked World, p.184-198, October 07-10, 2002
|
| |
23
|
|
| |
24
|
|
| |
25
|
K. Lerman, S. Minton, and C. A. Knoblock. Wrapper maintenance: A machine learning approach. Journal of Artif. Intell. Research (JAIR), 18:149--181, 2003.
|
| |
26
|
|
| |
27
|
Lixto software gmbh. http://www.lixto.com/.
|
 |
28
|
|
| |
29
|
|
| |
30
|
A. Sahuguet and F. Azavant. Building light-weight wrappers for legacy Web data-sources using W4F. The VLDB Journal, pages 738--741, 1999.
|
| |
31
|
Ian H. Witten and Eibe Frank. Data Mining: Practical machine learning tools and techniques, 2nd Edition. Morgan Kaufmann, San Francisco, 2005.
|
|