|
ABSTRACT
Document analysis research typically focuses on document image understanding or classic problems in text classification, clustering, summarization and discovery. While that is an important aspect of document management, in practice, documents lifecycles are often determined by the context of the business process that they are relevant to. It therefore becomes necessary for the document analysis techniques to recognize and leverage the contextual information provided by a supporting schema and business process. This paper presents an intelligent document management framework with relevant document analysis, metadata extraction, and business process association algorithms and methodology. The architecture supporting this framework seamlessly integrates a runtime environment with an authoring environment by combining relational data modeling tools with document classification techniques. The runtime environment accepts incoming documents, classifies the document, extracts metadata and executes customized business logic. The authoring environment supports the association of a class of documents with a relational document schema, identification of attribute values that must be extracted automatically, generation of relevant business logic, and deployment of authoring artifacts into the runtime architecture. We demonstrate the use of this framework with representative real-world document transformative applications.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Andries M. and Engels, G., A hybrid query language for the extended entity relationship model. In Journal of Visual Languages and Computing, 8(1), 1997, Special Issue on Visual Query Systems.
|
| |
2
|
Angelaccio, M., Catarci, T. & Santucci, G., QBD*: A Fully Visual Query System. Journal on Visual Languages and Computing, 1(2), 255--273, 1990.
|
| |
3
|
Bagdanov, A.D., Worring, M. Fine-Grained Document Genre Classification Using First Order Random Graphs. In Proceedings of ICDAR 01
|
| |
4
|
Catarci, T., Costabile, M.F., Levialdi, S. and Batini, C. Visual Query Systems for Databases: A Survey. Technical Report SI/RR-95/17, Dipartimento di Scienze dell'Informazione, Universita' di Roma "La Sapienza", 1995.
|
 |
5
|
|
| |
6
|
See http://www-306.ibm.com/software/data/cm/
|
 |
7
|
Paul Dourish , W. Keith Edwards , Anthony LaMarca , John Lamping , Karin Petersen , Michael Salisbury , Douglas B. Terry , James Thornton, Extending document management systems with user-specific active properties, ACM Transactions on Information Systems (TOIS), v.18 n.2, p.140-170, April 2000
[doi> 10.1145/348751.348758]
|
| |
8
|
See: http://www.eclipse.org/emf/
|
 |
9
|
Eban M. Haber , Yannis E. Ioannidis , Meron Liviny, Opossum: a flexible schema visulaization and editing tool, Conference companion on Human factors in computing systems, p.321-322, April 24-28, 1994, Boston, Massachusetts, United States
[doi> 10.1145/259963.260392]
|
| |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
Lyman, Peter and Hal R. Varian, How Much Information, 2000. Retrieved from <http://www.sims.berkeley.edu/how-much-info>
|
 |
14
|
|
 |
15
|
|
 |
16
|
Chris Olston , Allison Woodruff , Alexander Aiken , Michael Chu , Vuk Ercegovac , Mark Lin , Mybrid Spalding , Michael Stonebraker, DataSplash, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.550-552, June 01-04, 1998, Seattle, Washington, United States
|
| |
17
|
Gornik, D. UML Data Modeling Profile. IBM Rational Software Whitepaper TP 162 05/02, 2003.
|
| |
18
|
Gornik, D. Data Modeling for Data Warehouses. IBM Rational Software Whitepaper TP 161 05/02, 2002.
|
 |
19
|
|
| |
20
|
See <http://www.rational.com/eda/ras/preview/index.htm>
|
 |
21
|
|
|