| BigBatch: a document processing platform for clusters and grids |
| Full text |
Pdf
(177 KB)
|
| Source
|
Symposium on Applied Computing
archive
Proceedings of the 2008 ACM symposium on Applied computing
table of contents
Fortaleza, Ceara, Brazil
SESSION: Document engineering
table of contents
Pages 434-441
Year of Publication: 2008
ISBN:978-1-59593-753-7
|
|
Authors
|
|
Giorgia Mattos
|
Universidade Federal de Pernambuco, Recife, PE, Brazil
|
|
Rafael Dueire Lins
|
Universidade Federal de Pernambuco, Recife, PE, Brazil
|
|
Andrei de Araújo Formiga
|
Universidade Federal de Pernambuco, Recife, PE, Brazil
|
|
Fernando Mário Junqueira Martins
|
Universidade do Minho, Braga, Portugal
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 4, Downloads (12 Months): 76, Citation Count: 0
|
|
|
ABSTRACT
BigBatch is an image processing environment designed to process batches of thousands of monochromatic documents. One of the flexibilities and pioneer aspects of BigBatch is offering the possibility of working in distributed environments such as clusters and grids. This paper presents the BigBatch tool and the results of a comparative analysis between cluster and grid configurations. The results obtained show almost no difference in total execution times, indicating that performance is not a primary criterion for choosing between the use of a cluster or a grid. However, there are other, qualitative, aspects that may impact this choice. This paper also considers these aspects and provides a general picture of how to successfully use BigBatch to process document images employing many computers for this task.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
BlackIce Document Imaging SDK 10. BlackIce Software Inc. http://www.blackice.com/.
|
| |
2
|
ClearImage 5. Inlite Res. Inc. http://www.inliteresearch.com.
|
| |
3
|
Kodak Digital Science Scanner 1500. http://www.kodak.com/global/en/business/docimaging/1500002/
|
| |
4
|
Leadtools 13. Leadtools Inc. http://www.leadtools.com.
|
| |
5
|
Microsoft Cluster Server. http://www.microsoft.com/windowsserver2003/enterprise/clustering.mspx
|
| |
6
|
openMosix, http://openmosix.sourceforge.net/. Access on 05-23-2007.
|
| |
7
|
ScanFix Bitonal Image Optimizer 4.21. TMS Sequoia, http://www.tmsinc.com.
|
| |
8
|
Skyline Tools Corporate Suite 7. Skyline Tools Imaging. http://www.skylinetools.com.
|
| |
9
|
TOP 500 Supercomputer Sites, "Top 500 list", http://www.top500.org/. Access on 06-04-2007.
|
| |
10
|
Ubuntu Linux. http://www.ubuntu.com/
|
| |
11
|
|
 |
12
|
|
| |
13
|
Ávila, B. T. and Lins, R. D. 2004. Efficient Removal of Noisy Borders from Monochromatic Documents, Proc. of ICIAR 2004, LNCS(3212):249--256, Springer-Verlag.
|
 |
14
|
|
 |
15
|
|
| |
16
|
|
| |
17
|
Robit Chandra , Leonardo Dagum , Dave Kohr , Dror Maydan , Jeff McDonald , Ramesh Menon, Parallel programming in OpenMP, Morgan Kaufmann Publishers Inc., San Francisco, CA, 2001
|
| |
18
|
Cirne, W. et al. 2006. "Labs of the World, Unite!!!". Journal of Grid Computing, v. 4, n. 3, pp.225--246.
|
| |
19
|
Haller, P., and Odersky, M. 2006. Event-Based Programming without Inversion of Control. LNCS. 4228, pp. 4--22.
|
| |
20
|
Lins, R. D. and Alves, N. F. 2005. A New Technique for Assessing the Performance of OCRs. IADIS - International Conference on Computer Applications, 1:51--56 IADIS Press.
|
| |
21
|
Lins, R. D. and Ávila, B. T. 2004. A New Algorithm for Skew Detection in Images of Documents, Proc. of ICIAR 2004, LNCS(3212):234--240, Springer Verlag.
|
| |
22
|
Lins, R. D., Ávila, B. T., and Formiga, A. A. 2006. BigBatch: An Environment for Processing Monochromatic Documents. International Conference on Image Analysis and Recognition, LNCS 4142, pp. 886--896.
|
| |
23
|
Litzkow, M., Livny, M., and Mutka, M. 1998. Condor -- a hunter of idle workstations. In 8th International Conference of Distributed Computing Systems.
|
 |
24
|
Martin Odersky , Matthias Zenger, Scalable component abstractions, Proceedings of the 20th annual ACM SIGPLAN conference on Object oriented programming, systems, languages, and applications, October 16-20, 2005, San Diego, CA, USA
|
| |
25
|
Paranhos, D., Cirne, W., and Brasileiro, F. 2003. Trading cycles for information: Using replication to schedule bag-of-tasks applicatoins on computational grids. Proceedings of the Euro-Par 2003: International Conference on Parallel and Distributed Computing, Lecture Notes in Computer Science, v. 2790, pp. 169--180.
|
| |
26
|
Santos-Neto, E., Cirne, W., Brasileiro, F., and Lima, A. 2005. Exploiting replication and data reuse to efficiently schedule data-intensive applications on grids. LNCS, v. 3277, pp. 210--232.
|
| |
27
|
|
|