ACM Home Page
Please provide us with feedback. Feedback
BigBatch: a document processing platform for clusters and grids
Full text PdfPdf (177 KB)
Source Symposium on Applied Computing archive
Proceedings of the 2008 ACM symposium on Applied computing table of contents
Fortaleza, Ceara, Brazil
SESSION: Document engineering table of contents
Pages 434-441  
Year of Publication: 2008
ISBN:978-1-59593-753-7
Authors
Giorgia Mattos  Universidade Federal de Pernambuco, Recife, PE, Brazil
Rafael Dueire Lins  Universidade Federal de Pernambuco, Recife, PE, Brazil
Andrei de Araújo Formiga  Universidade Federal de Pernambuco, Recife, PE, Brazil
Fernando Mário Junqueira Martins  Universidade do Minho, Braga, Portugal
Sponsor
SIGAPP: ACM Special Interest Group on Applied Computing
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 76,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1363686.1363792
What is a DOI?

ABSTRACT

BigBatch is an image processing environment designed to process batches of thousands of monochromatic documents. One of the flexibilities and pioneer aspects of BigBatch is offering the possibility of working in distributed environments such as clusters and grids. This paper presents the BigBatch tool and the results of a comparative analysis between cluster and grid configurations. The results obtained show almost no difference in total execution times, indicating that performance is not a primary criterion for choosing between the use of a cluster or a grid. However, there are other, qualitative, aspects that may impact this choice. This paper also considers these aspects and provides a general picture of how to successfully use BigBatch to process document images employing many computers for this task.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
BlackIce Document Imaging SDK 10. BlackIce Software Inc. http://www.blackice.com/.
 
2
ClearImage 5. Inlite Res. Inc. http://www.inliteresearch.com.
 
3
Kodak Digital Science Scanner 1500. http://www.kodak.com/global/en/business/docimaging/1500002/
 
4
Leadtools 13. Leadtools Inc. http://www.leadtools.com.
 
5
Microsoft Cluster Server. http://www.microsoft.com/windowsserver2003/enterprise/clustering.mspx
 
6
openMosix, http://openmosix.sourceforge.net/. Access on 05-23-2007.
 
7
ScanFix Bitonal Image Optimizer 4.21. TMS Sequoia, http://www.tmsinc.com.
 
8
Skyline Tools Corporate Suite 7. Skyline Tools Imaging. http://www.skylinetools.com.
 
9
TOP 500 Supercomputer Sites, "Top 500 list", http://www.top500.org/. Access on 06-04-2007.
 
10
Ubuntu Linux. http://www.ubuntu.com/
 
11
12
 
13
Ávila, B. T. and Lins, R. D. 2004. Efficient Removal of Noisy Borders from Monochromatic Documents, Proc. of ICIAR 2004, LNCS(3212):249--256, Springer-Verlag.
14
15
 
16
 
17
 
18
Cirne, W. et al. 2006. "Labs of the World, Unite!!!". Journal of Grid Computing, v. 4, n. 3, pp.225--246.
 
19
Haller, P., and Odersky, M. 2006. Event-Based Programming without Inversion of Control. LNCS. 4228, pp. 4--22.
 
20
Lins, R. D. and Alves, N. F. 2005. A New Technique for Assessing the Performance of OCRs. IADIS - International Conference on Computer Applications, 1:51--56 IADIS Press.
 
21
Lins, R. D. and Ávila, B. T. 2004. A New Algorithm for Skew Detection in Images of Documents, Proc. of ICIAR 2004, LNCS(3212):234--240, Springer Verlag.
 
22
Lins, R. D., Ávila, B. T., and Formiga, A. A. 2006. BigBatch: An Environment for Processing Monochromatic Documents. International Conference on Image Analysis and Recognition, LNCS 4142, pp. 886--896.
 
23
Litzkow, M., Livny, M., and Mutka, M. 1998. Condor -- a hunter of idle workstations. In 8th International Conference of Distributed Computing Systems.
24
 
25
Paranhos, D., Cirne, W., and Brasileiro, F. 2003. Trading cycles for information: Using replication to schedule bag-of-tasks applicatoins on computational grids. Proceedings of the Euro-Par 2003: International Conference on Parallel and Distributed Computing, Lecture Notes in Computer Science, v. 2790, pp. 169--180.
 
26
Santos-Neto, E., Cirne, W., Brasileiro, F., and Lima, A. 2005. Exploiting replication and data reuse to efficiently schedule data-intensive applications on grids. LNCS, v. 3277, pp. 210--232.
 
27

Collaborative Colleagues:
Giorgia Mattos: colleagues
Rafael Dueire Lins: colleagues
Andrei de Araújo Formiga: colleagues
Fernando Mário Junqueira Martins: colleagues