ACM Home Page
Please provide us with feedback. Feedback
Automated cleansing for spend analytics
Full text PdfPdf (491 KB)
Source Conference on Information and Knowledge Management archive
Proceedings of the 14th ACM international conference on Information and knowledge management table of contents
Bremen, Germany
SESSION: Industry track session table of contents
Pages: 437 - 445  
Year of Publication: 2005
ISBN:1-59593-140-6
Authors
Moninder Singh  IBM Thomas J. Watson Research Center, Yorktown Heights, NY
Jayant R. Kalagnanam  IBM Thomas J. Watson Research Center, Yorktown Heights, NY
Sudhir Verma  IBM Thomas J. Watson Research Center, Yorktown Heights, NY
Amit J. Shah  IBM Thomas J. Watson Research Center, Yorktown Heights, NY
Swaroop K. Chalasani  IBM Thomas J. Watson Research Center, Yorktown Heights, NY
Sponsors
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 42,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1099554.1099682
What is a DOI?

ABSTRACT

The development of an aggregate view of the procurement spend across an enterprise using transactional data is increasingly becoming a very important and strategic activity. Not only does it provide a complete and accurate picture of what the enterprise is buying and from whom, it also allows it to consolidate suppliers, as well as negotiate better prices. The importance, as well as the complexity, of this cleansing exercise is further magnified by the increasing popularity of Business Transformation Outsourcing (BTO) wherein enterprises are turning over non-core activities, such as indirect procurement, to third parties, who now need to develop an integrated view of spend across multiple enterprises in order to optimize procurement and generate maximum savings. However, the creation of such an integrated view of procurement spend requires the creation of a homogeneous data repository from disparate (heterogeneous) data sources across various geographic and functional organizations throughout the enterprise(s). Such repositories get transactional data from various sources such as invoices, purchase orders, account ledgers. As such, the transactions are not cross-indexed, refer to the same suppliers by different names, and use different ways of representing information about the same commodities. Before an aggregated spend view can be developed, this data needs to be cleansed, primarily to normalize the supplier names and correctly map each transaction to the appropriate commodity code. Commodity mapping, in particular, is made more difficult by the fact that it has to be done on the basis of unstructured text descriptions found in the various data sources. We describe an on-demand system to automatically perform this cleansing activity using techniques from information retrieval and machine learning. Built on standard integration and application infrastructure software, this system provides enterprises with a fast, reliable, accurate and on-demand way of cleansing transactional data and generating an integrated view of spend. This system is currently in the process of being deployed by IBM for use in its BTO practice.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
Jaccard, P. The distribution of flora in the alpine zone, New Phytologist 11, 37--50, 1912.
 
4
 
5
Levenshtein, V.I. Binary codes capable of correcting deletions, insertions and reversals, Soviet Physics Doklady 10 (8): 707--710, 1966.
 
6
McCallum, A. and Nigam, K. A comparison of event models for Naive Bayes text classification, AAAI-98 Workshop on Learning for Text Categorization, 1998.
7
8
 
9
Nigam, K., Lafferty, J., McCallum, A. Using maximum entropy for text classification, IJCAI-99 Workshop on Machine Learning for Information Filtering, 61--67, 1999.
 
10
 
11
UNSPSC, The United Nations Standard Products and Services Code, http://www.unspsc.org
 
12
 
13
WordNet. A lexical database for the English language. Cognitive Science Laboratory, Princeton University, Princeton, NJ. http://wordnet.princeton.edu.

Collaborative Colleagues:
Moninder Singh: colleagues
Jayant R. Kalagnanam: colleagues
Sudhir Verma: colleagues
Amit J. Shah: colleagues
Swaroop K. Chalasani: colleagues