| Automated cleansing for spend analytics |
| Full text |
Pdf
(491 KB)
|
| Source
|
Conference on Information and Knowledge Management
archive
Proceedings of the 14th ACM international conference on Information and knowledge management
table of contents
Bremen, Germany
SESSION: Industry track session
table of contents
Pages: 437 - 445
Year of Publication: 2005
ISBN:1-59593-140-6
|
|
Authors
|
|
Moninder Singh
|
IBM Thomas J. Watson Research Center, Yorktown Heights, NY
|
|
Jayant R. Kalagnanam
|
IBM Thomas J. Watson Research Center, Yorktown Heights, NY
|
|
Sudhir Verma
|
IBM Thomas J. Watson Research Center, Yorktown Heights, NY
|
|
Amit J. Shah
|
IBM Thomas J. Watson Research Center, Yorktown Heights, NY
|
|
Swaroop K. Chalasani
|
IBM Thomas J. Watson Research Center, Yorktown Heights, NY
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 8, Downloads (12 Months): 43, Citation Count: 0
|
|
|
ABSTRACT
The development of an aggregate view of the procurement spend across an enterprise using transactional data is increasingly becoming a very important and strategic activity. Not only does it provide a complete and accurate picture of what the enterprise is buying and from whom, it also allows it to consolidate suppliers, as well as negotiate better prices. The importance, as well as the complexity, of this cleansing exercise is further magnified by the increasing popularity of Business Transformation Outsourcing (BTO) wherein enterprises are turning over non-core activities, such as indirect procurement, to third parties, who now need to develop an integrated view of spend across multiple enterprises in order to optimize procurement and generate maximum savings. However, the creation of such an integrated view of procurement spend requires the creation of a homogeneous data repository from disparate (heterogeneous) data sources across various geographic and functional organizations throughout the enterprise(s). Such repositories get transactional data from various sources such as invoices, purchase orders, account ledgers. As such, the transactions are not cross-indexed, refer to the same suppliers by different names, and use different ways of representing information about the same commodities. Before an aggregated spend view can be developed, this data needs to be cleansed, primarily to normalize the supplier names and correctly map each transaction to the appropriate commodity code. Commodity mapping, in particular, is made more difficult by the fact that it has to be done on the basis of unstructured text descriptions found in the various data sources. We describe an on-demand system to automatically perform this cleansing activity using techniques from information retrieval and machine learning. Built on standard integration and application infrastructure software, this system provides enterprises with a fast, reliable, accurate and on-demand way of cleansing transactional data and generating an integrated view of spend. This system is currently in the process of being deployed by IBM for use in its BTO practice.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
Jaccard, P. The distribution of flora in the alpine zone, New Phytologist 11, 37--50, 1912.
|
| |
4
|
|
| |
5
|
Levenshtein, V.I. Binary codes capable of correcting deletions, insertions and reversals, Soviet Physics Doklady 10 (8): 707--710, 1966.
|
| |
6
|
McCallum, A. and Nigam, K. A comparison of event models for Naive Bayes text classification, AAAI-98 Workshop on Learning for Text Categorization, 1998.
|
 |
7
|
Andrew McCallum , Kamal Nigam , Lyle H. Ungar, Efficient clustering of high-dimensional data sets with application to reference matching, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, p.169-178, August 20-23, 2000, Boston, Massachusetts, United States
[doi> 10.1145/347090.347123]
|
 |
8
|
|
| |
9
|
Nigam, K., Lafferty, J., McCallum, A. Using maximum entropy for text classification, IJCAI-99 Workshop on Machine Learning for Information Filtering, 61--67, 1999.
|
| |
10
|
|
| |
11
|
UNSPSC, The United Nations Standard Products and Services Code, http://www.unspsc.org
|
| |
12
|
|
| |
13
|
WordNet. A lexical database for the English language. Cognitive Science Laboratory, Princeton University, Princeton, NJ. http://wordnet.princeton.edu.
|
|