ACM Home Page
Please provide us with feedback. Feedback
Data cleaning in microsoft SQL server 2005
Full text PdfPdf (328 KB)
Source International Conference on Management of Data archive
Proceedings of the 2005 ACM SIGMOD international conference on Management of data table of contents
Baltimore, Maryland
SESSION: Demonstrations: Group 2 table of contents
Pages: 918 - 920  
Year of Publication: 2005
ISBN:1-59593-060-4
Authors
Surajit Chaudhuri  Microsoft Research, Redmond, WA
Kris Ganjam  Microsoft Research, Redmond, WA
Venky Ganti  Microsoft Research, Redmond, WA
Rahul Kapoor  Microsoft Research, Redmond, WA
Vivek Narasayya  Microsoft Research, Redmond, WA
Theo Vassilakis  Microsoft Research, Redmond, WA
Sponsors
ACM: Association for Computing Machinery
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 26,   Downloads (12 Months): 183,   Citation Count: 5
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1066157.1066287
What is a DOI?

ABSTRACT

When collecting and combining data from various sources into a data warehouse, ensuring high data quality and consistency becomes a significant, often expensive, challenge. Common data quality problems include inconsistent data conventions amongst sources such as different abbreviations or synonyms; data entry errors such as spelling mistakes; missing, incomplete, outdated or otherwise incorrect attribute values. These data defects generally manifest themselves as foreign-key mismatches and approximately duplicate records, both of which make further data mining and decision support analyses either impossible or suspect. We demonstrate two new data cleansing operators, Fuzzy Lookup and Fuzzy Grouping, which address these problems in a scalable and domain-independent manner. These operators are implemented within Microsoft SQL Server 2005 Integration Services. Our demo will explain their functionality and highlight multiple real-world scenarios in which they can be used to achieve high data quality.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
{CGGNV04} Whitepaper on Fuzzy Lookup and Fuzzy Grouping. <u>http://msdn. microsoft.com/library/default.asp?url=/library/en-us/dnsq190/html/FzDTSSQL05.asp</u>
 
4
{TR05} <u>http://www.trilliumsoft.com</u>

Collaborative Colleagues:
Surajit Chaudhuri: colleagues
Kris Ganjam: colleagues
Venky Ganti: colleagues
Rahul Kapoor: colleagues
Vivek Narasayya: colleagues
Theo Vassilakis: colleagues