| Data cleaning in microsoft SQL server 2005 |
| Full text |
Pdf
(328 KB)
|
| Source
|
International Conference on Management of Data
archive
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
table of contents
Baltimore, Maryland
SESSION: Demonstrations: Group 2
table of contents
Pages: 918 - 920
Year of Publication: 2005
ISBN:1-59593-060-4
|
|
Authors
|
|
Surajit Chaudhuri
|
Microsoft Research, Redmond, WA
|
|
Kris Ganjam
|
Microsoft Research, Redmond, WA
|
|
Venky Ganti
|
Microsoft Research, Redmond, WA
|
|
Rahul Kapoor
|
Microsoft Research, Redmond, WA
|
|
Vivek Narasayya
|
Microsoft Research, Redmond, WA
|
|
Theo Vassilakis
|
Microsoft Research, Redmond, WA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 26, Downloads (12 Months): 183, Citation Count: 5
|
|
|
ABSTRACT
When collecting and combining data from various sources into a data warehouse, ensuring high data quality and consistency becomes a significant, often expensive, challenge. Common data quality problems include inconsistent data conventions amongst sources such as different abbreviations or synonyms; data entry errors such as spelling mistakes; missing, incomplete, outdated or otherwise incorrect attribute values. These data defects generally manifest themselves as foreign-key mismatches and approximately duplicate records, both of which make further data mining and decision support analyses either impossible or suspect. We demonstrate two new data cleansing operators, Fuzzy Lookup and Fuzzy Grouping, which address these problems in a scalable and domain-independent manner. These operators are implemented within Microsoft SQL Server 2005 Integration Services. Our demo will explain their functionality and highlight multiple real-world scenarios in which they can be used to achieve high data quality.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
| |
3
|
{CGGNV04} Whitepaper on Fuzzy Lookup and Fuzzy Grouping. <u>http://msdn. microsoft.com/library/default.asp?url=/library/en-us/dnsq190/html/FzDTSSQL05.asp</u>
|
| |
4
|
{TR05} <u>http://www.trilliumsoft.com</u>
|
|