| Framework for mining web content outliers |
| Full text |
Pdf
(138 KB)
|
| Source
|
Symposium on Applied Computing
archive
Proceedings of the 2004 ACM symposium on Applied computing
table of contents
Nicosia, Cyprus
SESSION: Data mining (DM)
table of contents
Pages: 590 - 594
Year of Publication: 2004
ISBN:1-58113-812-1
|
|
Authors
|
|
Malik Agyemang
|
University of Calgary, Calgary, Alberta, Canada
|
|
Ken Barker
|
University of Calgary, Calgary, Alberta, Canada
|
|
Reda Alhajj
|
University of Calgary, Calgary, Alberta, Canada
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 6, Downloads (12 Months): 68, Citation Count: 4
|
|
|
ABSTRACT
Outliers are data objects with different characteristics compared to other data objects. Exploring the diverse and dynamic web data for outliers is more interesting than finding outliers in numeric data sets. Interestingly, the existing web mining algorithms have concentrated on finding patterns that are frequent while discarding the less frequent ones that are likely to contain the outlying data. This paper refers to outliers present on the web as web outliers to distinguish them from traditional outliers. Web outliers are data objects that show significantly different characteristics than other web data. Although the presence of web outliers appears obvious, there is neither formal definition for web outliers nor algorithms for mining them. Secondly, traditional outlier mining algorithms designed solely for numeric data sets are inappropriate for mining web outliers. This paper establishes the presence of web outliers and discusses some practical applications of web outlier mining. Finally, we present taxonomy for web outliers and propose a general framework for mining web content out.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Barnett, V. and Lewis, T. Outliers in Statistical Data. John Willey, 1994.
|
 |
3
|
Markus M. Breunig , Hans-Peter Kriegel , Raymond T. Ng , Jörg Sander, LOF: identifying density-based local outliers, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.93-104, May 15-18, 2000, Dallas, Texas, United States
[doi> 10.1145/342009.335388]
|
| |
4
|
|
| |
5
|
Cooley, R., Mobasher, B., and Srivastava, J. Data Preparation for Mining World Wide Web Browsing Patterns. Knowledge and Information Systems 1, 1999.
|
| |
6
|
Cooley, R. Mobasher B. and Srivastava J. Web Mining: Information and Pattern Discovery on the Web. SIGKDD Exploration: Newsletter of SIGKDD and Data Mining, ACM, I(2), 2000
|
| |
7
|
Danile Riboni. Feature Selection for Web Page Classification. D.S.I Universita, Milano, Italy, 2002
|
| |
8
|
Hawkins, D. Identification of Outliers. Chapman and Hall, London, 1980.
|
 |
9
|
|
| |
10
|
Johnson, T., Kwok, I., and Ng, R. Fast Computation of 2-D Depth Contours. Proc. of KDD, 1998, pp 224--228.
|
| |
11
|
Knorr, E. M., and Ng, R. T. A Unified Notion of Outliers: Properties and Computation. Proc. of KDD, 1997, pp 219--222.
|
| |
12
|
|
 |
13
|
Sridhar Ramaswamy , Rajeev Rastogi , Kyuseok Shim, Efficient algorithms for mining outliers from large data sets, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.427-438, May 15-18, 2000, Dallas, Texas, United States
[doi> 10.1145/342009.335437]
|
 |
14
|
|
|