ACM Home Page
Please provide us with feedback. Feedback
Mining very long sequences in large databases with PLWAPLong
Full text PdfPdf (1.28 MB)
Source
ACM International Conference Proceeding Series archive
Proceedings of the 2009 International Database Engineering & Applications Symposium table of contents
Cetraro - Calabria, Italy
SESSION: Short papers table of contents
Pages 234-241  
Year of Publication: 2009
ISBN:978-1-60558-402-7
Authors
C. I. Ezeife  University of Windsor, Windsor, Ontario
Kashif Saeed  University of Windsor, Windsor, Ontario
Dan Zhang  University of Windsor, Windsor, Ontario
Sponsors
: BytePress
Concordia University : Concordia University
: ACM
: Universita della Calabria, Rende(CS), Italy
: ICAR-CNR, Rende (CS), Italy
: ACM International Conference Proceeding Series
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 8,   Citation Count: 0
Additional Information:

abstract   references   index terms  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1620432.1620457
What is a DOI?

ABSTRACT

Position Coded Pre-order Linked Web Access Pattern (PLWAP) mining algorithm is one of the existing efficient web sequential pattern mining algorithms, which stores the frequent sequences of the entire sequential database in a compressed tree form with position coded nodes. However, for very long sequences exceeding thirty two nodes, the number of bits an integer position code can hold, the PLWAP algorithm's performance begins to degrade because it employs linked lists to store conjunctions of long position codes and the linked list traversals slow down the algorithm both during tree construction and mining. PLWAP algorithm also uses each and every node in the frequent 1-item event queue to test for that event inclusion in the suffix tree root set during mining.

This paper proposes (1) using a different position code numbering scheme where each node is assigned two numeric codes (startPosition, endPosition) instead of one, (2) using pre-knowledge of "Last Descendant" of each tree branch to lower the cost of creating the suffix tree root sets during mining. Experiments show that the proposed new scheme, the PLWAPLong outperforms the PLWAP for long sequences and large databases as well as regular databases.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
R. S. R. Agrawal. Mining sequential patterns: Generalizations and performance improvements. In Proceedings of the Fifth International Conference On Extending Database Technology (EDBT '96) Avignon France, pages 3--17, 1996.
 
2
C. Ezeife and Y. Lu. Mining web log sequential patterns with position coded pre-order linked wap-tree. International Journal of Data Mining and Knowledge Discovery (DMKD) Kluwer Publishers, 10(1):5--38, 2005.
 
3
C. Ezeife, Y. Lu, and Y. Liu. Plwap sequential mining: Open source code. In Proceedings of the ACM SIGKDD's Open Source Data Mining Workshop on Frequent Pattern Mining Implementations, Chicago, pages 26--29. ACM, August 2005.
 
4
J. Han, J. Pei, Y. Yin, and R. Mao. Mining frequent patterns without candidate generation: A frequent-pattern tree approach. International Journal of Data Mining and Knowledge Discovery, 8(1):53--87, Jan 2004.
 
5
T. Imielinski, A. Swami, and R. Agarwal. Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD conference on management of data, pages 207--216. ACM, 1993.
 
6
Y. Lu and C. Ezeife. Position coded pre-order linked wap-tree for web log sequential pattern mining. In Proceedings of the 7th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2003), pages 337--349. Springer, May 2003.
 
7
N. Mabroukeh and C. Ezeife. Taxonomy of sequential pattern mining algorithms. ACM Computing Surveys Journal, 2009.
 
8
J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M. Hsu. Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth. In 2001 International Conference on Data Engineering (ICDE'01), Heidelberg, Germany, pages 215--224, 2001.
 
9
J. Pei, J. Han, B. Mortazavi-Asl, and H. Zhu. Mining access patterns efficiently from web logs. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'00) Kyoto Japan, 2000.
 
10
M. J. Zaki. Spade: An efficient algorithm for mining frequent sequences. Machine Learning, 42:32--60, 2001.