ACM Home Page
Please provide us with feedback. Feedback
ANF: a fast and scalable tool for data mining in massive graphs
Full text PdfPdf (1.07 MB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Edmonton, Alberta, Canada
SESSION: Graphs and trees table of contents
Pages: 81 - 90  
Year of Publication: 2002
ISBN:1-58113-567-X
Authors
Christopher R. Palmer  Carnegie Mellon University, Pittsburgh, PA
Phillip B. Gibbons  Intel Research Pittsburgh, Pittsburgh, PA
Christos Faloutsos  Carnegie Mellon University, Pittsburgh, PA
Sponsors
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
: AAAI
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 19,   Downloads (12 Months): 90,   Citation Count: 37
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/775047.775059
What is a DOI?

ABSTRACT

Graphs are an increasingly important data source, with such important graphs as the Internet and the Web. Other familiar graphs include CAD circuits, phone records, gene sequences, city streets, social networks and academic citations. Any kind of relationship, such as actors appearing in movies, can be represented as a graph. This work presents a data mining tool, called ANF, that can quickly answer a number of interesting questions on graph-represented data, such as the following. How robust is the Internet to failures? What are the most influential database papers? Are there gender differences in movie appearance patterns? At its core, ANF is based on a fast and memory-efficient approach for approximating the complete "neighbourhood function" for a graph. For the Internet graph (268K nodes), ANF's highly-accurate approximation is more than 700 times faster than the exact computation. This reduces the running time from nearly a day to a matter of a minute or two, allowing users to perform ad hoc drill-down tasks and to repeatedly answer questions about changing data sources. To enable this drill-down, ANF employs new techniques for approximating neighbourhood-type functions for graphs with distinguished nodes and/or edges. When compared to the best existing approximation, ANF's approach is both faster and more accurate, given the same resources. Additionally, unlike previous approaches, ANF scales gracefully to handle disk resident graphs. Finally, we present some of our results from mining large graphs using ANF.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
 
4
 
5
 
6
CORA search engine. http://www.cora.whizbang.com.
7
8
 
9
 
10
IMDB. http://www.imdb.com.
 
11
 
12
13
 
14
15
 
16
C. R. Palmer, G. Siganos, M. Faloutsos, C. Faloutsos, and P. Gibbons. The connectivity and fault-tolerance of the Internet topology. In Workshop on Network-Related Data Management (NRDM-2001), 2001.
 
17
C. R. Palmer and J. G. Steffan. Generating network toplogies that obey power laws. In IEEE Globecom 2000, 2000.
 
18
 
19
 
20
S. L. Tauro, C. Palmer, G. Siganos, and M. Faloutsos. A simple conceptual model for the Internet topology. In IEEE Globecom 2001, 2001.

CITED BY  37

Collaborative Colleagues:
Christopher R. Palmer: colleagues
Phillip B. Gibbons: colleagues
Christos Faloutsos: colleagues