|
ABSTRACT
We focus on large graphs where nodes have attributes, such as a social network where the nodes are labelled with each person's job title. In such a setting, we want to find subgraphs that match a user query pattern. For example, a "star" query would be, "find a CEO who has strong interactions with a Manager, a Lawyer,and an Accountant, or another structure as close to that as possible". Similarly, a "loop" query could help spot a money laundering ring. Traditional SQL-based methods, as well as more recent graph indexing methods, will return no answer when an exact match does not exist. This is the first main feature of our method. It can find exact-, as well as near-matches, and it will present them to the user in our proposed "goodness" order. For example, our method tolerates indirect paths between, say, the "CEO" and the "Accountant" of the above sample query, when direct paths don't exist. Its second feature is scalability. In general, if the query has nq nodes and the data graph has n nodes, the problem needs polynomial time complexity O(n n q), which is prohibitive. Our G-Ray ("Graph X-Ray") method finds high-quality subgraphs in time linear on the size of the data graph. Experimental results on the DLBP author-publication graph (with 356K nodes and 1.9M edges) illustrate both the effectiveness and scalability of our approach. The results agree with our intuition, and the speed is excellent. It takes 4 seconds on average fora 4-node query on the DBLP graph.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
B. Aleman-Meza, C. Halaschek-Wiener, S. Sahoo, A. Sheth, and I. Arpinar. Lecture Notes in Computer Science, volume 3495, chapter Template Based Semantic Similarity for Security Applications, pages 621-622. Springer, 2005.
|
| |
2
|
|
| |
3
|
H. Blau, N. Immerman, and D. Jensen. A visual language for querying and updating graphs. Technical Report 2002-037, Department of Computer Science, University of Massacheusetts, Amherst, 2002.
|
| |
4
|
|
 |
5
|
|
| |
6
|
D. J. Cook and L. B. Holder. Substructure discovery using minimum description length and background knowledge. Journal of Artificial Intelligence Research (JAIR), 1:231--255, 1994.
|
 |
7
|
|
| |
8
|
D. Fogaras and B. Racz. Towards scaling fully personalized pagerank. In Proc. WAW, pages 105--117, 2004.
|
| |
9
|
B. Gallagher. Matching structure and semantics: A survey on graph-based pattern matching. In AAAI FS '06: Papers from the 2006 AAAI Fall Symposium on Capturing and Using Patterns for Evidence Detection, pages 45--53, 2006.
|
| |
10
|
|
 |
11
|
R. Jin , C. Wang , D. Polshakov , S. Parthasarathy , G. Agrawal, Discovering frequent topological structures from graph datasets, Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
[doi> 10.1145/1081870.1081944]
|
| |
12
|
S. Kamvar, T. Haveliwala, C. Manning, and G. Golub. Exploiting the block structure of the web for computing pagerank. In Stanford University Technical Report, 2003.
|
 |
13
|
|
| |
14
|
|
| |
15
|
|
 |
16
|
Jia-Yu Pan , Hyung-Jeong Yang , Christos Faloutsos , Pinar Duygulu, Automatic multimedia cross-modal correlation discovery, Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, August 22-25, 2004, Seattle, WA, USA
[doi> 10.1145/1014052.1014135]
|
 |
17
|
|
| |
18
|
L. Shapiro and R. Haralick. Structural descriptions and inexact matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 3:504--519, 1981.
|
 |
19
|
|
| |
20
|
|
| |
21
|
W.-H. Tsai and K.-S. Fu. Error-correcting isomorphisms of attributed relational graphs for pattern analysis. IEEE Transactions on Systems, Man and Cybernetics, 9:757--768, 1979.
|
| |
22
|
M. Wolverton, P. Berry, I. W. Harrison, J. D. Lowrance, D. Morley, A. C. Rodriguez, E. H. Ruspini, and J. Thoméré. LAW: A workbench for approximate pattern matching in relational data. In IAAI '03: Proceedings of the Fifteenth Conference on Innovative Applications of Artificial Intelligence, pages 143--150, 2003.
|
| |
23
|
|
 |
24
|
|
CITED BY 2
|
|
Hanghang Tong , Yasushi Sakurai , Tina Eliassi-Rad , Christos Faloutsos, Fast mining of complex time-stamped events, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|