Pattern discovery from graph structured data

CREATED: 200801220758 Speaker: Hiroshi Motoda (AFRL)

** Background

heuristic: frequenting occurring events are worth paying attention to
regularity
concept: something which makes inference more efficient
inference load = graph size + complexity of concept - #concepts
repeatedly find frequent subgraphs which minimize inference load, get hierarichal representation
graphs ** chemical compounds ** social networks ** web ** abstract concept ** important substructures
approaches ** Apriori based (AGM) ** Graph based induction (GBI) ** Apriori based graph mining
how to identify isomorphic candidates
idea: define a canonical form of a graph as the matrix with minimum code
canonical form(X) = canonical form(Y) => X and Y are isomorphic
join using adj matrix of k freq subgraphs with common k-1 subgraph to get k+1 candidate graph, need to check that all (k+1)Ck subgraphs are freq
Application ** analysis of mutagenicity of amino acid compounds ** HIV compounds (min support for active compounds, max support for inactive compounds) ** consumer behavior as directed graph *** Extensions
taxonomy of labels ** extract least general subgraph of same support ** application: predictive toxicology evaluation data
3D information ** application: dopamine compounds ** Graph based induction
pairwise chunking (contract freq pair into a single node)
problem: overlapping patterns cannot be found
extensions ** use beam search ** pseudo chunking (graph size remains the same, still remember structure in chunk) ** find acyclic subgraph/path/tree ** inpattern/outpattern (domain knowledge) ** Pruning
frequency is monotonic
X^2/information gain is not monotonic but convex, can help to compute upper bound
find simple patterns first, best path -> best acyclic graph -> best subgraph
best = max information gain ** Classification
use subgraphs to construct decision tree for classification ** Challenges
efficient graph mining
produce human understandable results

Melvin's digital garden

Pattern discovery from graph structured data

Links to this note