### Melvin's digital garden

CREATED: 200701010947 Author: Dongwon Lee Website: http://pike.psu.edu

• entity with group of elements as information ** author->citations ** images->sub images *** Methods
• Jaccard similarity = $(g_1 \cap g_2)/(g_1 \cup g_2)$
• bipartite matching *** Idea
• similar ~ many matching elements, few matching byt strong similarity
• define $BM(g_1, g_2) = (\sum_{(r_1,r_2) \in M} sim(r_1,r_2)) / (m_1 + m_2 - |M|)$ ** M = maximum bipartite matching ** $sim(r1,r2) \ge \rho$ ** similar if $BM \ge \theta$ *** Improvement
• computing M is expensive, $O(n^3)$
• use greedy matching, relax condition that each element can only be be in one matching, just match each element to closest match
• compute upper bound using $g_1 \cup g_2$
• compute lower bound using $g_1 \cap g_2$
• use upper bound and lower bound to quickly decide if BM is similar
• idea of filtering, compute some bounds quickly to avoid expensive computation

** Adaptive SNM - sorted neighbourhood method *** Method