Melvin's digital garden

Hidden common cause relations in relational learning

CREATED: 200807160610 speaker: Ricardo Silva

** Modeling link information

  • Using link information in machine learning
  • link information, from relational dbs, from web links, derived associations from raw data
  • classical classifier, $\theta$ - deterministic function, $\epsilon$ - unobserved features
  • including hidden common cause, $\zeta$ - features with common cause H
  • inducing associations by conditioning, earthquake -> alarm <- burglary
  • Directed mixed graph notation, edges derived from given relations, information propagated through known nodes
  • nonparametric Probit regression ** $P(Y_i = 1 | X_i) = P(Y*(X_i) > 0)$ ** $Y*(X_i) = \theta(X_i) + \epsilon_i, \epsilon_i = N(0,1)$ and prior of $\theta$ is also a Gaussian
  • Dependent Probit model, $\epsilon_i = \epsilon_i* + \zeta_i$ where $\epsilon_i*$ are independent Gaussian and $\zeta_i$ are dependent Gaussian.
  • Define $g(x_i) = \theta(x_i) + \zeta$ to obtain a factorized model ** How to define the covariance matrix, $\sum_\zeta$, of $\zeta$?
  • Approach 1 - Using cliques ** $Cov(\zeta_i, \zeta_j) = #cliques(i,j) / \sqrt(#cliques(i) \times #cliques(j))$ ** Triangulate and extract cliques ** Triangulation may lead to “blow up” effect
  • Approach 2 - hidden parent for each pair of nodes ** $Cov(\zeta_i, \zeta_j) = 1 / \sqrt(#neigh(i) \times #neigh(j)$ ** “pulverization” effect ** Other methods
  • Markov random fields, propagates information through unmeasured points

Links to this note