Hidden common cause relations in relational learning
CREATED: 200807160610 speaker: Ricardo Silva
** Modeling link information
- Using link information in machine learning
- link information, from relational dbs, from web links, derived associations from raw data
- classical classifier, $\theta$ - deterministic function, $\epsilon$ - unobserved features
- including hidden common cause, $\zeta$ - features with common cause H
- inducing associations by conditioning, earthquake -> alarm <- burglary
- Directed mixed graph notation, edges derived from given relations, information propagated through known nodes
- nonparametric Probit regression ** $P(Y_i = 1 | X_i) = P(Y*(X_i) > 0)$ ** $Y*(X_i) = \theta(X_i) + \epsilon_i, \epsilon_i = N(0,1)$ and prior of $\theta$ is also a Gaussian
- Dependent Probit model, $\epsilon_i = \epsilon_i* + \zeta_i$ where $\epsilon_i*$ are independent Gaussian and $\zeta_i$ are dependent Gaussian.
- Define $g(x_i) = \theta(x_i) + \zeta$ to obtain a factorized model ** How to define the covariance matrix, $\sum_\zeta$, of $\zeta$?
- Approach 1 - Using cliques ** $Cov(\zeta_i, \zeta_j) = #cliques(i,j) / \sqrt(#cliques(i) \times #cliques(j))$ ** Triangulate and extract cliques ** Triangulation may lead to “blow up” effect
- Approach 2 - hidden parent for each pair of nodes ** $Cov(\zeta_i, \zeta_j) = 1 / \sqrt(#neigh(i) \times #neigh(j)$ ** “pulverization” effect ** Other methods
- Markov random fields, propagates information through unmeasured points