Melvin's digital garden

Analyzing stacks of images using copula concept

CREATED: 200906120348 Speaker: Andrej Borissenko, PICB ** MELK data

  • using protein specific antibodies to provide information on spatial distribution of proteins
  • for each protein, there is a flourescent image ** Problems
  • cell detection
  • classification
  • topological motifs (prescence/abscence of protein) ** Copulas
  • function which combines two marginal distribution into a joint distribution
  • H(x,y) = C(F(x), G(y))
  • lower bound: C(u,v) = max(u+v-1, 0)
  • upper bound: C(u,v) = min(u,v)
  • independence: C(u,v) = uv
  • can be estimated from given data easily by counting
  • comparing given data against independence allows one to visualize how two proteins are dependent ** Conventional dependence measures
  • Pearson’s correlation (depends on copula and marginals)
  • Kendall’s tau (only depends on copula)
  • Spearman’s rho (only depends on copula)
  • Pearson’s correlation sensitive to outliers
  • single value does not give enough information about dependence between two RV ** Topological motifs (how to select threshold for prescence/abscence of protein)
  • idea is to maximize multi-information, which is essentially the KL divergence of independence from estimated distribution
  • can be shown that in the limit, as the number of thesholds increases, this converged to mutual information of two random variables, I(X,Y)
  • MI only depends on copula
  • $MI_{coarse} \le MI_{fine} \le I(X,Y)$
  • so this approach is to find the best estimate of I(X,Y) ** Clustering in one dimention
  • equivalent to fitting mixture models to histograms
  • using mixture of gamma distribution
  • actual model is linear combination of gamma
  • extension to higher dimensions using copula

Links to this note