Melvin's digital garden

Statistical Analysis of high dimensional molecular data

CREATED: 200906250534 Speaker: William R. Atchley Web: http://www4.ncsu.edu/~lgmcferr/ICMSBWorkshop/Home.html ** Multivariate statistical procedures

  • Cluster analysis
  • PCA, ICA
  • Factor analysis
  • Latent variable analysis
  • main objective is to reduce dimensionality in a meaningful way ** Cluster Analysis
  • ubiquitous in microarray studies
  • not much supporting statistical theory ** PCA
  • dimension reduction by optimizing variance explanation
  • ordination
  • with noisy data, principle components may reflect stochastic effects and environmental noise ** Latent variable model
  • relates a set of observed or manifest variables to a set of unobserved or latent variables
  • latent variables often reflect phenomenon of interest
  • separates variability that is shared (commonality) or unique
  • latent variables by factor analysis
  • exploratory factor analysis: detect natural groupings of variables (factors) based on shared variability
  • e.g. Latent structure of AA physiochemical variation (Atchley2005, PNAS) ** Utility of Factor Scores
  • each amino acid can be represented by factor scores for each latent variable, summarizes almost 500 physiochemical attributes
  • new way of aligning protein sequences base on factor scores
  • factor scores as data in discriminant analysis (Atchley2006)

Links to this note