Statistical Analysis of high dimensional molecular data
CREATED: 200906250534 Speaker: William R. Atchley Web: http://www4.ncsu.edu/~lgmcferr/ICMSBWorkshop/Home.html ** Multivariate statistical procedures
- Cluster analysis
- PCA, ICA
- Factor analysis
- Latent variable analysis
- main objective is to reduce dimensionality in a meaningful way ** Cluster Analysis
- ubiquitous in microarray studies
- not much supporting statistical theory ** PCA
- dimension reduction by optimizing variance explanation
- ordination
- with noisy data, principle components may reflect stochastic effects and environmental noise ** Latent variable model
- relates a set of observed or manifest variables to a set of unobserved or latent variables
- latent variables often reflect phenomenon of interest
- separates variability that is shared (commonality) or unique
- latent variables by factor analysis
- exploratory factor analysis: detect natural groupings of variables (factors) based on shared variability
- e.g. Latent structure of AA physiochemical variation (Atchley2005, PNAS) ** Utility of Factor Scores
- each amino acid can be represented by factor scores for each latent variable, summarizes almost 500 physiochemical attributes
- new way of aligning protein sequences base on factor scores
- factor scores as data in discriminant analysis (Atchley2006)