Melvin's digital garden

Computational methods for prediction of alternative promoters and TFBS

CREATED: 200811100603 Speaker: Junwen Wang ** Structure of a gene

  • Eukaryotic gene: > 1000bp
  • Core promoter region: ~100bp
  • TFBS: 6-20bp ** Prediction models
  • Content sensor ** Markov model ** captures context information ignores position specifity ** for long sequences
  • Signal sensor ** Position weight matrix ** captures position specific information but ignores context ** for short sequences ** Generalized Markov Model for long sequences
  • instead of using a single nucleotide as a unit, allow di/tri nucleotide
  • allow gaps
  • evaluate on promoter/exon/nucleosome sequences
  • in gapped model, distance between best performance ~11bp ** Position Specific Propensity Model for medium length sequences
  • propensity of a k-mer using variable length k-mer
  • 5’ promoter more likely to be in CpG island
  • whole genome alternative promoter predictions ** Modeling of short sequences (TFBS)
  • PWM assumes position independence
  • PSPM or variable length MM of binding sites is superior to the PWM model

Links to this note