Melvin's digital garden

Motif finding

CREATED: 200701021353 ** Motivation

  • orthologous promoted regions

** Challenges

  • transcription start site vs traslation start site

** Methods

  • pattern driven - search 4^l patterns
  • sample dirver
  • extended sample driven approach ** check k neighbour of l-mers in sample ** number of patterns of each l-mer = (l choose k)3^k
  • pattern branching approach ** best neighbours
  • PWM driven ** partition the space of motifs
  • multiple alignment
  • cluster motifs + TRANSFAC

** Ranking Motifs

  • comparing motifs with different lengths using E-value = size of search space x p-value (Keich)

** Models

  • (l,d) motif is not realisticas hamming distance does not take into account fact that certain positions are conserved
  • PWM is more accurate

Links to this note