Scan statistics

CREATED: 200701021146 ** Definitions | M_u | The maximum score in any window of length u. | | lambda | The underlying rate of events occurring under normal circumstances. | | n | The length of the interval under consideration |

** Cluster of DAM sites in E.Coli DNA

  • Chance and statistical significance in protein and DNA sequence analysis, S Karlin and V Brendel, Science, Vol 257, Issue 5066, 39-49
  • M_245 = 8, n = 4.7mil, lambda = 1.1/250
  • Approximate P-values by Naus (1982) ** P(M_245 >= 8) = 0.87 ** P(M_245 >= 10) = 0.03

** Palindromes in DNA *** Probability of occurrence

  • Masse et al. (1992) & Leung et al. (1994).
  • Palindromic sequences clusters around origin of replication.
  • HCMV seq, M_1000 = 10, n = 229354, lambda = 0.001
  • At least length 10, ignore length, just count number of palindromes, p-value = 0.00195 *** Weighted scan - Chew, Choi, Leung (2005)
  • longer palindromes given more weight
  • pattern of length k, given score of k/10
  • Applications of weighted scan ** Rajewsky et al. (2002) & Lifanov et al. (2003), clusters of transcription factor binding sites. ** Position weighted matrices to score words for similarity to a given motif. ** Siepel et al. (2005), segments of high evolutionary conservation
  • Chan and Zhang (2006), formula for computing p-value for weighted scan

** Template matching - Dave and Margoliash (2000), Mooney (2000)

  • pattern of spikes modelled as points
  • W = (w_1, …, w_d) - finch listening to song ** w_i = times in which spikes were generated for ith neuron
  • Y = (y_1, …, y_d) - finch sleeping
  • if W matches Y -> replay song when sleeping, learning occurs during sleep
  • eg w_1 = {0.01, 0.05, 0.09, 0.12}, y_1 = {0.32, 0.75, 1.03, 1.15, 1.25} ** scoring base on distance of peak to nearest peak in target spectrum
  • Chi (2004), approximation of log[P(M_T >= c)]
  • Chan and Loh (2005), more precice appeoximation of P(M_T >= C)

