Scan statistics

** Cluster of DAM sites in E.Coli DNA

Chance and statistical significance in protein and DNA sequence analysis, S Karlin and V Brendel, Science, Vol 257, Issue 5066, 39-49
M_245 = 8, n = 4.7mil, lambda = 1.1/250
Approximate P-values by Naus (1982) ** P(M_245 >= 8) = 0.87 ** P(M_245 >= 10) = 0.03

** Palindromes in DNA *** Probability of occurrence

Masse et al. (1992) & Leung et al. (1994).
Palindromic sequences clusters around origin of replication.
HCMV seq, M_1000 = 10, n = 229354, lambda = 0.001
At least length 10, ignore length, just count number of palindromes, p-value = 0.00195 *** Weighted scan - Chew, Choi, Leung (2005)
longer palindromes given more weight
pattern of length k, given score of k/10
Applications of weighted scan ** Rajewsky et al. (2002) & Lifanov et al. (2003), clusters of transcription factor binding sites. ** Position weighted matrices to score words for similarity to a given motif. ** Siepel et al. (2005), segments of high evolutionary conservation
Chan and Zhang (2006), formula for computing p-value for weighted scan

** Template matching - Dave and Margoliash (2000), Mooney (2000)

pattern of spikes modelled as points
W = (w_1, …, w_d) - finch listening to song ** w_i = times in which spikes were generated for ith neuron
Y = (y_1, …, y_d) - finch sleeping
if W matches Y -> replay song when sleeping, learning occurs during sleep
eg w_1 = {0.01, 0.05, 0.09, 0.12}, y_1 = {0.32, 0.75, 1.03, 1.15, 1.25} ** scoring base on distance of peak to nearest peak in target spectrum
Chi (2004), approximation of log[P(M_T >= c)]
Chan and Loh (2005), more precice appeoximation of P(M_T >= C)