Melvin's digital garden

De Novo Peptide Sequencing

CREATED: 200612290712 ** Motivation

identify peptides

discover proteomic patterns to distinguish between diseased and healthy tissue

** Challenge

fragmentation leads to different ion types

incomplete ladder

noise

unknown charge

some amino acid mass are very close

** Spectrum graph method

normalize spectrum, generate twin peaks

discretize

generate a graph, nodes are peaks, two nodes are connected if the difference is equal to the mass of an amino acid

path from src to dest represent a sequence

score each path

find path with maximum score

simple score = length of path, becomes LongestPathProblem

dancik’s scoring

each ion type j has a probability p_j of occurring

peak can be due to noise which occurs with probability p_noise

if certain pairs are forbidden, problem becomes NP-hard

** Limitations of current algorithms

multiple charge

internal fragmentation

Links to this note