Sputnik

Source code, linux binary and test library as compressed tar file (151K).
Windows executable (45K).
Source code as pre-formatted html (17K).

Sputnik is a C language program that searches dna sequence files in Fasta format for microsatellite repeats. A sequence file is specified on the command line and the resulting hits are written to stdout along with their position in the sequence, length, and a score determined by the length of the repeat and the number of errors.

Sputnik uses a recursive algorithm to search for repeated patterns of nucleotides of length between 2 and 5. Insertions, mismatches and deletions are tolerated but affect the overall score. It does not search against a "library" of known microsatellites. Instead it reads through the entire sequence, assumes the existence of a repeat at every position, compares subsequent nucleotides and applies a simple scoring rule. If the resulting score rises above a preset threshold, the region along with its position and score is written out. If the score falls below a cutoff threshold, the search is abandoned and begun again at the next nucleotide. Each nucleotide that matches the value predicted (by assuming a repeat) adds to the score. Each "error" subtracts from the score. When an error is encountered, the three possible kinds of errors (mismatch, insertion and deletion) are assumed and recursive calls to the comparison routine are made. If the resulting score from one of these is above the cutoff threshold, it is returned and the best of three pursued.

Here is a sample of the output from sputnik being run against a library constructed from a genbank search for "HUMAN REPEAT" sequences:


> sputnik rep.lib
>hshprma LOCUS HSHPRMA       249 bp    DNA  PRI 01-MAY-1993
dinucleotide 128 : 171 -- length 44 score 35
GAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAAAGAGAGAGA
dinucleotide 184 : 210 -- length 27 score 25
GTGTGTGTGTGTGTGTGTGTGTGTGTG
>hum315mfd LOCUS HUM315MFD     251 bp ds-DNA  PRI 04-AUG-1993
trinucleotide 210 : 246 -- length 37 score 16
TTATTATTATTATTTTATTTTATTTTATTATTATTAT
...
Sputnik can be recompiled to change the score or threshold parameters, or the maximum recursion depth. In practice scores diverge quickly and adjusting these has little effect on anything other than the execution time. It might benefit from a nicer interface and output that was easier to parse.

Sputnik was developed by Chris Abajian at the University of Washington Department of Molecular Biotechnology in September '94. It is not currently "supported" but is a small program and easily modified.

If you have any questions or need help, email me at

Chris Abajian <chrisa@espressosoftware.com>