Automated binning of microsatellite alleles: Problems and solutions
Amos W., Hoffman JI., Frodsham A., Zhang L., Best S., Hill AVS.
As genotyping methods move ever closer to full automation, care must be taken to ensure that there is no equivalent rise in allele-calling error rates. One clear source of error lies with how raw allele lengths are converted into allele classes, a process referred to as binning. Standard automated approaches usually assume collinearity between expected and measured fragment length. Unfortunately, such collinearity is often only approximate, with the consequence that alleles do not conform to a perfect 2-, 3- or 4-base-pair periodicity. To account for these problems, we introduce a method that allows repeat units to be fractionally shorter or longer than their theoretical value. Tested on a large human data set, our algorithm performs well over a wide range of dinucleotide repeat loci. The size of the problem caused by sticking to whole numbers of bases is indicated by the fact that the effective repeat length was within 5% of the assumed length only 68.3% of the time. © 2006 The Authors.