An evaluation of HapMap sample size and tagging SNP performance in large-scale empirical and simulated data sets.
Zeggini E., Rayner W., Morris AP., Hattersley AT., Walker M., Hitman GA., Deloukas P., Cardon LR., McCarthy MI.
A substantial investment has been made in the generation of large public resources designed to enable the identification of tag SNP sets, but data establishing the adequacy of the sample sizes used are limited. Using large-scale empirical and simulated data sets, we found that the sample sizes used in the HapMap project are sufficient to capture common variation, but that performance declines substantially for variants with minor allele frequencies of <5%.