Impact of marker density and reference population size on accuracy of imputation in simulated data

Document Type : Research Paper

Author

Assistant Professor, Department of Animal Science, Faculty of Agriculture, Ilam University, Ilam, Iran

Abstract

In this study, effect of the reference population size and the number of missing single nucleotide polymorphisms (SNPs) on imputation accuracy was assessed. The QMSim software was used to create a reference database of 1000 simulated animals. Two datasets were created from the database reference: The first dataset (A), included original genotypes, containing the missing SNPs (52,000 SNP markers), and the second one (B) included the same genotypes without the missing data (37,000 SNP markers). In both datasets, animals were simulated for a reference population with the size of 100, 250, 500 and 750. The deleted SNPs were simulated randomly in both datasets with the proportion of 15%, 30%, 55%, 70%, and 95%. The accuracy was determined based on the correlation between the original SNP values before deletion and its values after imputation. The results of this study showed that the accuracy of the imputation was influenced by the size of reference population and density of the deleted SNP markers. By increasing the reference population size from 100 to 750 animals in both datasets, the average accuracy of the imputation was increased. The highest accuracy in the reference population of 750 animals was from 0.89 to 0.98 in dataset A and 0.90 to 0.99 in dataset B. Generally, the results showed that if the size of the reference population is sufficient, the imputation accuracy does not much change, despite large number of missing SNPs.

Keywords

Main Subjects


Browning S. and Browning B. 2011. Haplotype phasing: Ex­isting methods and new developments. Nature Reviews Genetics, 12: 703-714.
Browning B., Zhou Y. and Browning S. 2018. A one-penny imputed genome from next-generation reference panels. The American Journal of Human Genetics, 103: 338-348.
Carvalheiro R., Boison S., Neves H., Sargolzaei M., Schenkel F., Utsunomiya Y., O’Brien A., Solkner J., McEwan J., Van Tassell C., Sonstegard T. and Garcia J. 2014. Accuracy of genotype imputation in Nelore cattle. Genetics Selection Evolution, 46, 69.
Daetwyler H., Wiggans G., Hayes B., Woolliams J. and Goddard M. 2011. Imputation of missing genotypes from sparse to high density using long-range phasing. Genetics, 189: 317-327.
Elsen J. M. 2016. Approximated prediction of genomic selection accuracy when reference and candidate populations are related. Genetics Selection Evolution, 48, 16.
Ghoreishifar S. M., Moradi- Shahrbabak H., Moradi- Shahrb­abak M., Nicolazzi E. L., Williams J. L., Iamartino D. and Nejati- Javaremi A. 2018. Accuracy of imputation of single-nucleotide polymorphism marker genotypes for water buffaloes (Bubalus bubalis) using different refer­ence population sizes and imputation tools. Livestock Science, 216: 174-182.
Hayes B., Bowman P., Chamberlain A. and Goddard M. 2009. Invited review: Genomic selection in dairy cattle. Journal of Dairy Science, 92: 433-443.
Hickey J., Kinghorn B., Tier B., Wilson J., Dunstan N. and Van der Werf J. 2011. A combined long-range phasing and long haplotype imputation method to impute phase for SNP genotypes. Genetics Selection Evolution, 43, 12.
Hong L. S, Clark S. and Van der Werf J. 2017. Estimation of genomic prediction accuracy from reference populations with varying degrees of relationship. Plos One, 21, 1-22.
Kranjčevičová A., Kašná E., Brzáková M. A, Přiby J. and Vostrý L. 2019. Impact of reference population size and marker density on accuracy of population imputation, Czech Journal of Animal Science, 64: 405-410.
Mulder H., Calus M., Druet T. and Schrooten C. 2012. Impu­tation of genotypes with low-density chips and its effect on reliability of direct genomic values in Dutch Holstein cattle. Journal of Dairy Science, 95: 876-889.
Nicolazzi E., Biffani S. and Jansen G. 2013. Short communica­tion: Imputing genotypes using PedImpute fast algorithm combining pedigree and population information. Journal of Dairy Science, 96: 2649-2653.
Olson K. M., VanRaden P. M., Tooker M. E. and Cooper T. A. 2011. Differences among methods to validate genomic evaluations for dairy cattle. Journal of Dairy Science, 94: 2613-2620.
Sargolzaei M., Chesnais J. and Schenkel F. 2014. A new ap­proach for efficient genotype imputation using informa­tion from relatives. BMC Genomics, 15, 12.
Sargolzaei M. and Schenkel F. S. 2009. QMSim: a large-scale genome simulator for livestock. Bioinformatics, 25: 680-681.
Schaeffer L. 2006. Strategy for applying genome-wide selection in dairy cattle. Journal of Animal Breeding and Genetics, 23: 218-223.
Schurz H.Stephanie J. M., van Helden P. D., Tromp G., Hoal E. G., Kinnear C. J.  and Möller M. 2019. Evaluating the accuracy of imputation methods in a five-way admixed population. Frontiers in Genetics, 10: 34.
VanRaden P., O’Connell J., Wiggans G. and Weigel K. 2011. Genomic evaluations with many more genotypes. Genet­ics Selection Evolution, 43, 1-10.
VanRaden P., Sun C. and O’Connell J. 2015. Fast imputation using medium or low-coverage sequence data. BMC Genetics, 16(82): 2039-2042.
Ventura R., Lu D., Schenkel F. S., Wang Z., Li C. and Miller S. P. 2014. Impact of reference population on accuracy of im­putation from 6K to 50K single nucleotide polymorphism chips in purebred and crossbreed beef cattle. Journal of Animal Science, 92: 1433-1444.
Wang Y., Lin G., Li C. and Stothard P. 2016. Genotype imputa­tion methods and their effects on genomic predictions in cattle. Springer Science Reviews, 4: 79-98.
Whalen A., Gorjanc G., Ros- Freixedes R. and Hickey J. 2018. Assessment of the performance of hidden Markov models for imputation in animal breeding. Genetics Selection Evolution, 50, 4-10.
Zhang Z. and Druet T. 2010. Marker imputation with low-density marker panels in Dutch Holstein cattle. Journal of Dairy Science, 93: 5487-5494.