اثر تراکم نشانگرها و اندازه جمعیت مرجع بر صحت مستندسازی در داده شبیه سازی شده

نوع مقاله : مقاله پژوهشی

نویسنده

استادیار، گروه علوم دامی، دانشکده کشاورزی، دانشگاه ایلام

چکیده

در پژوهش حاضر، اثر اندازه جمعیت مرجع و تعداد نشانگرهای چندشکلی تک نوکلئوتیدی (SNP) گم­­شده بر صحت مستندسازی (ایمپیوتیشن) مورد بررسی قرار گرفت. از نرم‌افزار QMSim برای ایجاد بانک اطلاعاتی مرجع به تعداد 1000 حیوان شبیه­سازی شده استفاده شد. از داده­های مرجع دو دسته ایجاد شد: دسته اول (A) شامل ژنوتیپ­های اصلی حاوی داده­های گم­شده (تعداد 52 هزار نشانگر SNP) و دسته دوم (B) با خروج داده­های گم­شده از مجموع داده­ها (تعداد 37 هزار نشانگر SNP) ایجاد شد. در هر دو دسته، تعداد جمعیت مرجع با 100، 250، 500 و 750 حیوان شبیه­سازی شد. تعداد نشانگرهای SNP حذف شده به طور تصادفی و با نسبت­های 15، 30، 55، 70 و 95 درصد در هر دو دسته شبیه­سازی شد. بر اساس همبستگی بین ارزش نشانگرهای SNP اصلی قبل از حذف و ارزش آن­ها بعد از مستندسازی، صحت برآورد شد. نتایج مطالعه حاضر نشان داد که صحت مستندسازی تحت تأثیر اندازه جمعیت مرجع و تراکم نشانگرهای SNP گم­شده قرار داشت. با افزایش اندازه جمعیت مرجع از 100 به 750 حیوان، متوسط صحت مستندسازی در هر دو دسته افزایش یافت. بیشترین میزان صحت برای جمعیت مرجع با 750 حیوان در دامنه 89/0 تا 98/0 برای دسته A و 90/0 تا 99/0 برای دسته B  مشاهده شد. به طور کلی، نتایج نشان داد که اگر اندازه جمعیت مرجع به اندازه کافی باشد، علی­رغم تعداد زیاد نشانگر SNP گم­شده، صحت مستندسازی تغییر زیادی نخواهد کرد.

کلیدواژه‌ها

موضوعات


عنوان مقاله [English]

Impact of marker density and reference population size on accuracy of imputation in simulated data

نویسنده [English]

  • Y. Mohammadi
Assistant Professor, Department of Animal Science, Faculty of Agriculture, Ilam University, Ilam, Iran
چکیده [English]

In this study, effect of the reference population size and the number of missing single nucleotide polymorphisms (SNPs) on imputation accuracy was assessed. The QMSim software was used to create a reference database of 1000 simulated animals. Two datasets were created from the database reference: The first dataset (A), included original genotypes, containing the missing SNPs (52,000 SNP markers), and the second one (B) included the same genotypes without the missing data (37,000 SNP markers). In both datasets, animals were simulated for a reference population with the size of 100, 250, 500 and 750. The deleted SNPs were simulated randomly in both datasets with the proportion of 15%, 30%, 55%, 70%, and 95%. The accuracy was determined based on the correlation between the original SNP values before deletion and its values after imputation. The results of this study showed that the accuracy of the imputation was influenced by the size of reference population and density of the deleted SNP markers. By increasing the reference population size from 100 to 750 animals in both datasets, the average accuracy of the imputation was increased. The highest accuracy in the reference population of 750 animals was from 0.89 to 0.98 in dataset A and 0.90 to 0.99 in dataset B. Generally, the results showed that if the size of the reference population is sufficient, the imputation accuracy does not much change, despite large number of missing SNPs.

کلیدواژه‌ها [English]

  • Genomic evaluation
  • Missing data
  • Animal
  • Prediction accuracy
  • Imputation
Browning S. and Browning B. 2011. Haplotype phasing: Ex­isting methods and new developments. Nature Reviews Genetics, 12: 703-714.
Browning B., Zhou Y. and Browning S. 2018. A one-penny imputed genome from next-generation reference panels. The American Journal of Human Genetics, 103: 338-348.
Carvalheiro R., Boison S., Neves H., Sargolzaei M., Schenkel F., Utsunomiya Y., O’Brien A., Solkner J., McEwan J., Van Tassell C., Sonstegard T. and Garcia J. 2014. Accuracy of genotype imputation in Nelore cattle. Genetics Selection Evolution, 46, 69.
Daetwyler H., Wiggans G., Hayes B., Woolliams J. and Goddard M. 2011. Imputation of missing genotypes from sparse to high density using long-range phasing. Genetics, 189: 317-327.
Elsen J. M. 2016. Approximated prediction of genomic selection accuracy when reference and candidate populations are related. Genetics Selection Evolution, 48, 16.
Ghoreishifar S. M., Moradi- Shahrbabak H., Moradi- Shahrb­abak M., Nicolazzi E. L., Williams J. L., Iamartino D. and Nejati- Javaremi A. 2018. Accuracy of imputation of single-nucleotide polymorphism marker genotypes for water buffaloes (Bubalus bubalis) using different refer­ence population sizes and imputation tools. Livestock Science, 216: 174-182.
Hayes B., Bowman P., Chamberlain A. and Goddard M. 2009. Invited review: Genomic selection in dairy cattle. Journal of Dairy Science, 92: 433-443.
Hickey J., Kinghorn B., Tier B., Wilson J., Dunstan N. and Van der Werf J. 2011. A combined long-range phasing and long haplotype imputation method to impute phase for SNP genotypes. Genetics Selection Evolution, 43, 12.
Hong L. S, Clark S. and Van der Werf J. 2017. Estimation of genomic prediction accuracy from reference populations with varying degrees of relationship. Plos One, 21, 1-22.
Kranjčevičová A., Kašná E., Brzáková M. A, Přiby J. and Vostrý L. 2019. Impact of reference population size and marker density on accuracy of population imputation, Czech Journal of Animal Science, 64: 405-410.
Mulder H., Calus M., Druet T. and Schrooten C. 2012. Impu­tation of genotypes with low-density chips and its effect on reliability of direct genomic values in Dutch Holstein cattle. Journal of Dairy Science, 95: 876-889.
Nicolazzi E., Biffani S. and Jansen G. 2013. Short communica­tion: Imputing genotypes using PedImpute fast algorithm combining pedigree and population information. Journal of Dairy Science, 96: 2649-2653.
Olson K. M., VanRaden P. M., Tooker M. E. and Cooper T. A. 2011. Differences among methods to validate genomic evaluations for dairy cattle. Journal of Dairy Science, 94: 2613-2620.
Sargolzaei M., Chesnais J. and Schenkel F. 2014. A new ap­proach for efficient genotype imputation using informa­tion from relatives. BMC Genomics, 15, 12.
Sargolzaei M. and Schenkel F. S. 2009. QMSim: a large-scale genome simulator for livestock. Bioinformatics, 25: 680-681.
Schaeffer L. 2006. Strategy for applying genome-wide selection in dairy cattle. Journal of Animal Breeding and Genetics, 23: 218-223.
Schurz H.Stephanie J. M., van Helden P. D., Tromp G., Hoal E. G., Kinnear C. J.  and Möller M. 2019. Evaluating the accuracy of imputation methods in a five-way admixed population. Frontiers in Genetics, 10: 34.
VanRaden P., O’Connell J., Wiggans G. and Weigel K. 2011. Genomic evaluations with many more genotypes. Genet­ics Selection Evolution, 43, 1-10.
VanRaden P., Sun C. and O’Connell J. 2015. Fast imputation using medium or low-coverage sequence data. BMC Genetics, 16(82): 2039-2042.
Ventura R., Lu D., Schenkel F. S., Wang Z., Li C. and Miller S. P. 2014. Impact of reference population on accuracy of im­putation from 6K to 50K single nucleotide polymorphism chips in purebred and crossbreed beef cattle. Journal of Animal Science, 92: 1433-1444.
Wang Y., Lin G., Li C. and Stothard P. 2016. Genotype imputa­tion methods and their effects on genomic predictions in cattle. Springer Science Reviews, 4: 79-98.
Whalen A., Gorjanc G., Ros- Freixedes R. and Hickey J. 2018. Assessment of the performance of hidden Markov models for imputation in animal breeding. Genetics Selection Evolution, 50, 4-10.
Zhang Z. and Druet T. 2010. Marker imputation with low-density marker panels in Dutch Holstein cattle. Journal of Dairy Science, 93: 5487-5494.