ارزیابی عملکرد ماشین بردار پشتیبان با کرنل های مختلف در تجزیه ژنومی در سطوح مختلف واریانس غالبیت

نوع مقاله : مقاله پژوهشی

نویسندگان

گروه علوم دامی، دانشکده علوم دامی و شیلات، دانشگاه علوم کشاورزی و منابع طبیعی ساری

چکیده

هدف از پژوهش حاضر، بررسی و مقایسه صحت پیش­بینی ژنومی روش­ ماشین بردار پشتیبان (SVM) بر ­اساس توابع کرنل مختلف شامل خطی (SVM-lin)، شعاعی (SVM-rad)، چند­جمله­ای (SVM-pol) و حلقوی (SVM-sig)، و روش GBLUP در مدل­های کنش ژنی صرفاً افزایشی و افزایشی-انحراف غالبیت با در نظر گرفتن سطوح مختلف واریانس غالبیت بود. بدین منظور، ژنومی حاوی شش کروموزوم و به­طول 600 سانتی­مورگان شبیه­سازی شد. روی هر کروموزوم، 1000 نشانگر چندشکلی‌ تک نوکلئوتیدی (SNP) با فواصل یکسان و 100 جایگاه صفت کمّی (QTL) به­طور تصادفی در نظر گرفته شد. واریانس فنوتیپی و وراثت پذیری به­ترتیب برابر با 1 و 4/0 در نظر گرفته شد. واریانس انحراف غالبیت برابر ­با 10/0، 15/0، 20/0، 25/0، 30/0 و 35/0 در نظر گرفته شد. صحت پیش­بینی به­عنوان ضریب همبستگی پیرسون بین ارزش ژنتیکی واقعی (TGV) یا ارزش اصلاحی واقعی(TBV) و ارزش ژنتیکی ژنومی (GEGV) یا ارزش اصلاحی ژنومی (GEBV) تعریف شد. روش مرسوم GBLUP در تمام سناریوهای مختلف واریانس غالبیت، صحت پیش­بینی GEBV و GEGV بالاتری را نشان داد. در بین رویکردهای مختلف SVM، در مدل صرفاً افزایشی و افزایشی- انحراف غالبیت بر اساس صحت پیش­­بینی GEGV، رویکردهای SVM-rad و SVM-sig به­ترتیب بالاترین عملکرد را نشان دادند. بر اساس صحت پیش­بینی GEBV، با افزایش واریانس غالبیت، این برتری به­شدت کاهش یافت، به­‌­طوری که در واریانس غالبیت بیشتر از 30/0، رویکردهای SVM-lin و SVM-sig به­ترتیب صحت پیش­بینی GEBV اندکی بالاتر و برابر با SVM-rad نشان دادند. به­طورکلی، در برازش فنوتیپ روی نشانگرها با روش ناپارامتری SVM، استفاده از تابع کرنل شعاعی در مدل پیشنهاد می­شود.

کلیدواژه‌ها

موضوعات


عنوان مقاله [English]

Evaluating the performance of support vector machine with different kernels in genomic analysis at different levels of dominance variance

نویسندگان [English]

  • H. Sahebalam
  • M. Gholizadeh
  • H. Hafezian
Department of Animal Science, Faculty of Animal Science and Fisheries, Sari Agricultural Sciences and Natural Resources University, Sari, Iran
چکیده [English]

Introduction: Predicting quantitative traits is a fundamental aspect of plant and animal breeding. Genomic selection is a precise and efficient approach that estimates genetic merit using high-density single nucleotide polymorphisms (SNPs). However, most genomic selection procedures primarily focus on additive effects to calculate the genomic estimated breeding value (GEBV) for selection candidates. Nonetheless, incorporating non-additive effects offers several advantages: (i) it enhances the accuracy of GEBV predictions and subsequent selection responses, (ii) it facilitates optimized mate allocation among selection candidates, and (iii) it enables improved utilization of non-additive genetic variation through tailored crossbreeding or purebred breeding strategies. One of the most challenging factors affecting the accuracy of genomic evaluation is the selection of an appropriate statistical method to estimate marker effects with high accuracy. The most common parametric methods for genomic evaluation are the genomic best linear unbiased predictor (GBLUP) and Bayesian methods, which use the co-variance structure between individuals and regression of phenotype on markers to predict the genetic values ​​of individuals, respectively. However, in recent years, non-parametric methods of machine learning have been widely used for genomic evaluation in animal and plant breeding programs. This study aimed to evaluate and compare the accuracy of genomic predictions using GBLUP and support vector machine (SVM) methods. The SVM models employed various kernel functions, including linear (SVM-lin), Gaussian radial (SVM-rad), polynomial (SVM-pol), and cyclic (SVM-sig). Both purely additive and additive + dominance deviation gene action models were considered under varying levels of dominance variance.
Materials and methods: A simulated genome comprising six chromosomes with a total length of 600 cM was used for this study. Each chromosome contained 1,000 evenly spaced SNPs and 100 randomly distributed quantitative trait loci (QTLs). Phenotypic variance ( ) and narrow-sense heritability ( ) were set to 1 and 0.4, respectively. Dominance variance ( ) levels were evaluated at 0.10, 0.15, 0.20, 0.25, 0.30, and 0.35. Prediction accuracy was calculated as the Pearson correlation coefficient between the true genetic value (TGV) or true breeding value (TBV) and their respective genomic estimates (GEGV or GEBV). The correlations were represented as  and , respectively. In addition, the practical significance of differences in prediction accuracy among the studied statistical methods was assessed using Cohen’s d effect size during 100 replicates.
Results and discussion: The conventional GBLUP method consistently exhibited higher prediction accuracy for both GEBV and GEGV across all scenarios. Among the SVM approaches, the SVM-rad and SVM-sig kernels showed superior performance in predicting GEGV under both purely additive and additive + dominance deviation models. However, for GEBV prediction, their performance declined with increasing dominance variance. When dominance variance exceeded 0.30, SVM-lin and SVM-sig demonstrated prediction accuracy comparable to or slightly better than SVM-rad. The differences in prediction accuracy between GBLUP and SVM-rad were minimal (d=0.218) in the purely additive model but reached their peak (d=0.492 and d=0.404) in the additive + dominance deviation model at the highest dominance variance ( ). This disparity occurred because, as dominance variance increased, the GBLUP method exhibited slightly greater changes in accuracy compared to the SVM-rad method. Furthermore, with higher dominance variance, the difference in prediction accuracy for GEBV between the SVM methods and both GBLUP and SVM-rad substantially decreased. For example, in the purely additive model ( ), Cohen’s d was 2.608 and 2.336, respectively, while in the additive + dominance deviation model ( ), d dropped to 0.309 and 0.189, respectively. Using the additive + dominance deviation model significantly improved GEBV prediction accuracy, particularly when dominance variance contributed substantially to phenotypic variance. This improvement is due to dominance deviation, which arises from interactions between alleles at a locus. While additive effects are represented as breeding values, which partially incorporate dominance effects, genomic evaluations that explicitly consider dominance effects can further enhance GEBV accuracy.
Conclusions: The choice of kernel function in SVM models plays a pivotal role in the accuracy of GEBV and GEGV predictions. Overall, when applying the nonparametric SVM method to fit markers to phenotypes, the Gaussian radial kernel function is recommended for optimal performance. However, as the dominance variance increased, the performance of SVM-lin and SVM-sig methods improved significantly and the performance gap with GBLUP and SVM-rad decreased. This indicated the potential capacity of SVM in investigating non-additive effects, especially in situations where the contribution of dominance in explaining phenotypic variance increases.

کلیدواژه‌ها [English]

  • Genomic breeding value
  • Kernel function
  • Genomic analysis
  • Prediction accuracy
  • Support vector machine
Akbarpour, T., Ghavi Hossein-Zadeh, N., & Shadparvar, A. A. (2021). Marker genotyping error effects on genomic predictions under different genetic architectures. Molecular Genetics and Genomics, 296, 79-89. doi: 10.1007/s00438-020-01728-z.
Aliloo, H., Pryce, J. E., González-Recio, O., Cocks, B. G., & Hayes, B. J. (2016). Accounting for dominance to improve genomic evaluations of dairy cows for fertility and milk production traits. Genetics Selection Evolution48, 1-11. doi: 10.1186/s12711-016-0186-0
Aliloo, H., Pryce, J. E., González-Recio, O., Cocks, B. G., Goddard, M. E., & Hayes, B. J. (2017). Including nonadditive genetic effects in mating programs to maximize dairy farm profitability. Journal of Dairy Science100(2), 1203-1222. doi: 10.3168/jds.2016-11261
Ansari, S., Ghavi Hossein-Zadeh, N., & Shadparvar, A. A. (2024). Genomic predictions under different genetic architectures are impacted by mating designs. Veterinary and Animal Science, 25, 100373. doi: 10.1016/j.vas.2024.100373
Atefi, A., Shadparvar, A. A., & Hossein-Zadeh, N. G. (2021). Accuracy of genomic evaluation considering the interaction effect between estimation method of marker effects, population structure, and genetic architecture of the trait. Animal Production Research, 10(2), 1-10. doi: 10.22124/ar.2021.16234.1520 [In Persian]
Blondel, M., Onogi, A., Iwata, H., & Ueda, N. (2015). A ranking approach to genomic selection. PloS One10(6), e0128570. doi: 10.1371/journal.pone.0128570
Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. In: Proceedings of the fifth annual workshop on Computational Learning Theory. Pp. 144-152. doi: 10.1145/130385.130401
Cohen, J. (1988). Statistical power analysis for the behavioral sciences. New York, NY: Routledge Academic. doi: 10.4324/9780203771587
Cortes, C., & Vapnik, V. (1995). Support-Vector Networks. Machine Learning, 20, 273-297. doi: 10.1007/BF00994018.
Crow, J. F. (2010). On epistasis: why it is unimportant in polygenic directional selection. Philosophical Transactions of the Royal Society B: Biological Sciences365(1544), 1241-1244. doi: 10.1098/rstb.2009.0275
de los Campos, G., Naya, H., Gianola, D., Crossa, J., Legarra, A., Manfredi, E., Weigel, K., & Cotes, J. M. (2009). Predicting quantitative traits with regression models for dense molecular markers and pedigree. Genetics182(1), 375-385. doi: 10.1534/genetics.109.101501
Duenk, P., Calus, M. P., Wientjes, Y. C., & Bijma, P. (2017). Benefits of dominance over additive models for the estimation of average effects in the presence of dominance. G3: Genes, Genomes, Genetics7(10), 3405-3414. doi: 10.1534/g3.117.300113
Esfandyari, H., & Sørensen, A. C. (2017). xbreed: an R package for genomic simulation of purebreds and crossbreds. In Book of Abstracts of the 68th Annual Meeting of the European Federation of Animal Science. Pp. 234-234. doi: 10.3920/9789086868599_313
Falconer, D. S., and McKay, T. (1996). Introduction to quantitative genetics. Harlow: Pearson Education Limited.
Gianola, D., Fernando, R. L., & Stella, A. (2006). Genomic-assisted prediction of genetic value with semiparametric procedures. Genetics173(3), 1761-1776. doi: 10.1534/genetics.105.049510
Ghafouri-Kesbi, F., Rahimi-Mianji, G., Honarvar, M., & Nejati-Javaremi, A. (2016). Predictive ability of random forests, boosting, support vector machines and genomic best linear unbiased prediction in different scenarios of genomic evaluation. Animal Production Science57(2), 229-236. doi: 10.1071/AN15538
Goddard, M. E., & Hayes, B. J. (2007). Genomic selection. Journal of Animal Breeding and Genetics124(6), 323-330. doi: 10.1111/j.1439-0388.2007. 00702.x
González-Recio, O., Rosa, G. J., & Gianola, D. (2014). Machine learning methods and predictive ability metrics for genome-wide prediction of complex traits. Livestock Science166, 217-231. doi: 10.1016/j.livsci.2014.05.036
Habier, D., Fernando, R. L., & Dekkers, J. (2007). The impact of genetic relationship information on genome-assisted breeding values. Genetics177(4), 2389-2397. doi: 10.1534/genetics.107.081190
Hastie, T. J., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning, 2nd edition. Springer-Verlag, New York.
Hayes, B. J., Visscher, P. M., & Goddard, M. E. (2009). Increased accuracy of artificial selection by using the realized relationship matrix. Genetics Research91(1), 47-60. doi: 10.1017/S0016672308009981
Henderson, C. R. (1976). A simple method for computing the inverse of a numerator relationship matrix used in prediction of breeding values. Biometrics, 32(1), 69-83. doi: 10.2307/2529339
Hill, W. G. (2010). Understanding and using quantitative genetic variation. Philosophical Transactions of the Royal Society B: Biological Sciences365(1537), 73-85. doi: 10.1098/rstb.2009.0203
Hill, W. G., & Robertson, A. (1968). Linkage disequilibrium in finite populations. Theoretical and Applied Genetics38, 226-231. doi: 10.1007/BF01245622
Howard, R., Carriquiry, A. L., & Beavis, W. D. (2014). Parametric and nonparametric statistical methods for genomic selection of traits with additive and epistatic genetic architectures. G3: Genes, Genomes, Genetics4(6), 1027-1046. doi: 10.1534/g3.114.010298
Karimi, M., Ghafouri-Kesbi, F., & Zamani, P. (2023). Investigating the impact of dominance genetic effects on the accuracy of genomic evaluation. Research on Animal Production14(1), 145-153. doi:10.61186/rap.14.39.145 [In Persian]
Kasnavi, S. A., Aminafshar, M., Shariati, M. M., Kashan, N. E. J., & Honarvar, M. (2018). The effect of kernel selection on genome wide prediction of discrete traits by Support Vector Machine. Gene Reports11, 279-282. doi: 10.1016/j.genrep.2018.04.006
Long, N., Gianola, D., Rosa, G. J., & Weigel, K. A. (2011). Application of support vector regression to genome-assisted prediction of quantitative traits. Theoretical and Applied Genetics123, 1065-1074. doi: 10.1007/s00122-011-1648-y
Mäki-Tanila, A. (2007). An overview on quantitative and genomic tools for utilising dominance genetic variation in improving animal production. Agricultural and Food Science16(2), 188-198. doi: 10.2137/145960607782219337
Meuwissen, T. H., Hayes, B. J., & Goddard, M. (2001). Prediction of total genetic value using genome-wide dense marker maps. Genetics157(4), 1819-1829. doi: 10.1093/genetics/157.4.1819
Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F., Chang, C. C., & Lin, C. C. (2019). e1071: misc functions of the department of statistics, probability theory group (formerly: E1071), TU Wien. Available at:  https://cran.r-project.org/web/packages/e1071/e1071.pdf
Mohammadi, Y. (2019). Accuracy of genomic selection using models with additive effects for productive traits in Iranian Holstein cows. The second international conference and the third national conference on agriculture, environment and food security. Jiroft University, Jiroft, Iran. [In Persian]
Momen, M., Mehrgardi, A. A., Sheikhi, A., Kranis, A., Tusell, L., Morota, G., Rosa, G. J. M., & Gianola, D. (2018). Predictive ability of genome-assisted statistical models under various forms of gene action. Scientific Reports8(1), 12309. doi: 10.1038/s41598-018-30089-2
Neves, H. H., Carvalheiro, R., & Queiroz, S. A. (2012). A comparison of statistical methods for genomic selection in a mice population. BMC Genetics13, 1-17. doi: 10.1186/1471-2156-13-100
Nocedal, J., & Wright, S. J. (Eds.). (1999). Numerical optimization. New York, NY: Springer New York.                doi: 10.1007/0-387-22742-3_18
Ogutu, J. O., Piepho, H. P., & Schulz-Streeck, T. (2011). A comparison of random forests, boosting and support vector machines for genomic selection. BMC Proceedings5, 1-5. doi: 10.1186/1753-6561-5-S3-S11
Pérez, P., & de Los Campos, G. (2014). Genome-wide regression and prediction with the BGLR statistical package. Genetics198(2), 483-495. doi: 10.1534/genetics.114.164442
Quaas, R. L. (1976). Computing the diagonal elements and inverse of a large numerator relationship matrix. Biometrics, 32(4), 949-953. doi: 10.2307/2529279
Saheb Alam, H., Gholizadeh, M., Hafezian, H., & Farhadi, A. (2018). Comparison of Bayesian methods in the genomic evaluation with different genetic architecture. Research on Animal Production8(18), 177-186. doi: 10.29252/rap.8.18.177 [In Persian]
Sahebalam, H., Gholizadeh, M., Hafezian, H., & Farhadi, A. (2019). Comparison of parametric, semiparametric and nonparametric methods in genomic evaluation. Journal of Genetics98, 1-8. doi: 10.1007/s12041-019-1149-3
Sahebalam, H., Gholizadeh, M., Hafezian, H., & Ebrahimi, F. (2022). Evaluation of Bagging approach versus GBLUP and Bayesian LASSO in genomic prediction. Journal of Genetics101(1), 19. doi: 10.1007/s12041-022-01358-x
Sahebalam, H., Gholizadeh, M., & Hafezian, H. (2024). Investigating the performance of frequentist and Bayesian techniques in genomic evaluation. Biochemical Genetics, 1-27. doi: 10.1007/s10528-024-10842-1
Salehi, A., Bazrafshan, M., & Abdollahi-Arpanahi, R. (2021). Assessment of parametric and non-parametric methods for prediction of quantitative traits with non-additive genetic architecture. Annals of Animal Science21(2), 469-484. doi: 10.2478/aoas-2020-0087
Schölkopf, B. (2002). Learning with kernels: support vector machines, regularization, optimization, and beyond.
Seyedsharifi, R., Ala Noshahr, F., Seif Davati, J., & Hedayat Evrigh, N. (2022). Genomic prediction of additive and dominance effects on some economic traits of Moghani sheep. Research on Animal Production13(38), 187-193. doi:10.52547/rap.13.38.187 [In Persian]
Shin, K. S., Lee, T. S., & Kim, H. J. (2005). An application of support vector machines in bankruptcy prediction model. Expert Systems with Applications28(1), 127-135. doi: 10.1016/j.eswa.2004.08.009
Su, G., Christensen, O. F., Ostersen, T., Henryon, M., & Lund, M. S. (2012). Estimating additive and non-additive genetic variances and predicting genetic merits using genome-wide dense single nucleotide polymorphism markers. PLoS One. 7, e45293. doi: 10.1371/journal.pone.0045293
Tamaddoni-Arani, M., Razmkabir, M., Abdollahi-Arpanahi, R., Rashidi, A., & Moradi, Z. (2021). Comparison of different statistical methods in genomic selection based on selection effectiveness criteria. Animal Production Research, 10(3), 1-20. doi: 10.22124/ar.2021.19332.1608 [In Persian]
Thomasen, J. R., Sørensen, A. C., Su, G., Madsen, P., Lund, M. S., & Guldbrandtsen, B. (2013). The admixed population structure in Danish Jersey dairy cattle challenges accurate genomic predictions. Journal of Animal Science91(7), 3105-3112. doi: 10.2527/jas.2012-5490
Toro, M. A., & Varona, L. (2010). A note on mate allocation for dominance handling in genomic selection. Genetics Selection Evolution42, 1-9. doi: 10.1186/1297-9686-42-33
VanRaden, P. M. (2008). Efficient methods to compute genomic predictions. Journal of Dairy Science91(11), 4414-4423. doi: 10.3168/jds.2007-0980
Vapnik, V. (1995). The nature of statistical learning theory. (2nd ed.). Springer. doi: 10.1007/978-1-4757-3264-1
Varona, L., Legarra, A., Toro, M. A., & Vitezica, Z. G. (2018). Non-additive effects in genomic selection. Frontiers in Genetics, 9(78), 1-12. doi: 10.3389/fgene.2018.00078
Vitezica, Z. G., Varona, L., & Legarra, A. (2013). On the additive and dominant variance and covariance of individuals within the genomic selection scope. Genetics195(4), 1223-1230. doi: 10.1534/genetics.113.155176 
Yang, P., Hwa Yang, Y., Zhou, B. B., & Zomaya, A. Y. (2010). A review of ensemble methods in bioinformatics. Current Bioinformatics5(4), 296-308. doi: 10.2174/157489310794072508
Zeng, J., Toosi, A., Fernando, R. L., Dekkers, J. C., & Garrick, D. J. (2013). Genomic selection of purebred animals for crossbred performance in the presence of dominant gene action. Genetics Selection Evolution45, 1-17. doi: 10.1186/1297-9686-45-11
Zhu, Y., Tan, Y., Hua, Y., Wang, M., Zhang, G., & Zhang, J. (2010). Feature selection and performance evaluation of support vector machine (SVM)-based classifier for differentiating benign and malignant pulmonary nodules by computed tomography. Journal of Digital Imaging23, 51-65. doi: 10.1007/s10278-009-9185-9