کاربرد الگوریتم جنگل تصادفی در برآورد اثرات نشانگرها و تعیین ژن‌های کاندیدا صفات تولیدمثلی در گاو شیری هلشتاین ایران

نوع مقاله : مقاله پژوهشی

نویسندگان

1 دانشگاه تبریز

2 موسسه تحقیقات علوممؤسسه تحقیقات علوم دامی کشور، سازمان تحقیقات، آموزش و ترویج کشاورزی-کرج دامی کشور

چکیده

یادگیری ماشین رویکرد قدرتمندی برای مطالعات ژنومی است. هدف تحقیق حاضر استفاده از روش یادگیری ماشین (جنگل‌ تصادفی) برای پویش ژنومی صفات تولید مثلی شامل سن در زمان اولین زایش (AFC)، روزهای باز (DO)، فاصله گوساله‌زایی (CI) و نرخ آبستنی دختران (DPR) در گاوهای هلشتاین ایران بود. اطلاعات لازم از مرکز اصلاح نژاد و بهبود تولیدات دامی کشور اخذ شد. اطلاعات ژنوتیپی شامل نشانگرها‌ی چند شکلی تک نوکلئوتیدی (SNP) مربوط به 2419 رأس گاو نر بود. فایل داده رکوردهای ثبت شده سال-های 1360 تا 1398 شامل 2،774،183 رأس دام بود. با توجه به تفاوت تراکم در اطلاعات ژنومی گاوهای نر، تعداد نشانگرهای آن‌ها نیز با یکدیگر متفاوت بود. برای یکسان سازی نشانگرها از نرم افزار FImpute برای جانهی ژنوتیپ استفاده گردید. در این تحقیق با استفاده از الگوریتم جنگل تصادفی که نمونه‌ای از الگوریتم‌های با نظارت و از نوع رگرسیونی هست، در مجموع 21 نشانگر با میزان اهمیت بالا برای صفات مختلف تولید مثلی مشخص شد. سپس از طریق روش هستی‌شناسی ژنی، ژن‌های پیشنهادی مهمی برای این صفات شناسایی شدند. ژن‌های MPZL1 و CD247 شناسایی شده بر روی کروموزوم 3 در رابطه با صفت AFC و ژن‌های RPS6KC1 و FAM170A در رابطه با صفت DPR برای بهبود عملکرد تولید‌مثلی گاوهای شیری مهم بوده و می‌توانند مورد استفاده قرار بگیرند. نشانگرها و ژن‌های شناسایی شده در این تحقیق می‌توانند اطلاعات جدیدی را در مورد معماری ژنتیکی صفات تولیدمثلی برای بهبود ژنومی آن‌ها ارائه دهد و در طراحی تراشه‌ها برای ارزیابی صفات تولیدمثلی مورد استفاده قرار گیرد.

کلیدواژه‌ها

موضوعات


عنوان مقاله [English]

Application of Random Forest Algorithm for estimation of marker effects and the identification of candidate genes of reproductive traits in Iranian Holstein dairy cattle

نویسندگان [English]

  • Sadegh Alijani 1
  • Jeyran Jabbari Tourchi 1
  • abbas rafat 1
  • Mokhtar Ali Abbasi 2
1 دانشگاه تبریز
2 , the National Animal Breeding Center and Promotion of Animal Products, Karaj, Iran
چکیده [English]

Introduction: Genome wide association study (GWAS) are a powerful approach to identify genomic regions related to fertility traits and that explain a significant part of the genetic variance associated of this trait and identify the relevant causal mutations. Evaluating the correlation between each genotyped marker and trait is an essential strategy for GWAS studies that examine the effects of all markers by considering their possible interactions, environmental factors, and even mutual effects between markers (Jiang et al., 2019). Recently, machine learning methods have been introduced to genomic topics, and the basis of these methods is different from the common methods of genomic evaluation. The machine learning method is used to estimate the genomic breeding values of the candidate animals by considering the training data (genotypic and phenotypic information of the reference population). One of the key advantages of this method is the ability to analyze large data. Machine learning is a branch of artificial intelligence whose goal is to achieve machines that are able to extract knowledge (learning) from the environment (Bureau et al., 2005). A variety of machine learning methods (Random Forest, Boosting and Deep learning) are used to model genetic variance and environmental factors, study gene networks, GWAS, study epistasis effects, and genomic evaluation (Yang et al., 2010). Random forest is one of the machine learning methods that has been successfully used in various fields of science. This research was conducted with the aim of identifying markers and genes related to reproductive traits such as calving interval, open days, daughter pregnancy rate and age at first calving in Iranian Holstein dairy cattle. These traits have already been investigated with the ssGBLUP method and using a smaller sample size (Mohammadi et al., 2022). However, in the present research, by using more genotyped animals, random forest algorithm was used to identify markers and genes related to the reproductive traits.This research was conducted using machine learning method to identify SNPs and candidate genes for fertility traits (calving interval, days open, pregnancy rate and age at first calving) in Iranian Holstein dairy cattle.

Materials and method
Phenotypic data: The records used in this research were provided by National Animal Breeding Center and Promotion of Animal Products of Iran and include the traits of age at first calving (AFC) and days open (DO), calving interval (CI) and daughter pregnancy rate (DPR) related to the genotyped bulls' daughters. In this research, the pedigree information of 2,774,183 of animals was used.
Genotypic data: In this research, the genotypic information of the markers related to 2419 Holstein bulls was used. Genomic data quality control was performed using factors such as the number of genotyped SNPs per animal (ACR), the number of genotyped animals per SNP (CR), Hardy-Weinberg equilibrium (HWE) and minor allele Frequency (MAF). took in the filtration of genomic data, the markers whose minor allele Frequency was less than 5% were removed, then the samples whose genotyped frequency was less than 90% were identified and removed. Then the markers whose genotyping rate was less than 95% in the samples were identified and removed. Finally, the SNPs that deviated from the HWE test (p-value<10-6) were excluded from the analysis as a measure of genotyping error. To control the quality of genomic data, PLINK 1.9 software (Purcell et al., 2007) was used. Then Ranfog software (Breiman, 2001) was used in Linux environment to perform analysis through random forest algorithm.

Results and discussion
Random forest-machine learning: By using Random Forest algorithm, a total of 21 importance SNPs were observed, then through the gene ontology method, important candidate genes for fertility traits were identified, and 62 genes were inside or within 250-Kb of these SNPs. The most significant SNP was observed for AFC trait. The most importance SNP for reproductive traits of cattle for AFC trait located in (ARS-BFGL-NGS-22647) BTA3, for CI trait located in (ARS-BFGL-NGS-114194) BTA11, for DO trait located in (BTA-74076) -no-rs) was BTA5 and for DPR trait located in (ARS-BFGL-NGS-32553) was BTA26. The researchers on fertility traits in Nellore cattle through machine learning method, identified MPZL1 and CD247 genes on chromosome number 3, and this gene was related to age at first calving (Alves et al. 2022). Many pathways of cell biology are effective on the performance of reproductive traits, in the research of (Liao et al. 2015), the relationship between the CD247 gene and the pathways of biology including cell development and function was reported. In the research by (Kordowitzki et al. 2021), it was shown that the IFFO2 gene plays an important role in the molecular structure of cells, as well as in the mechanism of the formation of blastocysts, embryos, and the duration of pregnancy of cattle. In a study that was conducted on the mice population about the structure of the flagellum and the sperm maturation process, the role of the ALDH4A1 gene in the sperm maturation process was reported by Xuan (Xuan et al. 2022). The related of RPS6KC1 gene with pregnancy rate and the number of antral follicles in Nellore heifers was reported by Santana (Santana et al. 2015). KAT2B gene is a transcription activator that plays an essential role in regulating the correction of histone acetylation and plays an important role in improving carcass quality, development and metabolism of muscles and fat in native Chinese beef cattle, they also play a key role in regulating biological processes and are related to cell growth, metabolism and immune system function (Lin et al. 2022).

Conclusion: According to the objectives of this research, new information about markers and candidate genes related to reproductive traits in Iranian Holstein dairy cattle was reported. The markers and candidate genes identified in the present research can be used in genomic selection to improve the reproductive traits of Holstein dairy cattle.

کلیدواژه‌ها [English]

  • Genotype
  • Machine learning
  • Marker and Random Forest