Всероссийский научно-исследовательский институт физиологии, биохимии и питания животных – филиал Федерального государственного бюджетного научного учреждения «Федеральный научный центр животноводства – ВИЖ имени академика Л.К. Эрнста»
Рассматриваются современные методы компьютерного анализа данных, применяемые при решении широкого круга задач, возникающих в области современного продуктивного животноводства. Некоторые задачи этого круга, рассмотренные в более ранней публикации авторов, включают в себя выявление клинического мастита при автоматическом доении, исхода лечения респираторных заболеваний, индивидуальной продуктивной ценности, фертильности, обоснование решений по выбраковке коров, прогноз различных продуктивных показателей на основании геномных оценок. Оптимизации управления стадом на основании получаемых оценок является перспективным направлением повышения эффективности современного животноводства. Настоящая статья содержит описание основных методов анализа эмпирических данных. В статье кратко описываются методы построения регрессионных зависимостей: метод наименьших квадратов, модели, учитывающие случайные эффекты, байесовские, гребневые регрессии. Описаны частичная регрессия, ядерная регрессия и адаптивная многомерная непараметрическая сплайн-регрессия. Рассмотрен классический метод дискриминантного анализа, метод логистической регрессии, методы наивного Байеса и k ближайших соседей, метод опорных векторов, искусственные нейронные сети, деревья решений и случайный лес. В приложении приведены основные метрики, принятые для характеристики качества решения задач классификации и построения регрессионных зависимостей: точность, полнота, метрика F1, чувствительность, специфичность, кривая ошибок, площадь под кривой ошибок. Материал статьи будет полезен специалистам широкого профиля, интересующимся применением современных методов анализа и интерпретации экспериментальных данных.
$11. Abdollahi-Arpanahi R., Morota G., Peñagaricano F. Predicting bull fertility using genomic data and biological information. J. Dairy Sci. 2017, 100(1): 9656-9666.
$12. Adamczyk K., Zaborski D., Grzesiak W., Makulska J., Jagusiak W. Recognition of culling reasons in Polish dairy cows using data mining methods. Computers and electronics in agriculture. 2016, 127: 26-37.
$13. Aguilar I., Misztal I., Tsuruta S., Wiggans G.R., Lawlor T.J. Multiple trait genomic evaluation of conception rate in Holsteins. J. Dairy Sci. 2011, 94(5): 2621-2624.
$14. Alonso J., Villa A., Bahamonde A. Improved estimation of bovine weight trajectories using Support Vector Machine Classification. Computers and electronics in agriculture. 2015, 110: 36-41.
$15. Amrineab D.E., Whiteb B.J., Larsonb R.L. Comparison of classification algorithms to predict outcomes of feedlot cattle identified and treated for bovine respiratory disease. Computers and electronics in agriculture. 2014, 105: 9-19.
$16. Ankinakattea S., Norberga E., Løvendahla P., Edwardsa D., Højsgaardb S. Predicting mastitis in dairy cows using neural networks and generalized additive models: A comparison. Computers and electronics in agriculture. 2013, 99: 1-6.
$17. Borchers M.R., Chang Y.M., Proudfoot K.L., Wadsworth B.A., Stone A.E., Bewley J.M. Machine-learning-based calving prediction from activity, lying, and ruminating behaviors in dairy cattle. J. Dairy Sci. 2017, 100(7): 5664-5674.
$18. Brügemann K., Gernand E., von Borstel U.U., König S. Genetic analyses of protein yield in dairy cows applying random regression models with time-dependent and temperature x humidity-dependent covariate. J. Dairy Sci. 2011, 94(8): 4129-4139.
$19. Cao K.L., Rossouw D., Robert-granié C. A sparse pls for variable selection when integrating omics data. Statistical Applications in Genetics and Molecular Biology. 2008, 7(1): 35.
$110. Cherepanov G.G., Kharitonov E.L., Makar Z.N., Mikhal’skii A.I., Novosel’tseva Zh.A. [An analysis of possible approaches to overcome the antagonism between the level of productivity and the viability of the breeding stock by using intensive technologies]. Problemy biologii productivnykh zhivotnykh - Problems of Productive Animal Biology. 2017, 1: 5-27. (In Russian)
$111. Chun H., Keleş S. Expression quantitative trait loci mapping with multivariate sparse partial least squares regression. Genetics. 2009, 182: 79-90.
$112. Colombani C., Legarra A., Fritz S., Guillaume F., Croiseau P., Ducrocq V., Robert-Granié C. Application of Bayesian least absolute shrinkage and selection operator (LASSO) and BayesCπ methods for genomic selection in French Holstein and Montbéliarde breeds. J. Dairy Sci. 2013, 96(1): 575-591.
$113. Craninx M., Fievez V., Vlaeminck B., De Baets B. Artificial neural network models of the rumen fermentation pattern in dairy cattle. Computers and electronics in agriculture. 2008, 60(2): 226-238.
$114. De Sousa R.V., da Silva Rodrigues A.V., de Abreu M.G., Tabile R.A.,Martello L.S. Predictive model based on artificial neural network for assessing beef cattle thermal stress using weather and physiological variables. Computers and electronics in agriculture. 2018, 144: 37-43.
$115. Dhakal K., Tiezzi F., Clay J.S., Maltecca C. Inferring causal relationships between reproductive and metabolic health disorders and production traits in first-lactation US Holsteins using recursive models. J. Dairy Sci. 2015, 98(4): 2713-2726.
$116. Dutta R., Smith D., Rawnsley R., Bishop-Hurley G., Hills J., Timms G., Heanry D. Dynamic cattle behavioural classification using supervised ensemble classifiers. Computers and electronics in agriculture. 2015, 111: 18-28.
$117. Ferragina A., de los Campos G., VazquezA.I., Cecchinato A., Bittante G. Bayesian regression models outperform partial least squares methods for predicting milk components and technological properties using infrared spectral data. J. Dairy Sci. 2015, 98(11): 8133-8151.
$118. Fisher R.A. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 1936, 7: 179-188.
$119. Flores H., Meneses C., Villalobos J.R., Sanchez O. Improvement of feedlot operations through statistical learning and business analytics tools. Computers and electronics in agriculture. 2017, 143: 273-285.
$120. Friedman J.H. Multivariate adaptive regression splines. The Annals of Statistics. 1991, 19 (1): 1-141.
$121. Gianola D., van Kaam J.B. Reproducing kernel hilbert spaces regression methods for genomic assisted prediction of quantitative traits. Genetics. 2008, 178(4): 2289-2303.
$122. Gianola, D. Priors in whole-genome regression: The Bayesian alphabet returns. Genetics. 2013, 194: 573-596.
$123. González L.A., Bishop-Hurley G.J., Handcock R.N., Crossman C. Behavioral classification of data from collars containing motion sensors in grazing cattle. Computers and electronics in agriculture. 2015, 110: 91-102.
$124. Grzesiak W., Błaszczyk P., Lacroix R. Methods of predicting milk yield in dairy cows –Predictive capabilities of Wood's lactation curve and artificial neural networks (ANNs). Computers and electronics in agriculture. 2006, 54(2): 69-83.
$125. Grzesiak W., Zaborski D., Sablik P., Żukiewicz A., Dybus A., Szatkowska I. Detection of cows with insemination problems using selected classification models. Computers and electronics in agriculture. 2010, 74(2): 265-273.
$126. Habier D., Fernando R.L., Kizilkaya K and Garrick D.J. Extension of the bayesian alphabet for genomic selection. BMC Bioinformatics. 2011, 12: 186.
$127. Hastie T., Tibshirani R., Friedman J. The Elements of Statistical Learning. Data Mining, Inference, and Prediction. Springer Series in Statistics. 2016, 764 pp.
$128. Hempstalk K., McParland S., Berry D.P. Machine learning algorithms for the prediction of conception success to a given insemination in lactating dairy cows. J. Dairy Sci. 2015, 98(8): 5262-5273.
$129. Jiménez-Montero J.A., González-Recio O., Alenda R. Comparison of methods for the implementation of genome-assisted evaluation of Spanish dairy cattle. J. Dairy Sci. 2013, 96(1): 625-634.
$130. Kamphuis C., Mollenhorst H., Feelders A., Pietersma D., Hogeveen H. Decision-tree induction to detect clinical mastitis with automatic milking. Computers and electronics in agriculture. 2010, 70(1): 60-68.
$131. Kuznetsov V.M. [BLUP genetic assessment of dairy cattle]. Zootekhniya - Zootechnics. 1995, 11: 8-15. (In Russian).
$132. Kuznetsov V.M. [In silico study of expanded reproduction in a closed dairy cattle breeding]. Problemy biologii productivnykh zhivotnykh - Problems of Productive Animal Biology, 2018, 3: 54-86
$133. Li B., Zhang N., Wang Y.-G., George A.W., Reverter A., Li Y. Genomic prediction of breeding values using a subset of SNPs identified by three machine learning methods. Frontiers in Genetics. 2018, 9: 237. doi: 10.3389/fgene.2018.00237.
$134. Li W., Ji Z., Wang L., Sun C., Yang X. Automatic individual identification of Holstein dairy cows using tailhead images. Computers and electronics in agriculture. 2017, 142(B): 622-631.
$135. Lourenco D.A.L., Misztal I., Tsuruta S., Aguilar I., Ezra E., Ron M., Shirak A., Weller J.I. Methods for genomic evaluation of a relatively small genotyped dairy population and effect of genotyped cow information in multiparity analyses. J. Dairy Sci. 2014, 97(3): 1742-1752.
$136. Ma P., Lund M.S., Nielsen U.S., Aamand G.P., Su G. Single-step genomic model improved reliability and reduced the bias of genomic predictions in Danish Jersey. J. Dairy Sci. 2015, 98(12): 9026-9034.
$137. Manafiazar G.G., McFadden T.T., Goonewardene L.L., Okine E.E., Basarab J.J., Li P.P., Wang Z. Z.. Prediction of residual feed intake for first-lactation dairy cows using orthogonal polynomial random regression. J. Dairy Sci. 2013, 96(12): 7991-8001.
$138. McQueen R.J., Garner S.R., Nevill-Manning C.G., Witten I.H. Applying machine learning to agricultural data. Computers and electronics in agriculture. 1995, 12(4): 275-293.
$139. Merkov A.B. Raspoznavanie obrazov. Vvedenie v metody mashinnogo obucheniya (Pattern recognition. Introduction to statistical learning methods). Moscow: Editorial URSS, 2011, 250 p. (In Russian)
$140. Merkov A.B. Raspoznavanie obrazov. Obuchenie i postroenie stokhasticheskikh modelei [Pattern recognition. Training and building stochastic models]. Moscow: LENAND, 2014, 240 p. (In Russian).
$141. Meuwissen T.H.E., Hayes B.J., Goddard M.E.: Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001, 157(4): 1819-1829.
$142. Mikhailenko I.M. [Life cycle management of lactating cows on the basis of probabilistic-statistical and dynamic models]. Sel’skokhozyaistvennaya biologiya - Agricultural Вiology. 2015, 50(4): 467-475. (In Russian).
$143. Mikhalskii A.I., Novoseltseva Zh.A. [Application of machine learning methods in solving problems of productive animal husbandry]. Problemy biologii productivnykh zhivotnykh - Problems of Productive Animal Biology, 2018, 4: 98-109. (In Russian).
$144. Mitchell R.S., Sherlock R.A., Smith L.A. An investigation into the use of machine learning for determining oestrus in cows. Computers and electronics in agriculture. 1996, 15(3): 195-213.
$145. Nadimi E.S., .Jørgensen R.N., Blanes-Vidal V., Christensen S. Monitoring and classifying animal behavior using ZigBee-based mobile ad hoc wireless sensor networks and artificial neural networks. Computers and electronics in agriculture. 2012, 82: 44-54.
$146. Pérez-Elizalde S., Cuevas J., Pérez-Rodríguez P. and Crossa J. Selection of the Bandwidth Parameter in a Bayesian Kernel Regression Model for Genomic-Enabled Prediction. Journal of Agricultural, Biological, and Environmental Statistics. 2015, 20(4): 512–532.
$147. Pietersma D., Lacroix R., Lefebvre R.D., Mwade K. Induction and evaluation of decision trees for lactation curve analysis. Computers and electronics in agriculture. 2003, 38(1): 19-32.
$148. Pintus M.A., Gaspa G., Nicolazzi E.L., Vicario D., Rossoni A., Ajmone-Marsan P., Nardone A., Dimauro C., Macciotta N.P.P. Prediction of genomic breeding values for dairy traits in Italian Brown and Simmental bulls using a principal component approach. J. Dairy Sci. 2012, 95(6): 3390-3400.
$149. Pinzón-Sánchez C., Cabrera V.E., Ruegg P.L. Decision tree analysis of treatment strategies for mild and moderate cases of clinical mastitis occurring in early lactation. J. Dairy Sci. 2011, 94(4): 1873-1892.
$150. Pryce J.E., Arias J., Bowman P.J., Davis S.R., Macdonald K.A., Waghorn G.C., Wales W.J., Williams Y.J., Spelman R.J., Hayes B.J. Accuracy of genomic predictions of residual feed intake and 250-day body weight in growing heifers using 625,000 single nucleotide polymorphism markers J. Dairy Sci. 2012, 95(4): 2108-2119.
$151. Rodriguez J.J., Kuncheva L.I., and Alonso C.J. Rotation forest: A new classifier ensemble method. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28: 1619–1630.
$152. Salehi F., Lacroix R., Wade K.M. Development of neuro-fuzzifiers for qualitative analyses of milk yield. Computers and electronics in agriculture. 2000, 28(3): 171-186.
$153. Sanzogni L., Kerr D. Milk production estimates using feed forward artificial neural networks. Computers and electronics in agriculture. 2001, 32(1): 21-30.
$154. Sasaki O., Aihara M., Nishiura A., Takeda H., Satoh M. Genetic analysis of the cumulative pseudo-survival rate during lactation of Holstein cattle in Japan by using random regression models. J. Dairy Sci. 2015, 98(8): 5781-5795.
$155. Savegnago R.P., Rosa G.J.M., Valente B.D., Herrera L.G.G., Carneiro R.L.R., Sesana R.C., Faro L.E., Munari D.P. Estimates of genetic parameters and eigenvector indices for milk production of Holstein cows. J. Dairy Sci. 2013, 96(11): 7284-7293.
$156. Shahinfar S., Page D., Guenther J., Cabrera V., Fricke P., Weigel K. Prediction of insemination outcomes in Holstein dairy cattle using alternative machine learning algorithms. J. Dairy Sci. 2014, 97(2): 731-742.
$157. Shalev-Shwartz S., Ben-David S. Understanding machine learning: from theory to algorithms. CambridgeUniversity Press, 2014, 449 p.
$158. Steeneveld W., van der Gaag L.C., Barkema H.W., Hogeveen H. Simplify the interpretation of alert lists for clinical mastitis in automatic milking systems. Computers and electronics in agriculture. 2010, 71(1): 50-56.
$159. van Pelt M.L., Meuwissen T.H.E., de Jong G., Veerkamp R.F. Genetic analysis of longevity in Dutch dairy cattle using random regression. J. Dairy Sci. 2015, 98(6): 4117-4130.
$160. Vapnik V.N. Vosstanovlenie zavisimostei na osnove empiricheskikh dannykh [Dependencies reconstruction based on empirical data], Мoscow: Nauka, 1979, 449 p. (in Russian).
$161. Vapnik V.N. The nature of statistical learning theory. Springer, 2000. 311 p.
$162. Viugin V.V. Matematicheskie osnovy teorii mashinnogo obucheniya i prognozirovaniya (Mathematical foundations of machine learning and prediction theory). Moscow: MCNMO, 2013, 390 p. (in Russian)
$163. Yao C., Spurlock D.M., Armentano L.E., Page Jr C.D., VandeHaar M.J., Bickhart D.M., Weigel K.A. Random Forests approach for identifying additive and epistatic single nucleotide polymorphisms associated with residual feed intake in dairy cattle. J. Dairy Sci. 2013, 96(10): 6716-6729.
$164. Zhang F., Murphy M.D., Shalloo L., Ruelle E., Upton J. An automatic model configuration and optimization system for milk production forecasting. Computers and electronics in agriculture. 2016, 128: 100-111.