基于高分辨率网络的大熊猫姿态估计方法

doi:10.16829/j.slxb.150639

摘要/Abstract

摘要：

对圈养大熊猫 (Ailuropoda melanoleuca) 开展长期行为监测能及时了解其所处生理周期和健康状况，有助于繁殖饲养机构迅速采取相应繁育保护措施提高饲养管理水平，但目前无法对大熊猫进行24 h监控并及时地获得相应的行为信息。准确的动物姿态估计是动物行为研究的关键，也是诸多下游应用的基础。了解大熊猫的姿态可以促进大熊猫行为研究并提升保护管理水平。为了提高复杂环境下大熊猫姿态估计的准确率，本文以高分辨率网络 (High resolution net, HRNet) 为基础网络架构提出了一种大熊猫姿态估计方法：针对大熊猫不同部位尺度差异较大的问题，在HRNet-32中引入了空洞空间金字塔池化 (Atrous spatial pyramid pooling, ASPP) 模块，在提升特征感受野的同时捕获多尺度信息；同时对大熊猫身体关键点进行分组，引入基于部位的多分支结构来学习特定于每个部位组的表征。多次对比实验结果表明本文所用模型具有较高的检测精度：在PCK@0.05中所用模型精度达到了81.51%。本文提出的方法可为大熊猫的行为分析和健康评估提供技术支撑。

关键词: 大熊猫, 姿态估计, 图像分析, 深度学习

Abstract:

Long-term behavioral monitoring of captive giant pandas (Ailuropoda melanoleuca) can help animal managers better understand the panda’s physiological cycle and health status in a timely manner, and help breeding facilities quickly take corresponding husbandry actions to improve breeding management. At present, neither animal managers nor scientists can monitor giant pandas 24 hours a day and obtain corresponding behavioral information on time. Accurate animal pose estimation is an important factor in animal behavior research and is also the basis for many downstream applications. Understanding the pose of giant pandas can greatly promote the research of panda behavior and improve its conservation and management. In order to improve the accuracy of giant panda pose estimation in complex environments, this paper proposed a pose estimation method based on the high-resolution network HRNet-32. To solve the problem of large-scale differences in different parts of the giant pandas, an atrous spatial pyramid pooling module was introduced in HRNet-32, which used dilated convolution with different dilated rates to form a similar pyramid form, so as to capture multi-scale information while enhancing the feature’s receptive field. Meanwhile, the giant panda pose estimation was regarded as a homogeneous multi-task learning problem, the joint points of the giant panda were grouped, and the part-based multi-branch structure was introduced to learn the representations specific to each part group. The results of several comparison experiments show that the model proposed in this paper, PCK@0.05, had a high detection accuracy (81.51%). The method proposed in this paper can provide technical support for the behavioral analysis and health assessment of giant pandas.

Key words: Giant panda, Posture estimation, Image analysis, Deep learning

中图分类号:

Q958.1

漆愚, 苏菡, 侯蓉, 刘鹏, 陈鹏, 臧航行, 张志和. 基于高分辨率网络的大熊猫姿态估计方法[J]. 兽类学报, 2022, 42(4): 451-460.

Yu QI, Han SU, Rong HOU, Peng LIU, Peng CHEN, Hangxing ZANG, Zhihe ZHANG. Giant panda pose estimation method based on high resolution net[J]. ACTA THERIOLOGICA SINICA, 2022, 42(4): 451-460.

图/表 10

图1 大熊猫视频分帧图像. a ~ c：可用数据样例；d ~ f：不可用数据样例

Fig.1 Diagram of video framed image of giant panda. a - c: available data samples; d - f: unavailable data samples

图2 大熊猫姿态关键点标记. 1：右耳；2：左耳；3：鼻子；4：脖子；5：腰背部；6：臀部；7：右肩；8：右肘；9：右前爪；10：左肩；11：左肘；12：左前爪；13：右臀；14：右膝；15：右后爪；16：左臀；17：左膝；18：左后爪；19：大熊猫目标框

Fig.2 Diagram of the joint points of the giant panda. 1: right ear; 2: left ear; 3: nose; 4: neck; 5: back; 6: hip; 7: right shoulder; 8: right elbow; 9: right front paw; 10: left shoulder; 11: left elbow; 12: left front paw; 13: right hip; 14: right knee; 15: right hind paw; 16: left hip; 17: left knee; 18: left hind paw; 19: the giant panda target box

图3 大熊猫姿态估计总体架构图. 第一阶段为共享特征表示，第二阶段为多分支结构学习特定的高级特征表示

Fig.3 The proposed giant panda pose estimation framework. The first stage is shared feature representation, the second stage is multi-branched structures for learning specific high-level feature representations

图4 HRNet网络结构

Fig.4 HRNet network structure (Sun et al., 2019)

图5 ASPP模块示意图

Fig.5 Diagram of ASPP module (Chen et al., 2017)

图6 大熊猫关节点分组. 通过虚线框将大熊猫关节点分为5组，同一组的关节点颜色相同

Fig.6 Diagram of the grouping of giant panda joints. The giant panda joint points are divided into 5 groups by the dotted frame, and the joint points of the same group have the same color

表1 大熊猫姿态估计不同模型结果比较

Table 1 Comparison results of pose estimation of giant panda

方法 Methods	PCK@0.05 Accuracy For the Panda Dataset （%）
方法 Methods	耳 Ear	鼻 Nose	躯干Trunk	腿部Legs	平均Mean
8-Stack HG	93.99	97.50	68.22	71.84	74.12
Simple Baseline	97.39	98.18	74.39	78.10	80.00
HRNet32	96.97	98.75	75.63	78.72	80.31
This study	98.38	98.18	75.84	79.84	81.51

图7 大熊猫姿态估计预测示例图. 前三列为对比模型预测的大熊猫姿态估计，第四列为本文所用模型的预测结果，最后一列为姿态估计真实值

Fig.7 Example image of giant panda pose estimation prediction. The first three columns are the giant panda pose estimates predicted by the comparison model, the fourth column is the prediction result of the model proposed in this paper, and the last column is the true value of the pose estimation

表2 大熊猫姿态估计消融实验结果

Table 2 Results of ablation experiment for giant panda pose estimation

方法 Methods	PCK@0.05 Accuracy For the Panda Dataset （%）
方法 Methods	耳 Ear	鼻Nose	躯干 Trunk	腿部 Legs	平均 Mean
HRNet32	96.97	98.75	75.63	78.72	80.31
HRNet32 + Multi‑Branches	97.34	98.86	74.39	79.58	80.75
HRNet32 + ASPP + MultiBranches	98.38	98.18	75.84	79.84	81.51

图8 本研究模型的大熊猫姿态估计. a：拍摄角度良好，遮挡较小时的模型预测结果；b：周遭环境较暗，自遮挡严重时的模型预测结果

Fig.8 The giant panda pose estimation of this study model. a: The prediction result of the model with good shooting Angle and small occlusion; b: The prediction result of the model with dark surrounding environment and serious self-occlusion

参考文献 0

	Cao J K, Tang H Y, Fang H S, Shen X Y, Tai Y W, Lu C W. 2019. Cross-domain adaptation for animal pose estimation. 2019 IEEE/CVF International Conference on Computer Vision (ICCV). DOI:10.1109/iccv.2019.00959 .
	Cao Z, Simon T, Wei S E, Sheikh Y. 2017. Realtime multi-person 2d pose estimation using part affinity fields. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). DOI: 10.1109/cvpr.2017.143 .
	Caruana R. 1997. Multitask learning. Machine Learning, 28 (1): 41-75.
	Chen L C, Papandreou G, Schroff F, Adam H. 2017. Rethinking atrous convolution for semantic Image segmentation. arXiv preprint arXiv: .
	Chen X L, Zhu Y, Li Y D, Zhang G Q. 2016. The influence of human activities on the behavior of captive giant panda in summer at Mount Emei. Sichuan Journal of Zoology, 35 (5): 680-685. (in Chinese)
	Chen Y K, Song Y N, He J J, Xu R H, Huang X B. 2019. Animal pose estimation and state assessment based on deep learning. Electronics World, (5): 47-48. (in Chinese)
	Chen Y L, Wang Z C, Peng Y X, Zhang Z Q, Yu G, Sun J. 2018. Cascaded pyramid network for multi-person pose estimation. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. DOI: 10.1109/cvpr.2018.00742 .
	Fang H S, Xie S Q, Tai Y W, Lu C W. 2017. Rmpe: Regional multi-person pose estimation. 2017 IEEE International Conference on Computer Vision (ICCV). DOI: 10.1109/iccv.2017.256 .
	Feng L Q, Zhao Y Q, Sun Y C, Zhao W X, Tang J X. 2021. Action recognition using a spatial-temporal network for wild felines. Animals, 11 (2): 485.
	He K M, Zhang X Y, Ren S Q, Sun J. 2016. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). DOI: 10.1109/cvpr.2016.90 .
	He Q, Zhao Q J, Liu N, Chen P, Zhang Z H, Hou R. 2019. Distinguishing individual red pandas from their faces. Chinese Conference on Pattern Recognition and Computer Vision (PRCV). DOI: 10.1007/978-3-030-31723-2_61 .
	Hu P Y, Ramanan D. 2016. Bottom-up and top-down reasoning with hierarchical rectified gaussians. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). DOI: 10.1109/cvpr.2016.604 .
	Koolhaas J M, Van Reenen C G. 2016. Animal behavior and well-being symposium: Interaction between coping style/personality, stress, and welfare: Relevance for domestic farm animals. Journal of Animal Science, 94 (6): 2284-2296.
	Li C W, Wang C D, Yan B Y, Chen S J, Wu H L, Huang X Y, Jin S Y, Huang S, Chen L, Li D S. 2012. The determination of blood physiological indices for the giant panda. Acta Theriologica Sinica, 32 (3): 266-270. (in Chinese)
	Li C, Lee G H. 2021. From synthetic to real: Unsupervised domain adaptation for animal pose estimation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
	Li X Y, Cai C, Zhang R F, Ju L, He J R. 2019. Deep cascaded convolutional models for cattle pose estimation. Computers and Electronics in Agriculture, 164: 104885.
	Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick C L. 2014. Microsoft coco: Common objects in context. European Conference on Computer Vision.
	Liu J, Chen Y, Guo L R, Gu B, Liu H, Hou A Y, Liu X F, Sun L X, Liu D Z. 2006. Stereotypic behavior and fecal cortisol level in captive giant pandas in relation to environmental enrichment. Zoo Biology, 25 (6): 445-459.
	Liu N, Zhao Q J, Zhang N, Cheng X H, Zhu J N. 2019. Pose‑guided complementary features learning for amur tiger re-identification. Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops.
	Mu J T, Qiu W C, Hager G D, Yuille A L. 2020. Learning from synthetic animals. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). DOI: 10.1109/cvpr42600.2020.01240 .
	Newell A, Yang K Y, Deng J. 2016. Stacked hourglass networks for human pose estimation. European Conference on Computer Vision. DOI: 10.1007/978-3-319-46484-8_29 .
	Olivas E S, Guerrero J D M, Martinez-Sober M, Magdalena-Benedito J R, Serrano L, eds. 2019. Handbook of research on machine learning applications and trends: Algorithms, methods, and techniques: Algorithms, methods, and techniques. IGI Global.
	Pishchulin L, Andriluka M, Gehler P, Schiele B. 2013. Poselet conditioned pictorial structures. 2013 IEEE Conference on Computer Vision and Pattern Recognition. DOI: 10.1109/cvpr.2013.82 .
	Pishchulin L, Insafutdinov E, Tang S Y, Andres B, Andriluka M, Gehler P, Schiele B. 2016. Deepcut: Joint subset partition and labeling for multi person pose estimation. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). DOI: 10.1109/cvpr.2016.533 .
	Ruder S. 2017. An overview of multi-task learning in deep neural networks. arXiv preprint arXiv: .
	Rushen J, Butterworth A, Swanson J C. 2011. Animal behavior and well-being symposium: Farm animal welfare assurance: Science and application. Journal of Animal Science, 89 (4): 1219-1228.
	Schofield D, Nagrani A, Zisserman A, Hayashi M, Matsuzawa T, Biro D, Carvalho S. 2019. Chimpanzee face recognition from videos in the wild using deep learning. Science Advances, 5 (9): eaaw0736.
	Sun K, Xiao B, Liu D, Wang J D. 2019. Deep high-resolution representation learning for human pose estimation. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). DOI: 10.1109/cvpr.2019.00584 .
	Tang W, Wu Y. 2019. Does learning specific features for related parts help human pose estimation? 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). DOI: 10.1109/cvpr.2019.00120 .
	Tian Y D, Zitnick C L, Narasimhan S G. 2012. Exploring the spatial hierarchy of mixture models for human pose estimation. European Conference on Computer Vision. DOI: 10.1007/978-3-642-33715-4_19 .
	Toshev A, Szegedy C. 2014. Deeppose: Human pose estimation via deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition.
	Wang H N, Su H, Chen P, Hou R, Zhang Z H, Xie W Y. 2019. Learning deep features for giant panda gender classification using face images. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops.
	Xiao B, Wu H P, Wei Y C. 2018. Simple baselines for human pose estimation and tracking. Proceedings of the European Conference on Computer Vision (ECCV). DOI: 10.1007/978-3-030-01231-1_29 .
	Yang W, Li S, Ouyang W L, Li H S, Wang X G. 2017. Learning feature pyramids for human pose estimation. Proceedings of the IEEE International Conference on Computer Vision. DOI: 10.1109/iccv.2017.144 .
	Yang Y, Jia J B, Zhao Y J. 2019. Effects of human factors on panda behavior in captivity. Journal of Northeast Forestry University, 47 (12): 104-106. (in Chinese)
	Yu H, Xu Y F, Zhang J, Zhao W, Guan Z Y, Tao D C. 2021. AP- 10K: A benchmark for animal pose estimation in the wild. arXiv preprint arXiv: .
	Yue L H, Li J X, Liu Q S. 2021. Body parts relevance learning via expectation–maximization for human pose estimation. Multimedia Systems, 27 (5): 927-939.
	Zhang B W, Li M, Zhang Z J, Goossens B, Zhu L F, Zhang S N, Hu J C, Bruford M W, Wei F W. 2007. Genetic viability and population history of the giant panda, putting an end to the‘Evolutionary Dead End’? Molecular Biology and Evolution, 24 (8): 1801-1810.
	Zhang F Y, Wang M L, Wang Z C. 2021. Construction of the animal skeletons keypoint detection model based on transformer and scale fusion. Transactions of the Chinese Society of Agricultural Engineering, 37 (23): 179-185. (in Chinese)
	Zhou F X, Jiang Z H, Liu Z H, Chen F, Chen L, Tong L, Yang Z L, Wang H K, Fei M R, Li L, Zhou H Y. 2021. Structured context enhancement network for mouse pose estimation. IEEE Transactions on Circuits and Systems for Video Technology. DOI:10.1109/tcsvt.2021.3098497 .
	Zhou J L, Ma G Q, Li Q S, Liu C H, Liu D D, Ye Y L, Wang X B. 2012. Exploitation on spring stereotypic behavior of giant pandas in wildlife park of high‑altitude region. Sichuan Journal of Zoology, 31 (3): 342-347. (in Chinese)
	Zhou X, Hu H P, Li T, Deng L H, Li R G. 2013. Behavior and conservation of giant panda. Green Technology, (2): 33-36. (in Chinese)
	李才武, 王承东, 严玉宝, 陈世界, 吴虹林, 黄晓宇, 金森燕, 黄山, 陈亮, 李德生. 2012. 大熊猫血液生理指标的测定. 兽类学报, 32 (3): 266-270.
	李凯年. 2012. 饲养人员行为对农场动物福利与生产性能的影响. 中国动物保健, 14 (1): 4-8.
	杨勇, 贾竞波, 赵英杰. 2019. 圈养环境中人为因素对大熊猫行为的影响. 东北林业大学学报, 47 (12): 104-106.
	陈永康, 宋亚男, 何嘉俊, 徐荣华, 黄栩滨. 2019. 基于深度学习的动物姿态估计和状态评估研究. 电子世界, (5): 47-48.
	陈艳, 王永峰, 魏荣平, 邓林华. 2019. 圈养大熊猫应激的研究进展. 畜牧兽医科技信息, (2): 15-17.
	陈绪玲, 朱英, 李裕冬, 张贵权. 2016. 人为因素对峨眉山圈养大熊猫夏季行为的影响. 四川动物, 35 (5): 680-685.
	张飞宇, 王美丽, 王正超. 2021. 引入Transformer和尺度融合的动物骨骼关键点检测模型构建. 农业工程学报, 37 (23): 179-185.
	周杰珑, 马国强, 李奇生, 刘成会, 刘丹丹, 叶研琳, 王雄波. 2012. 高海拔地区昆明野生动物园大熊猫春季刻板行为探究. 四川动物, 31 (3): 342-347.
	周晓, 胡海平, 李倜, 邓林华, 李仁贵. 2013. 大熊猫的行为和保护研究. 绿色科技, (2): 33-36.

[1]	孙萌萌, 严啸, 李凤, 唐勇, 张新星, 董超, 黄圣杰, 郑元明, 罗永, 周世强. 大熊猫咬合力和采食速率的发育及其影响因素[J]. 兽类学报, 2024, 44(2): 135-145.
[2]	周晓, 杨波, 曾文, 李果, 杨长江, 王静, 谢庆阳, 刘怀庭, 王锐, 罗波, 张明春, 黄炎. 大熊猫母兽育幼早期应激水平与母幼行为的关系[J]. 兽类学报, 2024, 44(2): 209-216.
[3]	钟俊杰, 钮冰, 陈沁, 陈翔, 王艳. 深度学习在野生动物保护中的应用[J]. 兽类学报, 2023, 43(6): 734-744.
[4]	张如梅, 张庆, 杨逍, 张发瑞, 赵定, 庞德洪, 杨孔, 官天培. 高山生态系统哺乳动物多样性时空分布格局——以大熊猫国家公园雪宝顶片区为例[J]. 兽类学报, 2023, 43(5): 533-543.
[5]	苏小艳, 杨梅, 燕霞, 侯蓉, 肖梅, 王劲松, 刘圣, 王路才, 张文平, 黄虹秀. 大熊猫国家公园广元片区野生动物及其环境源大肠杆菌流行病学及耐药性调查[J]. 兽类学报, 2023, 43(4): 430-442.
[6]	闫拯, 刘皓秋, 刘晓燕, 徐海泓, 刘彦晖, 李常青, 王伯, 刘学锋, 崔胜楠, 贾婷, 杨嫡, 张成林, 刘定震. 大熊猫粪便样本处理与激素萃取方法比较[J]. 兽类学报, 2023, 43(4): 465-471.
[7]	李明喜, 何欢柳, 张皓, 李翰, 邓陶, 陈敏, 王海瑞, 姚英. 圈养成年雄性大熊猫不同采食期维生素及矿物元素营养状况分析[J]. 兽类学报, 2023, 43(3): 333-341.
[8]	安俊辉, 李媛, 王东辉, 陈佳松, 李红艳, 梁小虎, 封同英, 蔡志刚, 侯蓉, 曾长军, 刘玉良. 大熊猫脐带间充质干细胞外泌体的分离鉴定及miRNAs富集分析[J]. 兽类学报, 2023, 43(1): 33-40.
[9]	周章玉, 侯佳萍, 刘鹏, 陈鹏, 段昶. 基于双模型融合的大熊猫头部图像分割[J]. 兽类学报, 2023, 43(1): 82-88.
[10]	周世强, 何胜山, 屈元元, 罗永, 吴代福, 黄炎, 李德生, 张和民. 野化培训大熊猫幼仔的食性转换[J]. 兽类学报, 2022, 42(6): 652-664.
[11]	王东辉, 刘玉良, 沈富军, 蔡志刚, 安俊辉, 侯蓉. 圈养大熊猫冷冻精液对其种群遗传多样性的作用[J]. 兽类学报, 2022, 42(3): 261-269.
[12]	黄文俊, 罗娌, 陈欣, 刘礼, 廖礼慧, 王小兰, 李碧, 李明喜, 陈敏, 易得娇, 李翰, 张皓, 卓贵富, 刘云健, 李莹馨, 陈奕君, 周璇, 谢跃. 重组Bt伴胞晶体蛋白Cry5B对大熊猫西氏贝蛔虫的离体杀灭活性观察[J]. 兽类学报, 2022, 42(3): 304-311.
[13]	周延山, 侯蓉, 刘家斌, 毕温磊, Jacob R. Owens, 张志和, 黄蜂, 骆伟, 齐敦武. 野化培训大熊猫的生境利用和空间格局[J]. 兽类学报, 2021, 41(6): 641-648.
[14]	王子叶, 毕温磊, 吴蔚, 余姣姣, 周延山, 侯蓉, 向左甫, 齐敦武. 圈养大熊猫催产素水平与母性行为之间关系[J]. 兽类学报, 2021, 41(6): 721-730.
[15]	苏小艳, 李林, 燕霞, 张东升, 侯蓉, 刘颂蕊. 大熊猫轮状病毒PCR检测方法的建立及应用[J]. 兽类学报, 2021, 41(3): 254-260.