代表性学术论文 (Publications)
                                    
                                    
                                        *通讯作者 ▽共同一作 (For full publication list, please refer to my homepage)
N. Lin▽, T. Ohkawa▽, Y. Huang*, M. Zhang, M. Li, M. Cai*, R. Furuta, and Y. Sato, "SiMHand: Mining Similar Hands for Large-Scale 3D Hand Pose Pre-training," International Conference on Learning Representations (ICLR), 2025.
H. Huang, H. Yu, D. Liu, H. Chen*, and M. Cai*, "Egocentric Speaker Diarization with Vision-Guided Clustering and Adaptive Speech Re-detection," International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025.
M. Cai, J. Kezierbieke, X. Zhong, and H. Chen, "Uncertainty-Aware and Class-Balanced Domain Adaptation for Object Detection in Driving Scenes," IEEE Transactions on Intelligent Transportation Systems (T-ITS), 2024. (SCI中科院1区, IF: 8.5)
X. Ren, J. Luo, X. Zhong, and M. Cai*, "Emotion-aware audio-driven face animation via contrastive feature disentanglement," INTERSPEECH (国际语音会议), 2023.
G. Duan, Y. Fu, M. Cai, H. Chen, and J. Sun, "Dongting: a large-scale dataset for anomaly detection of the Linux kernel," The Journal of Systems & Software (JSS), 2023. (SCI中科院2区, IF: 3.5)
C. Xue, X. Zhong, M. Cai*, H. Chen, and W. Wang, "Audio-visual event localization by learning spatial and semantic co-attention," IEEE Transactions on Multimedia (TMM), vol. 25, pp. 418–429, 2023. (SCI中科院1区, IF: 7.3)
H. Yu▽, M. Cai▽, Y. Liu, and F. Lu, "First and third-person video co-analysis by learning spatial-temporal joint attention," IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 45, no. 6, pp. 6631–6646, 2023. (CCF A & SCI中科院1区,IF: 17.861)
M. Cai, F. Lu, and Y. Sato, "Generalizing hand segmentation in egocentric videos with uncertainty-guided model adaptation," IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020. (CCF A, 接收率: 22%)
Y. Huang, M. Cai*, Z. Li, F. Lu, and Y. Sato, "Mutual context network for jointly estimating egocentric gaze and actions," IEEE Transactions on Image Processing (TIP), DOI:10.1109/TIP.2020.3007841, 2020. (CCF A & SCI中科院1区, IF: 6.79)
Y. Huang, M. Cai*, and Y. Sato, "An ego-vision system for discovering human joint attention," IEEE Transactions on Human-Machine Systems (THMS), DOI:10.1109/THMS.2020.2965429, 2020. (CCF B, IF: 3.332)
H. Yu▽, M. Cai▽, Y. Liu, and F. Lu, "What I see is what you see: joint attention learning for first and third person video co-analysis," ACM International Conference on Multimedia (MM), 2019. (CCF A,接收率: 26.8%)
Y. Huang, M. Cai*, Z. Li, and Y. Sato, "Predicting gaze in egocentric videos by learning task-dependent attention transition," European Conference on Computer Vision (ECCV), 2018. (CCF B,接收率: 2.4% [oral])
M. Cai, F. Lu, and Y. Gao, "Desktop action recognition from first-person point-of-view," IEEE Transactions on Cybernetics (TCYB), DOI:10.1109/TCYB.2018.2806381, 2018. (SCI中科院1区,IF: 8.803)
M. Cai, K. Kitani, and Y. Sato, "An ego-vision system for hand grasp analysis," IEEE Transactions on Human-Machine Systems (THMS), vol. 47, no. 4, pp. 524–535, 2017. (CCF B, IF: 2.563)
M. Cai, K. Kitani, and Y. Sato, "Understanding hand-object manipulation with grasp types and object attributes," Robotics: Science and Systems (RSS), 2016. (机器人领域顶级会议, 接收率: 20%)
M. Cai, K. Kitani, and Y. Sato, "A scalable approach for understanding the visual structures of hand grasps," IEEE International Conference on Robotics and Automation (ICRA), 2015. (CCF B)
                                     
                                    
                                        科研课题 (Funding)
                                    
                                    
                                        纵向课题(funded by governments):
2024年--2027年,“面向具身智能的第一人称交互行为感知与理解”,国家自然科学基金面上项目(主持)
2022年--2024年,“第一人称视频的分析与理解”,湖南省优秀青年科学基金项目(主持)
2020年--2022年,“基于第一人称视频的手部操作活动自动分析理解关键技术研究”,国家自然科学基金青年项目(主持)
2020年--2022年,“可扩展的日常手对象交互视频分析与理解方法研究”,湖南省青年科学基金项目(主持)
2020年--2021年,“基于第一人称视频的眼动注视建模与应用研究”,国家重点实验室开放课题(主持)
2018年--2022年,“基于注意力转移机制的第一人称视频显著性研究及应用”,中央高校科研启动费(主持)
横向课题(funded by corporations):
2020.4--2020.12,“基于移动相机的用户眼动注视建模与应用研究”,百度开放主题研究项目(主持)
                                     
                                    
                                        授权发明专利 (Patents)
                                    
                                    
                                        讲授课程 (Teaching)
                                    
                                    
                                        学生培养 (Students)
                                    
                                    
                                        研究生(Graduate students):
Ke Zhang 张轲(PhD, 2021~)
Fei Chen 陈飞(PhD, 2024~)
Feiyi Huang 黄非毅(PhD, 2024~)[共同指导]
He Huang 黄河 (Master, 2022~)
Jinwen Liu 刘婧雯 (Master, 2022~)
Shiming Chen 陈诗铭 (Master, 2022~)
Qi Jin 金琦(Master, 2023~)
Yaru Zhao 赵娅汝(Master, 2024~)
Fukun Chen 陈付坤(Master, 2024~)
Yijie Huang 黄艺杰(Master, 2024~)
【欢迎编程能力强且对计算机视觉领域的研究感兴趣的同学报考】(目前团队有若干硕士名额,欢迎邮件联系)
1. 本实验室目前兼顾学术研究,以及经济行业的信息化、智能化应用研究
2. 根据实际经费情况会尽可能对学生进行研究补助
3. 对于发表CCF B类以上会议论文的学生将资助参加会议,对于发表SCI二区以上期刊论文的学生将进行物质奖励
4. 对于满足第三条的优秀学生,可以考虑推荐到东京大学或北京航空航天大学具有紧密合作关系的课题组读博
已毕业学生(Alumni):
2024届:贾那热斯·克孜尔别克(中国工商银行-北京)、温平平(美团-北京)
2023届:任新(中兴-武汉)、戴艺晨(读博)
2022届:余冶(华为-杭州)、罗敏怡(国家电网-衡阳)、薛冰洁(中国联通-郑州)、薛程[共同指导](阿里巴巴-杭州)
本科毕业设计(Undergraduate thesis):
欢迎对计算机视觉感兴趣的同学报名。邮件联系时请附上简历,并说明毕业后的去向或计划。
                                     
                                    
                                        学术服务 (Academic Service)