Follow
Haiyang Xu
Haiyang Xu
Alibaba Group, DIDI AI LABS, SEU
Verified email at seu.edu.cn - Homepage
Title
Cited by
Cited by
Year
mPLUG-Owl: Modularization empowers large language models with multimodality
Q Ye, H Xu, G Xu, J Ye, M Yan, Y Zhou, J Wang, A Hu, P Shi, Y Shi, C Li, ...
arXiv preprint arXiv:2304.14178, 2023
3452023
Learning alignment for multimodal emotion recognition from speech
H Xu, H Zhang, K Han, Y Wang, Y Peng, X Li
InterSpeech 2019, 2019
1552019
E2E-VLP: End-to-End Vision-Language Pre-training Enhanced by Visual Learning
H Xu, M Yan, C Li, B Bi, S Huang, W Xiao, F Huang
ACL 2021, Oral, 2021
1002021
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections
C Li, H Xu, J Tian, W Wang, M Yan, ...
EMNLP2022, 2022
85*2022
Neural Topic Modeling with Bidirectional Adversarial Training
R Wang, X Hu, D Zhou, Y He, Y Xiong, C Ye, H Xu
ACL 2020, 2020
782020
mPLUG-2: A modularized multi-modal foundation model across text, image and video
H Xu, Q Ye, M Yan, Y Shi, J Ye, Y Xu, C Li
ICML2023 3, 2023
622023
mPLUG-Owl2: Revolutionizing multi-modal large language model with modality collaboration
Q Ye, H Xu, J Ye, M Yan, H Liu, Q Qian, J Zhang, F Huang, J Zhou
CVPR2024, 2023
532023
Hitea: Hierarchical temporal-aware video-language pre-training
Q Ye, G Xu, M Yan, H Xu, Q Qian, J Zhang, F Huang
ICCV2023, 2022
412022
Evaluation and analysis of hallucination in large vision-language models
J Wang, Y Zhou, G Xu, P Shi, C Zhao, H Xu, Q Ye, M Yan, J Zhang, J Zhu, ...
arXiv preprint arXiv:2308.15126, 2023
342023
mPLUG-DocOwl: Modularized multimodal large language model for document understanding
J Ye, A Hu, H Xu, Q Ye, M Yan, Y Dan, C Zhao, G Xu, C Li, J Tian, Q Qi, ...
arXiv preprint arXiv:2307.02499, 2023
342023
An unsupervised Bayesian modelling approach for storyline detection on news articles
D Zhou, H Xu, Y He
EMNLP 2015, 1943-1948, 2015
292015
Bin Bi, Jiabo Ye, Hehong Chen, Guohai Xu, Zheng Cao, et al. mplug: Effective and efficient vision-language learning by cross-modal skip-connections
C Li, H Xu, J Tian, W Wang, M Yan
arXiv preprint arXiv:2205.12005 1 (2), 2022
272022
Unsupervised Storyline Extraction from News Articles.
D Zhou, H Xu, XY Dai, Y He
IJCAI 2016, 3014-3021, 2016
252016
Semvlp: Vision-language pre-training by aligning semantics at multiple levels
C Li, M Yan, H Xu, F Luo, W Wang
arXiv preprint arXiv:2103.07829 3, 2021
232021
UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
J Ye, A Hu, H Xu, Q Ye, M Yan, G Xu, C Li, J Tian, Q Qian, J Zhang, Q Jin, ...
EMNLP2023, 2023
192023
EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching
Y Shi, X Yang, H Xu, C Yuan, B Li, W Hu, ZJ Zha
CVPR2022, 2021
172021
Bin Bi, Jiabo Ye, Hehong Chen, Guohai Xu, Zheng Cao, et al. 2022. mplug: Effective and efficient vision-language learning by cross-modal skip-connections
C Li, H Xu, J Tian, W Wang, M Yan
arXiv preprint arXiv:2205.12005, 2022
142022
Bin Bi, Qi Qian, Wei Wang, Guohai Xu, Ji Zhang, Songfang Huang, Fei Huang, and Jingren Zhou. mplug-2: A modularized multi-modal foundation model across text, image and video
H Xu, Q Ye, M Yan, Y Shi, J Ye, Y Xu, C Li
International Conference on Machine Learning, ICML, 23-29, 2023
122023
An llm-free multi-dimensional benchmark for mllms hallucination evaluation
J Wang, Y Wang, G Xu, J Zhang, Y Gu, H Jia, H Xu, M Yan, J Zhang, ...
arXiv preprint arXiv:2311.07397, 2023
112023
ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models
C Li, H Chen, M Yan, W Shen, H Xu, Z Wu, Z Zhang, W Zhou, Y Chen, ...
EMNLP2023, 2023
102023
The system can't perform the operation now. Try again later.
Articles 1–20