Image2point: 3d point-cloud understanding with 2d image pretrained models C Xu, S Yang, T Galanti, B Wu, X Yue, B Zhai, W Zhan, P Vajda, K Keutzer, ... European Conference on Computer Vision, 638-656, 2022 | 81 | 2022 |
Multitask vision-language prompt tuning S Shen, S Yang, T Zhang, B Zhai, JE Gonzalez, K Keutzer, T Darrell Proceedings of the IEEE/CVF Winter Conference on Applications of Computer …, 2024 | 41 | 2024 |
Squeezewave: Extremely lightweight vocoders for on-device speech synthesis B Zhai, T Gao, F Xue, D Rothchild, B Wu, JE Gonzalez, K Keutzer arXiv preprint arXiv:2001.05685, 2020 | 36 | 2020 |
Exploring the reasoning abilities of multimodal large language models (mllms): A comprehensive survey on emerging trends in multimodal reasoning Y Wang, W Chen, X Han, X Lin, H Zhao, Y Liu, B Zhai, J Yuan, Q You, ... arXiv preprint arXiv:2401.06805, 2024 | 31* | 2024 |
HallE-Switch: Controlling Object Hallucination in Large Vision Language Models B Zhai, S Yang, C Xu, S Shen, K Keutzer, M Li arXiv e-prints, arXiv: 2310.01779, 2023 | 28* | 2023 |
You only group once: Efficient point-cloud processing with token representation and relation inference module C Xu, B Zhai, B Wu, T Li, W Zhan, P Vajda, K Keutzer, M Tomizuka 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems …, 2021 | 27 | 2021 |
Integer-only zero-shot quantization for efficient speech recognition S Kim, A Gholami, Z Yao, N Lee, P Wang, A Nrusimha, B Zhai, T Gao, ... ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022 | 23 | 2022 |
Law of Vision Representation in MLLMs S Yang, B Zhai, Q You, J Yuan, H Yang, C Xu arXiv preprint arXiv:2408.16357, 2024 | 1 | 2024 |
COCO is" ALL''You Need for Visual Instruction Fine-tuning X Han, Y Wang, B Zhai, Q You, H Yang arXiv preprint arXiv:2401.08968, 2024 | 1 | 2024 |
InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models X Han, Q You, Y Liu, W Chen, H Zheng, K Mrini, X Lin, Y Wang, B Zhai, ... arXiv e-prints, arXiv: 2311.11567, 2023 | 1 | 2023 |
InfiMM: Advancing Multimodal Understanding with an Open-Sourced Visual Language Model H Liu, Q You, Y Wang, X Han, B Zhai, Y Liu, W Chen, Y Jian, Y Tao, ... Findings of the Association for Computational Linguistics ACL 2024, 485-492, 2024 | | 2024 |
InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding H Liu, Q You, X Han, Y Wang, B Zhai, Y Liu, Y Tao, H Huang, R He, ... arXiv preprint arXiv:2403.01487, 2024 | | 2024 |