Follow
Bohan Zhai
Bohan Zhai
GenAI at Snowflake, UC Berkeley
Verified email at berkeley.edu
Title
Cited by
Cited by
Year
Image2point: 3d point-cloud understanding with 2d image pretrained models
C Xu, S Yang, T Galanti, B Wu, X Yue, B Zhai, W Zhan, P Vajda, K Keutzer, ...
European Conference on Computer Vision, 638-656, 2022
812022
Multitask vision-language prompt tuning
S Shen, S Yang, T Zhang, B Zhai, JE Gonzalez, K Keutzer, T Darrell
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer …, 2024
412024
Squeezewave: Extremely lightweight vocoders for on-device speech synthesis
B Zhai, T Gao, F Xue, D Rothchild, B Wu, JE Gonzalez, K Keutzer
arXiv preprint arXiv:2001.05685, 2020
362020
Exploring the reasoning abilities of multimodal large language models (mllms): A comprehensive survey on emerging trends in multimodal reasoning
Y Wang, W Chen, X Han, X Lin, H Zhao, Y Liu, B Zhai, J Yuan, Q You, ...
arXiv preprint arXiv:2401.06805, 2024
31*2024
HallE-Switch: Controlling Object Hallucination in Large Vision Language Models
B Zhai, S Yang, C Xu, S Shen, K Keutzer, M Li
arXiv e-prints, arXiv: 2310.01779, 2023
28*2023
You only group once: Efficient point-cloud processing with token representation and relation inference module
C Xu, B Zhai, B Wu, T Li, W Zhan, P Vajda, K Keutzer, M Tomizuka
2021 IEEE/RSJ International Conference on Intelligent Robots and Systems …, 2021
272021
Integer-only zero-shot quantization for efficient speech recognition
S Kim, A Gholami, Z Yao, N Lee, P Wang, A Nrusimha, B Zhai, T Gao, ...
ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022
232022
Law of Vision Representation in MLLMs
S Yang, B Zhai, Q You, J Yuan, H Yang, C Xu
arXiv preprint arXiv:2408.16357, 2024
12024
COCO is" ALL''You Need for Visual Instruction Fine-tuning
X Han, Y Wang, B Zhai, Q You, H Yang
arXiv preprint arXiv:2401.08968, 2024
12024
InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models
X Han, Q You, Y Liu, W Chen, H Zheng, K Mrini, X Lin, Y Wang, B Zhai, ...
arXiv e-prints, arXiv: 2311.11567, 2023
12023
InfiMM: Advancing Multimodal Understanding with an Open-Sourced Visual Language Model
H Liu, Q You, Y Wang, X Han, B Zhai, Y Liu, W Chen, Y Jian, Y Tao, ...
Findings of the Association for Computational Linguistics ACL 2024, 485-492, 2024
2024
InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding
H Liu, Q You, X Han, Y Wang, B Zhai, Y Liu, Y Tao, H Huang, R He, ...
arXiv preprint arXiv:2403.01487, 2024
2024
The system can't perform the operation now. Try again later.
Articles 1–12