Efficient average reward reinforcement learning using constant shifting values S Yang, Y Gao, B An, H Wang, X Chen Proceedings of the AAAI Conference on Artificial Intelligence 30 (1), 2016 | 18 | 2016 |
A contextual bandit approach to personalized online recommendation via sparse interactions C Zhang, H Wang, S Yang, Y Gao Pacific-Asia Conference on Knowledge Discovery and Data Mining, 394-406, 2019 | 6 | 2019 |
Incremental nonnegative matrix factorization based on matrix sketching and k-means clustering C Zhang, H Wang, S Yang, Y Gao International Conference on Intelligent Data Engineering and Automated …, 2016 | 4 | 2016 |
Contextual bandits with hidden features to online recommendation via sparse interactions S Yang, H Wang, C Zhang, Y Gao IEEE Intelligent Systems 35 (5), 62-72, 2020 | 3 | 2020 |
An optimal algorithm for the stochastic bandits with knowing near-optimal mean reward S Yang, H Wang, Y Gao, X Chen Proceedings of the 17th International Conference on Autonomous Agents and …, 2018 | 2 | 2018 |
An optimal algorithm for the stochastic bandits while knowing the near-optimal mean reward S Yang, Y Gao IEEE Transactions on Neural Networks and Learning Systems, 2020 | 1 | 2020 |