Nadav Merlis

Cited by

	All	Since 2019
Citations	468	463
h-index	7	7
i10-index	6	6

120

20182019202020212022202320244 23 78 102 90 103 67

Public access

View all

3 articles

0 articles

available

not available

Based on funding mandates

Co-authors

Shie MannorProfessor of Electrical Engineering @ Technion & Researcher @ NvidiaVerified email at technion.ac.il
Yonathan EfroniMeta, New YorkVerified email at fb.com
Daniel J. MankowitzGoogle DeepmindVerified email at google.com
Matan HaroushTechnionVerified email at campus.technion.ac.il
Tom ZahavyStaff Research Scientist, Google DeepMindVerified email at deepmind.com
Chen TesslerResearch Scientist, NVIDIA ResearchVerified email at nvidia.com
Guy TennenholtzResearch Scientist, Google ResearchVerified email at google.com
Vianney PerchetCrest, ENSAE & Criteo AI LabVerified email at normalesup.org
Lior ShaniGoogle ResearchVerified email at google.com
Hugo RichardCRITEO AI LabsVerified email at criteo.com
Mathieu MolinaInria - CREST ENSAEVerified email at inria.fr
Dorian BaudryENSAE, IP ParisVerified email at ensae.fr
Flore SentenacPhD Student, CRESTVerified email at ensae.fr

Nadav Merlis

Postdoctoral Fellow @ CREST, ENSAE Paris

Verified email at ensae.fr - Homepage

Reinforcement Learning Multi-Armed Bandits


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Learn what not to learn: Action elimination with deep reinforcement learning T Zahavy, M Haroush, N Merlis, DJ Mankowitz, S Mannor arXiv preprint arXiv:1809.02121, 2018	242	2018
Tight regret bounds for model-based reinforcement learning with greedy policies Y Efroni, N Merlis, M Ghavamzadeh, S Mannor Advances in Neural Information Processing Systems 32, 2019	78	2019
Reinforcement learning with trajectory feedback Y Efroni, N Merlis, S Mannor Proceedings of the AAAI conference on artificial intelligence 35 (8), 7288-7295, 2021	40	2021
Ensemble bootstrapping for q-learning O Peer, C Tessler, N Merlis, R Meir International Conference on Machine Learning, 8454-8463, 2021	35	2021
Batch-size independent regret bounds for the combinatorial multi-armed bandit problem N Merlis, S Mannor Conference on Learning Theory, 2465-2489, 2019	25	2019
Tight lower bounds for combinatorial multi-armed bandits N Merlis, S Mannor Conference on Learning Theory, 2830-2857, 2020	19	2020
Confidence-budget matching for sequential budgeted learning Y Efroni, N Merlis, A Saha, S Mannor International Conference on Machine Learning, 2937-2947, 2021	9	2021
Lenient regret for multi-armed bandits N Merlis, S Mannor Proceedings of the AAAI Conference on Artificial Intelligence 35 (10), 8950-8957, 2021	7	2021
Never Worse, Mostly Better: Stable Policy Improvement in Deep Reinforcement Learning P Khanna, G Tennenholtz, N Merlis, S Mannor, C Tessler arXiv preprint arXiv:1910.01062, 2019	4*	2019
Reinforcement learning with history dependent dynamic contexts G Tennenholtz, N Merlis, L Shani, M Mladenov, C Boutilier International Conference on Machine Learning, 34011-34053, 2023	3	2023
Reinforcement learning with a terminator G Tennenholtz, N Merlis, L Shani, S Mannor, U Shalit, G Chechik, ... Advances in Neural Information Processing Systems 35, 35696-35709, 2022	3	2022
On preemption and learning in stochastic scheduling N Merlis, H Richard, F Sentenac, C Odic, M Molina, V Perchet International Conference on Machine Learning, 24478-24516, 2023	2	2023
The Value of Reward Lookahead in Reinforcement Learning N Merlis, D Baudry, V Perchet arXiv preprint arXiv:2403.11637, 2024	1	2024
Improved Algorithms for Contextual Dynamic Pricing M Tullii, S Gaucher, N Merlis, V Perchet arXiv preprint arXiv:2406.11316, 2024		2024
Reinforcement Learning with Lookahead Information N Merlis arXiv preprint arXiv:2406.02258, 2024		2024
On Bits and Bandits: Quantifying the Regret-Information Trade-off I Shufaro, N Merlis, N Weinberger, S Mannor arXiv preprint arXiv:2405.16581, 2024		2024
Multi-armed bandits with guaranteed revenue per arm D Baudry, N Merlis, MB Molina, H Richard, V Perchet International Conference on Artificial Intelligence and Statistics, 379-387, 2024		2024
Ranking with Popularity Bias: User Welfare under Self-Amplification Dynamics G Tennenholtz, M Mladenov, N Merlis, RL Axtell, C Boutilier arXiv preprint arXiv:2305.18333, 2023		2023
Query-Reward Tradeoffs in Multi-Armed Bandits N Merlis, Y Efroni, S Mannor arXiv preprint arXiv:2110.05724, 2021		2021

The system can't perform the operation now. Try again later.

Articles 1–19

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors