default search action
Satinder Singh 0001
Person information
- affiliation: DeepMind, London, UK
- affiliation: University of Michigan, Department of Electrical Engineering and Computer Science, Ann Arbor, MI, USA
- affiliation: Syntek Capital
- affiliation: AT&T Labs, Florham Park, NJ, USA
- affiliation: University of Colorado Boulder, Department of Computer Science, CO, USA
- affiliation: Massachusetts Institute of Technology (MIT), Brain and Cognitive Science Department, Cambridge, MA, USA
Other persons with the same name
- Satinder Singh — disambiguation page
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [j32]Chandan Singh, Sukhjeet Kaur Ranade, Satinder Pal Singh:
Attention learning models using local Zernike moments-based normalized images and convolutional neural networks for skin lesion classification. Biomed. Signal Process. Control. 96: 106512 (2024) - [c203]Jake Bruce, Michael D. Dennis, Ashley Edwards, Jack Parker-Holder, Yuge Shi, Edward Hughes, Matthew Lai, Aditi Mavalankar, Richie Steigerwald, Chris Apps, Yusuf Aytar, Sarah Bechtle, Feryal M. P. Behbahani, Stephanie C. Y. Chan, Nicolas Heess, Lucy Gonzalez, Simon Osindero, Sherjil Ozair, Scott E. Reed, Jingwei Zhang, Konrad Zolna, Jeff Clune, Nando de Freitas, Satinder Singh, Tim Rocktäschel:
Genie: Generative Interactive Environments. ICML 2024 - [i78]Jake Bruce, Michael Dennis, Ashley Edwards, Jack Parker-Holder, Yuge Shi, Edward Hughes, Matthew Lai, Aditi Mavalankar, Richie Steigerwald, Chris Apps, Yusuf Aytar, Sarah Bechtle, Feryal M. P. Behbahani, Stephanie Chan, Nicolas Heess, Lucy Gonzalez, Simon Osindero, Sherjil Ozair, Scott E. Reed, Jingwei Zhang, Konrad Zolna, Jeff Clune, Nando de Freitas, Satinder Singh, Tim Rocktäschel:
Genie: Generative Interactive Environments. CoRR abs/2402.15391 (2024) - 2023
- [j31]Qi Zhang, Edmund H. Durfee, Satinder Singh:
Risk-aware analysis for interpretations of probabilistic achievement and maintenance commitments. Artif. Intell. 317: 103864 (2023) - [j30]Khimya Khetarpal, Claire Vernade, Brendan O'Donoghue, Satinder Singh, Tom Zahavy:
POMRL: No-Regret Learning-to-Plan with Increasing Horizons. Trans. Mach. Learn. Res. 2023 (2023) - [c202]Robert Tjarko Lange, Tom Schaul, Yutian Chen, Tom Zahavy, Valentin Dalibard, Chris Lu, Satinder Singh, Sebastian Flennerhag:
Discovering Evolution Strategies via Meta-Black-Box Optimization. GECCO Companion 2023: 29-30 - [c201]Wilka Carvalho, Angelos Filos, Richard L. Lewis, Honglak Lee, Satinder Singh:
Composing Task Knowledge With Modular Successor Feature Approximators. ICLR 2023 - [c200]Robert Tjarko Lange, Tom Schaul, Yutian Chen, Tom Zahavy, Valentin Dalibard, Chris Lu, Satinder Singh, Sebastian Flennerhag:
Discovering Evolution Strategies via Meta-Black-Box Optimization. ICLR 2023 - [c199]Michael Laskin, Luyu Wang, Junhyuk Oh, Emilio Parisotto, Stephen Spencer, Richie Steigerwald, DJ Strouse, Steven Stenberg Hansen, Angelos Filos, Ethan A. Brooks, Maxime Gazeau, Himanshu Sahni, Satinder Singh, Volodymyr Mnih:
In-context Reinforcement Learning with Algorithm Distillation. ICLR 2023 - [c198]Tom Zahavy, Yannick Schroecker, Feryal M. P. Behbahani, Kate Baumli, Sebastian Flennerhag, Shaobo Hou, Satinder Singh:
Discovering Policies with DOMiNO: Diversity Optimization Maintaining Near Optimality. ICLR 2023 - [c197]Jakob Bauer, Kate Baumli, Feryal M. P. Behbahani, Avishkar Bhoopchand, Nathalie Bradley-Schmieg, Michael Chang, Natalie Clay, Adrian Collister, Vibhavari Dasagi, Lucy Gonzalez, Karol Gregor, Edward Hughes, Sheleem Kashem, Maria Loks-Thompson, Hannah Openshaw, Jack Parker-Holder, Shreya Pathak, Nicolas Perez Nieves, Nemanja Rakicevic, Tim Rocktäschel, Yannick Schroecker, Satinder Singh, Jakub Sygnowski, Karl Tuyls, Sarah York, Alexander Zacherl, Lei M. Zhang:
Human-Timescale Adaptation in an Open-Ended Task Space. ICML 2023: 1887-1935 - [c196]Ted Moskovitz, Brendan O'Donoghue, Vivek Veeriah, Sebastian Flennerhag, Satinder Singh, Tom Zahavy:
ReLOAD: Reinforcement Learning with Optimistic Ascent-Descent for Last-Iterate Convergence in Constrained MDPs. ICML 2023: 25303-25336 - [c195]Chris Lu, Yannick Schroecker, Albert Gu, Emilio Parisotto, Jakob N. Foerster, Satinder Singh, Feryal M. P. Behbahani:
Structured State Space Models for In-Context Reinforcement Learning. NeurIPS 2023 - [c194]David Abel, André Barreto, Benjamin Van Roy, Doina Precup, Hado Philip van Hasselt, Satinder Singh:
A Definition of Continual Reinforcement Learning. NeurIPS 2023 - [c193]Ethan A. Brooks, Logan Walls, Richard L. Lewis, Satinder Singh:
Large Language Models can Implement Policy Iteration. NeurIPS 2023 - [c192]Wilka Carvalho, Andre Saraiva, Angelos Filos, Andrew K. Lampinen, Loic Matthey, Richard L. Lewis, Honglak Lee, Satinder Singh, Danilo Jimenez Rezende, Daniel Zoran:
Combining Behaviors with the Successor Features Keyboard. NeurIPS 2023 - [c191]Sebastian Flennerhag, Tom Zahavy, Brendan O'Donoghue, Hado Philip van Hasselt, András György, Satinder Singh:
Optimistic Meta-Gradients. NeurIPS 2023 - [i77]Sebastian Flennerhag, Tom Zahavy, Brendan O'Donoghue, Hado van Hasselt, András György, Satinder Singh:
Optimistic Meta-Gradients. CoRR abs/2301.03236 (2023) - [i76]Wilka Carvalho, Angelos Filos, Richard L. Lewis, Honglak Lee, Satinder Singh:
Composing Task Knowledge with Modular Successor Feature Approximators. CoRR abs/2301.12305 (2023) - [i75]Ted Moskovitz, Brendan O'Donoghue, Vivek Veeriah, Sebastian Flennerhag, Satinder Singh, Tom Zahavy:
ReLOAD: Reinforcement Learning with Optimistic Ascent-Descent for Last-Iterate Convergence in Constrained MDPs. CoRR abs/2302.01275 (2023) - [i74]Bernardo Ávila Pires, Feryal M. P. Behbahani, Hubert Soyer, Kyriacos Nikiforou, Thomas Keck, Satinder Singh:
Hierarchical Reinforcement Learning in Complex 3D Environments. CoRR abs/2302.14451 (2023) - [i73]Chris Lu, Yannick Schroecker, Albert Gu, Emilio Parisotto, Jakob N. Foerster, Satinder Singh, Feryal M. P. Behbahani:
Structured State Space Models for In-Context Reinforcement Learning. CoRR abs/2303.03982 (2023) - [i72]David Abel, André Barreto, Hado van Hasselt, Benjamin Van Roy, Doina Precup, Satinder Singh:
On the Convergence of Bounded Agents. CoRR abs/2307.11044 (2023) - [i71]David Abel, André Barreto, Benjamin Van Roy, Doina Precup, Hado van Hasselt, Satinder Singh:
A Definition of Continual Reinforcement Learning. CoRR abs/2307.11046 (2023) - [i70]Tom Zahavy, Vivek Veeriah, Shaobo Hou, Kevin Waugh, Matthew Lai, Edouard Leurent, Nenad Tomasev, Lisa Schut, Demis Hassabis, Satinder Singh:
Diversifying AI: Towards Creative Chess with AlphaZero. CoRR abs/2308.09175 (2023) - [i69]Wilka Carvalho, Andre Saraiva, Angelos Filos, Andrew Kyle Lampinen, Loic Matthey, Richard L. Lewis, Honglak Lee, Satinder Singh, Danilo J. Rezende, Daniel Zoran:
Combining Behaviors with the Successor Features Keyboard. CoRR abs/2310.15940 (2023) - 2022
- [c190]Zeyu Zheng, Risto Vuorio, Richard L. Lewis, Satinder Singh:
Adaptive Pairwise Weights for Temporal Credit Assignment. AAAI 2022: 9225-9232 - [c189]Jelena Luketina, Sebastian Flennerhag, Yannick Schroecker, David Abel, Tom Zahavy, Satinder Singh:
Meta-Gradients in Non-Stationary Environments. CoLLAs 2022: 886-901 - [c188]Sebastian Flennerhag, Yannick Schroecker, Tom Zahavy, Hado van Hasselt, David Silver, Satinder Singh:
Bootstrapped Meta-Learning. ICLR 2022 - [c187]David Abel, Will Dabney, Anna Harutyunyan, Mark K. Ho, Michael L. Littman, Doina Precup, Satinder Singh:
On the Expressivity of Markov Reward (Extended Abstract). IJCAI 2022: 5254-5258 - [c186]Dilip Arumugam, Satinder Singh:
Planning to the Information Horizon of BAMDPs via Epistemic State Abstraction. NeurIPS 2022 - [c185]Christopher Grimm, André Barreto, Satinder Singh:
Approximate Value Equivalence. NeurIPS 2022 - [c184]Hao Liu, Tom Zahavy, Volodymyr Mnih, Satinder Singh:
Palm up: Playing in the Latent Manifold for Unsupervised Pretraining. NeurIPS 2022 - [d1]Julien Pérolat, Bart De Vylder, Daniel Hennes, Eugene Tarassov, Florian Strub, Vincent de Boer, Paul Muller, Jerome T. Connor, Neil Burch, Thomas Anthony, Stephen McAleer, Romuald Elie, Sarah H. Cen, Zhe Wang, Audrunas Gruslys, Aleksandra Malysheva, Mina Khan, Sherjil Ozair, Finbarr Timbers, Toby Pohlen, Tom Eccles, Mark Rowland, Marc Lanctot, Jean-Baptiste Lespiau, Bilal Piot, Shayegan Omidshafiei, Edward Lockhart, Laurent Sifre, Nathalie Beauguerlange, Rémi Munos, David Silver, Satinder Singh, Demis Hassabis, Karl Tuyls:
Figure Data for the paper "Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning". Zenodo, 2022 - [i68]Vivek Veeriah, Zeyu Zheng, Richard L. Lewis, Satinder Singh:
GrASP: Gradient-Based Affordance Selection for Planning. CoRR abs/2202.04772 (2022) - [i67]Tom Zahavy, Yannick Schroecker, Feryal M. P. Behbahani, Kate Baumli, Sebastian Flennerhag, Shaobo Hou, Satinder Singh:
Discovering Policies with DOMiNO: Diversity Optimization Maintaining Near Optimality. CoRR abs/2205.13521 (2022) - [i66]Julien Pérolat, Bart De Vylder, Daniel Hennes, Eugene Tarassov, Florian Strub, Vincent de Boer, Paul Muller, Jerome T. Connor, Neil Burch, Thomas W. Anthony, Stephen McAleer, Romuald Elie, Sarah H. Cen, Zhe Wang, Audrunas Gruslys, Aleksandra Malysheva, Mina Khan, Sherjil Ozair, Finbarr Timbers, Toby Pohlen, Tom Eccles, Mark Rowland, Marc Lanctot, Jean-Baptiste Lespiau, Bilal Piot, Shayegan Omidshafiei, Edward Lockhart, Laurent Sifre, Nathalie Beauguerlange, Rémi Munos, David Silver, Satinder Singh, Demis Hassabis, Karl Tuyls:
Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning. CoRR abs/2206.15378 (2022) - [i65]Jelena Luketina, Sebastian Flennerhag, Yannick Schroecker, David Abel, Tom Zahavy, Satinder Singh:
Meta-Gradients in Non-Stationary Environments. CoRR abs/2209.06159 (2022) - [i64]Ethan A. Brooks, Logan Walls, Richard L. Lewis, Satinder Singh:
In-Context Policy Iteration. CoRR abs/2210.03821 (2022) - [i63]Hao Liu, Tom Zahavy, Volodymyr Mnih, Satinder Singh:
Palm up: Playing in the Latent Manifold for Unsupervised Pretraining. CoRR abs/2210.10913 (2022) - [i62]Michael Laskin, Luyu Wang, Junhyuk Oh, Emilio Parisotto, Stephen Spencer, Richie Steigerwald, DJ Strouse, Steven Hansen, Angelos Filos, Ethan A. Brooks, Maxime Gazeau, Himanshu Sahni, Satinder Singh, Volodymyr Mnih:
In-context Reinforcement Learning with Algorithm Distillation. CoRR abs/2210.14215 (2022) - [i61]Dilip Arumugam, Satinder Singh:
Planning to the Information Horizon of BAMDPs via Epistemic State Abstraction. CoRR abs/2210.16872 (2022) - [i60]Robert Tjarko Lange, Tom Schaul, Yutian Chen, Tom Zahavy, Valentin Dallibard, Chris Lu, Satinder Singh, Sebastian Flennerhag:
Discovering Evolution Strategies via Meta-Black-Box Optimization. CoRR abs/2211.11260 (2022) - [i59]Khimya Khetarpal, Claire Vernade, Brendan O'Donoghue, Satinder Singh, Tom Zahavy:
POMRL: No-Regret Learning-to-Plan with Increasing Horizons. CoRR abs/2212.14530 (2022) - 2021
- [j29]David Silver, Satinder Singh, Doina Precup, Richard S. Sutton:
Reward is enough. Artif. Intell. 299: 103535 (2021) - [c183]Qi Zhang, Edmund H. Durfee, Satinder Singh:
Efficient Querying for Cooperative Probabilistic Commitments. AAAI 2021: 11378-11386 - [c182]Tom Zahavy, André Barreto, Daniel J. Mankowitz, Shaobo Hou, Brendan O'Donoghue, Iurii Kemaev, Satinder Singh:
Discovering a set of policies for the worst case reward. ICLR 2021 - [c181]Ethan A. Brooks, Janarthanan Rajendran, Richard L. Lewis, Satinder Singh:
Reinforcement Learning of Implicit and Explicit Control Flow Instructions. ICML 2021: 1082-1091 - [c180]Wilka Carvalho, Anthony Liang, Kimin Lee, Sungryull Sohn, Honglak Lee, Richard L. Lewis, Satinder Singh:
Reinforcement Learning for Sparse-Reward Object-Interaction Tasks in a First-person Simulated 3D Environment. IJCAI 2021: 2219-2226 - [c179]Christopher Grimm, André Barreto, Gregory Farquhar, David Silver, Satinder Singh:
Proper Value Equivalence. NeurIPS 2021: 7773-7786 - [c178]David Abel, Will Dabney, Anna Harutyunyan, Mark K. Ho, Michael L. Littman, Doina Precup, Satinder Singh:
On the Expressivity of Markov Reward. NeurIPS 2021: 7799-7812 - [c177]Zeyu Zheng, Vivek Veeriah, Risto Vuorio, Richard L. Lewis, Satinder Singh:
Learning State Representations from Random Deep Action-conditional Predictions. NeurIPS 2021: 23679-23691 - [c176]Tom Zahavy, Brendan O'Donoghue, Guillaume Desjardins, Satinder Singh:
Reward is enough for convex MDPs. NeurIPS 2021: 25746-25759 - [c175]Vivek Veeriah, Tom Zahavy, Matteo Hessel, Zhongwen Xu, Junhyuk Oh, Iurii Kemaev, Hado van Hasselt, David Silver, Satinder Singh:
Discovery of Options via Meta-Learned Subgoals. NeurIPS 2021: 29861-29873 - [i58]Tom Zahavy, André Barreto, Daniel J. Mankowitz, Shaobo Hou, Brendan O'Donoghue, Iurii Kemaev, Satinder Singh:
Discovering a set of policies for the worst case reward. CoRR abs/2102.04323 (2021) - [i57]Zeyu Zheng, Vivek Veeriah, Risto Vuorio, Richard L. Lewis, Satinder Singh:
Learning State Representations from Random Deep Action-conditional Predictions. CoRR abs/2102.04897 (2021) - [i56]Zeyu Zheng, Risto Vuorio, Richard L. Lewis, Satinder Singh:
Pairwise Weights for Temporal Credit Assignment. CoRR abs/2102.04999 (2021) - [i55]Vivek Veeriah, Tom Zahavy, Matteo Hessel, Zhongwen Xu, Junhyuk Oh, Iurii Kemaev, Hado van Hasselt, David Silver, Satinder Singh:
Discovery of Options via Meta-Learned Subgoals. CoRR abs/2102.06741 (2021) - [i54]Ethan A. Brooks, Janarthanan Rajendran, Richard L. Lewis, Satinder Singh:
Reinforcement Learning of Implicit and Explicit Control Flow in Instructions. CoRR abs/2102.13195 (2021) - [i53]Tom Zahavy, Brendan O'Donoghue, Guillaume Desjardins, Satinder Singh:
Reward is enough for convex MDPs. CoRR abs/2106.00661 (2021) - [i52]Tom Zahavy, Brendan O'Donoghue, André Barreto, Volodymyr Mnih, Sebastian Flennerhag, Satinder Singh:
Discovering Diverse Nearly Optimal Policies withSuccessor Features. CoRR abs/2106.00669 (2021) - [i51]Christopher Grimm, André Barreto, Gregory Farquhar, David Silver, Satinder Singh:
Proper Value Equivalence. CoRR abs/2106.10316 (2021) - [i50]Sebastian Flennerhag, Yannick Schroecker, Tom Zahavy, Hado van Hasselt, David Silver, Satinder Singh:
Bootstrapped Meta-Learning. CoRR abs/2109.04504 (2021) - [i49]Janarthanan Rajendran, Jonathan K. Kummerfeld, Satinder Singh:
Learning to Learn End-to-End Goal-Oriented Dialog From Related Dialog Tasks. CoRR abs/2110.15724 (2021) - [i48]David Abel, Will Dabney, Anna Harutyunyan, Mark K. Ho, Michael L. Littman, Doina Precup, Satinder Singh:
On the Expressivity of Markov Reward. CoRR abs/2111.00876 (2021) - 2020
- [j28]Qi Zhang, Edmund H. Durfee, Satinder Singh:
Semantics and algorithms for trustworthy commitment achievement under model uncertainty. Auton. Agents Multi Agent Syst. 34(1): 19 (2020) - [c174]Shun Zhang, Edmund H. Durfee, Satinder Singh:
Querying to Find a Safe Policy under Uncertain Safety Constraints in Markov Decision Processes. AAAI 2020: 2552-2559 - [c173]Janarthanan Rajendran, Richard L. Lewis, Vivek Veeriah, Honglak Lee, Satinder Singh:
How Should an Agent Practice? AAAI 2020: 5454-5461 - [c172]Qi Zhang, Edmund H. Durfee, Satinder Singh:
Modeling Probabilistic Commitments for Maintenance Is Inherently Harder than for Achievement. AAAI 2020: 10326-10333 - [c171]Aditya Modi, Nan Jiang, Ambuj Tewari, Satinder Singh:
Sample Complexity of Reinforcement Learning using Linearly Combined Model Ensembles. AISTATS 2020: 2010-2020 - [c170]Ian Osband, Yotam Doron, Matteo Hessel, John Aslanides, Eren Sezener, Andre Saraiva, Katrina McKinney, Tor Lattimore, Csaba Szepesvári, Satinder Singh, Benjamin Van Roy, Richard S. Sutton, David Silver, Hado van Hasselt:
Behaviour Suite for Reinforcement Learning. ICLR 2020 - [c169]Zeyu Zheng, Junhyuk Oh, Matteo Hessel, Zhongwen Xu, Manuel Kroiss, Hado van Hasselt, David Silver, Satinder Singh:
What Can Learned Intrinsic Rewards Capture? ICML 2020: 11436-11446 - [c168]Thomas W. Anthony, Tom Eccles, Andrea Tacchetti, János Kramár, Ian Gemp, Thomas C. Hudson, Nicolas Porcel, Marc Lanctot, Julien Pérolat, Richard Everett, Satinder Singh, Thore Graepel, Yoram Bachrach:
Learning to Play No-Press Diplomacy with Best Response Policy Iteration. NeurIPS 2020 - [c167]Christopher Grimm, André Barreto, Satinder Singh, David Silver:
The Value Equivalence Principle for Model-Based Reinforcement Learning. NeurIPS 2020 - [c166]Junhyuk Oh, Matteo Hessel, Wojciech M. Czarnecki, Zhongwen Xu, Hado van Hasselt, Satinder Singh, David Silver:
Discovering Reinforcement Learning Algorithms. NeurIPS 2020 - [c165]Zheng Wen, Doina Precup, Morteza Ibrahimi, André Barreto, Benjamin Van Roy, Satinder Singh:
On Efficiency in Hierarchical Reinforcement Learning. NeurIPS 2020 - [c164]Zhongwen Xu, Hado Philip van Hasselt, Matteo Hessel, Junhyuk Oh, Satinder Singh, David Silver:
Meta-Gradient Reinforcement Learning with an Objective Discovered Online. NeurIPS 2020 - [c163]Tom Zahavy, Zhongwen Xu, Vivek Veeriah, Matteo Hessel, Junhyuk Oh, Hado van Hasselt, David Silver, Satinder Singh:
A Self-Tuning Actor-Critic Algorithm. NeurIPS 2020 - [i47]Tom Zahavy, Zhongwen Xu, Vivek Veeriah, Matteo Hessel, Junhyuk Oh, Hado van Hasselt, David Silver, Satinder Singh:
Self-Tuning Deep Reinforcement Learning. CoRR abs/2002.12928 (2020) - [i46]Thomas W. Anthony, Tom Eccles, Andrea Tacchetti, János Kramár, Ian Gemp, Thomas C. Hudson, Nicolas Porcel, Marc Lanctot, Julien Pérolat, Richard Everett, Satinder Singh, Thore Graepel, Yoram Bachrach:
Learning to Play No-Press Diplomacy with Best Response Policy Iteration. CoRR abs/2006.04635 (2020) - [i45]Zhongwen Xu, Hado van Hasselt, Matteo Hessel, Junhyuk Oh, Satinder Singh, David Silver:
Meta-Gradient Reinforcement Learning with an Objective Discovered Online. CoRR abs/2007.08433 (2020) - [i44]Junhyuk Oh, Matteo Hessel, Wojciech M. Czarnecki, Zhongwen Xu, Hado van Hasselt, Satinder Singh, David Silver:
Discovering Reinforcement Learning Algorithms. CoRR abs/2007.08794 (2020) - [i43]Wilka Carvalho, Anthony Liang, Kimin Lee, Sungryull Sohn, Honglak Lee, Richard L. Lewis, Satinder Singh:
Reinforcement Learning for Sparse-Reward Object-Interaction Tasks in First-person Simulated 3D Environments. CoRR abs/2010.15195 (2020) - [i42]Christopher Grimm, André Barreto, Satinder Singh, David Silver:
The Value Equivalence Principle for Model-Based Reinforcement Learning. CoRR abs/2011.03506 (2020) - [i41]Qi Zhang, Edmund H. Durfee, Satinder Singh:
Efficient Querying for Cooperative Probabilistic Commitments. CoRR abs/2012.07195 (2020)
2010 – 2019
- 2019
- [c162]Qi Zhang, Richard L. Lewis, Satinder Singh, Edmund H. Durfee:
Learning to Communicate and Solve Visual Blocks-World Tasks. AAAI 2019: 5781-5788 - [c161]John Holler, Risto Vuorio, Zhiwei (Tony) Qin, Xiaocheng Tang, Yan Jiao, Tiancheng Jin, Satinder Singh, Chenxi Wang, Jieping Ye:
Deep Reinforcement Learning for Multi-driver Vehicle Dispatching and Repositioning Problem. ICDM 2019: 1090-1095 - [c160]Qi Zhang, Edmund H. Durfee, Satinder Singh:
Computational Strategies for the Trustworthy Pursuit and the Safe Modeling of Probabilistic Maintenance Commitments. AISafety@IJCAI 2019 - [c159]Philip Paquette, Yuchen Lu, Steven Bocco, Max O. Smith, Satya Ortiz-Gagne, Jonathan K. Kummerfeld, Joelle Pineau, Satinder Singh, Aaron C. Courville:
No-Press Diplomacy: Modeling Multi-Agent Gameplay. NeurIPS 2019: 4476-4487 - [c158]Vivek Veeriah, Matteo Hessel, Zhongwen Xu, Janarthanan Rajendran, Richard L. Lewis, Junhyuk Oh, Hado van Hasselt, David Silver, Satinder Singh:
Discovery of Useful Questions as Auxiliary Tasks. NeurIPS 2019: 9306-9317 - [c157]Anna Harutyunyan, Will Dabney, Thomas Mesnard, Mohammad Gheshlaghi Azar, Bilal Piot, Nicolas Heess, Hado van Hasselt, Gregory Wayne, Satinder Singh, Doina Precup, Rémi Munos:
Hindsight Credit Assignment. NeurIPS 2019: 12467-12476 - [c156]Janarthanan Rajendran, Jatin Ganhotra, Xiaoxiao Guo, Mo Yu, Satinder Singh, Lazaros Polymenakos:
NE-Table: A Neural key-value table for Named Entities. RANLP 2019: 980-993 - [p1]Benjamin W. Priest, George Cybenko, Satinder Singh, Massimiliano Albanese, Peng Liu:
Online and Scalable Adaptive Cyber Defense. Adversarial and Uncertain Reasoning for Adaptive Cyber Defense 2019: 232-261 - [i40]Christopher Grimm, Satinder Singh:
Learning Independently-Obtainable Reward Functions. CoRR abs/1901.08649 (2019) - [i39]Ian Osband, Yotam Doron, Matteo Hessel, John Aslanides, Eren Sezener, Andre Saraiva, Katrina McKinney, Tor Lattimore, Csaba Szepesvári, Satinder Singh, Benjamin Van Roy, Richard S. Sutton, David Silver, Hado van Hasselt:
Behaviour Suite for Reinforcement Learning. CoRR abs/1908.03568 (2019) - [i38]Philip Paquette, Yuchen Lu, Steven Bocco, Max O. Smith, Satya Ortiz-Gagne, Jonathan K. Kummerfeld, Satinder Singh, Joelle Pineau, Aaron C. Courville:
No Press Diplomacy: Modeling Multi-Agent Gameplay. CoRR abs/1909.02128 (2019) - [i37]Vivek Veeriah, Matteo Hessel, Zhongwen Xu, Richard L. Lewis, Janarthanan Rajendran, Junhyuk Oh, Hado van Hasselt, David Silver, Satinder Singh:
Discovery of Useful Questions as Auxiliary Tasks. CoRR abs/1909.04607 (2019) - [i36]Aditya Modi, Nan Jiang, Ambuj Tewari, Satinder Singh:
Sample Complexity of Reinforcement Learning using Linearly Combined Model Ensembles. CoRR abs/1910.10597 (2019) - [i35]Victor Bapst, Alvaro Sanchez-Gonzalez, Omar Shams, Kimberly L. Stachenfeld, Peter W. Battaglia, Satinder Singh, Jessica B. Hamrick:
Object-oriented state editing for HRL. CoRR abs/1910.14361 (2019) - [i34]Christopher Grimm, Irina Higgins, André Barreto, Denis Teplyashin, Markus Wulfmeier, Tim Hertweck, Raia Hadsell, Satinder Singh:
Disentangled Cumulants Help Successor Representations Transfer to New Tasks. CoRR abs/1911.10866 (2019) - [i33]John Holler, Risto Vuorio, Zhiwei (Tony) Qin, Xiaocheng Tang, Yan Jiao, Tiancheng Jin, Satinder Singh, Chenxi Wang, Jieping Ye:
Deep Reinforcement Learning for Multi-Driver Vehicle Dispatching and Repositioning Problem. CoRR abs/1911.11260 (2019) - [i32]Anna Harutyunyan, Will Dabney, Thomas Mesnard, Mohammad Gheshlaghi Azar, Bilal Piot, Nicolas Heess, Hado van Hasselt, Greg Wayne, Satinder Singh, Doina Precup, Rémi Munos:
Hindsight Credit Assignment. CoRR abs/1912.02503 (2019) - [i31]Zeyu Zheng, Junhyuk Oh, Matteo Hessel, Zhongwen Xu, Manuel Kroiss, Hado van Hasselt, David Silver, Satinder Singh:
What Can Learned Intrinsic Rewards Capture? CoRR abs/1912.05500 (2019) - [i30]Janarthanan Rajendran, Richard L. Lewis, Vivek Veeriah, Honglak Lee, Satinder Singh:
How Should an Agent Practice? CoRR abs/1912.07045 (2019) - 2018
- [j27]Thanh Hong Nguyen, Mason Wright, Michael P. Wellman, Satinder Singh:
Multistage Attack Graph Security Games: Heuristic Strategies, with Empirical Game-Theoretic Analysis. Secur. Commun. Networks 2018: 2864873:1-2864873:28 (2018) - [c155]Aditya Modi, Nan Jiang, Satinder Singh, Ambuj Tewari:
Markov Decision Processes with Continuous Side Information. ALT 2018: 597-618 - [c154]Qi Zhang, Edmund H. Durfee, Satinder Singh:
Challenges in the Trustworthy Pursuit of Maintenance Commitments Under Uncertainty. TRUST@AAMAS 2018: 75-86 - [c153]Shun Zhang, Edmund H. Durfee, Satinder Singh:
On Querying for Safe Optimality in Factored Markov Decision Processes. AAMAS 2018: 2168-2170 - [c152]Janarthanan Rajendran, Jatin Ganhotra, Satinder Singh, Lazaros Polymenakos:
Learning End-to-End Goal-Oriented Dialog with Multiple Answers. EMNLP 2018: 3834-3843 - [c151]Junhyuk Oh, Yijie Guo, Satinder Singh, Honglak Lee:
Self-Imitation Learning. ICML 2018: 3875-3884 - [c150]Shun Zhang, Edmund H. Durfee, Satinder Singh:
Minimax-Regret Querying on Side Effects for Safe Optimality in Factored Markov Decision Processes. IJCAI 2018: 4867-4873 - [c149]Nan Jiang, Alex Kulesza, Satinder Singh:
Completing State Representations using Spectral Learning. NeurIPS 2018: 4333-4342 - [c148]Zeyu Zheng, Junhyuk Oh, Satinder Singh:
On Learning Intrinsic Rewards for Policy Gradient Methods. NeurIPS 2018: 4649-4659 - [i29]Jiaxuan Wang, Ian Fox, Jonathan Skaza, Nick Linck, Satinder Singh, Jenna Wiens:
The Advantage of Doubling: A Deep Reinforcement Learning Approach to Studying the Double Team in the NBA. CoRR abs/1803.02940 (2018) - [i28]Zeyu Zheng, Junhyuk Oh, Satinder Singh:
On Learning Intrinsic Rewards for Policy Gradient Methods. CoRR abs/1804.06459 (2018) - [i27]Janarthanan Rajendran, Jatin Ganhotra, Xiaoxiao Guo, Mo Yu, Satinder Singh:
Named Entities troubling your Neural Methods? Build NE-Table: A neural approach for handling Named Entities. CoRR abs/1804.09540 (2018) - [i26]Junhyuk Oh, Yijie Guo, Satinder Singh, Honglak Lee:
Self-Imitation Learning. CoRR abs/1806.05635 (2018) - [i25]Vivek Veeriah, Junhyuk Oh, Satinder Singh:
Many-Goals Reinforcement Learning. CoRR abs/1806.09605 (2018) - [i24]Janarthanan Rajendran, Jatin Ganhotra, Satinder Singh, Lazaros Polymenakos:
Learning End-to-End Goal-Oriented Dialog with Multiple Answers. CoRR abs/1808.09996 (2018) - [i23]Yijie Guo, Junhyuk Oh, Satinder Singh, Honglak Lee:
Generative Adversarial Self-Imitation Learning. CoRR abs/1812.00950 (2018) - 2017
- [c147]Thanh Hong Nguyen, Michael P. Wellman, Satinder Singh:
A Stackelberg Game Model for Botnet Traffic Exfiltration. AAAI Workshops 2017 - [c146]Verónica Pérez-Rosas, Rada Mihalcea, Kenneth Resnicow, Satinder Singh, Lawrence C. An:
Understanding and Predicting Empathic Behavior in Counseling Therapy. ACL (1) 2017: 1426-1435 - [c145]Shun Zhang, Edmund H. Durfee, Satinder Singh:
Approximately-Optimal Queries for Planning in Reward-Uncertain Markov Decision Processes. ICAPS 2017: 339-347 - [c144]Qi Zhang, Satinder Singh, Edmund H. Durfee:
Minimizing Maximum Regret in Commitment Constrained Sequential Decision Making. ICAPS 2017: 348-357 - [c143]Thanh Hong Nguyen, Mason Wright, Michael P. Wellman, Satinder Singh:
Multi-Stage Attack Graph Security Games: Heuristic Strategies, with Empirical Game-Theoretic Analysis. MTD@CCS 2017: 87-97 - [c142]Verónica Pérez-Rosas, Rada Mihalcea, Kenneth Resnicow, Satinder Singh, Lawrence C. An, Kathy J. Goggin, Delwyn Catley:
Predicting Counselor Behaviors in Motivational Interviewing Encounters. EACL (1) 2017: 1128-1137 - [c141]Thanh Hong Nguyen, Michael P. Wellman, Satinder Singh:
A Stackelberg Game Model for Botnet Data Exfiltration. GameSec 2017: 151-170 - [c140]Xiaoxiao Guo, Tim Klinger, Clemens Rosenbaum, Joseph P. Bigus, Murray Campbell, Ban Kawas, Kartik Talamadupula, Gerry Tesauro, Satinder Singh:
Learning to Query, Reason, and Answer Questions On Ambiguous Texts. ICLR (Poster) 2017 - [c139]Junhyuk Oh, Satinder Singh, Honglak Lee, Pushmeet Kohli:
Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning. ICML 2017: 2661-2670 - [c138]Kareem Amin, Nan Jiang, Satinder Singh:
Repeated Inverse Reinforcement Learning. NIPS 2017: 1815-1824 - [c137]Junhyuk Oh, Satinder Singh, Honglak Lee:
Value Prediction Network. NIPS 2017: 6118-6128 - [e1]Satinder Singh, Shaul Markovitch:
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA. AAAI Press 2017 [contents] - [i22]Qi Zhang, Satinder Singh, Edmund H. Durfee:
Minimizing Maximum Regret in Commitment Constrained Sequential Decision Making. CoRR abs/1703.04587 (2017) - [i21]Kareem Amin, Nan Jiang, Satinder Singh:
Repeated Inverse Reinforcement Learning. CoRR abs/1705.05427 (2017) - [i20]Junhyuk Oh, Satinder Singh, Honglak Lee, Pushmeet Kohli:
Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning. CoRR abs/1706.05064 (2017) - [i19]Junhyuk Oh, Satinder Singh, Honglak Lee:
Value Prediction Network. CoRR abs/1707.03497 (2017) - [i18]Aditya Modi, Nan Jiang, Satinder Singh, Ambuj Tewari:
Markov Decision Processes with Continuous Side Information. CoRR abs/1711.05726 (2017) - 2016
- [j26]Alexander Van Esbroeck, Landon Smith, Zeeshan Syed, Satinder Singh, Zahi N. Karam:
Multi-task seizure detection: addressing intra-patient variation in seizure morphologies. Mach. Learn. 102(3): 309-321 (2016) - [c136]Nan Jiang, Alex Kulesza, Satinder Singh:
Improving Predictive State Representations via Gradient Descent. AAAI 2016: 1709-1715 - [c135]Edmund H. Durfee, Satinder Singh:
On the Trustworthy Fulfillment of Commitments. AAMAS Workshops (Selected Papers) 2016: 1-13 - [c134]Edmund H. Durfee, Satinder Singh:
On the Trustworthy Fulfillment of Commitments. TRUST@AAMAS 2016: 54-62 - [c133]Junhyuk Oh, Valliappa Chockalingam, Satinder Singh, Honglak Lee:
Control of Memory, Active Perception, and Action in Minecraft. ICML 2016: 2790-2799 - [c132]Xiaoxiao Guo, Satinder Singh, Richard L. Lewis, Honglak Lee:
Deep Learning for Reward Design to Improve Monte Carlo Tree Search in ATARI Games. IJCAI 2016: 1519-1525 - [c131]Nan Jiang, Satinder Singh, Ambuj Tewari:
On Structural Properties of MDPs that Bound Loss Due to Shallow Planning. IJCAI 2016: 1640-1647 - [c130]Qi Zhang, Edmund H. Durfee, Satinder Singh, Anna Chen, Stefan J. Witwicki:
Commitment Semantics for Sequential Decision Making under Reward Uncertainty. IJCAI 2016: 3315-3323 - [c129]Nan Jiang, Alex Kulesza, Satinder Singh, Richard L. Lewis:
The Dependence of Effective Planning Horizon on Model Accuracy. IJCAI 2016: 4180-4189 - [c128]Verónica Pérez-Rosas, Rada Mihalcea, Kenneth Resnicow, Satinder Singh, Lawrence C. An:
Building a Motivational Interviewing Dataset. CLPsych@HLT-NAACL 2016: 42-51 - [c127]Kareem Amin, Michael P. Wellman, Satinder Singh:
Gradient Methods for Stackelberg Games. UAI 2016 - [i17]Kareem Amin, Satinder Singh:
Towards Resolving Unidentifiability in Inverse Reinforcement Learning. CoRR abs/1601.06569 (2016) - [i16]Xiaoxiao Guo, Satinder Singh, Richard L. Lewis, Honglak Lee:
Deep Learning for Reward Design to Improve Monte Carlo Tree Search in ATARI Games. CoRR abs/1604.07095 (2016) - [i15]Junhyuk Oh, Valliappa Chockalingam, Satinder Singh, Honglak Lee:
Control of Memory, Active Perception, and Action in Minecraft. CoRR abs/1605.09128 (2016) - 2015
- [c126]Alex Kulesza, Nan Jiang, Satinder Singh:
Spectral Learning of Predictive State Representations with Insufficient Statistics. AAAI 2015: 2715-2721 - [c125]Edmund H. Durfee, Satinder Singh:
Commitment Semantics for Sequential Decision Making Under Reward Uncertainty. AAAI Fall Symposia 2015: 13-20 - [c124]Alex Kulesza, Nan Jiang, Satinder Singh:
Low-Rank Spectral Learning with Weighted Loss Functions. AISTATS 2015 - [c123]Nan Jiang, Alex Kulesza, Satinder Singh, Richard L. Lewis:
The Dependence of Effective Planning Horizon on Model Accuracy. AAMAS 2015: 1181-1189 - [c122]Nan Jiang, Alex Kulesza, Satinder Singh:
Abstraction Selection in Model-based Reinforcement Learning. ICML 2015: 179-188 - [c121]Junhyuk Oh, Xiaoxiao Guo, Honglak Lee, Richard L. Lewis, Satinder Singh:
Action-Conditional Video Prediction using Deep Networks in Atari Games. NIPS 2015: 2863-2871 - [i14]Junhyuk Oh, Xiaoxiao Guo, Honglak Lee, Richard L. Lewis, Satinder Singh:
Action-Conditional Video Prediction using Deep Networks in Atari Games. CoRR abs/1507.08750 (2015) - 2014
- [j25]Bingyao Liu, Satinder Singh, Richard L. Lewis, Shiyin Qin:
Optimal Rewards for Cooperative Agents. IEEE Trans. Auton. Ment. Dev. 6(4): 286-297 (2014) - [j24]Andrew Howes, Richard L. Lewis, Satinder Singh:
Utility Maximization and Bounds on Human Information Processing. Top. Cogn. Sci. 6(2): 198-203 (2014) - [j23]Richard L. Lewis, Andrew Howes, Satinder Singh:
Computational Rationality: Linking Mechanism and Behavior Through Bounded Utility Maximization. Top. Cogn. Sci. 6(2): 279-311 (2014) - [c120]Chih-Chun Chia, James Blum, Zahi N. Karam, Satinder Singh, Zeeshan Syed:
Predicting Postoperative Atrial Fibrillation from Independent ECG Components. AAAI 2014: 1178-1184 - [c119]Alexander Van Esbroeck, Satinder Singh, Ilan Rubinfeld, Zeeshan Syed:
Evaluating Trauma Patients: Addressing Missing Covariates with Joint Optimization. AAAI 2014: 1307-1313 - [c118]Michael Shvartsman, Richard L. Lewis, Satinder Singh:
Computationally Rational Saccadic Control: An Explanation of Spillover Effects Based on Sampling from Noisy Perception and Memory. CMCL@ACL 2014: 1-9 - [c117]Yevgeniy Vorobeychik, Bo An, Milind Tambe, Satinder Singh:
Computing Solutions in Infinite-Horizon Discounted Adversarial Patrolling Games. ICAPS 2014 - [c116]Robert Cohn, Satinder Singh, Edmund H. Durfee:
Characterizing EVOI-Sufficient k-Response Query Sets in Decision Problems. AISTATS 2014: 131-139 - [c115]Alex Kulesza, N. Raj Rao, Satinder Singh:
Low-Rank Spectral Learning. AISTATS 2014: 522-530 - [c114]Nan Jiang, Satinder Singh, Richard L. Lewis:
Improving UCT planning via approximate homomorphisms. AAMAS 2014: 1289-1296 - [c113]Zahi N. Karam, Emily Mower Provost, Satinder Singh, Jennifer Montgomery, Christopher Archer, Gloria Harrington, Melvin G. McInnis:
Ecologically valid long-term mood monitoring of individuals with bipolar disorder using speech. ICASSP 2014: 4858-4862 - [c112]Xiaoxiao Guo, Satinder Singh, Honglak Lee, Richard L. Lewis, Xiaoshi Wang:
Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning. NIPS 2014: 3338-3346 - [i13]Erik Talvitie, Satinder Singh:
Learning to Make Predictions In Partially Observable Environments Without a Generative Model. CoRR abs/1401.3870 (2014) - 2013
- [j22]Richard L. Lewis, Michael Shvartsman, Satinder Singh:
The Adaptive Nature of Eye Movements in Linguistic Tasks: How Payoff and Architecture Shape Speed-Accuracy Trade-Offs. Top. Cogn. Sci. 5(3): 581-610 (2013) - [c111]Michael Feary, Dorrit Billman, Xiuli Chen, Andrew Howes, Richard L. Lewis, Lance Sherry, Satinder Singh:
Linking Context to Evaluation in the Design of Safety Critical Interfaces. HCI (1) 2013: 193-202 - [c110]Xiaoxiao Guo, Satinder Singh, Richard L. Lewis:
Reward Mapping for Transfer in Long-Lived Agents. NIPS 2013: 2130-2138 - [i12]Michael J. Kearns, Michael L. Littman, Satinder Singh:
Graphical Models for Game Theory. CoRR abs/1301.2281 (2013) - [i11]Michael J. Kearns, Yishay Mansour, Satinder Singh:
Fast Planning in Stochastic Games. CoRR abs/1301.3867 (2013) - [i10]Satinder Singh, Michael J. Kearns, Yishay Mansour:
Nash Convergence of Gradient Dynamics in Iterated General-Sum Games. CoRR abs/1301.3892 (2013) - [i9]Yishay Mansour, Satinder Singh:
On the Complexity of Policy Iteration. CoRR abs/1301.6718 (2013) - [i8]David A. McAllester, Satinder Singh:
Approximate Planning for Factored POMDPs using Belief State Simplification. CoRR abs/1301.6719 (2013) - 2012
- [j21]Noa Agmon, Vikas Agrawal, David W. Aha, Yiannis Aloimonos, Donagh Buckley, Prashant Doshi, Christopher W. Geib, Floriana Grasso, Nancy L. Green, Benjamin Johnston, Burt Kaliski, Christopher Kiekintveld, Edith Law, Henry Lieberman, Ole J. Mengshoel, Ted Metzler, Joseph Modayil, Douglas W. Oard, Nilufer Onder, Barry O'Sullivan, Katerina Pastra, Doina Precup, Sowmya Ramachandran, Chris Reed, Sanem Sariel Talay, Ted Selker, Lokendra Shastri, Stephen F. Smith, Satinder Singh, Siddharth Srivastava, Gita Sukthankar, David C. Uthus, Mary-Anne Williams:
Reports of the AAAI 2011 Conference Workshops. AI Mag. 33(1): 57-70 (2012) - [c109]Bo An, David Kempe, Christopher Kiekintveld, Eric Shieh, Satinder Singh, Milind Tambe, Yevgeniy Vorobeychik:
Security Games with Limited Surveillance. AAAI 2012: 1241-1248 - [c108]Yevgeniy Vorobeychik, Satinder Singh:
Computing Stackelberg Equilibria in Discounted Stochastic Games. AAAI 2012: 1478-1484 - [c107]Bo An, David Kempe, Christopher Kiekintveld, Eric Anyung Shieh, Satinder Singh, Milind Tambe, Yevgeniy Vorobeychik:
Security Games with Limited Surveillance: An Initial Report. AAAI Spring Symposium: Game Theory for Security, Sustainability, and Health 2012 - [c106]Jeshua Bratman, Satinder Singh, Jonathan Sorg, Richard L. Lewis:
Strong mitigation: nesting search for good policies within search for good reward. AAMAS 2012: 407-414 - [c105]Quang Duong, Michael P. Wellman, Satinder Singh, Michael J. Kearns:
Learning and predicting dynamic networked behavior with graphical multiagent models. AAMAS 2012: 441-448 - [c104]Stefan J. Witwicki, Inn-Tung Chen, Edmund H. Durfee, Satinder Singh:
Planning and evaluating multiagent influences under reward uncertainty. AAMAS 2012: 1277-1278 - [c103]Bingyao Liu, Satinder Singh, Richard L. Lewis, Shiyin Qin:
Optimal rewards in multiagent teams. ICDL-EPIROB 2012: 1-8 - [c102]Tuomas Sandholm, Satinder Singh:
Lossy stochastic game abstraction with bounds. EC 2012: 880-897 - [i7]Jonathan Sorg, Satinder Singh, Richard L. Lewis:
Variance-Based Rewards for Approximate Bayesian Reinforcement Learning. CoRR abs/1203.3518 (2012) - [i6]Quang Duong, Michael P. Wellman, Satinder Singh:
Knowledge Combination in Graphical Multiagent Model. CoRR abs/1206.3248 (2012) - [i5]Ruggiero Cavallo, David C. Parkes, Satinder Singh:
Optimal Coordinated Planning Amongst Self-Interested Agents with Private State. CoRR abs/1206.6820 (2012) - [i4]Matthew R. Rudary, Satinder Singh, David Wingate:
Predictive Linear-Gaussian Models of Stochastic Dynamical Systems. CoRR abs/1207.1416 (2012) - [i3]Satinder Singh, Michael R. James, Matthew R. Rudary:
Predictive State Representations: A New Theory for Modeling Dynamical Systems. CoRR abs/1207.4167 (2012) - 2011
- [b1]Satinder Pal Singh:
IP Geolocation in Metropolitan Areas. University of Maryland, College Park, MD, USA, 2011 - [j20]Erik Talvitie, Satinder Singh:
Learning to Make Predictions In Partially Observable Environments Without a Generative Model. J. Artif. Intell. Res. 42: 353-392 (2011) - [c101]Jonathan Sorg, Satinder Singh, Richard L. Lewis:
Optimal Rewards versus Leaf-Evaluation Heuristics in Planning Agents. AAAI 2011: 465-470 - [c100]Robert Cohn, Edmund H. Durfee, Satinder Singh:
Comparing Action-Query Strategies in Semi-Autonomous Agents. AAAI 2011: 1102-1107 - [c99]Robert Cohn, Edmund H. Durfee, Satinder Singh:
Comparing action-query strategies in semi-autonomous agents. AAMAS 2011: 1287-1288 - [c98]Satinder Pal Singh, Randolph Baden, Choon Lee, Bobby Bhattacharjee, Richard J. La, Mark A. Shayman:
IP geolocation in metropolitan areas. SIGMETRICS 2011: 155-156 - [c97]Quang Duong, Michael P. Wellman, Satinder Singh:
Modeling Information Diffusion in Networks with Unobserved Links. SocialCom/PASSAT 2011: 362-369 - [i2]Michael J. Kearns, Diane J. Litman, Satinder Singh, Marilyn A. Walker:
Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System. CoRR abs/1106.0676 (2011) - [i1]Michael J. Kearns, Michael L. Littman, Satinder Singh, Peter Stone:
ATTac-2000: An Adaptive Autonomous Bidding Agent. CoRR abs/1106.0678 (2011) - 2010
- [j19]David C. Parkes, Ruggiero Cavallo, Florin Constantin, Satinder Singh:
Dynamic Incentive Mechanisms. AI Mag. 31(4): 79-94 (2010) - [j18]Satinder Singh, Richard L. Lewis, Andrew G. Barto, Jonathan Sorg:
Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective. IEEE Trans. Auton. Ment. Dev. 2(2): 70-82 (2010) - [c96]Jonathan Sorg, Satinder Singh:
Linear options. AAMAS 2010: 31-38 - [c95]Quang Duong, Michael P. Wellman, Satinder Singh, Yevgeniy Vorobeychik:
History-dependent graphical multiagent models. AAMAS 2010: 1215-1222 - [c94]Robert Cohn, Michael Maxim, Edmund H. Durfee, Satinder Singh:
Selecting Operator Queries Using Expected Myopic Gain. IAT 2010: 40-47 - [c93]Jonathan Sorg, Satinder Singh, Richard L. Lewis:
Internal Rewards Mitigate Agent Boundedness. ICML 2010: 1007-1014 - [c92]Jonathan Sorg, Satinder Singh, Richard L. Lewis:
Reward Design via Online Gradient Ascent. NIPS 2010: 2190-2198 - [c91]Jonathan Sorg, Satinder Singh, Richard L. Lewis:
Variance-Based Rewards for Approximate Bayesian Reinforcement Learning. UAI 2010: 564-571
2000 – 2009
- 2009
- [c90]Michael R. James, Satinder Singh:
SarsaLandmark: an algorithm for learning in POMDPs with landmarks. AAMAS (1) 2009: 585-591 - [c89]Jonathan Sorg, Satinder Singh:
Transfer via soft homomorphisms. AAMAS (2) 2009: 741-748 - [c88]Quang Duong, Yevgeniy Vorobeychik, Satinder Singh, Michael P. Wellman:
Learning Graphical Game Models. IJCAI 2009: 116-121 - [c87]Erik Talvitie, Satinder Singh:
Maintaining Predictions over Time without a Model. IJCAI 2009: 1249-1254 - 2008
- [c86]Britton Wolfe, Michael R. James, Satinder Singh:
Approximate predictive state representations. AAMAS (1) 2008: 363-370 - [c85]David Wingate, Satinder Singh:
Efficiently learning linear-linear exponential family predictive representations of state. ICML 2008: 1176-1183 - [c84]Matthew R. Rudary, Satinder Singh:
Predictive Linear-Gaussian Models of Dynamical Systems with Vector-Valued Actions and Observations. ISAIM 2008 - [c83]Erik Talvitie, Britton Wolfe, Satinder Singh:
Building Incomplete but Accurate Models. ISAIM 2008 - [c82]Erik Talvitie, Satinder Singh:
Simple Local Models for Complex Dynamical Systems. NIPS 2008: 1617-1624 - [c81]Quang Duong, Michael P. Wellman, Satinder Singh:
Knowledge Combination in Graphical Multiagent Models. UAI 2008: 153-160 - 2007
- [j17]Yevgeniy Vorobeychik, Michael P. Wellman, Satinder Singh:
Learning payoff functions in infinite games. Mach. Learn. 67(1-2): 145-168 (2007) - [c80]Vishal Soni, Satinder Singh:
Abstraction in Predictive State Representations. AAAI 2007: 639-644 - [c79]Yunyao Li, Ishan Chaudhuri, Huahai Yang, Satinder Singh, H. V. Jagadish:
Enabling Domain-Awareness for a Generic Natural Language Interface. AAAI 2007: 833-838 - [c78]Vishal Soni, Satinder Singh, Michael P. Wellman:
Constraint satisfaction algorithms for graphical games. AAMAS 2007: 67 - [c77]David Wingate, Satinder Singh:
On discovery and learning of models with predictive representations of state for agents with continuous actions and observations. AAMAS 2007: 187 - [c76]Erik Talvitie, Satinder Singh:
An Experts Algorithm for Transfer Learning. IJCAI 2007: 1065-1070 - [c75]David Wingate, Vishal Soni, Britton Wolfe, Satinder Singh:
Relational Knowledge with Predictive State Representations. IJCAI 2007: 2035-2040 - [c74]David Wingate, Satinder Singh:
Exponential Family Predictive Representations of State. NIPS 2007: 1617-1624 - [c73]Yunyao Li, Ishan Chaudhuri, Huahai Yang, Satinder Singh, H. V. Jagadish:
DaNaLIX: a domain-adaptive natural language interface for querying XML. SIGMOD Conference 2007: 1165-1168 - 2006
- [j16]Charles Lee Isbell Jr., Michael J. Kearns, Satinder Singh, Christian R. Shelton, Peter Stone, David P. Kormann:
Cobot in LambdaMOO: An Adaptive Social Statistics Agent. Auton. Agents Multi Agent Syst. 13(3): 327-354 (2006) - [c72]Vishal Soni, Satinder Singh:
Using Homomorphisms to Transfer Options across Continuous Reinforcement Learning Domains. AAAI 2006: 494-499 - [c71]David Wingate, Satinder Singh:
Mixtures of Predictive Linear Gaussian Models for Nonlinear, Stochastic Dynamical Systems. AAAI 2006: 524-529 - [c70]Matthew R. Rudary, Satinder Singh:
Predictive linear-Gaussian models of controlled stochastic dynamical systems. ICML 2006: 777-784 - [c69]David Wingate, Satinder Singh:
Kernel Predictive Linear Gaussian models for nonlinear stochastic dynamical systems. ICML 2006: 1017-1024 - [c68]Britton Wolfe, Satinder Singh:
Predictive state representations with options. ICML 2006: 1025-1032 - [c67]Ruggiero Cavallo, David C. Parkes, Satinder Singh:
Optimal Coordinated Planning Amongst Self-Interested Agents with Private State. UAI 2006 - 2005
- [j15]Nicholas L. Cassimatis, Sean Luke, Simon D. Levy, Ross W. Gayler, Pentti Kanerva, Chris Eliasmith, Timothy W. Bickmore, Alan C. Schultz, Randall Davis, James A. Landay, Robert C. Miller, Eric Saund, Thomas F. Stahovich, Michael L. Littman, Satinder Singh, Shlomo Argamon, Shlomo Dubnov:
Reports on the 2004 AAAI Fall Symposia. AI Mag. 26(1): 98-102 (2005) - [j14]Michael P. Wellman, Joshua Estelle, Satinder Singh, Yevgeniy Vorobeychik, Christopher Kiekintveld, Vishal Soni:
Strategic Interactions in a Supply Chain Game. Comput. Intell. 21(1): 1-26 (2005) - [c66]Michael R. James, Satinder Singh:
Planning in Models that Combine Memory with Predictive Representations of State. AAAI 2005: 987-992 - [c65]Britton Wolfe, Michael R. James, Satinder Singh:
Learning predictive state representations in dynamical systems without reset. ICML 2005: 980-987 - [c64]Michael R. James, Britton Wolfe, Satinder Singh:
Combining Memory and Landmarks with Predictive State Representations. IJCAI 2005: 734-739 - [c63]Yevgeniy Vorobeychik, Michael P. Wellman, Satinder Singh:
Learning Payoff Functions in Infinite Games. IJCAI 2005: 977-982 - [c62]Doina Precup, Richard S. Sutton, Cosmin Paduraru, Anna Koop, Satinder Singh:
Off-policy Learning with Options and Recognizers. NIPS 2005: 1097-1104 - [c61]Matthew R. Rudary, Satinder Singh, David Wingate:
Predictive Linear-Gaussian Models of Stochastic Dynamical Systems. UAI 2005: 501-508 - 2004
- [j13]Christopher Kiekintveld, Michael P. Wellman, Satinder Singh, Vishal Soni:
Value-driven procurement in the TAC supply chain game. SIGecom Exch. 4(3): 9-18 (2004) - [c60]Yevgeniy Vorobeychik, Michael P. Wellman, Satinder Singh:
Learning Payoff Functions in Infinite Games. AAAI Technical Report (2) 2004: 60-65 - [c59]Christopher Kiekintveld, Michael P. Wellman, Satinder Singh, Joshua Estelle, Yevgeniy Vorobeychik, Vishal Soni, Matthew R. Rudary:
Distributed Feedback Control for Decision Making on Supply Chains. ICAPS 2004: 384-392 - [c58]Joshua Estelle, Yevgeniy Vorobeychik, Michael P. Wellman, Satinder Singh, Christopher Kiekintveld, Vishal Soni:
Strategic Interactions in the TAC 2003 Supply Chain Tournament. Computers and Games 2004: 316-331 - [c57]Michael R. James, Satinder Singh:
Learning and discovery of predictive state representations in dynamical systems with reset. ICML 2004 - [c56]Matthew R. Rudary, Satinder Singh, Martha E. Pollack:
Adaptive cognitive orthotics: combining reinforcement learning and constraint-based temporal reasoning. ICML 2004 - [c55]Michael R. James, Satinder Singh, Michael L. Littman:
Planning with predictive state representations. ICMLA 2004: 304-311 - [c54]David C. Parkes, Satinder Singh, Dimah Yanovsky:
Approximately Efficient Online Mechanism Design. NIPS 2004: 1049-1056 - [c53]Satinder Singh, Andrew G. Barto, Nuttapong Chentanez:
Intrinsically Motivated Reinforcement Learning. NIPS 2004: 1281-1288 - [c52]Satinder Singh, Vishal Soni, Michael P. Wellman:
Computing approximate bayes-nash equilibria in tree-games of incomplete information. EC 2004: 81-90 - [c51]Satinder Singh, Michael R. James, Matthew R. Rudary:
Predictive State Representations: A New Theory for Modeling Dynamical Systems. UAI 2004: 512-518 - 2003
- [c50]Satinder Singh, Michael L. Littman, Nicholas K. Jong, David Pardoe, Peter Stone:
Learning Predictive State Representations. ICML 2003: 712-719 - [c49]David C. Parkes, Satinder Singh:
An MDP-Based Approach to Online Mechanism Design. NIPS 2003: 791-798 - [c48]Matthew R. Rudary, Satinder Singh:
A Nonlinear Predictive State Representation. NIPS 2003: 855-862 - 2002
- [j12]Satinder Singh, Diane J. Litman, Michael J. Kearns, Marilyn A. Walker:
Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System. J. Artif. Intell. Res. 16: 105-133 (2002) - [j11]Satinder Singh:
Introduction. Mach. Learn. 49(2-3): 107-109 (2002) - [j10]Michael J. Kearns, Satinder Singh:
Near-Optimal Reinforcement Learning in Polynomial Time. Mach. Learn. 49(2-3): 209-232 (2002) - [c47]Michael J. Kearns, Charles Lee Isbell Jr., Satinder Singh, Diane J. Litman, Jessica Howe:
CobotDS: A Spoken Dialogue System for Chat. AAAI/IAAI 2002: 425-430 - 2001
- [j9]Peter Stone, Michael L. Littman, Satinder Singh, Michael J. Kearns:
ATTac-2000: An Adaptive Autonomous Bidding Agent. J. Artif. Intell. Res. 15: 189-206 (2001) - [c46]Peter Stone, Michael L. Littman, Satinder Singh, Michael J. Kearns:
ATTac-2000: an adaptive autonomous bidding agent. Agents 2001: 238-245 - [c45]Charles Lee Isbell Jr., Christian R. Shelton, Michael J. Kearns, Satinder Singh, Peter Stone:
A social reinforcement learning agent. Agents 2001: 377-384 - [c44]Michael L. Littman, Michael J. Kearns, Satinder Singh:
An Efficient, Exact Algorithm for Solving Tree-Structured Graphical Games. NIPS 2001: 817-823 - [c43]Charles Lee Isbell Jr., Christian R. Shelton, Michael J. Kearns, Satinder Singh, Peter Stone:
Cobot: A Social Reinforcement Learning Agent. NIPS 2001: 1393-1400 - [c42]Michael L. Littman, Richard S. Sutton, Satinder Singh:
Predictive Representations of State. NIPS 2001: 1555-1561 - [c41]Michael J. Kearns, Michael L. Littman, Satinder Singh:
Graphical Models for Game Theory. UAI 2001: 253-260 - [c40]János A. Csirik, Michael L. Littman, Satinder Singh, Peter Stone:
FAucS : An FCC Spectrum Auction Simulator for Autonomous Bidding Agents. WELCOM 2001: 139-151 - 2000
- [j8]Satinder Singh, Tommi S. Jaakkola, Michael L. Littman, Csaba Szepesvári:
Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms. Mach. Learn. 38(3): 287-308 (2000) - [c39]Charles Lee Isbell Jr., Michael J. Kearns, David P. Kormann, Satinder Singh, Peter Stone:
Cobot in LambdaMOO: A Social Statistics Agent. AAAI/IAAI 2000: 36-41 - [c38]Satinder Singh, Michael J. Kearns, Diane J. Litman, Marilyn A. Walker:
Empirical Evaluation of a Reinforcement Learning Spoken Dialogue System. AAAI/IAAI 2000: 645-651 - [c37]Diane J. Litman, Michael S. Kearns, Satinder Singh, Marilyn A. Walker:
Automatic Optimization of Dialogue Management. COLING 2000: 502-508 - [c36]Michael J. Kearns, Satinder Singh:
Bias-Variance Error Bounds for Temporal Difference Updates. COLT 2000: 142-147 - [c35]Kary L. Myers, Michael J. Kearns, Satinder Singh, Marilyn A. Walker:
A Boosting Approach to Topic Spotting on Subdialogues. ICML 2000: 655-662 - [c34]Doina Precup, Richard S. Sutton, Satinder Singh:
Eligibility Traces for Off-Policy Policy Evaluation. ICML 2000: 759-766 - [c33]Peter Stone, Richard S. Sutton, Satinder Singh:
Reinforcement Learning for 3 vs. 2 Keepaway. RoboCup 2000: 249-258 - [c32]Michael J. Kearns, Yishay Mansour, Satinder Singh:
Fast Planning in Stochastic Games. UAI 2000: 309-316 - [c31]Satinder Singh, Michael J. Kearns, Yishay Mansour:
Nash Convergence of Gradient Dynamics in General-Sum Games. UAI 2000: 541-548
1990 – 1999
- 1999
- [j7]Richard S. Sutton, Doina Precup, Satinder Singh:
Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning. Artif. Intell. 112(1-2): 181-211 (1999) - [c30]Satinder Singh, Michael J. Kearns, Diane J. Litman, Marilyn A. Walker:
Reinforcement Learning for Spoken Dialogue Systems. NIPS 1999: 956-962 - [c29]Richard S. Sutton, David A. McAllester, Satinder Singh, Yishay Mansour:
Policy Gradient Methods for Reinforcement Learning with Function Approximation. NIPS 1999: 1057-1063 - [c28]Yishay Mansour, Satinder Singh:
On the Complexity of Policy Iteration. UAI 1999: 401-408 - [c27]David A. McAllester, Satinder Singh:
Approximate Planning for Factored POMDPs using Belief State Simplification. UAI 1999: 409-416 - 1998
- [j6]Satinder Singh, Peter Dayan:
Analytical Mean Squared Error Curves for Temporal Difference Learning. Mach. Learn. 32(1): 5-40 (1998) - [c26]Doina Precup, Richard S. Sutton, Satinder Singh:
Theoretical Results on Reinforcement Learning with Temporally Abstract Options. ECML 1998: 382-393 - [c25]Michael J. Kearns, Satinder Singh:
Near-Optimal Reinforcement Learning in Polynominal Time. ICML 1998: 260-268 - [c24]John Loch, Satinder Singh:
Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes. ICML 1998: 323-331 - [c23]Richard S. Sutton, Doina Precup, Satinder Singh:
Intra-Option Learning about Temporally Abstract Actions. ICML 1998: 556-564 - [c22]Timothy X. Brown, Hui Tong, Satinder Singh:
Optimizing Admission Control while Ensuring Quality of Service in Multimedia Networks via Reinforcement Learning. NIPS 1998: 982-988 - [c21]Michael J. Kearns, Satinder Singh:
Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms. NIPS 1998: 996-1002 - [c20]Richard S. Sutton, Satinder Singh, Doina Precup, Balaraman Ravindran:
Improved Switching among Temporally Abstract Actions. NIPS 1998: 1066-1072 - [c19]John K. Williams, Satinder Singh:
Experimental Results on Learning Stochastic Memoryless Policies for Partially Observable Markov Decision Processes. NIPS 1998: 1073-1080 - 1997
- [c18]Satinder Singh, David Cohn:
How to Dynamically Merge Markov Decision Processes. NIPS 1997: 1057-1063 - 1996
- [j5]Satinder P. Singh, Richard S. Sutton:
Reinforcement Learning with Replacing Eligibility Traces. Mach. Learn. 22(1-3): 123-158 (1996) - [c17]Lawrence K. Saul, Satinder P. Singh:
Learning Curve Bounds for a Markov Decision Process with Undiscounted Rewards. COLT 1996: 147-156 - [c16]David A. Cohn, Satinder Singh:
Predicting Lifetimes in Dynamically Allocated Memory. NIPS 1996: 939-945 - [c15]Satinder Singh, Dimitri P. Bertsekas:
Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems. NIPS 1996: 974-980 - [c14]Satinder Singh, Peter Dayan:
Analytical Mean Squared Error Curves in Temporal Difference Learning. NIPS 1996: 1054-1060 - 1995
- [j4]Andrew G. Barto, Steven J. Bradtke, Satinder P. Singh:
Learning to Act Using Real-Time Dynamic Programming. Artif. Intell. 72(1-2): 81-138 (1995) - [c13]Lawrence K. Saul, Satinder P. Singh:
Markov Decision Processes in Large State Spaces. COLT 1995: 281-288 - [c12]Peter Dayan, Satinder Singh:
Improving Policies without Measuring Merits. NIPS 1995: 1059-1065 - 1994
- [j3]Satinder P. Singh, Richard C. Yee:
An Upper Bound on the Loss from Approximate Optimal-Value Functions. Mach. Learn. 16(3): 227-233 (1994) - [j2]Tommi S. Jaakkola, Michael I. Jordan, Satinder P. Singh:
On the Convergence of Stochastic Iterative Dynamic Programming Algorithms. Neural Comput. 6(6): 1185-1201 (1994) - [c11]Satinder P. Singh:
Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes. AAAI 1994: 700-705 - [c10]Satinder P. Singh, Tommi S. Jaakkola, Michael I. Jordan:
Learning Without State-Estimation in Partially Observable Markovian Decision Processes. ICML 1994: 284-292 - [c9]Tommi S. Jaakkola, Satinder Singh, Michael I. Jordan:
Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems. NIPS 1994: 345-352 - [c8]Satinder Singh, Tommi S. Jaakkola, Michael I. Jordan:
Reinforcement Learning with Soft State Aggregation. NIPS 1994: 361-368 - 1993
- [c7]Satinder Singh, Andrew G. Barto, Roderic A. Grupen, Christopher I. Connolly:
Robust Reinforcement Learning in Motion Planning. NIPS 1993: 655-662 - [c6]Tommi S. Jaakkola, Michael I. Jordan, Satinder Singh:
Convergence of Stochastic Iterative Dynamic Programming Algorithms. NIPS 1993: 703-710 - 1992
- [j1]Satinder Pal Singh:
Transfer of Learning by Composing Solutions of Elemental Sequential Tasks. Mach. Learn. 8: 323-339 (1992) - [c5]Satinder P. Singh:
Reinforcement Learning with a Hierarchy of Abstract Models. AAAI 1992: 202-207 - [c4]Satinder P. Singh:
Scaling Reinforcement Learning Algorithms by Learning Variable Temporal Resolution Models. ML 1992: 406-415 - 1991
- [c3]Satinder P. Singh:
Transfer of Learning Across Compositions of Sequentail Tasks. ML 1991: 348-352 - [c2]Satinder Singh:
The Efficient Learning of Multiple Task Sequences. NIPS 1991: 251-258 - [c1]N. E. Berthier, Satinder P. Singh, Andrew G. Barto, James C. Houk:
A Cortico-Cerebellar Model that Learns to Generate Distributed Motor Commands to Control a Kinematic Arm. NIPS 1991: 611-618
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-10-30 21:32 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint