Skip to main content
Springer Nature Link
Log in
Menu
Find a journal Publish with us Track your research
Search
Cart
  1. Home
  2. Machine Learning
  3. Article

Learning to perceive and act by trial and error

  • Published: July 1991
  • Volume 7, pages 45–83, (1991)
  • Cite this article
Download PDF
Machine Learning Aims and scope Submit manuscript
Learning to perceive and act by trial and error
Download PDF
  • Steven D. Whitehead1 &
  • Dana H. Ballard2 
  • 4245 Accesses

  • 192 Citations

  • 3 Altmetric

  • Explore all metrics

Abstract

This article considers adaptive control architectures that integrate active sensory-motor systems with decision systems based on reinforcement learning. One unavoidable consequence of active perception is that the agent's internal representation often confounds external world states. We call this phoenomenon Perceptual aliasing and show that it destabilizes existing reinforcement learning algorithms with respect to the optimal decision policy. We then describe a new decision system that overcomes these difficulties for a restricted class of decision problems. The system incorporates a perceptual subcycle within the overall decision cycle and uses a modified learning algorithm to suppress the effects of perceptual aliasing. The result is a control architecture that learns not only how to solve a task but also where to focus its visual attention in order to collect necessary sensory information.

Article PDF

Download to read the full article text

Similar content being viewed by others

AGIL: Learning Attention from Human for Visuomotor Tasks

Chapter © 2018

Advanced Trainings in the System of Teachers’ Professional Development: The Empirical Study

Chapter © 2023

Perception and control: individual difference in the sense of agency is associated with learnability in sensorimotor adaptation

Article Open access 15 October 2021

Explore related subjects

Discover the latest articles, books and news in related subjects, suggested using machine learning.
  • Learning Theory
  • Neural encoding
  • Perception
  • Sensory Processing
  • Sensorimotor Processing
  • Stochastic Learning and Adaptive Control
Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

  • Agre, P. E. (1988). The dynamic structure of everyday life. PhD thesis, MIT Artificial Intelligence Laboratory, Cambridge, MA.

  • Agre, P. E., & Chapman, D. (1987). Pengi: An implementation of a theory of activity. Proceedings of the Sixth National Conference on Artificial Intelligence (pp. 268–272). Los Altos, CA: Morgan Kaufmann.

    Google Scholar 

  • Albus, J. S. (1975). A new approach to manipulator control: Cerebellar model articulation controller (CMAC). Transactions of the ASME: Journal of Dynamic Systems, Measurement and Control, 10, 25–61.

    Google Scholar 

  • Albus, J. S. (1981). Brains, behavior, and robotics. Peterborough, NH: BYTE Books.

    Google Scholar 

  • Anderson, C. W. (1986). Learning and problem solving with multilayer connectionist systems. PhD thesis, University of Massachusetts, Amherst, MA.

  • Anderson, C. W. (1989). Towers of hanoi with connectionist networks: Learning new features. Proceedings of the Sixth International Conference on Machine Learning (pp. 345–350). San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • Ballard, D. H. (1989). Reference frames for animate vision. Proceedings of the Eleventh International Joint conference on Artificial Intelligence (pp. 1635–1641). Los Altos, CA: Morgan Kaufmann.

    Google Scholar 

  • Barto, A. B., & Sutton, R. S. (1981). Landmark learning: An illustration of associative search. Biological Cybernetics, 42, 1–8.

    Google Scholar 

  • Barto, A. B., Sutton, R. S., & Watkins, C. (1990a). Sequential decision problems and neural networks. In D. S. Touretzky (Ed.), Advances in neural information processing systems 2. San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • Barto, A. B., Sutton, R. S., & Watkins, C. J. C. (1990b). Learning and sequential decision making. In M. Gabrial & J. W. Moore (Eds.), Learning and computational neuroscience. Cambridge, MA: MIT Press. (Also COINS Tech Report 89–95, Dept. of Computer and Information Sciences, University of Massachusetts, Amherst, MA 01003).

    Google Scholar 

  • Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuron-like elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, SMC-13, 834–846.

    Google Scholar 

  • Bellman, R. E. (1957). Dynamic programming. Princeton, NJ: Princeton University Press.

    Google Scholar 

  • Bertsekas, D. P. (1987) Dynamic programming: Deterministic and stochastic models. Englewood Cliffs, NJ: Prentice-Hall.

    Google Scholar 

  • Blum, L., & Blum, M. (1975). Toward a mathematical theory of inductive inference. Information and Control, 28, 125–155.

    Google Scholar 

  • Blythe, J., & Mitchell, T. M. (1989). On becoming reactive. Proceedings of the Sixth International Conference on Machine Learning (pp. 255–259). San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • Booker, L. B. (1982). Intelligent behavior as an adaptation to the task environment. PhD thesis, University of Michigan.

  • Brooks, R. A. (1986). A robust layered control system for a mobile robot. IEEE Journal of Robotics and Automation, 2, 14–22.

    Google Scholar 

  • Chapman, D. (1989). Penguins can make cake. AI Magazine, 10, 45–50.

    Google Scholar 

  • Clocksin, W. F., & Moore, A. W. (1988). Some experiments in adaptive state-space robotics. (Technical report). University of Cambridge, Computer Laboratory.

  • Drummond, M. (1989). Situated control rules. Proceedings of the Rochester Planning Workshop (pp. 18–34). (Technical Report 284). University of Rochester, Department of Computer Science.

  • Fikes, R. E., Hart, P. E., & Nilsson, N. J. (1972). Learning and executing generalized robot plans. Artificial Intelligence, 3, 251–288.

    Google Scholar 

  • Firby, R. J. (1987). An investigation into reactive planning in complex domains. Proceedings of the Sixth National Conference on Artificial Intelligence (pp. 202–206). Los Altos, CA: Morgan Kaufmann.

    Google Scholar 

  • Franklin, J. A. (1988). Refinement of robot motor skills through reinforcement learning. Proceedings of the 27th IEEE Conference on Decision and Control. Austin, TX.

  • Georgeff, M. P., & Lansky, A. L. (1987). Reactive reasoning and planning. Proceedings of the Sixth National Conference on Artificial Intelligence (pp. 677–682.). Los Altos, CA: Morgan Kaufmann.

    Google Scholar 

  • Gibson, J. J. (1979). The ecological approach to visual perception. Boston, MA: Houghton Mifflin.

    Google Scholar 

  • Ginsberg, M. L. (1989). Universal planning: An (almost) universally bad idea. AI Magazine, 10, 41–44.

    Google Scholar 

  • Girosi, F., & Poggio, T. (1989). Networks and the best approximation property (AI Memo No. 1164). Massachusetts Institute of Technology, Artificial Intelligence Laboratory.

  • Gordon, D. G., & Grefenstette, J. J. (1990). Explanations of empirically derived reactive plans. Proceedings of the Seventh International Conference on Machine Learning (pp. 198–203). San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • Grefenstette, J. J., Ramsey, C., & Schultz, A. (1990). Learning sequential decision rules using simulation and competition. Machine Learning, 5, 355–382.

    Google Scholar 

  • Grefenstette, J. J. (1988). Credit assignment in rule discovery systems based on genetic algorithms. Machine Learning, 3, 225–245.

    Google Scholar 

  • Grefenstette, J. J. (1989). Incremental learning of control strategies with genetic algorithms. Proceedings of the Sixth International Workshop on Machine Learning (pp. 340–344). San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • Holland, J. H. (1975). Adaptation in natural and artificial systems. Ann Arbor, MI: University of Michigan Press.

    Google Scholar 

  • Holland, J. H. (1986). Escaping brittleness: the possibilities of general-purpose learning algorithms applied to parallel rule-based systems. In R. S. Michalski, J. G. Carbonell, & T. M. Mitchell (Eds.), Machine learning: An artificial intelligence approach (Volume II). San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • Holland, J. H., Holyoak, K. F., Nisbett, R. E., & Thagard, P. R. (1986). Induction: processes of inference, learning, and discovery. Cambridge, MA: MIT Press.

    Google Scholar 

  • Hormel, M. (1989). A self-organizing associative memory system for control applications. In D. S. Touretzky (Ed.), Advances in neural information processing systems 1. San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • Kaelbling, L. P. (1989). A formal framework for learning in embedded systems. Proceedings of the Sixth International Workshop on Machine Learning (pp. 350–353). San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • Laird, J. E., Rosenbloom, P. S., & Newell, A. (1986). Chunking in soar: The anatomy of a general learning mechanism. Machine learning, 1, 11–46

    Google Scholar 

  • Mahadevan, S., & Connell, J. (1990). Automatic programming of behavior-based robots using reinforcement learning (Research Report RC 16359). IBM T. J. Watson Research Center.

  • Miller, W. T., Sutton, R. S., & Werbos, P. J. (1990). Neural networks for control. Cambridge, MA: MIT Press.

    Google Scholar 

  • Nilsson, N. J. (1989). Action networks. Proceedings of the Rochester Planning Workshop (Technical Report 284) (pp. 36–68). University of Rochester, Department of Computer Science.

  • Ramsey, C., Schultz, A., & Grefenstette, J. (1990). Simulation-assisted learning by competition: Effects of noise differences between training model and target environment. Proceedings of the Seventh International Conference on Machine Learning (pp. 211–215). San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • Ross, S. (1983). Introduction to stochastic dynamic programming. New York, NY: Academic Press.

    Google Scholar 

  • Schoppers, M. J. (1987). Universal plans for reactive robots in unpredictable domains. Proceedings of Ninth International Joint Conference on Artificial Intelligence (pp. 1039–1046). Los Altos, CA: Morgan Kaufmann.

    Google Scholar 

  • Schoppers, M. J. (1989). Representation and automatic synthesis of reaction plans. PhD thesis, Dept. of Computer Science, University of Illinois at Urbana-Champaign.

  • Sutton, R. S. (1984). Temporal credit assignment in reinforcement learning. PhD thesis, University of Massachusetts at Amherst.

  • Sutton, R. S. (1988). Learning to predict by the method of temporal differences. Machine Learning, 3, 9–44.

    Google Scholar 

  • Sutton, R. S. (1990a). First results with DYNA, an integrated architecture for learning, planning, and reacting. Proceedings of the AAAI Spring Symposium on Planning in Uncertain, Unpredictable, or Changing Environments.

  • Sutton, R. S. (1990b). Integrating architectures for learning, planning and reacting based on approximating dynamic programming. Proceedings of the Seventh International Conference on Machine Learning (pp. 216–224). San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • Ullman, S. (1984). Visual routines. Cognition, 18, 97–159. (Also in: Visual cognition, S. Pinker (Ed.), 1985).

  • Watkins, C. (1989). Learning from delayed rewards. PhD thesis, Cambridge University.

  • Whitehead, S. D. (1989). Scaling in reinforcement learning (Technical Report TR 304). University of Rochester, Department of Computer Science.

  • Whitehead, S. D., & Ballard, D. H. (1989a). Reactive behavior, learning, and anticipation. Proceedings of the NASA Conference on Space Telerobotics (pp. 333–344). Pasadena, CA: Jet Propulsions Laboratory.

    Google Scholar 

  • Whitehead, S. D., & Ballard, D. H. (1989b). A role for anticipation in reactive systems that learn. Proceedings of the Sixth International Workshop on Machine Learning (pp. 354–357). San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • Whitehead, S. D., & Ballard, D. H. (1991). A study of cooperative mechanisms for faster reinforcement learning (Technical Report TR 365). Rochester, NY: University of Rochester, Department of Computer Science.

    Google Scholar 

  • Williams, R. J. (1987). Reinforcement-learning connectionist systems (Technical Report NU-CCS-87–3). Boston, MA: Northeastern University, College of Computer Science.

    Google Scholar 

  • Wilson, S. W. (1987). Classifier systems and the animate problem. Machine Learning, 2, 199–228.

    Google Scholar 

  • Yee, R. C., Saxena, S., Utgoff, P. E., & Barto, A. G. (1990). Explaining temporal-differences to create useful concepts for evaluating states. Proceedings of Ninth National Conference on Artificial Intelligence (pp. 882–888). Cambridge, MA: MIT Press.

    Google Scholar 

Download references

Author information

Authors and Affiliations

  1. Department of Computer Science, University of Rochester, 14627, Rochester, New York

    Steven D. Whitehead

  2. Department of Computer Science, University of Rochester, 14627, Rochester, New York

    Dana H. Ballard

Authors
  1. Steven D. Whitehead
    View author publications

    Search author on:PubMed Google Scholar

  2. Dana H. Ballard
    View author publications

    Search author on:PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Whitehead, S.D., Ballard, D.H. Learning to perceive and act by trial and error. Mach Learn 7, 45–83 (1991). https://doi.org/10.1007/BF00058926

Download citation

  • Issue date: July 1991

  • DOI: https://doi.org/10.1007/BF00058926

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • Reinforcement learning
  • deictic representations
  • sensory-motor integration
  • hidden state
  • non-Markov decision problems
Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Advertisement

Search

Navigation

  • Find a journal
  • Publish with us
  • Track your research

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Journal finder
  • Publish your research
  • Language editing
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our brands

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Discover
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support
  • Legal notice
  • Cancel contracts here

3.129.7.1

Not affiliated

Springer Nature

© 2025 Springer Nature