Answer:
This refers to a machine learning approach in which agents learn to use feedback from their environment to choose actions that maximize rewards.
Step-by-step explanation:
hope this helps
6.3m questions
8.3m answers