Final answer:
In a multi-armed bandit model, when switching to exploitation, the options can be narrowed down by selecting the option with the highest reward or the option with the highest probability to maximize potential return and minimize risk.
Step-by-step explanation:
In a multi-armed bandit model, when switching to exploitation, the options can be narrowed down in different ways depending on the strategy used. One common strategy is to select the option with the highest reward. This means choosing the option that has historically provided the most positive outcome. Another approach is to select the option with the highest probability. This means picking the option that has the highest likelihood of yielding a positive result. Both of these strategies aim to maximize the potential return and minimize the risk of selecting a suboptimal option.