129k views
5 votes
Consider a one-player version of the game twenty-one as a Markov decision process. The objective is to draw cards one at a time from an infinite deck of playing cards and acquire a card sum as large as possible without going over 21. For now we will have ten integer states in {12,…,21} representing the card sum (sums smaller than 12 are trivially played). At each turn we can take one of two actions from state s. Stopping yields a reward equal to s and immediately ends the game. Hitting yields zero reward, and we will either transition to a state s′ with probability 1/13 where s < s′ ≤ 21, or immediately end the game ("bust") with probability s−8/13.

​1. Write down the Bellman optimality equations for the value V(s) of each state. While you can do so by writing 10 separate equations, each equation form is similar enough that you can write one compact expression for V(s) that applies to all of them (use summation notation).
2. We perform value iteration starting with Vo(s)=0 for all states. What are the values V1(s) in the next iteration? Have we found the optimal values V* ?
3. Now consider taking a policy iteration approach. Suppose that we have an initial policy π1, which is to hit in every state. What are the associated values Vπ1 (s) ? You should be able to reason this out without having to solve a linear system or write an iterative program.
4. Perform the next step of policy iteration to find π 2. Have we found the optimal policy π* ?

1 Answer

2 votes

Final answer:

The expected value of a game determines the average gain or loss if played multiple times, factoring in probabilities and payouts of all possible outcomes. It is crucial for understanding the long-term profitability of the game for the player. A negative expected value suggests average losses over time.

Step-by-step explanation:

Expected Value of a Card and Coin Game

To calculate the expected value for the card and coin game, one must determine the probabilities and outcomes associated with each event. The game consists of drawing a card and tossing a coin. There are two events with winnings associated with face cards, and a loss associated with drawing any other card.

  • If the card is a face card (12 out of 52 cards) and the coin lands on heads (probability of 0.5), the win is $6.
  • If the card is a face card and the coin lands on tails (also a probability of 0.5), the win is $2.
  • If the card is not a face card (40 out of 52 cards), the player loses $2, regardless of the coin flip.

Using these outcomes and their probabilities, one can calculate the expected value as follows:

Expected Value = (12/52) * (0.5 * $6 + 0.5 * $2) + (40/52) * (-$2)

The expected value represents the average gain or loss per game if the game is played many times. This value indicates whether the game is fair or biased towards the house or the player.

As for the long-term profit of playing a game with fixed outcomes and probabilities, such as the guess-a-suit card game or the number matching game, one can calculate the expected profit by considering all possible outcomes and their associated probabilities. If the expected profit is negative, it suggests that the player will, on average, lose money over time; a positive expected value indicates a potential profit.

The calculation of expected values in various games of chance demonstrates how probabilities and payouts affect the long-term outcomes for participants. Decisions regarding whether to engage in these games should be based on a sound understanding of expected value calculations.

User Parn
by
7.8k points