Final answer:
The inventory control problem is formulated as a Markov Decision Process by identifying states as current inventory levels, actions as order quantities, transitions as probabilities based on demand distribution, rewards as revenue minus costs, and decision epochs as the beginning of each month.
Step-by-step explanation:
To formulate the inventory control problem as a Markov Decision Process (MDP), we need to identify its five key elements: states, actions, transitions, rewards, and decision epochs. In this scenario:
- States: The state can be defined as the current inventory level, ranging from 0 to M units at the beginning of each month.
- Actions: These are the decisions on how many units to order at the start of the month, which also range from 0 to M.
- Transitions: The transition probabilities are determined by the known demand distribution, pj = P{Dt = j}, for all j.
- Rewards: The immediate reward is the revenue from the sale minus the ordering and holding costs, h(u) and O(u), and the terminal inventory value is g(u).
- Decision Epochs: The decision epochs are at the beginning of each month, when the ordering decision is made.
The objective in this MDP is to maximize expected total rewards over a certain time horizon by deciding the optimal ordering quantity at each decision epoch, given the current state.