28.1k views
1 vote
Consider an inventory control problem. We make the following assumptions:

a Decision to order is made at the beginning of each month
b Delivery occurs instantaneously
c Demand arrives throughout the month
d No backlogging is allowed
e No partial filling is allowed. That is, if the quantity demanded is more than the current inventory in the warehouse, the order cannot be filled.
f The warehouse has capacity of M units.
g Assume that the demand has a known time-homogeneous probability distribution
pj =P{Dt =j},j=0,1,2,...
h The ordering cost function is O(u), the hold cost is h(u), and the revenue is f(j). The terminal inventory value is g(u)
Formulate the model as an MDP (identify the five elements of the MDP).

1 Answer

7 votes

Final answer:

The inventory control problem is formulated as a Markov Decision Process by identifying states as current inventory levels, actions as order quantities, transitions as probabilities based on demand distribution, rewards as revenue minus costs, and decision epochs as the beginning of each month.

Step-by-step explanation:

To formulate the inventory control problem as a Markov Decision Process (MDP), we need to identify its five key elements: states, actions, transitions, rewards, and decision epochs. In this scenario:

  • States: The state can be defined as the current inventory level, ranging from 0 to M units at the beginning of each month.
  • Actions: These are the decisions on how many units to order at the start of the month, which also range from 0 to M.
  • Transitions: The transition probabilities are determined by the known demand distribution, pj = P{Dt = j}, for all j.
  • Rewards: The immediate reward is the revenue from the sale minus the ordering and holding costs, h(u) and O(u), and the terminal inventory value is g(u).
  • Decision Epochs: The decision epochs are at the beginning of each month, when the ordering decision is made.

The objective in this MDP is to maximize expected total rewards over a certain time horizon by deciding the optimal ordering quantity at each decision epoch, given the current state.

User Trastle
by
8.6k points