Each quarter the marketing manager of a retail store divides customers into two classes based on their purchase behavior in the previous quarter. Denote the classes as L for low and H for high. The manager wishes to determine to which classes of customers he should send quarterly catalogs. The cost of sending a catalog is $15 per customer and the expected purchase depends on the customer’s class and the manager’s action. If a customer is in class L and receives a catalog, then the expected purchase in the current quarter is $20, and if a class L customer does not receive a catalog his expected purchase is $10. If a customer is in class H and receives a catalog, then his expected purchase is $50, and if a class H customer does not receive a catalog his expected purchase is $25. The decision whether or not to send a catalog to a customer also affects the customer's classification in the subsequent quarter. If a customer is class L at the start of the present quarter, then the probability he is in class L at the subsequent quarter is 0.3 if he receives a catalog and 0.5 if he does not. If a customer is class H in the current period, then the probability that he remains in class H in the subsequent period is 0.8 if he receives a catalog and 0.4 if he does not. Assume a discount rate of 0.9 and an objective of maximizing expected total discounted reward.
a) Formulate this as an infinite-horizon discounted Markov decision problem.
b) For ???? = 0.1, find a near-optimal policy using value iteration.
c) Find an optimal policy using policy iteration starting with the stationary policy which has greatest one-step reward.
d) Formulate the problem as a linear program, giving its primal and dual. Solve both and interpret the solutions.