Maintenance optimization for a Markovian deteriorating system with population heterogeneity

ABSTRACT We develop a partially observable Markov decision process model to incorporate population heterogeneity when scheduling replacements for a deteriorating system. The single-component system deteriorates over a finite set of condition states according to a Markov chain. The population of spare components that is available for replacements is composed of multiple component types that cannot be distinguished by their exterior appearance but deteriorate according to different transition probability matrices. This situation may arise, for example, because of variations in the production process of components. We provide a set of conditions for which we characterize the structure of the optimal policy that minimizes the total expected discounted operating and replacement cost over an infinite horizon. In a numerical experiment, we benchmark the optimal policy against a heuristic policy that neglects population heterogeneity.


Introduction
Capital goods, such as lithography machines in semiconductor fabrication plants, baggage handling systems at airports, and medical equipment in hospitals, are essential for the primary processes of their users. Deterioration of a capital good can lower the efficiency of its operations or diminish the quality of the goods or services it delivers, resulting in increased operating costs. Therefore, it can be advantageous to replace deteriorated components of a capital good with non-deteriorated spare components. Often, a Condition-Based Maintenance (CBM) policy is adopted, and information on the deterioration level of components is collected (e.g., via remote monitoring or manual inspections) to support replacement decisions.
A common assumption in CBM models is that after each replacement the newly installed component is subject to the same stochastic deterioration process; i.e., the components form a homogeneous population. However, there exist several causes by which the population of components can be heterogeneous. For example, variability in the production process of components might affect their deterioration behavior. Then if the components are not distinguishable by appearance, the installed components will be randomly selected from a heterogeneous population (cf. Cha and Finkelstein, 2012;Xiang et al., 2014). Faults in the installation of components could be another cause of heterogeneity. In degradation modeling, therefore, population heterogeneity (also often referred to as unit-to-unit variability) is increasingly taken into account. Typically, this is done by including random parameters in the degradation model that differ over CONTACT Chiel van Oosterom vanoosterom@ese.eur.nl the components in a population; e.g., random drift and diffusion parameters in a Brownian motion (Peng and Tseng, 2009;Wang, 2010;Bian and Gebraeel, 2012) or random scale parameters in a gamma process (Lawless and Crowder, 2004;Tsai et al., 2012). Heterogeneity in populations of produced components has also motivated the development of burn-in testing procedures, which aim to reduce the number of weak components released for field operation; see Ye et al. (2012) and Xiang et al. (2013) for recent work on degradation-based burnin. Despite these facts, little work has been done on CBM models for components that form a heterogeneous population. In this article, we formulate and analyze a model for scheduling replacements for a single-component, Markovian deteriorating system under population heterogeneity. In the literature on CBM models, many works are devoted to maintenance optimization for single-component systems that deteriorate according to a Markov chain over a finite set of deterioration levels (condition states). One of the earliest works is due to Derman (1963), who studies the problem of scheduling replacements to minimize the long-run average cost when a corrective replacement of the installed component after it has failed (i.e., reached the highest deterioration level) is more costly than a preventive replacement. Derman formulates the problem using a Markov Decision Process (MDP) model and provides sufficient conditions on the transition probability matrix for the optimal policy to have a control-limit structure. Numerous variations on and extensions to this classic problem have been explored in later research. For example, Kawai et al. (2002) include operating cost in the model. Bobos and Protonotarios (1978) propose a model formulation that allows for multiple maintenance activities-e.g., replacement and multiple types of repair-and Kurt and Kharoufeh (2010) introduce a limit on the number of repairs. Semi-Markovian deterioration (Kao, 1973) and age-dependent deterioration (Benyamini and Yechiali, 1999) have been considered as generalizations of the deterioration behavior. A comprehensive review of these variations and extensions is given in Çekyay and Özekici (2011). In this stream of literature, the population of components is homogeneous, in that each installed component deteriorates according to the same transition probability matrix. The majority of the works focus, like Derman (1963), on identifying intuitively meaningful conditions for the optimal policy to possess a certain structure-most commonly a control-limit structure. The knowledge that the optimal policy has a particular structure offers managerial insight and may ease implementation.
The key difference between the CBM model we present here and the aforementioned models is that we consider population heterogeneity by assuming that the population of spare components consists of multiple component types that cannot be distinguished by their exterior appearance but deteriorate according to different transition probability matrices. The type of an installed component is not known; the only information available for making replacement decisions is the history of observed deterioration levels. We use a Partially Observable Markov Decision Process (POMDP) model to formulate the problem of scheduling replacements to minimize the total expected discounted operating and replacement cost over an infinite horizon, and we also derive a result on the structure of the optimal policy. We note that our model can also be applied to scenarios where heterogeneity among the installed components arises at the time of installation; in that case, the different component types should be reinterpreted as different categories of installation faults.
Efforts to incorporate population heterogeneity in CBM models so far have concentrated on models based on continuous-state deterioration processes. Crowder and Lawless (2007) and Zhang et al. (2014) study a fairly simple maintenance scheme to cope with population heterogeneity, in which only one inspection can be performed to observe a component's deterioration level before scheduling a preventive replacement. They apply their proposed policy to gamma process and Brownian motion degradation models with random parameters. Xiang et al. (2014) develop a joint burn-in and CBM policy for heterogeneous populations; however, in their policy, maintenance decisions for components that are placed into service after surviving burn-in are not adapted to information on component-specific parameters that may be inferred from observations of the deterioration level. More closely related to our work is research by Elwany et al. (2011) andChen et al. (2015), who study the problem of adaptively scheduling replacements for components that deteriorate according to a geometric Brownian motion and an inverse Gaussian process with random parameters, respectively. In both works, a conjugate prior distribution is assumed for the random parameters such that the posterior distribution, after having observed a sequence of deterioration levels of a component, is completely determined by (the logarithm of) the latest deterioration level and the component's age. This assumption permits a reduction of the state space of the decision process but also implies a restriction on the composition of the population. By contrast, we make no assumptions on the distribution of component types in the population, and we allow the entire history of observed deterioration levels to contain relevant information about the type of an installed component.
Our main contributions are summarized as follows. We extend the literature on a classical CBM problem by considering population heterogeneity when scheduling replacements for a single-component, Markovian deteriorating system. We formulate the resulting sequential decision process as a POMDP and obtain a result on the optimal policy structure. In developing our structural results, we introduce a new stochastic order and establish its relationship to existing stochastic orders. We complement the analytical results by conducting a numerical experiment to identify factors that make it especially important to account for population heterogeneity in replacement decisions. To perform this experiment, we adapt Hansen's policy iteration algorithm (Hansen, 1998) such that it can be used as a solution technique for our POMDP model, in which one state variable is completely observable.
The remainder of this article is organized into the following sections. In Section 2, we formulate the POMDP model for our problem of scheduling replacements under population heterogeneity. We derive structural results in Section 3 and present the numerical experiment, including the solution technique used, in Section 4. Finally, in Section 5, we conclude and provide directions for future research.

Model formulation
We consider a single-component system that operates over an infinite horizon, where time is divided into periods of unit length. The deterioration level of the system evolves as a discrete-time Markov chain on the finite set D = {0, 1, . . . , N}. Deterioration level 0 corresponds to the best condition, and deterioration level N indicates that the installed component has failed. We assume that the installed component stems from a heterogeneous population that includes components of multiple types, represented by the set T = {1, 2, . . . , M}. The type of the installed component determines the transition probability matrix of the Markov chain that describes the deterioration process on D: each component type t ∈ T is associated with a unique transition probability matrix P t with elements p t i j , i, j ∈ D. At any discrete time epoch, the installed component may either be kept operating or be replaced with a spare component in deterioration level 0 from the heterogeneous population. The spare components are assumed to be indistinguishable with respect to their type; therefore, upon replacement, the type of the newly installed component is random. The probability of it being t ∈ T equals the proportion ρ t of components of type t in the population.
Two kinds of costs are incurred: operating costs and replacement costs. The per period operating cost for a system in deterioration level i ∈ D is given by L i ≥ 0, independent of the type of the installed component. This cost depends on the deterioration level since the deterioration level may have an impact on the quality loss in the goods or services that the system produces or influence the system's energy consumption. The cost of replacing the installed component in deterioration level i ∈ D is given by C i ≥ 0. Because the salvage value of a component may vary based on the amount of deterioration, this cost again depends on the deterioration level. After a replacement, which we assume is instantaneous, the system is immediately put into operation with the newly installed component. This means that the total cost in a period when the installed component is replaced in deterioration level i ∈ D is C i + L 0 . Our aim is to find the replacement policy that minimizes the total expected discounted costs of system operation and replacement over an infinite horizon, where costs are discounted by a factor λ ∈ [0, 1).
We model this replacement problem using a POMDP model. The system state is given by (t, i), where t ∈ T is the type of the installed component and i ∈ D is the deterioration level; accordingly, S = T × D is the set of system states. We assume that, at every time epoch, a perfect observation is made of the deterioration level but the type of the installed component cannot be directly observed. Partial information on the type of the installed component can be inferred from the history of deterioration levels observed since the last replacement, because the component type determines the transition probability matrix on D. As such, the system state is said to be partially observable (as opposed to completely observable). A policy can prescribe actions based on the observed deterioration level and the partial information with respect to the type of the installed component. The set of available actions is A = {CO, RE}, where CO denotes "continue operating" and RE denotes "replace the installed component. " A general result for POMDPs (see, e.g., Monahan (1982) and references therein) is that, at every time epoch, the information available for decision making can be summarized by a probability distribution over the set of system states, called an information state, which represents a belief about the system state. Here, we can simplify the definition of information states based on the fact that one state variable, namely, the deterioration level, is completely observable-a setting sometimes referred to as mixed observability (Araya-López et al., 2010;Ong et al., 2010). This allows us to define information states as a combination of a (univariate) probability distribution over the set of component types and a single deterioration level. Thus, we define the information state space as = × D, where = π ∈ R M : M t=1 π t = 1, π t ≥ 0 for all t ∈ T is the set of probability mass functions on T . In information state (π, i) ∈ , π is the belief about the type of the installed component and i is the observed deterioration level.
As the information state is a sufficient statistic of the history, a POMDP can be expressed as an MDP on the information state space. Suppose (π, i) ∈ is the current information state. If action CO is taken, it incurs an immediate cost L i and induces a probability σ ( j; π, i) = M t=1 π t p t i j that the system is in deterioration level j at the next time epoch, for all j ∈ D. The set of all deterioration levels that can be reached from (π, i) is O(π, i) = { j ∈ D : σ ( j; π, i) > 0}. Having observed a deterioration level j ∈ O(π, i), the belief π is updated to ψ (π, i, j) using Bayes' rule. The updated probability that the type of the installed component is t is obtained as for all t ∈ T . This results in a new information state (ψ (π, i, j), j). Otherwise, if action RE is taken, it incurs an immediate cost C i and causes the process to be continued from information state (π new , 0), where π new t = ρ t for all t ∈ T . The optimal value function V : → R, where V (π, i) denotes the minimum total expected discounted cost for initial information state (π, i) ∈ , satisfies the optimality equations V (π, i) = min L i + λ j∈O(π,i) σ ( j; π, i)V (ψ (π, i, j), j), for all (π, i) ∈ (Monahan, 1982). The first (second) term in the minimization represents the total expected discounted cost achieved by taking action CO (RE) in (π, i) and following the optimal policy thereafter.

Structural results
In this section, we characterize the structure of the optimal policy under a set of sufficient conditions on the cost parameters and the transition probability matrices. The derivation is based on certain monotonicity properties of the optimal value function. In Section 3.1, we first provide preliminary results that are needed to establish these monotonicity properties. We then present our main structural results in Section 3.2. The proofs are given in the Appendix.

Preliminary results
To formulate any monotonicity results, a partial order has to be defined on the information states. Whereas the deterioration levels in D are naturally ordered, it is not clear which stochastic order should be imposed on the probability distributions in . Intuitively, this stochastic order must reflect how strong a component is believed to be. This requires that the strength of component types can be compared, such that one component type is either stronger or weaker than another. Therefore, as a condition for our structural results, we will need that the component types can be totally ordered by their transition probability matrices. Our task is to determine suitable orders on the transition probability matrices and the information states.
To begin, we provide definitions and properties of stochastic orders that will be helpful in our analysis. The first two orders are commonly used for comparing probability distributions (equivalently, the respective random variables), and their properties have been extensively studied in the literature (see, e.g., Shaked and Shanthikumar (2007)). Note that the definitions we provide assume discrete probability distributions on some totally ordered finite support X . For more general definitions, we refer to Shaked and Shanthikumar (2007).

Definition 1.
Let g and h be probability mass functions. Then g is smaller than h in the usual stochastic order, denoted by g st h, if x∈X :x≥y g x ≤ x∈X :x≥y h x for all y ∈ X .
Definition 2. Let g and h be probability mass functions. Then g is smaller than h in the likelihood ratio order, denoted by g lr h, The same concepts can be used to define a partial order on transition probability matrices. For this, note that each row of a transition probability matrix is a probability mass function for the next state given a current state; hence, stochastic orders can be employed in a row-wise comparison of such matrices. The following definitions assume transition probability matrices on some totally ordered finite state space X . We use p (x) to denote the row of a transition probability matrix P that corresponds to state x ∈ X .
Definition 3. Let P and Q be transition probability matrices. Then P is smaller than Q in the usual stochastic order, denoted for all x ∈ X . Definition 4. Let P and Q be transition probability matrices. Then P is smaller than Q in the likelihood ratio order, denoted The following proposition describes the relationship between these orders, which is that the likelihood ratio order implies the usual stochastic order (for part (i), see Theorem 1.C.1 in Shaked and Shanthikumar (2007), and part (ii) is a direct consequence of part (i)).

Proposition 1. (i) Let g and h be two probability mass functions. If g lr h, then g st h. (ii) Let P and Q be two transition probability matrices. If P lr Q, then P st Q.
In addition, we propose a third stochastic order. As will become clear later, having this stochastic order along with the usual stochastic order and the likelihood ratio order enables us to state our structural results under more general conditions in order to widen their applicability.
Definition 5. Let g and h be probability mass functions. Then g is smaller than h in the likelihood ratio order on the left and the usual stochastic order on the very right, denoted by g lrst h, if g y h x ≤ g x h y for all x, y ∈ X such that x ≤ y < u and g u ≤ h u , where u = max X . Definition 6. Let P and Q be transition probability matrices. Then P is smaller than Q in the likelihood ratio order on the left and the usual stochastic order on the very right, denoted by To gain intuition for the new stochastic order, we examine how it relates to the previous two stochastic orders, thereby complementing the relationship given in Proposition 1. In the following lemma, it is shown that the lr order implies the lrst order and the lrst order implies the st order. Lemma 1.
(i) Let g and h be two probability mass functions. If g lr h, then g lrst h. (ii) Let g and h be two probability mass functions. If g lrst h, then g st h. (iii) Let P and Q be two transition probability matrices. If P lr Q, then P lrst Q. (iv) Let P and Q be two transition probability matrices. If P lrst Q, then P st Q.
Any of the three orders on the transition probability matrices could be applied in our replacement problem to compare the strength of component types. If P s and P t , s, t ∈ T , can be ordered as P s lr P t , P s lrst P t , or P s st P t , component type s may be regarded as stronger than component type t: all three orders imply that, for any current deterioration level, the (random) next deterioration level is smaller for component type s than for component type t in a certain stochastic sense. The following example highlights the different comparisons that can be made with these orders for an important class of transition probability matrices.
Example 1. Many engineering systems are subject to both gradual deterioration and random shocks that cause sudden failures (Lam and Yeh, 1994). In a continuous-time setting, researchers have modeled such deterioration processes as a continuous-time Markov chain in which, from any deterioration level, transitions can be made to the next-higher deterioration level or to the failed state (Ohnishi et al., 1986;Lam and Yeh, 1994;Chiang and Yuan, 2001). In discrete time, if the deterioration level is monitored at sufficiently small time intervals and gradual deterioration is a process with stationary increments, such deterioration behavior can be captured by a transition probability matrix of the form parameterized by (α, β ). Parameters α and β represent the probability of experiencing a one-level increase in deterioration and the probability of experiencing a sudden failure, respectively. Now consider a population with four component types, T = {1, 2, 3, 4}, whose associated transition probability matrices on D are of the form (1). The parameters of P 1 , P 2 , P 3 , and P 4 are given by ( r P 1 lrst P and P 1 st P, but P 1 is incomparable to P under the lr order; r P 2 lr P, P 2 lrst P, and P 2 st P; r P 3 st P, but P 3 is incomparable to P under the other orders; r P 4 is incomparable to P.
Note that the likelihood ratio order only supports the conclusion that component type 2, not component type 1, is stronger than the alternative component type, even though component type 1 seems stronger than component type 2, as it is less susceptible to gradual deterioration. The reason is that, under the likelihood ratio order, a decrease in α needs to be accompanied by a relatively larger decrease in β to result in an improvement in strength.
Since T is finite and totally ordered with the natural order, any of the three stochastic orders could be applied to compare beliefs about the type of the installed component. However, the stochastic orders have a meaningful interpretation only if the natural order on T corresponds to a ranking of component types according to strength: with component type 1 being the strongest and component type M being the weakest, a smaller belief then implies that the component is believed to be stronger. This is why, to obtain monotonicity in the belief about the type of the installed component, we will need to impose a condition such as P 1 lr P 2 lr · · · lr P M , P 1 lrst P 2 lrst · · · lrst P M , or P 1 st P 2 st · · · st P M . From Lemma 1, we know that the condition based on the st order is the most general and the condition based on the lrst is more general than the one based on the lr order. Revisiting Example 1, we see that P 1 lr P 2 lr · · · lr P M actually excludes important settings compared with P 1 lrst P 2 lrst · · · lrst P M .
Example 1 (continued). Compare the four component types in T against each other. Given that β s = β t for s, t ∈ T , it can be shown that P s lr P t if and only if α s = α t and P s lrst P t if and only if α s ≤ α t . Hence, no two component types in T can be compared under the lr order, while there is a total ordering P 1 lrst P 2 lrst P 3 lrst P 4 under the lrst order.
In general, when the transition probability matrices are of the form (1) and the component type has no influence on the occurrence of fatal shocks, the above reasoning implies that structural results that require P 1 lr P 2 lr · · · lr P M do not apply but, possibly after relabeling the component types, structural results that require P 1 lrst P 2 lrst · · · lrst P M do.
As the following lemma shows, the usual stochastic order is sufficient to ensure that, for any current deterioration level, the (random) next deterioration level after taking action CO is stochastically smaller if the installed component is believed to be stronger. Lemma 2. Let P 1 st P 2 st · · · st P M . If π,π ∈ such that π stπ , then σ (·; π, i) st σ (·;π, i) for all i ∈ D.
Under more restrictive conditions on the transition probability matrices and the beliefs about the type of the installed component, we also have a result on how, under action CO, the current belief and the realization of the next deterioration level relate to the updated belief. It then holds, for any current deterioration level, that if the installed component is believed to be stronger and the next deterioration level is lower, the installed component is believed to be stronger after Bayesian updating. This is established in the following lemma.
Lemma 3 does not consider the case where the next deterioration level is N. In that case, the component fails and one would expect it to be replaced regardless of its component type, so the updated belief about the type of the installed component is of no importance.
The previous results saw a monotone relationship between the belief about the type of the installed component and the next information state. At this point, we also need to consider the relationship between the deterioration level and the next information state. We first define the following property.
Definition 7. Let t ∈ T . Transition probability matrix P t is truncated Toeplitz if there exists a sequence {p t l } l∈D such that We refer to the transition probability matrices that satisfy the condition in Definition 7 as truncated Toeplitz, as they may be thought of as a finite section of a larger Toeplitz (i.e., constantdiagonal) matrix where the last column is augmented such that all row sums equal 1 (Böttcher and Silbermann, 1999). The class of truncated Toeplitz matrices is rich. It encompasses the transition probability matrices of the form (1) as a special case, in and p t l = 0 for all l ∈ D such that 2 ≤ l ≤ N − 1, but it also allows for multi-level increases in deterioration. Note that upper triangularity is a necessary condition for a transition probability matrix to be truncated Toeplitz.
Under the condition that P t is truncated Toeplitz for all t ∈ T , when action CO is taken, the probability of a given increment in deterioration does not depend on the current deterioration level, as long as the next deterioration level is smaller than N. Two consequences are, for any belief about the type of the installed component, that the distribution of the next deterioration level shifts to the right (in a linear fashion) as the current deterioration level increases and that, if the component does not fail, the updated belief is completely determined by the deterioration increment. This result is captured in the following lemma.
Lemma 4. Let P t be truncated Toeplitz for all t ∈ T . If π ∈ and i, k ∈ D such that i ≤ k, then: We are now ready to state the final result of this section. We define a partial order on the information state space and establish conditions under which the expectation of a function of the (random) next information state after taking action CO is monotone in the current information state. This result is key to the inductive proof of the monotonicity properties of the optimal value function.
Lemma 5. Let P t be truncated Toeplitz for all t ∈ T and satisfy P 1 lrst P 2 lrst · · · lrst P M . Let F : → R be nondecreasing on ( , ) with F (π, N) being constant in π ∈ . Define the function G : → R by for all (π, i) ∈ . Then, G is nondecreasing on ( , ), with G(π, N) being constant in π ∈ .

Main results
The results we present in this section assume that the following list of conditions is satisfied: We briefly discuss these conditions to ascertain that they are reasonable and allow for realistic problem instances. Conditions (C1) to (C3) have also been used to derive structural results for the analogous problem with population homogeneity (Kawai et al., 2002, Section 8.1). Conditions (C1) and (C2) require a positive relationship between the deterioration level and the operating and replacement costs to capture the adverse effects of deterioration. Condition (C3) implies that an increase in the deterioration level leads to a more significant increase in operating cost than in replacement cost, which is natural for systems that are essential for the operations of their user. The last condition on the cost parameters, (C4), is to ensure that a failed component is replaced. This is done by requiring that, in deterioration level N, replacement is even myopically preferred over continued operation, which would imply a period of downtime. Although this requirement may be relaxed, it is generally satisfied in the domain of capital goods where the cost of downtime is very large. Conditions (C5) and (C6) have been discussed in Section 3.1.
The first result demonstrates that the optimal value function is monotone with respect to the partial order we defined in Definition 8: the minimum total expected discounted cost is higher when the system is in a higher deterioration level and the installed component is believed to be weaker. In addition, this result asserts that if the installed component has failed, its component type is irrelevant to the minimum total expected discounted cost.
The next result concerns the optimal policy structure. With Theorem 1 in hand, we can show that the optimal policy has a threshold-type structure.
Theorem 2. If RE is the optimal action in information state (π, i) ∈ and if (π, k) ∈ such that (π, i) (π, k), then RE is also the optimal action in information state (π, k). Furthermore, RE is the optimal action in information state (π, N) for all π ∈ .
The intuition behind Theorem 2 is that it becomes more advantageous to replace a component when it is more deteriorated and believed to be weaker. For a component that has failed, replacement is always optimal.

Numerical study
The optimal policy for the POMDP model adapts replacement decisions to the probabilistic information that the history of observed deterioration levels provides about the component type. Alternatively, a heuristic policy that ignores this information may be easier to implement and has the advantage of avoiding the computational complexity of solving a POMDP. The purpose of this section is to identify factors that suggest a large decrease in total expected discounted cost can be realized by taking population heterogeneity into account-knowledge that can be used for a given problem instance to assess a priori whether it is worthwhile to construct a POMDP model and search for the optimal policy. To this end, we compare in a numerical experiment the optimal policy with a heuristic policy that neglects population heterogeneity under different parameter settings.
The heuristic policy is specified in Section 4.1. The solution technique that we use for the POMDP model is described in Section 4.2, where it is also explained how the heuristic policy is evaluated. We illustrate the comparison between the optimal policy and the heuristic policy by means of an example in Section 4.3. We then perform the numerical experiment. Section 4.4 gives the parameter settings used, and Section 4.5 reports the outcomes.

Heuristic policy
The construction of the heuristic policy does not require formulating the POMDP model to incorporate population heterogeneity in the replacement problem. We approximate the deterioration process by making the simplifying assumption that all components deteriorate according to transition probability matrix M t=1 ρ t P t , which reduces the problem to a standard replacement problem with population homogeneity that we can formulate as an MDP. Using standard policy iteration, we compute the optimal policy for the MDP model, which prescribes for each deterioration level an action to take. We then obtain the heuristic policy for the POMDP model by applying these actions irrespective of the information on the type of the installed component. Note that constructing the heuristic policy is computationally inexpensive.

Solution technique
We wish to compare the total expected discounted cost of the heuristic policy with the minimum total expected discounted cost obtained using the optimal policy. However, in general, the problem of determining the optimal policy for infinite-horizon POMDPs is undecidable (Madani et al., 2003). Therefore, we instead compute an -optimal policy; i.e., a policy that yields a total expected discounted cost that is at most > 0 higher than the minimum total expected discounted cost. By setting to a small value, we obtain tight bounds on the minimum total expected discounted cost, which can then be used to make the desired comparison.
One of the best known methods for computing -optimal policies for POMDPs is Hansen's policy iteration algorithm (Hansen, 1998). It is not directly applicable to our problem, however, due to the fact that the information states contain a completely observable state variable. Here, we adapt Hansen's policy iteration algorithm to handle such information states so that it can be applied in our numerical experiment. The key concept, which we will see is also useful for evaluating the heuristic policy, is that of a finite-state controller.

... Finite-State Controllers
Despite the fact that information states encapsulate all information necessary for optimal decision making, it is not practical to perform policy iteration for POMDPs by searching over policies that prescribe actions based on the information state. The problem is that no general method is known for exactly evaluating policies defined as a mapping from the information state space to the action space. Hansen's policy iteration algorithm circumvents this difficulty by searching in an alternative policy space, consisting of policies that are expressed as a Finite-State Controller (FSC). An FSC uses a finite number of control states, on which all histories of actions and observations are mapped, as a basis for decision making (as opposed to the uncountably infinite number of information states). This makes policy evaluation simply a matter of solving a system of linear equations. Importantly, although the optimal policy might not have a representation as an FSC, the existence of an -optimal FSC is guaranteed (see Theorem 2 in Hansen (1998)).
Given that the deterioration level is completely observable, we formally define an FSC κ for our POMDP model as a triple ( i ) i∈D , δ, ζ , where i is a non-empty, finite set of control states, for all i ∈ D; δ : i∈D i → A is a mapping prescribing that action δ(γ ) ∈ A is taken in control state γ ∈ i , i ∈ D; and ζ : is a mapping specifying that, given a current control state γ ∈ i , i ∈ D, after observing j ∈ O δ(γ ) (i) as the next deterioration level, the successor control state is ζ (γ , j) ∈ j . Here, we use the notation The transition mechanism between the control states warrants that whenever the system is in deterioration level i ∈ D, the FSC is in a control state from i .
Graphically, FSCs can be represented using a state diagram. In this representation, nodes represent control states and are labeled by the actions prescribed by δ. Arcs represent transitions between control states and are labeled by the next deterioration level for which these transitions are made, as specified by ζ . We use this method to depict FSCs for the example in Section 4.3 (see Figs. 1 and 3).
To evaluate an FSC κ, we compute the total expected discounted cost υ κ t (γ ) of starting in control state γ if the (partially observable) initial system state is (t, i), for all γ ∈ i , (t, i) ∈ S. This is done by solving the following system of linear equations: (2) for all γ ∈ i , (t, i) ∈ S. The total expected discounted cost of starting in control state γ ∈ i given an initial information state (π, i) ∈ is then obtained as M t=1 π t υ κ t (γ ). Using the rule to start an FSC in the control state that optimizes it, we have that the total expected discounted cost of FSC κ for initial information state (π, i) ∈ is V κ (π, i) = min γ ∈ i M t=1 π t υ κ t (γ ). One example of a policy that can be expressed as an FSC is the heuristic policy. It has a representation as an FSC with one control state per deterioration level; therefore, we can obtain the total expected discounted cost of the heuristic policy for initial information state (π, i), denoted by V heu (π, i), for all (π, i) ∈ by solving a system of M(N + 1) linear equations.

... Algorithm
We first introduce some notation that is useful for the presentation of our adapted version of Hansen's policy iteration algorithm. For any FSC κ, let υ κ (γ ) denote the M × 1 vector with elements υ κ Let e denote the M × 1 vector with ones.
The outline of the algorithm is as follows: Step 1 (Initialization). Set > 0. Construct the heuristic policy; let κ be the corresponding FSC.
Step 3 (Dynamic programming update). For all i ∈ D, generate the set of vectors where each vector υ ∈ ϒ i is associated with a pair (a, (γ j ) j∈O a (i) ): a ∈ A is an action to be taken and γ j ∈ j is a control state to be selected upon observing deterioration level j, for all j ∈ O a (i). Prune all vectors υ ∈ ϒ i for which there exists no π ∈ such that M t=1 π t υ t < M t=1 π tυt for allυ ∈ ϒ i \ {υ}.
Step 4 (Policy improvement). Construct the improved FSCκ = (ˆ i ) i∈D ,δ,ζ as follows. For all υ ∈ ϒ i , i ∈ D, with their associated pairs (a, (γ j ) j∈O a (i) ): (i) If there exists a γ ∈ i such that δ(γ ) = a and ζ (γ , j) = γ j for all j ∈ O δ(γ ) (i), then include a copy of γ inκ. (ii) Else if there exists a γ ∈ i such that υ κ t (γ ) ≥ υ t for all t ∈ T , then replace γ by a new control stateγ in κ withδ(γ ) = a andζ (γ , j) = γ j for all j ∈ O a (i). If there are multiple such control states in i , merge them into one single control state. (iii) Else, add a new control stateγ toκ withδ(γ ) = a andζ (γ , j) = γ j for all j ∈ O a (i). In addition, include a copy inκ of all γ ∈ i , i ∈ D, not addressed in (i) or (ii) above for which there exists a vector υ ∈ ϒ k , k ∈ D, that is associated with a pair (a, (γ j ) j∈O a (k) ) such that γ i = γ .
Step 5 (Convergence test). Set If m < 1−λ λ , exit with FSCκ. Else, set κ toκ and go to Step 2. It can be shown that the algorithm terminates with an -optimal FSCκ; it holds that V (π, i) = Vκ (π, i) − λ 1−λ m and V (π, i) = Vκ (π, i) are lower and upper bounds on Fromκ, it is also possible to extract an -optimal policy defined as a mapping from the information state space to the action space by taking action δ(arg min γ ∈ i M t=1 π t υκ t (γ )) in information state (π, i), for all (π, i) ∈ . This policy is equivalent to FSCκ ifκ represents the optimal policy, which is certain if m = 0, but in general the two policies are different.
The algorithm presented here deviates from the original algorithm in Hansen (1998) in that multiple steps contain loops over the deterioration levels to accommodate the modified definition of FSCs. Another difference is that we initialize the algorithm with a well-founded initial FSC: the heuristic policy is computationally inexpensive to construct and evaluate and is likely to be closer to the optimal policy than an arbitrary policy.

Example
Consider a population with three component types, T = {1, 2, 3}, each accounting for an equal fraction of the population, ρ 1 = ρ 2 = ρ 3 = 1/3. There are four deterioration levels, D = {0, 1, 2, 3}, and the cost parameters are L 0 = L 1 = L 2 = 0, L 3 = 500, and C 0 = C 1 = C 2 = 100, C 3 = 200. The transition probability matrices are given by Note that conditions (C1) to (C6) are satisfied. In particular, component type 1 is the strongest and component type 3 is the weakest based on the lrst order. The discount factor is λ = 0.99. We first construct and evaluate the heuristic policy. Figures 1  and 2 depict the heuristic policy as an FSC and as a mapping from the information state space to the action space. It can be seen that the installed component is replaced if and only if the deterioration level is level 3. For initial information state (π new , 0), which corresponds to the scenario that the system begins operating with a newly installed component, the total expected discounted cost of the heuristic policy is V heu (π new , 0) = 2496.40.
We then compute an -optimal FSC using the adapted version of Hansen's policy iteration algorithm, taking = 0.05 and starting from Step 2, and derive an -optimal mapping from the information state space to the action space. Figure 3 depicts the -optimal FSC. Apart from replacing the installed component in deterioration level 3, the FSC also replaces the installed component if it is in deterioration level 2 within six periods after it has been taken into operation or a transition is made directly from deterioration level 0 to deterioration level 2. Although not apparent from Fig. 3 (because the relevant control states cannot be reached given initial information state (π new , 0)), the installed component may even be replaced in deterioration level 0 or 1. The -optimal mapping from the information state space to the action space, depicted in Fig. 4, exhibits similar characteristics: in deterioration levels 0, 1, and 2, the installed component is replaced if it is likely to be of a weak component type. For initial information state (π new , 0), the bounds on the minimum total expected discounted cost are V (π new , 0) = 2327.43 and V (π new , 0) = 2327.46, indicating that a 7.3% cost savings can be achieved by taking population heterogeneity into account.

Parameter settings
In our numerical experiment, we study settings with T = {1, 2} where component type 1 is stronger than component type 2. To generate a set of problem instances, we vary the population composition, the number of deterioration levels, the transition probability matrices, and the cost parameters.
For the fraction ρ 1 = 1 − ρ 2 , which determines the composition of the population, we consider values of 0.5 and 0.8. We consider values of 3, 5, and 10 for the number of deterioration levels, N + 1. The transition probability matrices are assumed to be of the form (1). We set (α 1 , β 1 ), the parameters of P 1 , at (0.15, 0.03) and consider values of (0.4, 0.2) and (0.7, 0.1) for (α 2 , β 2 ), the parameters of P 2 . A cost structure is assumed such that the replacement cost is given by An -optimal FSC for the example in Section .. The node with the thickest border corresponds to the control state in which the FSC is started for initial information state (π new , 0). Only control states that are reachable from this starting control state are depicted (the actual FSC is larger and consists of  control states). To reduce clutter, the transitions from the control states in which action RE is taken are omitted; they are the same as those for the starting control state.
with C > 0 and a > 1, and the operating cost is given by with b ≥ 0. Thus, the cost of preventively replacing a component does not depend on the deterioration level and is lower than the cost of a corrective replacement. The operating cost is linearly increasing for deterioration levels in {0, . . . , N − 1} and sufficiently high in deterioration level N to ensure that the optimal policy replaces the installed component if it has failed. Here, C is the cost of a preventive replacement, a is the factor by which the corrective replacement cost is a multiple of the preventive replacement cost, and b is the ratio between the operating and replacement cost at deterioration level N − 1. The parameters a, b, and C completely determine the cost structure, with C merely acting as a scale factor. We set C at 100, consider values of 2, 5, 10, and 20 for a, and consider values of 0, 0.1, and 0.5 for b. We assume a discount factor λ = 0.99. By taking all combinations of the parameter values that we consider, we create a test bed of in total 144 instances.

Results
We evaluate the decrease in total expected discounted cost by applying the optimal policy instead of the heuristic policy as a percentage savings. For all instances, we compute V heu (π new , 0) and, with = 0.05 in the adapted version of Hansen's policy iteration algorithm, V (π new , 0) and V (π new , 0). We then calculate the percentage savings by The average value of S over all instances is found to be 3.66%. Because we are most interested in parameter settings for which the optimal policy achieves a large decrease in total expected discounted cost, we report in Table 1 the 20 instances with the highest values of S. Several observations can be made. Most notably, ρ 1 = 0.5 in all 20 instances with the highest values of S. To explain why the savings are larger when ρ 1 = 0.5, note that the uncertainty in the random type of a newly installed component is higher than when ρ 1 = 0.8 (more formally, the entropy is higher). Therefore, the observed deterioration levels contain more information about the type of the component and incorporating that information is more advantageous.
Another observation is that higher values of a generally result in higher values of S. For an explanation, note that the optimal policy can be more successful in avoiding failures than the heuristic policy, which becomes more important when the cost C N = aC of corrective replacement is higher. The heuristic policy can only avoid failures due to gradual deterioration by replacing the installed component in deterioration level N − 1; the optimal policy can also reduce the occurrence of sudden failures by replacing the installed component if it is likely to be weak. Also, we observe that high values of S occur more often in settings with N + 1 = 10. This is because a higher number of deterioration levels has the effect that component lifetimes are longer, which the optimal policy can exploit to gather more information about the type of the installed component in order to improve replacement decisions. It should be mentioned, however, that there also exist settings with N + 1 = 3 for which S is high, as the instance ranked 16th in Table 1 shows. For a large savings when there are three deterioration levels, the value of a has to be such that in deterioration level 1 the decision whether to replace or to continue operating (and allow failure due to gradual deterioration) depends on the belief about the type of the installed component. Table 1 lists both instances with (α 2 , β 2 ) = (0.4, 0.2) and instances with (α 2 , β 2 ) = (0.7, 0.1). In settings with (α 2 , β 2 ) = (0.4, 0.2), in particular the difference between β 1 and β 2 is large, and the optimal policy's ability to adapt decisions to information about the type of the installed component is more valuable. In settings with (α 2 , β 2 ) = (0.7, 0.1), in particular the difference between α 1 and α 2 is large, and it is more probable that strong indications of the type of the installed component will  be obtained. Neither effect seems to dominate the other. Also, no clear relation is observed between b and S. The effect of an increase in b is ambiguous: it increases the operating costs that the optimal policy can save over the heuristic policy by replacing an installed component early if it is likely to be weak, but it also increases the operating costs that are common to both policies. In summary, the numerical experiment demonstrates that taking population heterogeneity into account can lead to a significant cost reduction. The largest savings are achieved when the uncertainty in the type of a newly installed component is high. Furthermore, the savings are generally larger when the number of deterioration levels is higher and corrective replacement is more costly.

Conclusions
We have presented a POMDP model for the problem of scheduling replacements for a single-component, Markovian deteriorating system under population heterogeneity. Under intuitively meaningful conditions on the cost parameters and the transition probability matrices associated with the different component types, we have established monotonicity properties of the optimal value function and derived the structure of the optimal policy. The new stochastic order that we introduced to develop these structural results, denoted by lrst , may have applications in other domains as well. We have also performed a numerical experiment to benchmark the optimal policy against a heuristic policy that neglects population heterogeneity. The results indicate that it is particularly important to account for population heterogeneity when there is a high uncertainty in the type of a newly installed component, the corrective replacement cost is high, and the number of deterioration levels is large.
In our model, we assumed that costless, perfect observations on the deterioration level are available at every time epoch. There are also works that study maintenance optimization problems with costly or imperfect inspections (e.g., Maillart, 2006;Kim and Makis, 2013). In future research, it will be interesting to study the implications of population heterogeneity in these contexts. Then, the partial observability of both the deterioration level and the type of the installed component may require information states to be defined as bivariate probability distributions. We suspect this will make it difficult to obtain structural results on the optimal policy. Another direction for future research is to relax the assumption that the type of the installed component is independent of the remaining spare components. This assumption is not valid, for example, if the components originate from one production batch that shares the same unknown component type. Future research may also consider the effect of strategic or tactical decisions that influence the composition of the population-e.g., ordering new spare components-on the optimal replacement policy.

Notes on contributors
Chiel van Oosterom is an assistant professor in the Econometric Institute at Erasmus University Rotterdam, while being in the final phase of his Ph.D. studies conducted at the School of Industrial Engineering, Eindhoven University of Technology. His research interests are in the area of sequential decision making under uncertainty, with a particular interest in stochastic decision processes in which the (un)availability and quality of information plays an important role. He is a member of INFORMS.
Hao Peng is an assistant professor in the Academy of Mathematics and Systems Science, Chinese Academy of Sciences. She received a Ph.D. degree in Industrial Engineering from the University of Houston, Houston, Texas, in 2010. She received her bachelor's degree in Industrial Engineering from Tsinghua University, Beijing, China (2006). Her research interests are maintenance optimization and quality and reliability engineering. Her research is supported by the President Fund of the Academy of Mathematics and Systems Science and the one-hundred plan (class C) of the Chinese Academy of Sciences. She is a member of INFORMS and IISE.
Geert-Jan van Houtum is a professor of maintenance and reliability at the School of Industrial Engineering, Eindhoven University of Technology. He is the scientific director of the BETA Research School for Operations Management and Logistics. His research is focused on spare parts management, production-inventory systems, maintenance and availability management of capital goods, and the effect of design decisions on the total cost of ownership of capital goods.
The proof of Lemma 2 relies on the following characterization of the usual stochastic order (Shaked and Shanthikumar, 2007, Section 1.A.1).

Proposition A1. Let g and h be two probability mass functions. Then g st h if and only if x∈X g x F (x) ≤ x∈X h x F (x) for every nondecreasing function F : X → R.
Proof of Lemma 2. Let π,π ∈ such that π stπ and i ∈ D. For all l ∈ D, because P 1 st P 2 st · · · st P M ascertains N j=l p t i j is nondecreasing in t, Proposition A1 can be applied to show Proof of Lemma 3. Let π,π ∈ such that π lrπ , i ∈ D, and j ∈ O(π, i), l ∈ O(π, i) such that j ≤ l < N (if such deterioration levels exist). Then, for all s, t ∈ T such that s ≤ t, ψ t (π, i, j)ψ s (π, i, l) = π t p t i j σ ( j; π, i)π s p s il σ (l;π, i) ≤ π s p s i j σ ( j; π, i)π t p t il σ (l;π, i) = ψ s (π, i, j)ψ t (π, i, l), where the inequality follows from the definitions of the lr order and the lrst order.
Proof of Lemma 4. The result directly follows from the definition of the truncated Toeplitz property.
It can be concluded that RE is the optimal action in information state (π, k) as well.
Hence, RE is the optimal action in information state (π, N) for all π ∈ .