Optimal Production Decisions in Biopharmaceutical Fill-and- Finish Operations

Fill-and-finish is among the most commonly outsourced operations in biopharmaceutical manufacturing and involves several challenges. For example, fill-operations have a random production yield, as biopharmaceutical drugs might lose their quality or stability during these operations. In addition, biopharmaceuticals are fragile molecules that need specialized equipment with limited capacity, and the associated production quantities are often strictly regulated. The non-stationary nature of the biopharmaceutical demand and limitations in forecasts add another layer of challenge in production planning. Furthermore, most companies tend to ‘freeze’ their production decisions for a limited period of time, in which they do not react to changes in the manufacturing system. Using such freeze periods helps to improve stability in planning but comes at a price of reduced flexibility. To address these challenges, we develop a finite-horizon, discounted-cost Markov decision model, and optimize the production decisions in biopharmaceutical fill-and-finish operations. We characterize the structural properties of optimal cost and policies, and propose a new, zone-based decision-making approach for these operations. More specifically, we show that the state space can be partitioned into decision zones that provide guidelines for optimal production policies. We illustrate the use of the model with an industry case study.


Introduction
Biopharmaceuticals are complex molecules that are extracted from or produced by living systems, such as virus or bacteria. Typically, these drugs are proteins, antibodies or vaccines, and are referred to as next generation drugs produced by biomanufacturing technologies. Production and handling of biopharmaceuticals is often more challenging than chemically synthesized (traditional) drugs because of their inherent biological complexity. For example, biopharmaceuticals consist of large, fragile, and pressure sensitive molecules that need specialized equipment and handling to preserve their biological activity. Figure 1 provides a high-level overview of production steps in biomanufacturing. First, biopharmaceuticals are manufactured through a series of fermentation and purification operations. This often results in active pharmaceutical ingredients or antigens, depending on the specific application context. During manufacturing, most of the large scale industry applications use big vessels such as thousand liter bioreactors. Therefore, the resulting product is called bulk material, and is stored in large quantities for further processing. Next, the bulk material is filled into smaller vials or other forms of packaging during fill-operations. This step typically consists of filling the bulk material into vials and capping them. However, most biopharmaceuticals contain live active ingredients (e.g., antigens), and hence vials often need to be freeze dried. Freeze-drying removes all the liquid from the filled product, such that only a so called 'cookie' remains in the vial. Freeze-drying helps to maintain product stability and quality, especially during storage and shipment. The resulting product is called Finished Product Unpacked (FPU). Then, the process continues with finish-operations, where FPUs are labelled and packaged along with a leaflet of medical information. The resulting product is called Final Product Packed (FPP), as shown in Figure 1. The scope of this paper is fill-and-finish operations, which are among the most commonly outsourced operations in biomanufacturing. As such, it is estimated that up to 74% of biomanufacturers outsource their fill-and-finish operations (Langer 2016). This is mainly because biopharmaceutical fill-and-finish operations require specialized handling and expertise to preserve drug quality. In addition, managing fill-and-finish operations can be often challenging in practice due to several factors: (1) Yield uncertainty in fill-operations. Biopharmaceutical fill-operations involve yield uncertainty.
In particular, the freeze-drying process involves complex biological and chemical dynamics that might interfere with the inherent stability of these molecules. For example, the biological activity might not be preserved during freeze-drying, dirt or liquid could remain in the vial after freezedrying, vials might break during the process, etc. In contrast, finish-operations correspond to labeling and packaging of vials, in which the state-of-the-art technologies often provide reliable production yields.
(2) Capacity limitations. As part of regulations, fill-and-finish operations need to comply with predetermined restrictions on the number of vials that can be processed at a time.
(3) High holding and shortage costs. Storing and handling of biopharmaceuticals require expensive resources in terms of labor, space and environmental conditions (e.g., cold storage). Furthermore, failure in meeting the demand can have critical implications in terms of the cost of disappointing clients and its impact on future orders. (4) Random and non-stationary demand. The demand is often random and non-stationary in the biopharmaceutical industry, i.e., the probability distribution of the demand changes over time as a consequence of the complex dynamics of the biopharmaceutical market. Subsequently, this leads to limitations in forecasts. For example, most biopharmaceutical companies often consider a planning horizon of one or two years, and forecasting longer periods is often challenging due to the dynamic and non-stationary nature of the market.
A possible approach to address these challenges is to adjust the production plans frequently. However, this might cause instability in planning. To assure stability, most biomanufacturers choose to freeze their production decisions for a limited period of time (e.g., two months). This implies that the production decisions are made upfront, and do not change in response to the system dynamics during the freeze period. In practice, using freeze periods provides significant visibility in terms of planning and manufacturing but comes at a price of reduced flexibility. For example, the biomanufacturer cannot respond to yield and demand uncertainty during the freeze period. Therefore, using freeze periods leads to a complex trade-off between reduced flexibility and improved planning stability, and hence adds another layer of challenge in practice.
To address these challenges, we develop a finite-horizon, discounted-cost Markov decision model, and answer the following research questions: How can biomanufacturers manage challenges related to yield uncertainty and freeze periods in biopharmaceutical fill-and-finish operations? What are the optimal production and inventory decisions for fill-and-finish operations? Can we generate guidelines that are easy to implement in practice and yet deliver optimal policies? How can we model and analyze the impact of freeze periods on the expected total cost in this setting? What is the additional cost of freezing production schedules, and how does this cost change with the length of freeze periods? By answering these questions using an optimization framework, we believe that biomanufacturers can significantly reduce costs in fill-and-finish operations.
This work provides several contributions to theory and practice: We analyze the structural characteristics of optimal policies, and propose a zone-based decision-making approach for biopharmaceutical fill-and-finish operations. More specifically, we show that the state space can be partitioned into distinct decision zones that provide rule-of-thumbs for optimal production decisions. These decision zones do not require any restrictive conditions on system parameters, and provide managerial insights that are easy to implement in practice. To the best of our knowledge, this is one of the first studies that builds a stochastic optimization model to address challenges related to yield uncertainty, demand non-stationarity and freeze periods in biopharmaceutical fill-and-finish operations. In addition, the model provides a rigorous basis for communicating mid-term planning challenges within a company, and yields insights related to the cost of using freeze periods in production planning. We also provide an industry case study to illustrate the use of the optimization model. incorporating the manufacturing challenges into decision-making. Therefore, these is a strong need for a rigorous decision-making framework in biopharmaceutical fill-and-finish operations.
The remainder of the paper is organized as follows. Relevant literature is reviewed in Section 2.
The model is presented in Section 3, and structural results are characterized in Section 4. Freeze periods are modelled in Section 5. An industry case study is presented in Section 6, and concluding remarks are provided in Section 7.

Relevant Literature
We categorize the relevant literature into two groups: (1) inventory optimization under random yield, and (2) modelling and analysis of biopharmaceutical fill-and-finish operations.
The literature on random yield has a rich background. We focus on papers that are directly related to our research, and refer the reader to Yano and Lee (1995) for a comprehensive review on random yield models in inventory management. Relevant studies on random yield can be divided into two subcategories: single-period and multi-period models. Karlin (1958) was the first to model a single period inventory model with random yield and stochastic demand. Among many others, this was followed by Noori and Keller (1986) and Ehrhardt and Taube (1987) who derived an analytical expression for the optimal order quantity when demand was uniformly or exponentially distributed.
These results were generalized in Henig and Gerchak (1990) by showing that the optimal order quantity was decreasing in the amount of inventory and was greater or equal to the difference between the critical order point and the current inventory amount. These results have also been extended to single-period, multi-station settings with yield uncertainty (Hwang and Singh (1998), Kogan and Lou (2003), Papachristos and Katsaros (2008)). To improve computational challenges, several heuristics have been developed to optimize inventory decisions (Bollopragada and Morton (1999), Inderfurth (2004), Rekik et al. (2007), Käki et al. (2015)).
Existing literature on multi-period models with random yield often focuses on single station systems.
For example, Mazzola et al. (1987) studied a multi-period problem where demand is deterministic and yield rate follows a discrete distribution. The authors used dynamic programming to optimally solve the problem when the yield rate is assumed to follow a binomial distribution. In Henig and Gerchak (1990), a multi-period setting is also modeled as a dynamic programming problem under stochastic proportional yields. The authors showed that it is optimal to order when the inventory amout is below a certain critical order point. From this finding, Yano and Lee (1995) derived that it can be beneficial to produce further ahead in the random yield case compared to problems without random yield. Note that aforementioned studies focus on single-station systems. However, our problem context involves a multi-period, multi-station setting with random yield. Relatively less research has been carried out for such systems since their analysis is acknowledged as a complex problem (Schmitt and Snyder (2012)). According to Hwang and Singh (1998), such settings are unlikely to produce structurally simple policies. This is also underlined in Tang (1990) where the author approximated the production quantities with a simple rule. A robust optimization model has been developed by Zanjani et al. (2010a) to manage a real world multi-period, multi-station problem facing yield uncertainty. The authors investigated service levels, and compared the results Our work is closely related to studies on multi-period, multi-echelon inventory systems. When there is no capacity restrictions, it is well known that the optimal policy is an echelon order-up-to policy (Clark and Scarf 1960). However, when capacity constraints are present, then the structure of the optimal policy is not known in general. Parker and Kapuscinski (2004) and Janakiraman and Muckstadt (2009) derive the optimal policy structure for such systems but under the assumption that the capacity limits at all stages are identical. Huh and Janakiraman (2010) relax this assumption and show the monotonicity of the optimal policy. However, none of the studies above consider uncertainty in inventory supply. We contribute to this stream of research by analyzing a capacitated system with random production yield. More specifically, our structural analysis of the optimal policy in the fill-and-finish operations extends the work of Huh and Janakiraman (2010) when the most upstream station (namely, the fill station in our setting) also has a capacity re-striction and random production yield. In the context of multi-period production planning with random yield, Han et al. (2012) develop a stochastic dynamic program to optimize the finite-horizon expected profit in a two-stage make-to-stock system. Different from Han et al. (2012), we focus on a system with production capacity and provide a structural analysis of the optimal policy. optimizing viscosity and filter capacity during fill-finish. A case study on quality risk management during fill-finish is described at Miller (2009). Simulation models have also been used to assess capacity in biomanufacturing (Robinson et al. (2017)). However, existing simulation studies are not equipped to answer the research questions defined in Section 1, especially to those related to determining optimal policies.

Model Formulation
We develop a discrete-time, finite-horizon, discounted-cost Markov decision model to optimize production decisions in biopharmaceutical fill-and-finish operations. First, we describe the notation used to build the model.

Decision epochs:
We consider a discrete-time problem where the planning horizon is equally divided into T decision epochs, such that T = {t : 0, 1, ..., T − 1} represents the set of all decision epochs. Time between two decision epochs is referred as a period; i.e., the tth decision epoch represents the beginning of period t + 1. In our problem setting in practice, each period typically corresponds to one month, and the planning horizon is often one to two years.
States: Let i = 1, 2 represent the production station for fill-and finish-operations, respectively.
The state space is defined as S ≡ S 1 × S 2 where S 1 ∈ R + and S 2 ∈ R. The state s i,t ∈ S i represents the inventory level at production station i ∈ {1, 2} and decision epoch t ∈ T . We let s 1,t ≥ 0 as a system requirement to generate feasible production plans. Note that s 2,t can be negative to indicate the amount of backlog in the finish-operations at decision epoch t. Raw materials for fill-operations is assumed to be ample. This aligns with current practice where raw materials are available in large vessels at bulk quantities.

Actions:
The action space is defined as U ≡ U 1 × U 2 . The action u i,t ∈ U i represents the number of vials to be produced in production station i ∈ {1, 2} at decision epoch t ∈ T . Due to regulatory requirements, the biomanufacturer needs to comply with a predetermined batch size b i at production station i. In addition, each production station has a limited capacity during a period. Therefore, the inclusion of limited capacity and fixed batch size truncates the action space Note that the amount of work-in-progress inventory in the fill-operations constrains the production quantity at the finish-operations, i.e., s 1,t ≥ u 2,t at t ∈ T .
State Transitions: To determine the state transition probabilities, we first focus on the random and often non-stationary demand for biopharmaceutical drugs. Let the non-negative random variable D t represent the demand in period t with probability density function g t (·) and realization d t , t ∈ {1, . . . , T }. Note that g t (·) is a function of t due to demand non-stationarity. Next, we focus on modeling the yield uncertainty in fill-operations. The amount obtained at fill is a random fraction R of the planned production quantity u 1,t at t ∈ T . The random fraction R has probability distribution f (·) with realization r on the interval [0, 1]. Therefore, the future state (s 1,t+1 , s 2,t+1 ) at t + 1 ∈ T is determined by the current state (s 1,t , s 2,t ), action (u 1,t , u 2,t ), realization of the yield fraction (r), and demand realization d t at t ∈ T , i.e., (1) It follows from Equation (1) that the demand d t is observed at finish-operations (i = 2), and workin-progress inventory is pulled from fill-operations (i = 1) based on the production amount u 2,t at the finish-operations. Equation (1) indicates that the production amount u i,t is available at the beginning of decision epoch t + 1. In addition, we note that Equation (1) assumes a proportional random yield model, which was validated using two years of production data (i.e., the data provided a normalized ratio of the total amount of yield obtained from the freeze-drying process to the total amount of input that goes into the freeze-drying process). In compliance with standard industry practice, we focus on the modeling and analysis of yield uncertainty in fill-operations. Nevertheless, our model formulation is flexible to capture yield uncertainty in finish-operations, and would result in similar insights.
Costs: Costs depend on the state (s 1,t , s 2,t ) at decision epoch t ∈ T . When inventory is held, cost h i is incurred per unit of inventory at production station i. A penalty cost of shortage, p 2 , has to be paid for each unit short in finish-operations. This leads to incurring the cost The Value Function: At the end of the planning horizon t = T , the terminal value is and the value function V t (s 1,t , s 2,t ) for all t ∈ {0, 1, ..., T − 1} is where the expected value E V t+1 S 1,t+1 , S 2,t+1 )|s 1,t , s 2,t , u 1,t , u 2,t is given by The parameter γ indicates the discount factor such that 0 ≤ γ ≤ 1. In Equations (3) and (4), the objective is to minimize the total discounted cost by optimizing the production decisions (u 1,t , u 2,t ) ∈ U. Then, V t (s 1,t , s 2,t ) expresses the minimum total discounted cost over the periods t, t+1, . . . , T given that the state at time t is (s 1,t , s 2,t ). Note that the expectation in Equation (5) is taken with respect to the probability distribution of the uncertain yield f (·) and the time-dependent demand g t (·). We let u * i,t (s 1,t , s 2,t ) denote an optimal production amount at production station i when the system state is (s 1,t , s 2,t ) at time t ∈ T , and π * t denote an optimal policy (i.e., function that maps a system state into an optimal action) at time t until the end of planning horizon T .
Hereinafter, we call the model presented in this section the base model.
The base model is developed as a finite-horizon model for multiple reasons. First, in practice, biopharmaceutical companies often optimize their operations by considering foreseeable future, and set their business strategies for a fixed planning horizon accordingly. For example, at MSD AH, production planners consider a planning window of typically one year but no longer than two years. This is mainly because of the uncertainties in the market (e.g., more effective drugs can enter the market, and product lines as well as production processes evolve over time). Second, products that we consider at MSD AH have nonstationary demand for which forecasts are available only for the next 12 to 24 months. At MSD AH, sales organizations generate such forecasts to serve as direct input to our finite-horizon model (in Section 5, we present an approach that relies on solving the long-run business problem by using our base model through a rolling-horizon mechanism over time). Furthermore, at MSD AH, the material resource planning (MRP) activities are conducted every month based on a finite planning horizon of one or two years. The decisions considered in this paper are mid-term planning decisions that are part of such MRP activities. Finally, in principle, a finite-horizon model can approximate an infinite-horizon model given that the number of decision epochs is sufficiently large.
We start our analysis by summarizing and developing some preliminary technical results in Section 4.1. Then, we analyze the structural properties of optimal policies for the fill-operations and finish-operations in Sections 4.2 and 4.3, respectively. All proofs are available in the Appendix.

Preliminaries
Our analysis relies on the notion of L -convexity (see, e.g., Simchi-Levi et al. 2014). To be able to define L -convexity, we first introduce the concept of submodularity. Let V ⊆ R n be a polyhedron that forms a lattice (i.e., for any are component-wise minimum and component-wise maximum operators, respectively). A function

it follows that f is submodular if and only
if ∂ 2 f (x)/∂x i ∂x j ≤ 0 for any distinct indexes i and j and for any x ∈ V.
Following Zipkin (2008), we say that the function f : e is the n-dimensional all-ones vector, and ξ ≤ 0. L -convexity of a function implies that the function has the convexity, submodularity, and diagonal-dominance properties (Gong and Chao 2013), which are desirable properties for the analysis of optimal policy structures.
Next, we extend Lemma 1 in Zipkin (2008) and introduce a property of L -convexity that will be useful for our analysis under random yield in fill-operations.
Lemma 1. Suppose that f : V − → R is an L -convex function and r ≥ 0. In this case, the function Similar to Zipkin (2008) and Huh and Janakiraman (2010), the core of our analysis includes a transformation of the state variables of the Markov decision process model. To be specific, our objective is to obtain L -convex value functions after transformation and exploit the properties of L -convexity in our analysis, whereas the value functions (i.e., Equations (3) and (4)) are not L -convex in the original state variables. Intuitively, the role of state transformation is to assure that the state variables have complementary relationships, and hence, the L -convexity of the value functions is preserved under minimization (see Topkis (1998) for details on complementarity and Li and Yu (2014) for its role in the structural analysis of dynamic inventory problems).

Fill Operations
For notational convenience, we drop the subscript t from the state and action variables when possible in the remainder of the analysis. Let w 1 = s 1 and w 2 = s 1 + s 2 . Notice that the transformed state variables (w 1 , w 2 ) ∈ R × R, and the corresponding state space is a lattice. Let u j ≥ 0 denote the production decision of production station j ∈ {1, 2}; i.e., u 1 and u 2 are the number of vials produced in the fill-operations and finish-operations, respectively. Letw 1 = w 1 − u 2 and ξ = −u 1 .
For t ∈ T , the value functions for the transformed problem can be written as with the terminal value function given by J T (w 1 , w 2 ) = c(w 1 , w 2 − w 1 ).
In Theorem 1(i), we show that the value function in each period is an L -convex function after the transformation of the state variables. This will enable us to show in Theorem 1(ii) that the optimal policy in the fill-operations is a monotone function of the inventory levels in the fill and finish operations.
(ii) The optimal number of vials produced in the fill-operations u * 1,t (s 1 , s 2 ) is nonincreasing in s 1 and s 2 for t ∈ T .
Consequently, it follows that optimal decisions for fill-operations can be divided into Zone I, II, III, where Zone I includes all states (s 1 , s 2 ) ∈ S in which u * 1,t (s 1 , s 2 ) = m 1 b 1 , Zone II in which 0 < u * 1,t (s 1 , s 2 ) < m 1 b 1 , and Zone III in which u * 1,t (s 1 , s 2 ) = 0. For practitioners, these decision zones provide important managerial insights on optimal policies, and are easy to implement in practice. For example, if the state of the system is an element of Zone I at time t, then optimal policy at fill-operations requires to produce at maximum capacity. In contrast, when the system state belongs to Zone III, then it is optimal to not produce anything at fill-operations. In case the system state is within Zone II, then optimal policy for fill-operations has a control-limit type structure, and these control-limits are nonincreasing in the inventory levels in each production station. A visual representation of the decision zones is provided in Figure 2. For simplicity, zones in Figure 2 are separated with straight lines, however, we note that these lines can be curvy (in the form of a more general switching curves) and the zones may have a variety of forms in theory.

Finish Operations
In this section, the objective is to study the monotonicity of optimal production decisions in the finish-operations. Recall that the intuition behind the state transformation in Section 4.2 was to make sure that the state in the next decision epoch has a special form, where a term proportional to ξ (i.e., the decision of fill station) is subtracted from each state component; see the objective function in Equation (7). In this way, L -convexity is preserved after the minimization. However, the same state transformation does not lead to the preservation of L -convexity when the finish decision is considered. Therefore, in this section, we adopt an alternative transformation of the state variables based on Huh and Janakiraman (2010). Different from Section 4.2, we now let w 1 = −s 1 and w 2 = s 2 . Notice that (w 1 , w 2 ) ∈ R − × R, and the corresponding state space is a lattice. Furthermore, we now letw 1 = w 1 − u 1 and ξ = −u 2 . For t ∈ T , the value functions for the transformed problem can be written as with the terminal value function given by J T (w 1 , w 2 ) = c(−w 1 , w 2 ). For a general distribution for the yield random variable R, the value function J t (w 1 , w 2 ) in Equation (8) is not necessarily L -convex in w 1 and w 2 . This is because the L -convexity of the function J t+1 (w 1 , w 2 ) in w 1 and w 2 does not guarantee the L -convexity of the function w 2 and ξ for r ∈ (0, 1), where r and d t denote the realizations of the random variables R and D t , respectively. Therefore, the objective function in Equation (9) is not necessarily L -convex inw 1 , w 1 , w 2 and ξ. Thus, different from Section 4.2, the L -convexity of the function f t+1 (w 1 , w 2 ) and subsequently the L -convexity of the value function J t (w 1 , w 2 ) in Equation (8) do not necessarily hold for a general distribution for the yield random variable R. In Theorem 2(i), we show that the value function J t (w 1 , w 2 ) is L -convex for the following special types of production yield in the filloperations: (1) Deterministic yield: R = r with probability 1.
(2) All-or-nothing yield: R = 1 with probability r and R = 0 with probability 1 − r. All-or-nothing type of yield is especially relevant in biomanufacturing due to random shocks that can affect the entire production (see, e.g., Tomlin (2009), Martagan et al. (2016)). For example, during the freeze-drying process, physico-chemical process parameters (i.e., temperature, moisture, pH, etc.) might deviate from their pre-specified control ranges because of bacterial contamination, human errors or equipment failure. In such cases, the whole batch needs to be scrapped to comply with regulatory requirements.
Theorem 2. Suppose that the production yield in fill-operations is deterministic or has all-ornothing type of yield uncertainty. (i) The value functions J t (w 1 , w 2 ) is L -convex in w 1 and w 2 for t = 1, 2, . . . , T . (ii) The optimal number of vials produced in the finish-operations u * 2,t (s 1 , s 2 ) is nondecreasing in s 1 and nonincreasing in s 2 for t = 1, 2, . . . , T − 1.
The monotonicity of optimal policies in finish-operations, as described in Theorem 2, leads to specific decision zones for the finish station (when the production yield in fill-operations is deterministic or has all-or-nothing type of uncertainty). This is summarized in Corollary 2.
It is important to note that we prove the monotonicity of the optimal policy in fill-operations (Theorem 1) under any general probability distribution for the yield in the fill station. On the other hand, we are able to analytically prove the monotonicity of the optimal policy in finishoperations only when the yield in the fill station is deterministic or has all-or-nothing type of uncertainty. Nevertheless, in our case study with industry data in Section 6, we observe that the monotonicity of optimal policies in the finish station continues to hold (under uniformly distributed yield random variable and a finite action space). In our additional numerical experiments with rightskewed, left-skewed and symmetric density functions for the yield random variable, we continue to observe that the optimal production quantity at the finish station still shows monotonicity. In such cases, the decisions for finish-operations can be divided into four zones, where Zone I includes all states (s 1 , s 2 ) ∈ S in which u * 2,t (s 1 , s 2 ) = m 2 b 2 (i.e., produce at maximum finish capacity), Zone III in which u * 2,t (s 1 , s 2 ) = 0 (i.e., do nothing), Zone IV in which u * 2,t (s 1 , s 2 ) = s 1 (i.e., produce at maximum fill inventory capacity), and Zone II as the remaining states. Figure 2 (b) illustrates a possible layout of the decision zones in finish-operations.
The zone-based decision-making presented in our work provides a new, rigorous approach for decision-making in biopharmaceutical fill-and-finish operations. From practical point of view, these decision zones are easy to adopt and implement in current practice. The case study presented in Section 6 uses industry data from MSD AH to illustrate how these decision zones can be generated and used in practice.

Planning with Freeze Periods
We expand the base model presented in Section 3 to incorporate freeze periods into production plans. First, we describe how MSD AH works with freeze periods in practice, and then introduce a formal description of the model to incorporate freeze periods into decision-making.

Current Practice
There are several different ways of freezing the production plans in practice. In this paper, we focus on the current practice at MSD AH, where production planners conduct a planning activity at the beginning of every period in order to generate a production plan for the next T periods. The length T of a planning horizon is often one or two years owing to the highly dynamic and non-stationary nature of the biopharmaceutical market. The length of a period is typically one month, because the periodic planning activities are part of monthly aggregate planning activities (i.e., in compliance with material resource planning activities).  Figure 3: Freeze periods over a rolling horizon Every time a planning activity is conducted, it generates a production plan for the upcoming T periods. Once a production plan is generated, the plan for the first L periods is communicated to the manufacturing team, and cannot be changed anymore. Thus, this is referred as a freeze period. In Figure 3, we illustrate this mechanism for a planning horizon of T = 12 months and a freeze period of length L = 2 months. In this example, the first planning activity is conducted at the beginning of January 2019, and generates a monthly production plan until the end of December 2019. In this production plan, the first two months (January and February) are frozen, as represented with dashed boxes in Figure 3. The next planning activity is conducted at the beginning of February 2019 and generates a monthly production plan until the end of January 2020. However, this production plan assumes that the production decisions of February 2019 are fixed (because it was part of the freeze period at the previous planning activity). That is, the frozen decisions are carried forward to the next planning activities. At each planning activity, the company works with the most recent demand forecasts for the next T periods to determine the production plans that minimize the total expected cost in a planning window of T periods. In this sense, the company adopts a rollinghorizon approach, in which the base model is repeatedly solved every period with updated demand forecasts. These forecasts are generated by a separate department in the company, referred as Sales Organizations, specialized on market analysis.
In practice, MSD AH typically uses a freeze period of length L = 2 months, as illustrated in Figure 3. The motivation of the company for using freeze periods is to ensure that, at all times, they have full visibility on their production activities in the next L periods. An alternative policy could be to generate a new production plan for the next T periods at every L periods. While such a policy might potentially perform better, it is not preferred in practice because it does not always provide full visibility on the production activities of the next L periods.

Modeling of Freeze Periods
The objective of this section is to formally describe how we model the dynamics of the freeze periods explained in Section 5.1. We let ω ∈ {1, 2, . . .} denote the index of the planning activity, where each planning activity ω generates a production plan for the upcoming T periods. Let L denote the length of the freeze period (also note that L ≤ T ). Each planning activity uses the latest available demand forecasts (for the next T periods). We model this by assuming that a collection of the probability distribution functionsĝ ω {ĝ t,ω (·) : t = ω, ω + 1, ω + T − 1} is available at the time of the planning activity ω, whereĝ t,ω (·) denotes an estimate of the demand density function g t (·) available at planning activity w. We consider that demand forecasts are provided exogenously as an input to the model and assume that there is no demand learning mechanism in the model. The pseudo-code in Algorithm 1 outlines how the base model is used in practice in a rolling horizon basis in the presence freeze periods.
Let π ω denote the T -period production plan generated by the planning activity ω (π ω is defined as an arbitrary policy for ω = 1). Algorithm 1 takes π ω−1 , the production plan generated in planning activity ω − 1, as input for the planning activity ω. We let π * = (π 0 , . . . , π T −1 ) denote the optimal policy obtained by solving the base model in Section 3 via backward induction for all t ∈ {0, ..., T − 1}; i.e., π t = (u * 1,t (s 1 , s 2 ), u * 2,t (s 1 , s 2 )) for all (s 1 , s 2 ) ∈ S. In Step 3 of Algorithm 1, notice that the collection of the demand distributions (for the next T periods) available at planning activity ω is used as input in the base model. We note that both π ω and π * consist of T elements, where each element is a pair of vectors representing a production policy (i.e., a function that maps a system state into a production decision) for the fill and finish operations. Let π ω (i) denote the ith element of π ω , and π * (i) denote the ith element of π * . For the first planning activity (i.e., ω = 1), clearly, the solution from the base model is the output (step 6). Otherwise, we note that the output of the ωth planning activity is a T -period production plan, where the first L − 1 elements are the 2nd to Lth elements of the previous production plan π ω−1 and the remaining T − L + 1 elements are the ones coming from the solution of the base model under new demand distributions (step 8).

Problem Setting
Fill-and-finish operations have a fixed batch size of 0.25 million vials (mln VL). Fill-operations have a maximum production capacity of 6 mln VL per month, and hence this leads to the action space U 1 = {0, 0.25, 0.5, . . . , 6} mln VL. Finish-operations have a maximum production capacity of 8 mln VL per month with the action space U 2 = {0, 0.25, 0.5, . . . , 8} mln VL. As an operational policy, the company does not store more than 15 mln VL, which sets an upper bound on the state space.
To protect confidentiality, we use representative values for model inputs. Holding cost is h 1 = $0.5 per vial in fill-operations, and h 2 = $0.8 per vial in finish-operations. Penalty cost of shortage is p 2 = $6.2 per vial, and includes the cost of internal planning and scheduling efforts to meet backorders, cost of disappointing clients and its impact on future orders, etc. Statistical analysis of monthly sales data over the last two years and forecast information for the upcoming two years indicate that monthly demand can be modelled with a normal distribution (see the electronic companion for the parameters of demand distribution used in the numerical analysis).
Analysis of two years of production data showed that the random yield fraction was uniformly distributed for fill-operations in the interval [0.70, 0.90]. Therefore, we obtain the mean E[R] = 0.80 and coefficient of variation CV [R] = 0.07. In compliance with industry standards, data analysis showed that no significant yield losses were encountered in finish-operations.
In alignment with the current production planning strategy of MSD AH, we consider a planning horizon of T = 24 months, where each decision epoch corresponds to the beginning of a month.
In practice, MSD AH's planning horizon is often one or two years. This is mainly because the biopharmaceutical market is highly dynamic and non-stationary, and hence it is often challenging to generate reliable policies for a period longer than one or two years (see Section 3 and 5.1 for further details). To check the robustness of optimal actions to different lengths of planning horizon, we conducted additional numerical experiments. We concluded that the effect of the length of the planning horizon on optimal production decisions at the beginning of the planning horizon decreases as T approaches 24 months. Nevertheless, the rolling horizon approach presented in Section 5.2 (and numerically evaluated in Section 6.3) enables to plan for the foreseeable future in practice.
All numerical experiments use a discount factor of γ = 0.99. This problem setting is referred as base case in our numerical analysis. We solve the problem with backward induction, and use a discretization scheme of the state space based on the minimum allowed batch size. We also tested the sensitivity to finer discretization schemes, and concluded that optimal cost and policies were robust to finer discretization levels.

Insights on Yield Uncertainty
The main aim of this section is to help operations managers understand the impact of yield uncertainty on optimal costs and decision zones. For this purpose, we focus on the following characteristics of the yield random variable R: (i) average yield, and (ii) variability in yield. In compliance with industry data, we use uniform distribution for the fraction R, and define the following sce-  with probability 0.2). These scenarios are defined based on input from practitioners to represent a variety of cases from the industry. The scenarios no yield loss, lower yield loss and higher yield loss enable us to study the impact of average yield, while the scenarios no variability, higher variability, and all-or-nothing yield enable us to study the impact of yield variability on operational performance.

Financial Implications of Yield Uncertainty in Fill-Operations
Our objective is to assess the financial implications of yield uncertainty. For this purpose, Table 1 and Table 2 report the optimal value function V 0 (s 1,0 , s 2,0 ) under different scenarios defined above.
For brevity, we report the optimal value function at representative states (s 1,0 , s 2,0 ) in the first planning activity (i.e., at the 0-th decision epoch). For completeness, the optimal value function associated with other states are reported in the electronic companion. The zone to which a selected state belongs to may be different in each scenario. For brevity, Table 1 and Table 2 show the specific decision zones associated with selected states under the base case setting (at t = 0). For each state, the column labelled "%∆" denotes the percentage difference in the value function of a given scenario compared to that of the base case (i.e., a positive value represents a cost reduction with respect to the base case, whereas a negative value denotes an increase in costs). First, we compare the performance of the base case against the scenario with no yield losses in Table 1. This comparison helps quantify the maximum cost reduction that could be achieved in practice by eliminating the production loss in fill-operations (i.e., it provides a benchmark for the maximum room for improvement). Comparing the optimal value functions of these two scenarios in Table 1, we observe that 38% to 55% improvement in cost could be achieved in practice (i.e., the room for improvement ranges between 20% and 60% including all other states that are not presented in Table 1). Nevertheless, analysis of the scenario with a lower yield loss continues to show a large room for improvement (i.e., 25% to 45% improvement) in practice, by increasing the average yield E[R] from 0.80 to 0.90. In addition, we see that the percentage differences between the value functions of the base case and the case with higher yield losses are higher than those of the base case and the case with lower yield losses. For practitioners, this can be interpreted as a diminishing improvement in expected costs, as the average yield increases (while the coefficient of variation remains the same). For practitioners, understanding this behavior might have important implications, as completely eliminating the yield losses could be challenging (or infeasible) because of the complex biological dynamics and limitations in the underlying freeze-drying technologies.
To gain further insights, we analyze the impact of yield variability in fill-operations. First, we compare the performance of the base case against the scenario with no variability in Table 2.
Note that both of these scenario assume the same average yield, E[R] = 0.80. Therefore, this comparison allows us to assess the financial implications of having no yield variability in practice.
In Table 2, we observe that the percentage difference between the optimal value functions of these two scenarios ranges between 20% and 37% (a range of 10% to 40% improvement is observed based on all states on the state space). For operations managers, this indicates that controlling the yield variability can lead to a large room for improvement in current practice. This observation is further substantiated by comparing the base case against the scenario with higher variability.
Note that both of these scenarios have the same mean but the coefficient of variation is twice larger in the higher variability case. In this setting, yield variability continues to exert a high impact on costs (i.e., the expected cost increases by 11% to 23% as the coefficient of variation doubles in Table 2. The room for improvement remains the same for other states that are not presented in Table 2). Nevertheless, the case with all-or-nothing exerts the highest increase in costs, as it involves the highest variability among scenarios considered in this analysis. These results indicate that controlling the yield variability can be a critical aspect for biopharmaceutical process improvement projects.
In addition, we compare the scenario with no yield losses (Table 1) against the one with no variability ( Table 2). Note that both of these scenarios assume zero variability but vary in terms of their yield losses (i.e., R = 1 for no yield losses case and R = 0.8 for no variability case). This comparison indicates that 11% to 34% improvement in the optimal value function can be achieved (including states that are not covered in the tables) by eliminating the yield losses. This indicates that controlling the average yield losses can be critical for practitioners.
From a practical perspective, controlling the mean and variability of yield uncertainty could be challenging because of the complex biological and chemical dynamics of the underlying freeze-drying processes. However, these processes can be partly controlled and improved using the guidelines for Quality-by-Design and Process Analytical Technology (Rathore and Mhatre 2011). These guidelines help to standardize and optimize some of the controllable input parameters of bio-processes, and are especially known to help control batch-to-batch variability. In Tables 1 and 2

Analysis of Decision Zones
A visual representation of the decision zones obtained under the base case is provided in Figure 4.
To gain further insights, our main objective in this section is to understand how the decision zones expand or shrink as a response to yield uncertainty in fill-operations. For this purpose, we define the measure fraction of state space (Θ) in the truncated state space of numerical experiments (i.e., s 1,0 ∈ [0, 15] mln VL and s 2,0 ∈ [−8, 8] mln VL). This measure denotes the ratio of the number of states belonging to a specific decision-zone to the total number of states in the state space.
Therefore, the measure Θ helps to quantify the relative size of a decision zone. Table 3 summarizes the results.
In Table 3, we observe that Zone I expands and Zone III shrink as the yield uncertainty becomes higher in the system (i.e., as the average yield losses or process variability increase). This indicates that the fill-operation tends to buffer against uncertainty by increasing its production amounts. In contrast, finish zones exert a different response to yield uncertainty. For example, Zone I of finishoperations tends to shrink while Zone III expands as the average yield losses or yield variability increases. This implies that finish-operation responds to the yield uncertainty of fill-operations by reducing its production amounts. The underlying intuition can be associated with several factors.
For example, the cost of holding inventory in the finish station is more expensive than the cost of holding inventory in the fill station. In addition, note that the system already buffers against yield uncertainty of fill-operations by increasing the production amounts of fill-operations (i.e., Zone I of fill expands in Table 3 as the yield uncertainly increases).

Insights on Freeze Periods
To understand the impact of freeze period on costs and policies, the base case is solved with a freeze period of length L ∈ {2, 3, 4, 5}. Note that freeze periods with L ≥ 6 are not considered in the numerical analysis, as they are not practical in our specific production setting. Currently, MSD AH uses a freeze period of L = 2. We consider planning activities for one year, i.e., ω ∈ {1, ..., 12}.
However, we also tested the sensitivity of optimal costs to longer planning activities and concluded that our insights are similar for a longer horizon of planning activities. At each planning activity, the biomanufacturer uses the most recent forecasts and generates a production plan for a planning horizon of two years, T = 24 (i.e., in alignment with Section 6.1 and 6.2). To assess the impact of freeze periods at MSD AH, we used historical sales data and their corresponding forecast files (see the electronic companion for further details on the demand distribution). For example, the forecast file of January 2016 is used in the first planning activity ω = 1, while the forecast file of December 2016 is used in the last planning activity ω = 12. Our main objective is to quantify the financial impact of freezing the production decisions, and help practitioners to have a better understanding on the trade-off between planning stability and flexibility.
To quantify the impact of freeze periods, we use the percentage change in the (T -period) expected cost in a particular planning activity as a key performance metric. More specifically, for a given freeze period of length L, Table 4 shows the percentage increase in the (24-period) expected cost compared to the base case with no freeze periods at state (s 1 , s 2 ) in the first decision epoch of the planning activity ω = 6 (we consider this particular planning activity as an example that reflects a situation halfway in the series of monthly re-planning activities considered in the case study). First, we focus on the current practice, and assess the financial implications of using a freeze period of length L = 2. Table 4 shows that the percentage increase in the optimal cost (compared to the base case) can be up to 11.3% under a freeze period of length L = 2. Further analysis shows that this increase can be up to 20% considering all other states that are not covered in Table 4. In addition, we see that the percentage increases under L = 3 are at least 1.5 times higher compared to those under L = 2. Therefore, financial implications of the freeze period become more pronounced as the freeze period increases from two to three months. Table 4 indicates that the percentage increase in costs gets higher in L for a fixed system state, however, no specific trends were observed regarding the behavior of the percentage increase as a function of the system state under a fixed L. On the other hand, at L = 2, we see that the highest percentage increase is often attained in a system state where the net inventory is around 5 to 7 million vials -which often corresponds to states within Zone II. This behavior indicates that the demand information plays a more critical role in adjusting the production amounts in Zone II.
When the production schedule is frozen in advance, decisions are made based on older demand information and can not be adjusted. Table 4 help operations managers understand the financial implications of the flexibility loss due to freeze periods. In practice, biomanufacturers often deal with a complex trade-off between planning flexibility and planning stability, and our analysis indicates that the current strategy of adopting a two months of freeze period can lead up to 20% increase in the expected cost of filland-finish operations considered in our specific case study. For practitioners, these results provide a rigorous and quantitative approach for assessing the impact of freezing production schedules, and enables an easier communication of planning strategies within the company.

Conclusions
Fill-and-finish operations include activities related to formulation, freeze-drying, filling, and sealing an active ingredient into its final form. State-of-the-art technologies can deliver consistent production yields in finish-operations. However, fill-operations remain to be a critical challenge in practice, as it relies on complex and unpredictable biological processes, such as freeze-dry. This leads to yield uncertainty in fill-operations. In addition, fill-and-finish operations are labor-and cost-intensive, and require specialized handling with strict limitations on batch sizes and capacity. As a response to these challenges, most biopharmaceutical companies tend to freeze their production decisions for a predefined period of time. This allows companies to improve their visibility and stability in manufacturing processes. However, using such freeze periods adds another level of challenge as it reduces planning flexibility.
To address these problems, we develop a finite-horizon, discounted-cost Markov decision model that optimizes the production decisions at biopharmaceutical fill-and-finish operations. The model is then extended to optimize decision-making under freeze periods. We analyze the structural properties of the optimal costs and policies, and show that the state space can be partitioned into decision zones. More specifically, we show that optimal production decisions can be classified into three decision zones for fill-operations, and four decision zones for finish-operations. These decision zones provide guidelines on optimal policies, and are easy to implement in practice.
We present an industry case study, and focus on evaluating the impact of yield uncertainty on optimal costs and policies. In our specific case study, numerical experiments indicate that fully eliminating the yield uncertainty in fill-operations can provide 20% to 60% improvement (depending on the current inventory levels) in the expected cost of fill-and-finish operations. In a typical large scale biopharmaceutical setting, such a room for improvement could often lead to a six figure reduction in costs. We note that companies can adopt new technologies to reduce yield uncertainty, but fully eliminating the yield uncertainty might not necessarily be feasible because of technological limitations. Nevertheless, our results provide a quantified business case to encourage further development in this field. We also conclude that the financial implications of controlling the yield variability can be as critical as controlling the mean yield losses themselves. In addition, numerical experiments reveal insights related to the cost of freezing the production decisions. In our specific case study, we observe that the current practice of adopting a two-month freeze period could lead up to 20% increase in the expected costs of fill-and-finish operations.
This research has been conducted in close collaboration with MSD AH in the Netherlands. However, the true impact of this work extends beyond their operations, as it addresses common industry challenges and trade-offs. Insights obtained from this research have also been shared with a larger biotech community through working group sessions. Industry feedback indicates that most biomanufacturers rely on expert opinion and MRP calculations that have limited ability in incorporating the manufacturing challenges (e.g., yield and demand uncertainty) into decision-making. Therefore, we believe that the optimization framework presented in this paper will help to improve decision-making in practice.
The application of operations research (OR) methodologies to the biomanufacturing industry is still in its infancy. However, biomanufacturers are realizing that they need to undergo a data-driven, OR-based transformation to fully realize the benefits of bioscience research. Future work could consider production and scheduling decisions for multiple products and/or multiple biomanufacturing activities (i.e., fermentation, chromatography, filtration, and fill-finish). In addition, the model can be extended to incorporate inventory spoilage into decision-making. We note that freeze-dried products (such as those considered in this paper) have long self lives. However, several biophar-

Appendix: Proofs
Proof of Lemma 1. Notice that the L -convexity of the function g(v, ξ) is equivalent to the Lconvexity of the function ψ(v, ξr) in v and ξ. It follows from the definition of L -convexity that the latter is the case if is submodular in v = (v 1 , . . . , v n ) and ξ. Let h(v 1 , . . . , v n , ξ) denote the function f (v 1 −ξ, . . . , v n −ξ).
Since submodularity is invariant by separable strictly increasing reparameterizations, it is known that the function J t+1 (w 1 − ξr − y, w 2 − ξr − d − y) is also submodular inw 1 , w 2 , ξ and y. Thus, it follows from Proposition 3 of Chen (2017) that the function J t+1 (w 1 − ξr, w 2 − ξr − d) continues to be L -convex inw 1 , w 2 , and ξ for a constant d. Because the expectation operator preserves L -convexity, it follows that the objective function in the optimization problem s.t. − m 2 b 2 ≤w 1 − w 1 ≤ 0 0 ≤w 1 is also L -convex in w 1 ,w 1 , w 2 and ξ. Consequently, it follows from Lemma 2(b) of Huh and Janakiraman (2010) that the functionJ t+1 (w 1 , w 2 , ξ) is L -convex in w 1 , w 2 and ξ. Furthermore, consider the optimization problem It again follows from Lemma 2(b) of Huh and Janakiraman (2010) that the function f t+1 (w 1 , w 2 ) is L -convex in w 1 and w 2 . Consequently, the value function J t (w 1 , w 2 ) in (6) is the sum of L -convex functions, and therefore, it is L -convex, completing the induction.
(ii) Let ξ * t (w 1 , w 2 ) denote the largest minimizer of the objective function in (12) for some t.
Since the objective function in (12) is submodular (due to its L -convexity) and the feasible region forms a lattice, Topkis' monotonicity theorem (Theorem 2.8.2, Topkis 1998) implies that ξ * t (w 1 , w 2 ) is nondecreasing in w 1 for a fixed w 2 , and it is nondecreasing in w 2 for a fixed w 1 . Let u * 1,t (s 1 , s 2 ) = −ξ * t (w 1 , w 2 ), where the relations between w 1 , w 2 , s 1 and s 2 are already defined by the state transformation. Since ξ * is L -convex inw 1 , w 1 , w 2 , and ξ for a constant d. Because the expectation operator preserves L -convexity, it follows that the objective function in the optimization problem J t+1 (w 1 , w 2 , ξ) = min w 1 E [q(w 1 , w 1 , w 2 , ξ; D t )] s.t. w 1 −w 1 ≤ m 1 b 1 is also L -convex in w 1 ,w 1 , w 2 and ξ. Consequently, it follows from Lemma 2(b) of Huh and Janakiraman (2010) that the functionJ t+1 (w 1 , w 2 , ξ) is L -convex in w 1 , w 2 and ξ. Furthermore, consider the optimization problem f t+1 (w 1 , w 2 ) = min ξJ t+1 (w 1 , w 2 , ξ) It again follows from Lemma 2(b) of Huh and Janakiraman (2010) that the function f t+1 (w 1 , w 2 ) is L -convex in w 1 and w 2 . Consequently, the value function J t (w 1 , w 2 ) in (8) is the sum of L -convex functions, and therefore, it is L -convex, completing the induction.
(ii) Let ξ * t (w 1 , w 2 ) denote the largest minimizer of the objective function in (14) for some t.
Since the objective function in (14) is submodular (due to its L -convexity) and the feasible region forms a lattice, Topkis' monotonicity theorem (Theorem 2.8.2, Topkis 1998) implies that ξ * t (w 1 , w 2 ) is nondecreasing in w 1 for a fixed w 2 , and it is nondecreasing in w 2 for a fixed w 1 . Let u * 2,t (s 1 , s 2 ) = −ξ * t (w 1 , w 2 ), where the relations between w 1 , w 2 , s 1 and s 2 are already defined by the state transformation. Since ξ * t is nondecreasing in w 1 , and w 1 is equal to −s 1 , u * 2,t (s 1 , s 2 ) is nondecreasing in s 1 . Furthermore, since ξ * t is nondecreasing in w 2 , and w 2 is equal to s 2 , u * 2,t (s 1 , s 2 ) is nonincreasing in s 2 . Thus, the result follows.
Deterministic yield. The result immediately follows from the analysis of the full-yield case (i.e., P(R = 1) = 1 under all-or-nothing yield uncertainty). In deterministic case, the yield is known upfront, suppose its value is r. The optimal finish policy in the full-yield case, which is solved under the finish capacity m 2 b 2 r, can be transformed (by multiplying with 1/r) to obtain the optimal finish Online Supplement for "Optimal Production Decisions in Biopharmaceutical Fill-and-Finish Operations" Tugce Martagan, Alp Akcay, Maarten Koek, and Ivo Adan

Input Data
To protect confidentiality, we use representative values for the demand distribution. All numerical experiments assume that monthly demand is normally distributed with a mean as shown in Table 5 and a variance of 380.000 vials. Columns in Table 5 represent different planning activities, ω ∈ {1, 2, 3, . . . , 12}. The base case scenario corresponds to ω = 1.