1 Introduction
Differential games stand as a suitable framework for modelling strategic interaction between different agents (known as players), where each of them is looking for the minimization or, equivalently, the maximization of his individual criterion (Engwerda,
2005; Başar and Olsder,
1999). In such a multi-player scenario, none of the players is allowed to maximize his profits or objectives at the expense of the rest of the players. Therefore, the solution of the game is given in a form of
“equilibrium of forces”.
Among different types of solutions, the so called Nash equilibrium is the most extensively used in the game theory literature. In this solution none of the players can improve their criteria by unilaterally deviating from their Nash strategy; therefore, no player has an incentive to change his decision. When the full state information is available to all the players to realize their decision strategy in each point of time, this is called a
feedback Nash equilibrium (Engwerda,
2005; Başar and Olsder,
1999; Friedman,
1971). In order to find such feedback strategies, the optimal control tools are applied, specifically an equivalent
N players form of the Hamilton–Jacobi–Bellman equation is required to be solved for each of the players. In the case of the non-cooperative Nash equilibrium solution framework, each player deals with a single criterion optimization problem (the standard optimal control problem), with the actions of the remaining players taking fixed equilibrium values.
Although the notion of robustness is such an important feature in the control theory, there are not many studies of dynamic games that are affected by some sort of uncertainties or disturbances. Some recent developments on this topic can be mentioned. Jiménez-Lizárraga and Poznyak (
2007) presented a notion of open loop Nash equilibrium (OLNE) where the parameters of the game are within a finite set and the solution is given in terms of the worst-case scenario, that is, the result of the application of certain control input (in terms of the cost function value) is associated with the worst or least favourable value of the unknown parameter. The article of Jank and Kun (
2002) shows also an OLNE and derives conditions for the existence and uniqueness of a worst case Nash equilibrium (WCNE); however, in this case they considered that the uncertainty belongs to a Hilbert functional space and enters adding up into the time derivative of the state variables. A similar problem is considered in a quite recent work (Engwerda,
2017), where the author shows that the WCNE can be derived by finding an ONLE of an associated differential game with
$2N$ initial state constraints, the author derives necessary and sufficient conditions for the solution of the finite time problem. The work of Jungers
et al. (
2008) deals with a game with polytopic uncertainties that reformulated the problem as a nonconvex coupling between semi-definite programming to find the Nash type controls. Other related approaches include: using the Nash strategy to design robust controls for linear systems (Chen and Zhou,
2001). Another way to deal with uncertainties is to view them as an exogenous input (a fictitious player) (Chen
et al.,
1997). In the work of van den Broek
et al. (
2003) the definition of equilibria is extended to deal with two cases: a soft-constrained formulation whose basis is given by Jank and Kun (
2002), where the fictitious player is introduced in the criteria via a weighting matrix.
In this work, inspired in the works of Jank and Kun (
2002) and Engwerda (
2005,
2017), we analyse a deterministic
N-player non-zero sum Differential Game case, considering finite, as well as infinite, time horizon in the performance index and a
${L^{2}}$ perturbation which is considered as a fictitious player trying to maximize the cost of each
i-th player.
Assuming the player has access to the full state information, we are interested in finding a type of robust feedback Nash strategies, that guarantee a robust equilibrium when the players consider the worst case of the perturbation with respect to their own point of view. To that end, a set of robust form of the HJB equations are introduced; each of these equations compute not only the minimum of the i-th player control; but the maximum or worst case uncertainty from his point of view; resulting in a min-max form of the known HJB equations for a N players game. To the best of the authors’ knowledge, using such a robust HJB equation has not been considered before to find a robust feedback Nash equilibrium in linear quadratic deterministic games, which stand as an important case to study. To summarize, the contributions of this work are as follows:
-
1. Presentation of the general conditions of robust worst case feedback Nash equilibrium by means of a robust form of the HJB equation for N players non-zero sum games.
-
2. Based on such a formulation, it gives the solution for the finite time horizon for the linear affine quadratic uncertain game.
-
3. It gives the solution of the infinite horizon for the linear affine dynamics.
-
4. It illustrates the result solving a problem of coordination of a two-echelon supply chain with seasonal uncertain fluctuations in demand. Such a case has not been treated before.
The development of this paper is as follows. In Section
2 we state, formally, the general problem of a differential game and the conditions for the Robust Nash Equilibrium to exist. Then, in Section
3 we define the dynamics of the problem analysed and the type of functional cost we have to minimize for a finite time horizon problem, we also state a theorem based on dynamic programming to find the robust controls for each player. In Section
4 we analyse the case of infinite time horizon. Finally, Section
5 follows with a numerical example. The purpose of this last section is to show how to apply the formulas obtained in Sections
3–
4 and then compare our results against a finite time differential game which does not consider perturbation in the solution of the problem, which is the common problem treated, but the system itself is affected by some sort of perturbation.
2 Problem Statement
In this section we exploit the principle of dynamic programming in order to find all the robust feedback Nash equilibrium strategies for each player of a Non-zero sum uncertain differential game. We begin by presenting the general sufficient conditions for such a robust equilibrium to exist. Towards that end, consider the following
N-person uncertain differential game with initial pair
$(s,y)\in [0,T]\times {\mathbb{R}^{n\times 1}}$ described by the following initial value problem
where
$x(t)\in {\mathbb{R}^{n\times 1}}$ is the state column vector of the game and
${u_{i}}(t)\in {\mathbb{R}^{{l_{i}}\times 1}}$ is the control strategy at time
t for each player
i that may run over a given control region
${U_{i}}\subset {\mathbb{R}^{{l_{i}}\times 1}}$,
i represents the number of player for
$i\in \{1,\dots ,N\}$,
${u_{\hat{\imath }}}$ is the vector of strategies for the rest of the players,
$\hat{\imath }$ is the
counter-coalition of players counteracting the player with index
i and
$w(t)\in {\mathbb{R}^{q\times 1}}$ is a finite unknown disturbance in the sense that
${\textstyle\int _{0}^{T}}\| w(t){\| ^{2}}\mathrm{d}t<+\infty $, that is,
w is square integrable or, stated another way,
$w\in {L^{2}}[0,T]$. The cost function as individual aim performance
which contains the integral term as well as a terminal state is given in the standard Bolza form.
Throughout the article we shall use the next notations:
-
• ${^{A}}B$ is the set of functions from the set A to the set B.
-
• ${A^{\mathrm{t}}}$ is the transpose of the matrix A.
-
• ${I_{N,i}}:=\{k\in \mathbb{N}:1\leqslant k\leqslant N$ and $k\ne i\}$.
-
• ${\mathcal{U}_{\mathrm{adm}}^{i}}[{s_{0}},{s_{1}}]:=\{{u_{i}}{\in ^{[{s_{0}},{s_{1}}]}}{U_{i}}:{u_{i}}$ is measurable}.
-
• ${\mathcal{U}_{\mathrm{adm}}^{i}}:={\mathcal{U}_{\mathrm{adm}}^{i}}[0,T]$ is the set of all admissible control strategies.
-
• ${\mathcal{U}_{\mathrm{adm}}^{\hat{\imath }}}:={\mathcal{U}_{\mathrm{adm}}^{1}}\times \cdots \times {\mathcal{U}_{\mathrm{adm}}^{i-1}}\times {\mathcal{U}_{\mathrm{adm}}^{i+1}}\times \cdots \times {\mathcal{U}_{\mathrm{adm}}^{N}}$.
-
• ${\mathcal{U}_{\mathrm{adm}}}:{=✕_{i=1}^{N}}{\mathcal{U}_{\mathrm{adm}}^{i}}$.
-
• If $u\in {\mathcal{U}_{\mathrm{adm}}}$, for $t\in [0,T]$, $u(t):=({u_{1}}(t),{u_{2}}(t),\dots ,{u_{N}}(t))$.
-
• ${\mathrm{D}_{i}}f$ denotes the partial derivative of f with respect to the i-th component.
-
• ${\mathbb{1}_{A}}$ denotes the indicator function of a set A.
Hypothesis 1.
The control region ${U_{i}}$ is a subset of ${\mathbb{R}^{{l_{i}}\times 1}}$. The maps f, ${g_{i}}$ and ${h_{i}}$ are such that for all $({u_{i}},{u_{\hat{\imath }}},w)\in {\mathcal{U}_{\mathrm{adm}}^{i}}\times {\mathcal{U}_{\mathrm{adm}}^{\hat{\imath }}}\times {L^{2}}[0,T]$, equation (
1)
admits an a.e. unique solution and the function ${J_{i}}$ given in (
2)
is well defined; in general we assume the conditions given by Yong and Zhou (1999, p. 159).
Remark 1.
We assume that the integrand
${g_{i}}$ given in Equation (
2) is positive definite, then the cost function
${J_{i}}$ could not take negative values.
2.1 Robust Feedback Nash Equilibrium
Next, we introduce the worst case uncertainty from the point of view of the
i-th player according to the complete set of controls
${u_{j}}$, with
$j\in \{1,\dots ,N\}$ (Jank and Kun,
2002; Engwerda,
2017):
In this paper we want to extend the robust Nash equilibrium notion, previously introduced by Jank and Kun (
2002) for an open loop information structure, to a full state feedback information for an
N players game.
Definition 1.
The control strategies
${u_{1}^{\mathrm{rn}}},{u_{2}^{\mathrm{rn}}},\dots ,{u_{N}^{\mathrm{rn}}}$, are said to be
robust feedback Nash equilibrium, where
${({u_{i}^{\mathrm{rn}}},)_{i=1}^{N}}\in {\mathcal{U}_{\mathrm{adm}}}$, if for any vector of admissible strategies
assuming the existence of the corresponding maximizing uncertainty function
${w_{i,{u_{i}},{u_{\hat{\imath }}^{\mathrm{rn}}}}^{\ast }}\in {L^{2}}[0,T]$ from the point of view of the
i-th player. Then, we have the next set of inequalities:
In those conditions, we say also that
$({u_{1}^{\mathrm{rn}}},{u_{2}^{\mathrm{rn}}},\dots ,{u_{N}^{\mathrm{rn}}})$ is a
vector of robust feedback Nash strategies for the whole set of players.
Hypothesis 2.
There is a unique vector of robust feedback Nash strategies for the whole set of players.
Now in order to find the robust feedback Nash equilibrium control strategies for the problem given by (
2) subject to (
1), we consider the following definition.
Definition 2.
Consider the
N-tuples of strategies
$({u_{1}},{u_{2}},\dots ,{u_{N}})$ and the
robust value function for the
i-th player as:
for any particular initial pair
$(s,y)\in [0,T)\times {\mathbb{R}^{n\times 1}}$. The function
${V_{i}}$ is also called the
robust Bellman function.
Remark 2.
Notice that the minimization operation over
${u_{i}}$ considers that the rest of the players are fixed in their Robust strategies (
4) and each
${w_{i,{u_{i}},{u_{\hat{\imath }}}}^{\ast }}$ satisfies (
3).
2.2 Robust Dynamic Programming Equation
Let us explore the Bellman principle of optimality (Poznyak,
2008) for the robust value function
${V_{i}}$ associated with the min-max posed problem for the
i-th player, considering the rest of the participants as well as the signal function
w fixed.
For
${u_{i}}\in {\mathcal{U}_{\mathrm{adm}}^{i}}$, let us take
${v_{i}}={\mathbb{1}_{[s,\hat{s})}}{u_{i}}+{\mathbb{1}_{[\hat{s},T)}}{u_{i}^{\mathrm{rn}}}$ and note that
${v_{i}}\in {\mathcal{U}_{\mathrm{adm}}^{i}}$. Using the Bellman principle of optimality for the functional
${J_{i}}(s,y,{v_{i}},{u_{\hat{\imath }}^{\mathrm{rn}}},\cdot )$, where
${J_{i}}$ is given in equation (
2), and using also equation (
5) given in Definition
2 we have:
where the control strategies
${u_{\hat{\imath }}^{\mathrm{rn}}}$ are robust Nash controls defined in (
4) and
$x(\hat{s})$ is such that
x fulfills (
1) when
${u_{j}}={u_{j}^{\mathrm{rn}}}$ for
$j\ne i$ and
$w={w_{i,{u_{i}},{u_{\hat{i}}^{\mathrm{rn}}}}^{\ast }}$ that is described in the Definition
1. Hence, taking the minimum in the right part of (
6) over
${u_{i}}$, the inequality yields to
On the other hand, beside for any
$\delta >0$, there is a control
${u_{i,\delta }}\in {\mathcal{U}_{\mathrm{adm}}^{i}}$, with the property:
where
${x_{\delta }}$ is the solution of (
1) under the application of the control
${u_{i,\delta }}$ keeping the rest of the players fixed. Indeed, if there is a
$\delta >0$ such that for any
${u_{i}}\in {\mathcal{U}_{\mathrm{adm}}^{i}}$ we have
then, taking
${u_{i}}={u_{i}^{\mathrm{rn}}}$ and using the Bellman principle of optimality, we would obtain
arriving to a contradiction. So, from the inequality (
8) we get
Now, as in the inequality (
9), the value of
δ is positive, but arbitrary, we have
(Fattorini,
1999; Poznyak,
2008).
From the inequalities (
7) and (
10), we have arrived to the next theorem that is a robust form of the dynamic programming equation, for the problem in consideration.
Theorem 1.
Let the basic assumption of Section 2 hold, then for any initial pair $(s,y)\in [0,T)\times {\mathbb{R}^{n\times 1}}$, the following relationship holds:
for all $\hat{s}\in [s,T]$.
The development of the principle of optimality to equation (
11), leads immediately to the following result:
Theorem 2.
Let’s consider the uncertain affine N-players differential game given by (
1)–(
2)
, where T is finite and the full state information is known. In this case the vector of control strategies $({u_{i}^{\mathrm{rn}}},{u_{\hat{\imath }}^{\mathrm{rn}}})$ provides a robust feedback equilibrium if there exists a continuously differentiable function ${V_{i}}:[0,T]\times {\mathbb{R}^{n\times 1}}\to \mathbb{R}$ satisfying the following partial differential equation:
where $\hat{{u_{i}}}(t)=({u_{1}^{\mathrm{rn}}}(t),\dots ,{u_{i-1}^{\mathrm{rn}}}(t),{u_{i}}(t),{u_{i+1}^{\mathrm{rn}}}(t),\dots ,{u_{N}^{\mathrm{rn}}}(t))$, and the corresponding min-max cost for each player is
Remark 3.
The partial differential equation (
12) of Theorem
2 is called the
robust Hamilton–Jacobi–Bellman (RHJB) equation. In previous important works dealing with the design of robust
${H_{\infty }}$ controllers using a dynamic game approach (Başar and Bernhard,
2008; Aliyu,
2011) the min-max version of the value function was already found. It was also found that actually when all the players are fixed in their robust Nash controls the game became a zero-sum game, played out between the
i-th player and the uncertainty. Equation (
12) is an extension to the
N players non-zero sum game; however, to the best of our knowledge, this case has not been introduced yet.
3 Finite Time Horizon N Players Linear Affine Quadratic Differential Game
Once the general conditions for the existence of a robust feedback Nash equilibrium in an uncertain differential game are established, we turn now to the special case of the linear affine quadratic differential games (LAQDG). In this section we consider the case where the time horizon is finite, that is,
$T<+\infty $. The game is played by
N participants which are trying to minimize certain loss inflicted by a disturbance, besides, the functional cost of the game is restricted by the corresponding differential equation. Therefore, in this section we assume that:
the functions for the cost of each player are given by the following quadratic functions:
where
j represents the number of the player,
$A(t)\in {\mathbb{R}^{n\times n}}$, and
${B_{j}}(t)\in {\mathbb{R}^{n\times {l_{j}}}}$, for
$j\in \{1,\dots ,N\}$, are the known system and controls matrices;
$x(t)$ is the state vector of the game and
${u_{j}}$ is the control strategy for the
j-th player;
$c(t)\in {\mathbb{R}^{n\times 1}}$ is an exogenous and known signal. In this case
w is the same as in (
1), that is, a finite disturbance entering the system through the matrix
$E(t)\in {\mathbb{R}^{n\times q}}$. The performance index for each
i-th player is given again in standard Bolza form, the strategy for the player
i is
${u_{i}}$ while
${u_{\hat{\imath }}}$ are the strategies of the rest of the players. The term
$w{(t)^{\mathrm{t}}}{W_{i}}(t)w(t)$ is the unknown uncertainty, which is trying to maximize the cost
${J_{i}}$ from the point of view of the
i-th player. The cost matrices are assumed to satisfy:
${Q_{i}}(t)={Q_{i}}{(t)^{\mathrm{t}}}\geqslant \mathbf{0}$,
${Q_{i\mathrm{f}}}={Q_{i\mathrm{f}}^{\mathrm{t}}}\geqslant \mathbf{0}$ and
${W_{i}}(t)={W_{i}}{(t)^{\mathrm{t}}}>\mathbf{0}$ (symmetric and semipositive/positive definite matrices);
${R_{i,i}}(t)={R_{i,i}}{(t)^{\mathrm{t}}}>\mathbf{0}$ and
${R_{i,j}}(t)={R_{i,j}}{(t)^{\mathrm{t}}}\geqslant \mathbf{0}$, where inequalities mean inequalities component by component. Assume also that the players have access to the full state information pattern, that is, they measure
$x(t)$, for all
$t\in [0,T]$. All the involved squared matrices are assumed to be non-singular.
For the linear affine dynamics given in (
14), equation (
12) can be rewritten as follows:
with terminal condition as
${V_{i}}(T,x(T))=x{(T)^{\mathrm{t}}}{Q_{i\mathrm{f}}}x(T)$. With this condition and if the assumptions mentioned above are satisfied, the Robust Feedback Nash Equilibrium can be directly obtained as
and worst case uncertainty from the point of view of the
i-th player is obtained as
Remark 4.
Notice that the value of
${w_{i,{u_{i}},{u_{\hat{\imath }}}}^{\ast }}$ given in (
18) does not depend of
$({u_{i}},{u_{\hat{\imath }}})$. So, in this particular case, we shall denote such value just by
${w_{i}^{\ast }}$.
Theorem 3.
The robust feedback Nash strategies for the uncertain LQ affine game (
14)–(
15)
, has the next linear form:
and the worst case uncertainty from the point of view of the i-th player is:
where the set of N Riccati type coupled equations ${P_{i}}$ satisfy the following boundary value problem:
where ${S_{i}}:={B_{i}}{R_{i,i}^{-1}}{B_{i}^{\mathrm{t}}}$, ${S_{j,i}}:={B_{j}}{R_{j,j}^{-1}}{R_{i,j}}{R_{j,j}^{-1}}{B_{j}^{\mathrm{t}}}$, ${M_{i}}:=E{W_{i}^{-1}}{E^{\mathrm{t}}}$, for $i,j\in \{1,\dots ,N\}$;
and the ${m_{i}}$ are the “shifting vectors” governed by the following coupled linear differential equations:
and the value of the robust Nash cost is:
where ${J_{i}^{\ast }}$ is the optimum value of (
2)
.
The proof of this theorem is presented in Appendix
A.
4 Infinite Time Horizon Case
In this section we consider the same linear affine quadratic game when the time horizon is infinite. As the case analysed in the last section, the players are trying to minimize certain loss inflicted by a disturbance, besides, the functional cost of the game is restricted by a differential equation which considers an affine term. In this type of game the functional cost is given by:
and the constraint has the following form:
The involved matrices are constant with corresponding dimension and involved matrices in (
24), satisfy equivalent restriction of the finite time counterpart. Following Engwerda (
2005), we assume that
$c\in {L_{\mathrm{exp},\hspace{0.1667em}\mathrm{loc}}^{2}}$, that is, locally square integrable and converging to zero exponentially. In this case, the system of algebraic Riccati equations takes on the form:
where
$\tilde{A}:=A-{\textstyle\sum _{j=1}^{N}}{S_{j}}\hspace{0.1667em}{P_{j}}$.
To find the solution to this problem the completion of the square method is developed in Poznyak (
2008) and the following theorem is stated.
Theorem 4.
For the differential game problem given by the equations (
24)–(
25)
, if the algebraic Riccati equations (
26)
possess symmetric stabilizing solutions ${P_{i}}$, then the infinite time horizon Robust Nash Equilibrium strategies are given by
and the worst case will be given by
where each ${m_{i}}$ fulfills the equation
and
Moreover, the optimal value ${J_{i}^{\ast }}$ is given by
and the closed loop states equation has the form
The proof of this theorem is found in Appendix
A.
5 Numerical Example: A Differential Game Model for a Vertical Marketing System with Demand Fluctuation and Seasonal Prices
Consider a noncooperative game in a two-echelon supply chain established between two chain agents (Dockner
et al.,
2000; Jørgensen,
1986); a single supplier (called the manufacturer) and a single distributor (called the retailer). The manufacturer is in charge of selling a product type to a single retailer over a period of time
T at the price
${p_{1}}(t)$. The retailer is in charge of distributing and marketing that product, at a price
${p_{2}}(t)={p_{1}}(t)+{r_{2}}(t)$, where
${r_{2}}(t)$ represents the profit margin gained by the retailer at time
t per each unit sold. In this case, let us set
${r_{2}}=0.2{p_{1}}$.
The dynamic of the game is established by both players searching for a Nash equilibrium in their coordination contract, and furthermore facing some source of uncertainties. For this particular case assume that the retailer deals with a demand that evolves exogenously over time, with the quantity sold per time unit,
d, depending not only on price
${p_{1}}$, but also on the time
t elapsed,
$d=d(p,t)$. The exogenous change in demand presented here is due to seasonal fluctuations. Under such enviroment, the profit equations for each players are
${J_{1}}$ and
${J_{2}}$ with the following quadratic structure:
subject to the following dynamic
where
${J_{1}}$ indicates the operating cost faced by the manufacturer given by the holding cost and the production cost, plus a small penalization of the inventories at the final time of the horizon. On the other hand,
${J_{2}}$ indicates the operating cost incurred in by the retailer obtained by the holding cost, the production cost (including the price paid to the manufacturer for the products), and the perturbation signal
w seen as a malicious fictitious player, and a small penalization of the inventories at the final time of the horizon. This game involves the dynamic changes of the inventory for each player
$({x_{1}},{x_{2}})$, with the production rate
$({u_{1}},{u_{2}})$ as decision variables. Moreover, the retailer’s dynamic faces an uncertain demand represented by two terms, the deterministic demand
d plus the uncertain factor
ew.
where

Fig. 1
Price vs Perturbed demand.

Fig. 2
Riccati differential equation player 1 (manufacturer).

Fig. 3
Riccati differential equation player 2 (retailer).

Fig. 4
Comparison between manufacturer produced units (${u_{1}}$), units demanded by the retailer (${u_{2}}$), and units left in the manufacturer stock ${x_{1}}$.
According to the game equations (
1) and (
2),
$N=2$. We used the Matlab software to solve numerically backward in time (
21), thus obtaining the corresponding robust Nash equilibrium strategies for each player. The results of such numerical solution are shown in figures (Figs.
2,
3). In Fig.
1 are depicted the perturbed demand and the manufacturer price. Figure
4 shows the behaviour of decision variables from each player, and the state equation from the manufacturer (
${u_{1}}$ production manufacturer’s rate,
${u_{2}}$ purchasing retailer’s rate, and manufacturer’s inventory). Through this figure we can compare the different outputs, for instance, we observe that products left in the manufacturer’s stock basically are close to zero. In fact, this figure shows the advantages of better coordination between the different chain agents, in order to reduce bullwip effect. Since, the manufacturer and the retailer share information about customer demand, the produced goods from the manufacturer and the purchased goods from the retailer are deriving in similar behaviour.
Also, since there are no restrictions for the states of a given stage in the chain we can see that, at times, we are going to have negative values for this variable. For example, between $t=8$ and $t=16$, the units left in stock get a negative value, this only means that the manufacturer has backlogged units. However, we can appreciate that the amount of this backlogged units is minimum. Also for the closing of the season, between $t=20$ and $t=25$ it is better for the manufacturer only to have backlogged units. Once we get a Nash equilibrium, any deviation from the output policies would result in a loss for the manufacturer or the retailer.

Fig. 5
Comparison between retailer bought units (${u_{2}}$), units demanded by the final consumer (D), and units left in the retailer stock (${x_{2}}$).
On the other hand, Fig.
5 shows the behaviour of the retailer’s dynamics through the time horizon. We can appreciate that the strategy followed by the retailer differs from the manufacturer in that the retailer uses inventory to face demand uncertainties. The retailer is considering the worst case of any perturbation on demand, but stock units are kept up to the minimum. The decisions at the end of the planning horizon are perturbed by the finite time horizon condition. For that reason the planning horizon was extended to two years in order to avoid such perturbations in the first year.