1 Introduction
MATLAB is a popular problem solving environment, widely used for general scientific computation in education, engineering and research. MATLAB is nowadays a standard tools in many areas. Thanks to its collection of direct (e.g. $\mathit{LU}$, ${\mathit{LDL}^{\top }}$, Cholesky) and iterative (e.g. conjugate gradient, GMRES, bi-conjugate gradient) solvers, it is tempting to use MATLAB for the numerical approximation of partial differential equations (PDEs). For the finite element method (FEM), the main obstacle for using MATLAB is the assembly of the global matrix and vector. Since MATLAB built-in solvers are optimized, the assembly operations can take up to 99% of whole CPU time, as shown in Koko (2007), using an implementation with standard loop over triangles (Alberty et al., 1999, 2002; Kwon and Bang, 2000) directly derived from compiled languages (FORTRAN and C/C++). Unfortunately, some vectorized FEM codes are less flexible and require a huge amount of memory due to the allocation of auxiliary arrays and the corresponding matrix operations (Koko, 2007; Rahman and Valdman, 2013, 2015). Recently, Koko (2016) proposed a MATLAB implementation close to the standard form by using cell-arrays to store the gradient of the basis functions, for the Poisson equation and linear elasticity in 2D and 3D.
In this paper, we propose a fast MATLAB implementation of the $P1$-Bubble/$P1$ finite element (Mini element, Arnold et al., 1984; Ern and Guermond, 2002; Glowinski, 2003) for the generalized Stokes problem in 2D and 3D. The mini element for spatial discretization of the Stokes problem is easy to use in engineering practice since it allows for the use of equal-order interpolation (the same mesh for velocity and pressure). Equal-order interpolation is very useful in large-scale multi-physics codes. Indeed, a code dealing with several independent variables (e.g. chemical species, velocity components, etc.) requires the transfer of information between its different components at each time-step. Fast implementation means that our code operates on array and does not use for-loops over elements (triangles of tetrahedrons) for the assembling operations. Instead, we use cell-arrays to store element matrices as in Koko (2016). We also propose a solution strategy for the final linear system. Indeed, we propose an efficient (preconditioned) Uzawa conjugate gradient method derived from the one used with $P2/P1$ (or $P1$-iso-$P2/P1$) finite element pair (Cahouet and Chabard, 1988; Glowinski, 2003). The proposed conjugate gradient method operates on the dual (pressure) space and, at each iteration, d independent linear systems are solved ($d=2,3$). Our implementation needs only MATLAB basic distribution functions and can be easily modified and refined.
The paper is organized as follows. The model problem is described in Section 2, followed by a finite element discretization in Section 3. The element matrices in 2D/3D are described in Section 4. In Section 5, we propose our Uzawa conjugate gradient method for solving the Stokes system. MATLAB implementation details are given in Section 6. Numerical experiments are carried out in Section 7. Readers can download and edit the codes from http://www.isima.fr/~jkoko/Codes/KSTOK.tar.gz.
2 The Model Problem
Let Ω be a bounded domain in ${\mathbb{R}^{d}}$ ($d=2,3$) with a Lipschitz-continuous boundary Γ. Consider in Ω the Stokes problem
where $\boldsymbol{u}=({u_{1}},\dots ,{u_{d}})\in {\mathbb{R}^{d}}$ is the velocity vector, p, the pressure, and $\boldsymbol{f}=({f_{1}},\dots ,{f_{d}})\in {\mathbb{R}^{d}}$, the field of external forces. In equation (2.1), $\alpha \geqslant 0$ is an arbitrary constant. If $\alpha =0$, then equations (2.1)–(2.3) turn to be the classic Stokes problem. If $\alpha >0$, then equations (2.1)–(2.3) turn to be a generalized Stokes problem encountered in time discretization of Navier-Stokes equations (see e.g. Glowinski, 2003). The constant $\nu >0$ is the kinematic viscosity.
We need the functional spaces $\boldsymbol{V}={H_{0}^{1}}{(\Omega )^{d}}$,
\[ {\boldsymbol{V}^{D}}=\big\{\boldsymbol{v}\in {H^{1}}{(\Omega )^{d}}\hspace{0.1667em}:\hspace{0.1667em}\boldsymbol{v}={\boldsymbol{u}^{D}}\hspace{2.5pt}\text{on}\hspace{2.5pt}\Gamma \big\},\hspace{1em}P=\bigg\{q\in {L^{2}}(\Omega )\hspace{0.1667em}:\hspace{0.1667em}{\int _{\Omega }}q\hspace{0.1667em}\mathrm{d}x=0\bigg\},\]
and bilinear forms
\[\begin{array}{l}\displaystyle {a_{i}}({u_{i}},{v_{i}})=\alpha {({u_{i}},{v_{i}})_{\Omega }}+\nu {(\nabla {u_{i}},\nabla {v_{i}})_{\Omega }},\hspace{1em}i=1,\dots ,d,\\ {} \displaystyle \boldsymbol{a}(\boldsymbol{u},\boldsymbol{v})={\sum \limits_{i=1}^{d}}{a_{i}}({u_{i}},{v_{i}}),\end{array}\]
where ${(\cdot \hspace{0.1667em},\cdot )_{\Omega }}$ stands for the standard ${L^{2}}(\Omega )$ scalar product. The variational formulation of the Stokes problem (2.1)–(2.3) is as follows:Find $(\boldsymbol{u},p)\in {\boldsymbol{V}^{D}}\times P$ such that:
3 Finite Element Discretization with $P1$-Bubble-$P1$
For the finite element discretization of (2.4)–(2.5), we have to chose a finite element pair for the velocity field and the pressure. This choice cannot be arbitrary but must satisfy the inf-sup condition (Babuska, 1971; Brezzi, 1974).
3.1 Mini Element
In this paper we study the discretization of the Stokes problem (2.1)–(2.3) by the finite element pair $P1$-bubble/$P1$ (the so-called mini-element), introduced by Arnold et al. (1984). This element leads to a relatively low number of degrees of freedom with a good approximate solution.
Let ${\mathcal{T}_{h}}$ be a triangulation of Ω and T a triangle of ${\mathcal{T}_{h}}$. We define the space associated with the bubble by
\[ {\mathbb{B}_{h}}=\big\{{\boldsymbol{v}_{h}}\in {\mathcal{C}^{0}}(\bar{\Omega });\hspace{2.5pt}\forall T\in {\mathcal{T}_{h}},\hspace{2.5pt}{v_{h|T}}=\boldsymbol{x}{b^{(T)}}\big\}.\]
We also defined the discrete function spaces
\[\begin{array}{l}\displaystyle {V_{ih}}=\big\{{\boldsymbol{v}_{h}}\in {\mathcal{C}^{0}}(\bar{\Omega });\hspace{0.2778em}{\boldsymbol{v}_{h|T}}\in {P^{1}},\hspace{2.5pt}\forall T\in {\mathcal{T}_{h}};\hspace{2.5pt}{\boldsymbol{v}_{h|\Gamma }}=0\big\},\hspace{1em}i=1,\dots ,d,\\ {} \displaystyle {P_{h}}=\bigg\{{q_{h}}\in {\mathcal{C}^{0}}(\bar{\Omega });\hspace{0.2778em}{q_{h|T}}\in {P^{1}},\hspace{2.5pt}\forall T\in {\mathcal{T}_{h}};\hspace{0.1667em}:\hspace{0.1667em}{\int _{\Omega }}{q_{h}}\hspace{0.1667em}\mathrm{d}x=0\bigg\},\end{array}\]
and we set ${X_{ih}}={V_{ih}}\oplus {\mathbb{B}_{h}}$ and ${\boldsymbol{X}_{h}}={X_{1h}}\times \cdots \times {X_{dh}}$. With the above preparations, the discrete variational problem reads as follows.Find $({\boldsymbol{u}_{h}},{p_{h}})\in {\boldsymbol{X}_{h}}\times {P_{h}}$ such that
For a given triangle T, the velocity field ${\boldsymbol{u}_{h}}$ and the pressure ${p_{h}}$ are approximated by linear combinations of the basis functions in the form
\[ {\boldsymbol{u}_{h}}(\boldsymbol{x})={\sum \limits_{i=1}^{d+1}}{\phi _{i}}(\boldsymbol{x}){\mathbf{u}_{i}}+{\mathbf{u}_{b}}{\phi _{b}}(\boldsymbol{x}),\hspace{2em}{p_{h}}(\boldsymbol{x})={\sum \limits_{i=1}^{d+1}}{\phi _{i}}(\boldsymbol{x}){\mathrm{p}_{i}},\]
where ${\mathbf{u}_{i}}$ and ${\mathrm{p}_{i}}$ are nodal values of ${\boldsymbol{u}_{h}}$ and ${p_{h}}$ while ${\mathbf{u}_{b}}$ is the bubble value. The basis functions are defined by
\[ {\phi _{1}}(\boldsymbol{x})=1-x-y,\hspace{2em}{\phi _{2}}(\boldsymbol{x})=x,\hspace{2em}{\phi _{3}}(\boldsymbol{x})=y,\hspace{2em}{\phi _{b}}(\boldsymbol{x})=27{\prod \limits_{i=1}^{3}}{\phi _{i}}(\boldsymbol{x})\]
in 2D, and
\[\begin{array}{l}\displaystyle {\phi _{1}}(\boldsymbol{x})=1-x-y-z,\hspace{2em}{\phi _{2}}(\boldsymbol{x})=x,\hspace{2em}{\phi _{3}}(\boldsymbol{x})=y,\\ {} \displaystyle {\phi _{4}}(\boldsymbol{x})=z,\hspace{2em}{\phi _{b}}(\boldsymbol{x})=256{\prod \limits_{i=1}^{4}}{\phi _{i}}(\boldsymbol{x})\end{array}\]
in 3D.To construct the coefficient matrices, a number of integrals involving powers of the basis functions will be computed. Integrals over a triangle T can be evaluated directly by the following formulas
where $|T|$ stands for the triangle area (in 2D), or the tetrahedron volume (in 3D). A useful property for the basis functions is
(3.3)
\[ {\int _{T}}{\phi _{1}^{{\alpha _{1}}}}{\phi _{2}^{{\alpha _{2}}}}{\phi _{3}^{{\alpha _{3}}}}\mathrm{d}\boldsymbol{x}=2|T|\frac{{\alpha _{1}}!\hspace{0.1667em}{\alpha _{2}}!\hspace{0.1667em}{\alpha _{3}}!}{({\alpha _{1}}+{\alpha _{2}}+{\alpha _{3}}+2)!},\](3.4)
\[ {\int _{T}}{\phi _{1}^{{\alpha _{1}}}}{\phi _{2}^{{\alpha _{2}}}}{\phi _{3}^{{\alpha _{3}}}}{\phi _{4}^{{\alpha _{4}}}}\mathrm{d}\boldsymbol{x}=6|T|\frac{{\alpha _{1}}!\hspace{0.1667em}{\alpha _{2}}!\hspace{0.1667em}{\alpha _{3}}!\hspace{0.1667em}{\alpha _{4}}\hspace{-0.1667em}}{({\alpha _{1}}+{\alpha _{2}}+{\alpha _{3}}+{\alpha _{4}}+3)!},\]3.2 Algebraic Formulation
We use the following notations for the discrete velocity/pressure nodal values
Systems (3.1)–(3.2) lead to the following algebraic form, using notations (3.6)
in 2D, or
in 3D. In (3.7) and (3.8) $\bar{A}=\bar{M}+\bar{R}$, with $\bar{M}$ as the mass matrix and $\bar{R}$ as the stiffness matrix. ${\bar{B}_{i}}$ is the divergence submatrix associated with the i-th partial derivative, i.e.
To create the algebraic system (3.7) or (3.8), the discrete system (3.1)–(3.2) is evaluated over each triangle T to obtain the element matrices and vectors
(3.6)
\[ {\bar{u}_{i}}=\left[\substack{{u_{i}}\\ {} {u_{ib}}}\right],\hspace{2em}{\bar{f}_{i}}=\left[\substack{{f_{i}}\\ {} {f_{ib}}}\right],\hspace{1em}i=1,\dots ,d.\](3.7)
\[ \left[\begin{array}{c@{\hskip4.0pt}c@{\hskip4.0pt}c}\bar{A}\hspace{1em}& 0\hspace{1em}& -{\bar{B}_{1}^{\top }}\\ {} 0\hspace{1em}& \bar{A}\hspace{1em}& -{\bar{B}_{2}^{\top }}\\ {} -{\bar{B}_{1}}\hspace{1em}& -{\bar{B}_{2}}\hspace{1em}& 0\end{array}\right]\left[\substack{{\bar{u}_{1}}\\ {} {\bar{u}_{2}}\\ {} \mathrm{p}}\right]=\left[\substack{{\bar{f}_{1}}\\ {} {\bar{f}_{2}}\\ {} 0}\right]\](3.8)
\[ \left[\begin{array}{c@{\hskip4.0pt}c@{\hskip4.0pt}c@{\hskip4.0pt}c}\bar{A}\hspace{1em}& 0\hspace{1em}& 0\hspace{1em}& -{\bar{B}_{1}^{\top }}\\ {} 0\hspace{1em}& \bar{A}\hspace{1em}& 0\hspace{1em}& -{\bar{B}_{2}^{\top }}\\ {} 0\hspace{1em}& 0\hspace{1em}& \bar{A}\hspace{1em}& -{\bar{B}_{3}^{\top }}\\ {} -{\bar{B}_{1}}\hspace{1em}& -{\bar{B}_{2}}\hspace{1em}& -{\bar{B}_{3}}\hspace{1em}& 0\end{array}\right]\left[\substack{{\bar{u}_{1}}\\ {} {\bar{u}_{2}}\\ {} {\bar{u}_{3}}\\ {} p}\right]=\left[\substack{{\bar{f}_{1}}\\ {} {\bar{f}_{2}}\\ {} {\bar{f}_{3}}\\ {} 0}\right]\]
\[\begin{array}{l}\displaystyle {\bar{M}_{ij}^{(T)}}={\int _{T}}\alpha {\phi _{i}}{\phi _{j}}\mathrm{d}x,\hspace{2em}{\bar{R}_{ij}^{(T)}}={\int _{T}}\nu \nabla {\phi _{i}}\cdot \nabla {\phi _{j}}\mathrm{d}x,\\ {} \displaystyle {\bar{B}_{ij}^{(T)}}={\int _{T}}{\partial _{1}}{\phi _{i}}{\phi _{j}}\mathrm{d}x+{\int _{T}}{\partial _{2}}{\phi _{i}}{\phi _{j}}\mathrm{d}x,\hspace{2em}{\bar{f}_{i}^{(T)}}={\int _{T}}f{\phi _{i}}\mathrm{d}x.\end{array}\]
Then assembling operations consist of direct-summing the element matrices over the triangulation ${\mathcal{T}_{h}}$ to obtain the global matrices $\bar{M}=({\bar{M}_{ij}})$, $\bar{R}=({\bar{R}_{ij}})$, $\bar{B}=({\bar{B}_{ij}})$ and $\bar{f}=({\bar{f}_{i}})$
\[\begin{array}{l}\displaystyle {\bar{M}_{ij}}=\sum \limits_{T\in {\mathcal{T}_{h}}}{\bar{M}_{ij}^{(T)}},\hspace{2em}{\bar{R}_{ij}}=\sum \limits_{T\in {\mathcal{T}_{h}}}{\bar{R}_{ij}^{(T)}},\\ {} \displaystyle {\bar{B}_{ij}}=\sum \limits_{T\in {\mathcal{T}_{h}}}{\bar{B}_{ij}^{(T)}},\hspace{2em}{\bar{f}_{i}}=\sum \limits_{T\in {\mathcal{T}_{h}}}{\bar{f}_{i}^{(T)}}.\end{array}\]
In the next sections we detail the element matrices $\bar{M}$, $\bar{R}$, ${\bar{B}_{i}}$ and the right-hand side $\bar{f}$.4 Element Matrices
For $P1$ finite element, we need only the element area and the gradient of the basis functions for the element matrices and vectors. To derive vectorized MATLAB codes, we need analytical expressions for all element matrices and vectors. This section is devoted to this task. For the sake of the presentation we drop the $(T)$-superscript introduced in the previous section to distinguish element matrices from global ones.
4.1 Two-Dimensional Case
For a triangle T, let ${\{({x_{i}},{y_{i}})\}_{i=1,2,3}}$ be the vertices and ${\{\phi \}_{i=1,2,3}}$ the corresponding basis functions. The gradient of ${\phi _{i}}$ is given by
where $|T|$ is the area of T given by
(4.1)
\[ \left[\substack{\nabla {\phi _{1}^{t}}\\ {} \nabla {\phi _{2}^{t}}\\ {} \nabla {\phi _{3}^{t}}}\right]={\left[\begin{array}{c@{\hskip4.0pt}c@{\hskip4.0pt}c}1\hspace{1em}& 1\hspace{1em}& 1\\ {} {x_{1}}\hspace{1em}& {x_{2}}\hspace{1em}& {x_{3}}\\ {} {y_{1}}\hspace{1em}& {y_{2}}\hspace{1em}& {y_{3}}\end{array}\right]^{-1}}\left[\begin{array}{c@{\hskip4.0pt}c}0\hspace{1em}& 0\\ {} 1\hspace{1em}& 0\\ {} 0\hspace{1em}& 1\end{array}\right]=\frac{1}{2|T|}\left[\begin{array}{c@{\hskip4.0pt}c}{y_{2}}-{y_{3}}\hspace{1em}& {x_{3}}-{x_{2}}\\ {} {y_{3}}-{y_{1}}\hspace{1em}& {x_{1}}-{x_{3}}\\ {} {y_{1}}-{y_{2}}\hspace{1em}& {x_{2}}-{x_{1}}\end{array}\right],\]A nonbubble entry of the element stiffness matrix $\bar{R}$ is given by
For the bubble entries ${\bar{R}_{bj}}$, for $j=1,2,3$. A straightforward calculation yields to (using (3.5))
For the diagonal entry corresponding to the bubble (i.e. $i=j=b$) we have
using (3.5).
(4.2)
\[ {\bar{R}_{ij}}={\int _{T}}\nu \nabla {\phi _{i}}\cdot \nabla {\phi _{j}}\mathrm{d}x=|T|\nu \nabla {\phi _{i}}\cdot \nabla {\phi _{j}},\hspace{1em}i,j=1,2,3.\](4.3)
\[\begin{aligned}{}{\bar{R}_{bb}}=& \nu {\int _{T}}{27^{2}}\nabla ({\phi _{1}}{\phi _{2}}{\phi _{3}})\cdot \nabla ({\phi _{1}}{\phi _{2}}{\phi _{3}})\mathrm{d}x\\ {} =& \frac{81}{10}\nu |T|\left(|\nabla {\phi _{1}}{|^{2}}+|\nabla {\phi _{2}}{|^{2}}+|\nabla {\phi _{3}}{|^{2}}+\nabla {\phi _{1}}\nabla {\phi _{2}}+\nabla {\phi _{1}}\nabla {\phi _{3}}+\nabla {\phi _{2}}\nabla {\phi _{3}}\right)\\ {} =& \frac{81}{10}\nu |T|\left(|\nabla {\phi _{1}}{|^{2}}+|\nabla {\phi _{2}}{|^{2}}+\nabla {\phi _{1}}\cdot \nabla {\phi _{2}}\right)=:{\omega _{R}},\end{aligned}\]With the above results, the element stiffness matrix is therefore
As for the stiffness matrix, we set $M={({\bar{M}_{ij}})_{i,j=1,\dots ,3}}$ as the nonbubble part of the mass matrix. A direct calculation yields
The bubble part of the mass matrix is given by
The element mass matrix is therefore
where
(4.4)
\[ {M_{ij}}=\left\{\begin{array}{l@{\hskip4.0pt}l}\frac{\alpha }{6}|T|\hspace{1em}& \text{if}\hspace{2.5pt}i=j,\\ {} \frac{\alpha }{12}|T|\hspace{1em}& \text{elsewhere}.\end{array}\right.\](4.5)
\[\begin{aligned}{}{\bar{M}_{bj}}=& \frac{3\alpha }{20}|T|,\hspace{1em}j=1,2,3,\\ {} {\bar{M}_{bb}}=& \frac{81\alpha }{280}|T|=:{\omega _{M}}.\end{aligned}\]Finally, the element stiffness/mass matrix $\bar{A}$ is
\[ \bar{A}=\left(\begin{array}{c@{\hskip4.0pt}c}A\hspace{1em}& z\\ {} {z^{\top }}\hspace{1em}& \omega \end{array}\right),\]
where we have set $A=R+M$ and $\omega ={\omega _{R}}+{\omega _{M}}$.A direct integration yields the element divergence matrix $-\bar{B}=[-{\bar{B}_{1}}\hspace{2.5pt}-{\bar{B}_{2}}]$, where ${\bar{B}_{i}}=[{B_{i}}\hspace{2.5pt}-{B_{ib}}]$ with
and
(4.6)
\[ {B_{i}}=\frac{|T|}{3}\left[\begin{array}{c@{\hskip4.0pt}c@{\hskip4.0pt}c}{\partial _{i}}{\phi _{1}}\hspace{1em}& {\partial _{i}}{\phi _{2}}\hspace{1em}& {\partial _{i}}{\phi _{3}}\\ {} {\partial _{i}}{\phi _{1}}\hspace{1em}& {\partial _{i}}{\phi _{2}}\hspace{1em}& {\partial _{i}}{\phi _{3}}\\ {} {\partial _{i}}{\phi _{1}}\hspace{1em}& {\partial _{i}}{\phi _{2}}\hspace{1em}& {\partial _{i}}{\phi _{3}}\end{array}\right],\hspace{1em}i=1,2,\]The contribution of the right-hand side component ${f_{i}}$, in nonbubble terms, is given by
where ${f_{iT}}$ is a mean value of ${f_{i}}$ on T. The bubble terms of the right-hand side are
With the above element matrices and vectors the $11\times 11$ element system corresponding to (3.7) is
We can now eliminate the bubble unknowns ${u_{1b}}$ and ${u_{2b}}$ since they correspond to diagonal blocks (the ω blocks). From (4.10)${_{3}}$ and (4.10)${_{4}}$, we deduce that
Substituting (4.11) into (4.10)${_{1}}$, (4.10)${_{2}}$ and (4.10)${_{5}}$ we obtain a linear system in ${({u_{1}}\hspace{2.5pt}{u_{2}}\hspace{2.5pt}p)^{\top }}$ whose matrix is
and the right-hand side
\[ \left[\begin{array}{c@{\hskip4.0pt}c@{\hskip4.0pt}c@{\hskip4.0pt}c@{\hskip4.0pt}c}A\hspace{1em}& z\hspace{1em}& 0\hspace{1em}& 0\hspace{1em}& -{B_{1}^{\top }}\\ {} {z^{\top }}\hspace{1em}& \omega \hspace{1em}& 0\hspace{1em}& 0\hspace{1em}& {B_{1b}^{\top }}\\ {} 0\hspace{1em}& 0\hspace{1em}& A\hspace{1em}& z\hspace{1em}& -{B_{2}^{\top }}\\ {} 0\hspace{1em}& 0\hspace{1em}& {z^{\top }}\hspace{1em}& \omega \hspace{1em}& {B_{2b}^{\top }}\\ {} -{B_{1}}\hspace{1em}& {B_{1b}}\hspace{1em}& -{B_{2}}\hspace{1em}& {B_{2b}}\hspace{1em}& 0\end{array}\right]\left[\substack{{u_{1}}\\ {} {u_{1b}}\\ {} {u_{2}}\\ {} {u_{2b}}\\ {} p}\right]=\left[\substack{{f_{1}}\\ {} {f_{1b}}\\ {} {f_{2}}\\ {} {f_{2b}}\\ {} 0}\right].\]
To reveal diagonal blocks, the system above can be rearranged as follows
(4.10)
\[ \left[\begin{array}{c@{\hskip4.0pt}c@{\hskip4.0pt}c@{\hskip4.0pt}c@{\hskip4.0pt}c}A\hspace{1em}& 0\hspace{1em}& z\hspace{1em}& 0\hspace{1em}& -{B_{1}^{\top }}\\ {} 0\hspace{1em}& A\hspace{1em}& 0\hspace{1em}& z\hspace{1em}& -{B_{2}^{\top }}\\ {} {z^{\top }}\hspace{1em}& 0\hspace{1em}& \omega \hspace{1em}& 0\hspace{1em}& {B_{1b}^{\top }}\\ {} 0\hspace{1em}& {z^{\top }}\hspace{1em}& 0\hspace{1em}& \omega \hspace{1em}& {B_{2b}^{\top }}\\ {} -{B_{1}}\hspace{1em}& -{B_{2}}\hspace{1em}& {B_{1b}}\hspace{1em}& {B_{2b}}\hspace{1em}& 0\end{array}\right]\left[\substack{{u_{1}}\\ {} {u_{2}}\\ {} {u_{1b}}\\ {} {u_{2b}}\\ {} p}\right]=\left[\substack{{f_{1}}\\ {} {f_{2}}\\ {} {f_{1b}}\\ {} {f_{2b}}\\ {} 0}\right].\](4.12)
\[ \left[\begin{array}{c@{\hskip4.0pt}c@{\hskip4.0pt}c}A-{\omega ^{-1}}z{z^{\top }}\hspace{1em}& 0\hspace{1em}& -{B_{1}^{\top }}-{\omega ^{-1}}z{B_{1b}^{\top }}\\ {} 0\hspace{1em}& A-{\omega ^{-1}}z{z^{\top }}\hspace{1em}& -{B_{2}^{\top }}-{\omega ^{-1}}z{B_{2b}^{\top }}\\ {} -{B_{1}}-{\omega ^{-1}}{B_{1b}}{z^{\top }}\hspace{1em}& -{B_{2}}-{\omega ^{-1}}{B_{2b}}{z^{\top }}\hspace{1em}& -{\omega ^{-1}}({B_{1b}}{B_{1b}^{\top }}+{B_{2b}}{B_{2b}^{\top }})\end{array}\right]\]4.2 Three-Dimensional Case
Let ${\{{\boldsymbol{x}_{i}}=({x_{i}},{y_{i}},{z_{i}})\}_{i=1,\dots ,4}}$ be the vertices of a tetrahedron T and ${\{{\phi _{i}}\}_{i=1,\dots ,4}}$ the corresponding basis functions. The gradient ${\nabla _{\boldsymbol{x}}}{\phi _{i}}$ over T are given by
The volume of tetrahedron T is given, from (4.14), by $|T|=\mathrm{det}(J)/6$.
\[ \left[\substack{\nabla {\phi _{1}^{\top }}\\ {} \nabla {\phi _{2}^{\top }}\\ {} \nabla {\phi _{3}^{\top }}\\ {} \nabla {\phi _{4}^{\top }}}\right]={\left[\begin{array}{c@{\hskip4.0pt}c@{\hskip4.0pt}c@{\hskip4.0pt}c}1\hspace{1em}& 1\hspace{1em}& 1\hspace{1em}& 1\\ {} {x_{1}}\hspace{1em}& {x_{2}}\hspace{1em}& {x_{3}}\hspace{1em}& {x_{4}}\\ {} {y_{1}}\hspace{1em}& {y_{2}}\hspace{1em}& {y_{3}}\hspace{1em}& {y_{4}}\\ {} {z_{1}}\hspace{1em}& {z_{2}}\hspace{1em}& {z_{3}}\hspace{1em}& {z_{4}}\end{array}\right]^{-1}}\left[\begin{array}{c@{\hskip4.0pt}c@{\hskip4.0pt}c}0\hspace{1em}& 0\hspace{1em}& 0\\ {} 1\hspace{1em}& 0\hspace{1em}& 0\\ {} 0\hspace{1em}& 1\hspace{1em}& 0\\ {} 0\hspace{1em}& 0\hspace{1em}& 1\end{array}\right].\]
An alternative formula for computing $\nabla {\phi _{i}}$ is
\[ {\nabla _{\boldsymbol{x}}}{\phi _{i}}={J^{-1}}{\nabla _{\boldsymbol{\xi }}}{\phi _{i}}(\boldsymbol{\xi }),\]
where J is the Jacobean matrix of the mapping $\boldsymbol{\xi }\mapsto \boldsymbol{x}(\boldsymbol{\xi })$, that is,
(4.14)
\[ J=\left[\begin{array}{c@{\hskip4.0pt}c@{\hskip4.0pt}c}{x_{2}}-{x_{1}}\hspace{1em}& {y_{2}}-{y_{1}}\hspace{1em}& {z_{2}}-{z_{1}}\\ {} {x_{3}}-{x_{1}}\hspace{1em}& {y_{3}}-{y_{1}}\hspace{1em}& {z_{3}}-{z_{1}}\\ {} {x_{4}}-{x_{1}}\hspace{1em}& {y_{4}}-{y_{1}}\hspace{1em}& {z_{4}}-{z_{1}}\end{array}\right].\]As for 2D case, the nonbubble entries of the element stiffness matrix are given by
while ${\bar{R}_{bj}}=0$, for all $j=1,\dots ,4$. For the diagonal entry, using (3.4)–(3.5), we obtain
(4.15)
\[ {\bar{R}_{bj}}=\frac{9}{4}|T|{\sum \limits_{i=1}^{3}}\nabla {\phi _{i}}=0,\hspace{1em}j=1,2,3,4,\](4.16)
\[\begin{aligned}{}{\bar{R}_{bb}}=& \nu {\int _{T}}{256^{2}}\nabla ({\phi _{1}}{\phi _{2}}{\phi _{3}}{\phi _{4}})\cdot \nabla ({\phi _{1}}{\phi _{2}}{\phi _{3}}{\phi _{4}})\mathrm{d}x\\ {} =& \frac{8192}{845}\nu |T|\left[{\sum \limits_{\ell =1}^{3}}|\nabla {\phi _{\ell }}{|^{2}}+\nabla {\phi _{1}}\cdot \nabla {\phi _{2}}+\nabla {\phi _{1}}\cdot \nabla {\phi _{3}}+\nabla {\phi _{2}}\cdot \nabla {\phi _{3}}\right]\\ {} =:& {\omega _{R}}.\end{aligned}\]For the 3D mass matrix, a straightforward calculation with a linear tetrahedron shows that (for nonbubble entries)
where ${\alpha _{T}}$ is a mean value of α on T. The bubble part of the mass matrix is given by
The element mass matrix is therefore
where
(4.17)
\[ {\bar{M}_{ij}}=\left\{\begin{array}{l@{\hskip4.0pt}l}{\alpha _{T}}\frac{|T|}{10}\hspace{1em}& \text{if}\hspace{2.5pt}i=j,\\ {} {\alpha _{T}}\frac{|T|}{20}\hspace{1em}& \text{if}\hspace{2.5pt}i\ne j,\end{array}\right.\](4.18)
\[\begin{aligned}{}{\bar{M}_{bj}}=& \frac{8\alpha }{105}|T|,\hspace{1em}j=1,2,3,\\ {} {\bar{M}_{bb}}=& \frac{8192\alpha }{51975}|T|=:{\omega _{M}}.\end{aligned}\]If we set $-\bar{B}=[-{\bar{B}_{1}}\hspace{2.5pt}-{\bar{B}_{2}}\hspace{2.5pt}-{\bar{B}_{3}}]$ and ${\bar{B}_{i}}=[{B_{i}}\hspace{2.5pt}-{B_{ib}}]$, a straightforward calculation yields to
and
(4.19)
\[ {B_{i}}=\frac{|T|}{4}\left[\begin{array}{c@{\hskip4.0pt}c@{\hskip4.0pt}c@{\hskip4.0pt}c}{\partial _{i}}{\phi _{1}}\hspace{1em}& {\partial _{i}}{\phi _{2}}\hspace{1em}& {\partial _{i}}{\phi _{3}}\hspace{1em}& {\partial _{i}}{\phi _{4}}\\ {} {\partial _{i}}{\phi _{1}}\hspace{1em}& {\partial _{i}}{\phi _{2}}\hspace{1em}& {\partial _{i}}{\phi _{3}}\hspace{1em}& {\partial _{i}}{\phi _{4}}\\ {} {\partial _{i}}{\phi _{1}}\hspace{1em}& {\partial _{i}}{\phi _{2}}\hspace{1em}& {\partial _{i}}{\phi _{3}}\hspace{1em}& {\partial _{i}}{\phi _{4}}\\ {} {\partial _{i}}{\phi _{1}}\hspace{1em}& {\partial _{i}}{\phi _{2}}\hspace{1em}& {\partial _{i}}{\phi _{3}}\hspace{1em}& {\partial _{i}}{\phi _{4}}\end{array}\right],\hspace{1em}i=1,\dots ,4,\]The contribution of the right-hand side component ${f_{i}}$, in nonbubble terms, is given by
where ${f_{iT}}$ is a mean value of ${f_{i}}$ on T. The bubble terms of the right-hand side are
As in 2D, we rearrange the Stokes system and we get
Substituting (4.23) into the rest of the system, we obtain a linear system whose matrix is
and the right-hand side
(4.23)
\[ {u_{ib}}=\big({f_{ib}}-{B_{ib}^{\top }}p-{z^{\top }}{u_{i}}\big)\big/\omega ,\hspace{1em}i=1,\dots ,4.\](4.24)
\[ \left[\begin{array}{c@{\hskip4.0pt}c@{\hskip4.0pt}c@{\hskip4.0pt}c}A-{\omega ^{-1}}z{z^{\top }}\hspace{1em}& 0\hspace{1em}& 0\hspace{1em}& -{B_{1}^{\top }}-{\omega ^{-1}}z{B_{1b}^{\top }}\\ {} 0\hspace{1em}& A-{\omega ^{-1}}z{z^{\top }}\hspace{1em}& 0\hspace{1em}& -{B_{2}^{\top }}-{\omega ^{-1}}z{B_{2b}^{\top }}\\ {} 0\hspace{1em}& 0\hspace{1em}& A-{\omega ^{-1}}z{z^{\top }}\hspace{1em}& -{B_{3}^{\top }}-{\omega ^{-1}}z{B_{3b}^{\top }}\\ {} -{B_{1}}-{\omega ^{-1}}{B_{1b}}{z^{\top }}\hspace{1em}& -{B_{2}}-{\omega ^{-1}}{B_{2b}}{z^{\top }}\hspace{1em}& -{B_{3}}-{\omega ^{-1}}{B_{3b}}{z^{\top }}\hspace{1em}& -{\omega ^{-1}}{\textstyle\textstyle\sum _{i=1}^{3}}{B_{ib}}{B_{ib}^{\top }}\end{array}\right]\]5 Uzawa Conjugate Gradient Algorithm
We propose in this section a preconditioned conjugate gradient method for solving the Stokes system after the assembly of the element systems (4.12)–(4.13) and (4.24)–(4.25). For a finite element pair of the form ${P_{k+1}}/{P_{k}}$ (e.g. $P2/P1$ or $P1$-iso-$P2$/$P1$), the preconditioner advocated by Cahouet and Chabard (1988) (see also Fortin and Glowinski, 1983; Glowinski, 2003; Glowinski and Le Tallec, 1989) is efficient and widely used. In the author’s knowledge, there is no equivalent preconditioner for the pair $P1$-Bubble/$P1$ (or $P1/P1$ with stabilization). The algorithm presented in his section is therefore an original contribution.
The Stokes system can be rewritten in a compact form
where A is (symmetric) positive definite block diagonal matrix, C is positive semi-definite matrix.
Let us introduce the generalized Lagrangian function
Due to the properties of A and C, the saddle-point for (5.2) exists. Then it follows that (5.1) is the saddle-point equation for (5.2), that is, (5.1) characterizes the solution of the saddle-point problem
(5.2)
\[ \mathcal{L}(\mathbf{u},\mathbf{p})=\frac{1}{2}{\mathbf{u}^{\top }}\mathbf{A}\mathbf{u}-{\mathbf{f}^{\top }}\mathbf{u}-{\mathrm{p}^{\top }}\mathbf{B}\mathbf{u}-\frac{1}{2}{\mathrm{p}^{\top }}\mathrm{C}\mathrm{p}-{\mathrm{f}_{p}^{\top }}\mathrm{p}.\]5.1 Uzawa Conjuage Gradient Algoritm
To derive a dual (Uzawa) algorithm for (5.1), we assume that $\mathbf{u}=\mathbf{u}(\mathrm{p})$ is the solution of Poisson equation
that is, in the decomposed useful form
If we introduce the dual functional
\[ {\mathrm{A}_{i}}{\mathrm{u}_{i}}={\mathrm{f}_{i}}+{\mathrm{B}_{i}^{\top }}\mathrm{p},\hspace{1em}i=1,\dots ,d.\]
Then multiplying (5.4) by u and substituting the result in (5.2), we obtain
(5.5)
\[ \mathcal{L}(\mathbf{u}(\mathrm{p}),\mathrm{p})=-\frac{1}{2}{\mathbf{u}^{\top }}\mathbf{A}\mathbf{u}-\frac{1}{2}{\mathrm{p}^{\top }}\mathrm{C}\mathrm{p}-{\mathrm{f}_{p}^{\top }}\mathrm{p}.\]
\[ {J^{\ast }}(\mathrm{p})=-\underset{\mathbf{u}}{\min }\mathcal{L}(\mathbf{u}(\mathrm{p}),\mathrm{p})\]
with $\mathbf{u}(\mathrm{p})$ solution of (5.4), the saddle-point problem (5.3) becomes.Find p such that
From (5.5), ${J^{\ast }}$ is quadratic and coercive. From (5.4), we deduce that the mapping $\mathrm{p}\mapsto \mathbf{u}(\mathrm{p})$ is linear and $\mathbf{u}(\mathrm{p}+t\mathrm{d})=\mathbf{u}(\mathrm{p})+t\mathbf{w}$ where w is the solution of the sensitivity problem
or
It follows that the derivative of ${J^{\ast }}$ is
With a search direction d, we compute an optimal stepsize ρ by solving
that is,
(5.8)
\[ \mathrm{r}:=\nabla {J^{\ast }}(\mathrm{p})=\mathbf{B}\mathbf{u}+\mathrm{C}\mathrm{p}+{\mathrm{f}_{p}}.\]
\[ \rho =-{\mathrm{d}^{\top }}(\mathbf{B}\mathbf{w}+\mathrm{C}\mathrm{d})/({\mathrm{r}^{\top }}\mathrm{d}),\]
where w is the solution of the sensitivity equation (5.7).Since ${J^{\ast }}$ is a quadratic function, the best search direction is a conjugate gradient direction. As a consequence, the best algorithm for (5.6) is a conjugate gradient algorithm. At each iteration k, the Fletcher–Reeves conjugate gradient direction is given by
A dual (Uzawa) conjugate gradient algorithm for solving the saddle-point problem (5.1) is outlined in Algorithm 1. Theoretically, Algorithm 1 converges in at most ${n_{B}}=\mathrm{rank}(\mathbf{B})$ iterations. Obviously, for large scale problems, ${n_{B}}$ is very large and it is preferable that convergence should be obtained in a number of iterations considerably less than ${n_{B}}$.
Algorithm 1
Uzawa conjugate gradient algorithm for (5.1).
5.2 Preconditioned Uzawa Conjugate Gradient Algorithm
A practical implementation of a conjugate gradient method for solving (5.1) requires a preconditioner, that is, a suitable scalar product for computing $\nabla {J^{\ast }}(\mathrm{p})$ instead of the standard one used in (5.8). The convergence properties of the conjugate gradient method for the generalized Stokes problem are deteriorated for large values of the ratio $\alpha /\nu $, see e.g. (Glowinski, 2003; Glowinski and Le Tallec, 1989).
To derive a preconditioner for (5.1) following the idea of Cahouet and Chabard (1988), Glowinski and Guidoboni (2009), we need to simplify the continuous problem. We first notice that the equivalence between $P1$-Bubble/$P1$ element and the stabilized formulation has been proved (Pierre, 1987, 1989; 1995; Matsumoto, 2005; Baiocchi et al., 1993). Then, if we neglect the bubble contribution in the stiffness and divergence matrices, (5.1) can be expressed in the strong form as
where ${\nu _{h}}$ is an element dependent constant.
As in Glowinski and Guidoboni (2009) we define the linear operator from ${L^{2}}(\Omega )$ into ${L^{2}}(\Omega )$ (neglecting the constant term in (5.8))
where ${\boldsymbol{u}_{q}}$ is the solution of
Note that (5.12) is the (strong) sensitivity system. The idea behind preconditioning is to find a linear operator $\mathcal{B}$ such that $\mathcal{B}\mathcal{A}q=q$. Applying the divergence operator in (5.12), we obtain
Using (5.11) in (5.13), we obtain
Then, in practice, at each step of the preconditioned conjugate gradient algorithm, the gradient of ${J^{\ast }}$ is computed as an approximate solution of the linear system
where K and M are the stiffness and the mass matrices, respectively. Let us introduce the mesh Reynolds number
where h is the mesh size. Taking into account the CPU time and the storage requirement for computing the last term of the matrix involved in (5.15), we consider the following preconditioning system instead
where
In (5.16) and (5.17), K and M are $P1$ stiffness and mass matrix, respectively, and C, the bubble matrix from Functions 6.1–6.3.
(5.11)
\[ \phi :=\mathcal{A}q=\nabla \cdot {\boldsymbol{u}_{q}}-\nabla \cdot ({\nu _{h}}\nabla q),\]
\[ \alpha \nabla \cdot {\boldsymbol{u}_{q}}-\nu \nabla \cdot \Delta {\boldsymbol{u}_{q}}=-\Delta q\]
or
(5.13)
\[ \alpha \nabla \cdot {\boldsymbol{u}_{q}}-\nu \Delta (\nabla \cdot {\boldsymbol{u}_{q}})=-\Delta q.\](5.14)
\[ -\Delta q-\alpha \nabla \cdot ({\nu _{h}}\nabla q)+\nu \Delta \big(\nabla \cdot ({\nu _{h}}\nabla q)\big)=\alpha \phi -\nu \Delta \phi .\](5.17)
\[ \mathrm{H}=\left\{\begin{array}{l@{\hskip4.0pt}l}\nu K+\alpha \mathrm{diag}(M)\hspace{1em}& \text{if}\hspace{2.5pt}R{e_{h}}\leqslant 1,\\ {} \nu K+\alpha \mathbb{I}\hspace{1em}& \text{if}\hspace{2.5pt}R{e_{h}}>1.\end{array}\right.\]Algorithm 2
Preconditioned Uzawa conjugate gradient algorithm for (5.1).
From the theory of preconditioned conjugate gradient methods for the Stokes problem (see e.g. Glowinski, 2003), if Dirichlet conditions are imposed for the velocity, then for g in (5.14) we must impose $\partial \mathrm{g}/\partial n=0$ (homogeneous Neumann boundary conditions). On the other hand, where a stress condition is prescribed for the fluid, we must impose $\mathrm{g}=0$ in (5.14).
Remark 1.
The preconditioning system (5.15), that is,
induces (over the discrete pressure space) the norm $|\mathrm{g}{|_{P}}={\mathrm{g}^{\top }}\mathrm{r}$.
6 MATLAB Implementation
We know in detail the assembly of the Stokes systems (4.12)–(4.13) and (4.24)–(4.25). For the computational efficiency, the MATLAB codes must be vectorized (i.e. without long for-loops). We then use arrays, cell-arrays, and MATLAB vectorized operators and functions
.*, ./, .^, sum, sparse
6.1 Mesh Representation and KPDE Package
We assume that the triangulation of Ω consists of np nodes and nt elements (triangles or tetrahedrons). We adopt the mesh representation by arrays used in Koko (2007; 2016), Persson and Strang (2004). The nodes coordinates are stored in an array p(1:np,1:2) (in 2D) or p(1:np,1:3) (in 3D). The element nodes are stores in an array t(1:nt,1:3) (in 2D) or t(1:nt,1:4) (in 3D). Dirichlet boundary conditions are provided by a list of nodes and the corresponding prescribed values.
As shown in Koko (2016), using cell-array in FEM allows to have implementation close to the standard form used in classical languages (C/C++, FORTRAN) FEM codes, while being efficient. The idea is to compute and store, for all triangles, the element matrix entry ${A_{ij}^{(T)}}$ in the cell-array At{i,j}. Then we use MATLAB function sparse to assemble A with finite small for loops
for i=1:nd for j=1:nd A=A+sparse(t(:,i),t(:,j),At{i,j},np,np); end end
where nd=3 (in 2D) or nd=4 (in 3D). This approach is used in KPDE package (Koko, 2016) for assembling matrices and vectors from Poisson and linear elasticity equations in 2D and 3D.
We need two key functions from KPDE package, kpde2dgphi and kpde3dgphi, which compute the gradient of the basis functions and the elements volume as follows
[ar,g]=kpde2dgphi(p,t); %2D [vol,g]=kpde3dgphi(p,t); %3D
g is 3-by-1 or 4-by-1 cell-array such that g{i}(:,k) is ${\partial _{k}}{\phi _{i}}(x)$ for all elements.
6.2 2D Case
Using the cell array g and the array ar, the nonbubble entry ${\bar{R}_{ij}}$, (4.2), of the element stiffness matrix is then given (for all triangles) by
Rij=nu*ar.*sum(g{i}.*g{j},2).
The bubble diagonal entry ${\omega _{R}}$, (4.3), is computed as follows for all triangles
omega_r=(81/10)*nu.*ar.*(sum(g1.^2,2)+sum(g2.^2,2) +sum(g1.*g2,2)).
The nonbubble entry ${M_{ij}}$, (4.4), of the element mass matrix is,
Mii=alpha.*ar/6 ; % diagonal Mij=alpha.*ar/12; % off-diagonal.
The bubble entries (4.5) of the element mass matrix are given by
omega_m=(81/280)*alpha.*ar; % diagonal Mbj=(3/20)*alpha.*ar}; % off-diagonal.
The entries $(i,j)$ of the element divergence matrix (4.6) are vectorized as follows
B1ij=ar.*g{i}(:,1)/3; B2ij=ar.*g{i}(:,2)/3;
while the bubble entries (4.7) are
B1ib=(9/10)*ar.*g{i}(:,1); B2ib=(9/10)*ar.*g{i}(:,2).
The vectorized element contribution of the right-hand side (4.8) is
f1=(1/3)*ar.*f1t; f2=(1/3)*ar.*f2t;
while the bubble entries (4.9) are
f1b=(9/20)*ar.*f1t; f2b=(9/20)*ar.*f2t.
Function 6.1
Assembly of the 2D Stokes matrix (4.12).
Function 6.2
Assembly of the Stokes right-hand side (4.13).
MATLAB Functions 6.1–6.2 assemble the Stokes system. To make the assembling functions self-contained, all calculations are integrated except elements area and the gradient of basis functions which can be computed outside and passed as argument. Note that in 2D, the time for computing the triangles area and the gradient of the basis functions is not significant. Functions 6.1–6.2 can be called without the last two arguments without a significant computational overcost.
Instead of matrix (4.12) and vector (4.13), Functions 6.1–6.2 can return submatrices and subvectors of (4.12)–(4.13) if called with more than one output arguments. The statement
[A,B,C]=kstok2dp1bmat(p,t,nu,alpha)
returns the velocity stiffness matrix A, the divergence matrix B=[B1 B2] and the pressure (stiffness) matrix C. Similarly,
[b,bp]=kstok2dp1brhs(p,t,nu,alpha)
returns the velocity right-hand side b=[b1; b2] and the pressure right-hand side bp. These submatrices are used in the preconditioned Uzawa conjugate gradient method presented in Section 5.
6.3 3D Case
Using the cell-array g and the element volume vol, computed by the KPDE function kpde3dgphi, the element stiffness matrices are computed simultaneously on all elements by
Rij=nu*vol.*sum(g{i}.*g{j},2);
while bubble diagonal entry (4.16) is given by
omega_R=(8192/945)*nu.*vol.*(sum(g{1}.^2,2)+sum(g{2}.^2,2) +sum(g{3}.^2,2)...+sum(g{1}.*g{2},2) +sum(g{1}.*g{3},2)+sum(g{2}.*g{3},2)).
The nonbubble entries of element mass matrix (4.17) are computed by
Mii=alpha.*vol/10; % diagonal Mij=alpha.*vol/20; % off diagonal.
The bubble part of the mass matrix (4.18) is given by
omega_m=(8192/51975)*alpha.*vol; % diagonal Mbj=(8/105)*alpha.*vol; % off-diagonal.
The entries $(i,j)$ of the element divergence matrices (4.19) are computed simultaneously as follows
B1ij=vol.*g{i}(:,1)/4; B2ij=vol.*g{i}(:,2)/4; B3ij=vol.*g{i}(:,3)/4;
while the bubble entries are
B1ib=(32/105)*vol.*g{i}(:,1); B2ib=(32/105)*vol.*g{i}(:,2); B3ib=(32/105)*vol.*g{i}(:,3).
f1=vol.*f1t/4; f2=vol.*f2t/4; f3=vol.*f3t/4; f1b=(32/105)*vol*f1t; f2b=(32/105)*vol*f2t; f3b=(32/105)*vol*f3t.
Function 6.3
Assembly of the 3D Stokes matrix (4.24).
Function 6.4
Assembly of the 3D Stokes right-hand side (4.25).
MATLAB Functions 6.3–6.4 assemble the three-dimensional Stokes system. For computational efficiency, the elements volume vol and the gradient of the basis functions g can be computed once and for all, and passed to Functions 6.3–6.4. For more flexibility, vol and g can also be computed inside Functions 6.3–6.4 if they do not appear in the list of input arguments. As in 2D, Functions 6.3–6.4 can return submatrices and subvectors used in the preconditioned Uzawa conjugate gradient algorithm, if called with more than one output argument.
6.4 Preconditioned Uzawa Conjugate Gradient Algorithm
In our MATLAB implementation the same function (i.e. kstokcg) is used for 2D and 3D problems. For this, we use cell-arrays to store the component system informations: Cholesky factors, permutation vectors, right-hand sides. For instance, for the 3D Stokes problem we form
R={R1 R2 R3}; % Cholesky factors s={s1 s2 s3}; % Permutations vectors b={b(1:np) b(np+1:2*np) b3(2*np+1:3*np)}; %Right-hand sides.
Then in the conjugate gradient function, the velocity systems are solved using the for-loop
for i=1:nd w{i}(s{i})=R{i}’\(R{i}\b{i}(s{i})); end
where nd=3. An alternative implementation is to form the block diagonal matrix R=blkdiag(R1, R2, R3), the permutation vector s=[s1 s2 s3] such that e.g. in Step k.1, we solve
w(s)=R’\(R\b(s)).
But for large scale 3D problems, computing R’\(R\b(s)) requires a large amount of memory and can fail.
7 Numerical Experiments
We now propose some numerical experiments to demonstrate the performances of our implementation. The computations have been carried out on a Dell Precision T3610 work station equipped with Intel Xeon 3.0GHz processor with 32GB RAM. The MATLAB version is 9 (R2016a).
7.1 Scalability
We first study the scalability of our MATLAB codes: We consider the discretization of a unit cube ${(0,\hspace{2.5pt}1)^{d}}$ ($d=2,3$) with a uniform mesh of size h, with nt triangles (or tetrahedrons) and np nodes. This initial mesh is successively uniformly refined to produce meshes of size $h/2$, $h/4$, $h/8$, etc. After each refinement the number of triangles is multiplied by 4 (2D) and the number of tetrahedraons by 8 (3D). Since the assembly process is essentially based on the number of elements, we expect that the time to assemble the matrices increases by approximately the same factor, i.e. 5–6 in 2D and 8–10 in 3D as observed in Koko (2016) for Poisson equation and linear elasticity. Tables 1–2 show the assembly CPU times (in Seconds) for the Stokes system in 2D and 3D, respectively. We can notice an almost linear optimal time-scaling for our implementation.
Table 1
CPU times (in seconds) for assembling the Stokes matrix system of size N in 2D.
h | 1/32 | 1/64 | 1/128 | 1/256 | 1/512 | 1/1024 |
N | 3*1089 | 3*4225 | 3*16641 | 3*66049 | 3*263169 | 3*1050625 |
A | 0.018 | 0.071 | 0.325 | 1.516 | 7.242 | 36.390 |
b | 0.002 | 0.011 | 0.050 | 0.251 | 1.214 | 5.995 |
Table 2
CPU times (in seconds) for assembling the Stokes matrix system of size N in 3D.
h | 1/4 | 1/8 | 1/16 | 1/32 | 1/64 | 1/128 |
N | 4*125 | 4*729 | 4*4913 | 4*35937 | 4*274625 | 4*2146689 |
A | 0.075 | 0.057 | 0.397 | 3.438 | 34.538 | 348.249 |
b | 0.027 | 0.018 | 0.051 | 0.465 | 4.544 | 46.394 |
Table 3
Comparative performances of MATLAB direct solvers and Algorithm 2 for the 2D Stokes system, $\nu =1/50$ and $\alpha /\nu ={10^{3}}$.
Mesh size h | 1/32 | 1/64 | 1/128 | 1/256 | 1/512 |
Gaussian elim. CPU | 0.02 | 0.07 | 0.40 | 2.11 | 10.88 |
${\mathrm{LDL}^{\top }}$ CPU | 0.02 | 0.17 | 1.22 | 7.13 | 46.11 |
Algorithm 2 CPU | 0.02 | 0.08 | 0.35 | 1.94 | 12.01 |
7.2 Factorization Versus Uzawa Conjugate Gradient
We now consider a Stokes flow in a driven cavity $\Omega ={(0,\hspace{2.5pt}1)^{d}}$ with $\boldsymbol{f}=0$ in (2.1) and
Ω is discretized by uniform meshes of size $1/16$, $1/32$, $1/64$, $1/128$, $1/256$ and $1/512$ in 2D, and $1/4$, $1/8$, $1/16$, $1/32$ and $1/64$ in 3D. Since there are powerful linear (direct) solvers in MATLAB, we first compare our conjugate gradient algorithm with
Table 3 shows the comparative performances for the 2D Stokes problem with $\nu =1/50$ and $\alpha /\nu ={10^{3}}$. We can notice that the proposed Uzawa conjugate gradient (Algorithm 2) and the MATLAB built-in Gaussian elimination are almost equivalent even though, in Algorithm 2, the component systems are uncoupled and can be solved in parallel. The CPU times for ${\mathit{LDL}^{\top }}$ include CPU time for the factorization and columns and rows permutation to reduce fill-in that represents up to 90% of the whole CPU time.
2D:
${\Gamma _{1}}=(0,\hspace{2.5pt}1)\times \{1\}$, $\boldsymbol{u}={(1,\hspace{2.5pt}0)^{\top }}$ on ${\Gamma _{1}}$, and $\boldsymbol{u}=0$ on $\partial \Omega \setminus {\Gamma _{1}}$;
3D:
${\Gamma _{1}}=(0,\hspace{2.5pt}1)\times (0,\hspace{2.5pt}1)\times \{1\}$, $\boldsymbol{u}={(1,\hspace{2.5pt}0,\hspace{2.5pt}0)^{\top }}$ on ${\Gamma _{1}}$, and $\boldsymbol{u}=0$ on $\partial \Omega \setminus {\Gamma _{1}}$.
If the Stokes problem is used in an iterative process (e.g. time stepping or linearization), then Algorithm 2 or ${\mathit{LDL}^{\top }}$ factorization are preferable. Indeed, if a ${\mathit{LDL}^{\top }}$ is carried out (once and for all) in the initialization step, then the solution of linear system reduces to forward/backward substitutions in the rest of the iterative process. The computational cost of Algorithm 2 can be reduced by using, as initial solution at the current step, the solution of the previous step.
For the 3D Stokes problem the proposed Algorithm 2 outperforms the MATLAB direct solvers, Table 4. For the largest problem ($4\ast {65^{3}}=1\hspace{0.1667em}098\hspace{0.1667em}500$ unknowns) the direct solvers fail because of lack of memory due to fill-in during factorization. Table 5 shows the performances of the MATLAB gmres iterative solver using incomplete $LU$ factorization as preconditioner (MATLAB function ilu with ${10^{-3}}$ as drop tolerance). The value of restart parameter is 10. GMRES algorithm outperforms Algorithm 2 up to $h=1/32$. But for the largest problem, Algorithm 2 is more than four times faster. It is clear that for large scale 3D problems, the proposed Uzawa algorithm is preferable. Table 6 shows the good convergence properties of the proposed Uzawa conjugate gradient algorithm when $\alpha /\nu \gg 1$.
Table 4
Comparative CPU times (in sec.) of MATLAB direct solvers and Algorithm 2 for the 3D Stokes system, $\nu =1/50$ and $\alpha /\nu ={10^{3}}$, (OoM = Out of Memory).
Mesh size h | 1/4 | 1/8 | 1/16 | 1/32 | 1/64 |
Gaussian elim. | 0.01 | 0.05 | 1.73 | 145.44 | OoM |
${\mathrm{LDL}^{\top }}$ | 0.02 | 0.08 | 6.29 | 128.54 | OoM |
Algorithm 2 | 0.04 | 0.89 | 1.70 | 78.08 | 1894.59 |
7.3 2D Visualization
In two-dimensional incompressible fluid problems, it is usual to display the stream-lines. If the domain Ω is bounded and simply connected, in order to compute the stream-function ψ, we have to solve the Poisson–Neumann problem
where $\omega ={\partial _{1}}{u_{2}}-{\partial _{2}}{u_{1}}$ is the vorticity and τ the counter-clockwise oriented unit tangent vector at Γ. Problem (7.1)–(7.2) has a unique solution in ${H^{1}}(\Omega )/\mathbb{R}$. The variational formulation of (7.1)–(7.2) is
Find $\psi \in {H^{1}}(\Omega )$
This leads to the following algebraic system using $P1$ finite element
where R is the 2D Laplacian matrix. We impose $\psi =0$ at an arbitrary node to ensure the uniqueness.
(7.3)
\[ {(\nabla \psi ,\nabla \varphi )_{\Omega }}={({u_{1}},{\partial _{2}}\varphi )_{\Omega }}-{({u_{2}},{\partial _{1}}\varphi )_{\Omega }},\hspace{1em}\forall \varphi \in {H^{1}}(\Omega ).\]We now consider a test problem derived from a benchmark problem described in Schäfer and Turek (1996). A mesh sample is shown in Fig. 1. The inflow and outflow conditions (on left/right boundaries) are
\[\begin{array}{l}\displaystyle {u_{1}}=\frac{0.3}{{0.41^{2}}}\times 4y(0.41-y),\hspace{2.5pt}{u_{2}}=0\hspace{1em}\text{on}\hspace{2.5pt}{\Gamma _{in}}=\{0\}\times (0,0.41),\\ {} \displaystyle {u_{1}}=\frac{0.3}{{0.41^{2}}}\times 4y(0.41-y),\hspace{2.5pt}{u_{2}}=0\hspace{1em}\text{on}\hspace{2.5pt}{\Gamma _{out}}=\{2.2\}\times (0,0.41).\end{array}\]
On the other parts of the boundary of Ω, homogeneous boundary conditions are prescribed (i.e. $\boldsymbol{u}=0$). The parameter α in (2.1) is set to 0. The centre of the internal cylinder is $(0.25,0.2)$ and the diameter is 0.1. The kinematic viscosity is $\nu ={10^{-3}}$. This gives a Reynolds number of $Re=30$ based on the diameter of the cylinder and the maximum of the inflow velocity. The domain is discretized by a non uniform mesh consisting of 1730 nodes and 3280 triangles, Fig. 1. The velocity field obtained with MATLAB commandquiver(p(:,1),p(:,2),u1,u2)
is shown in Fig. 2. Isobar lines of Fig. 3 are obtained using the contour plotting function kpde2dcont from KPDE package (Koko, 2016). The streamlines of Fig. 4 are obtained by plotting the solution of (7.3) with kpde2dcont.
Unfortunately, for 3D flows, there is no simple tool for graphics output. quiver3 allows for visualization of 3D velocity fields but the result is often unsatisfactory. Plotting 3D functions (or their contours) is a non trivial problem. There is no simple subproblem like (7.3) for streamlines in 3D.
8 Conclusion
We have proposed a fast MATLAB package for the numerical approximation of the generalized Stokes problem with the mini-element. Numerical experiments show that the proposed assembling functions have an optimal linear time-scaling. The proposed Uzawa conjugate gradient algorithm outperforms the MATLAB built-in solvers for 3D problems.