## 1 Introduction

**Φ**is an $M\times N$ measurement matrix, $\boldsymbol{y}\in {\mathbb{R}^{M}}$ is a set of

*M*measurements (where

*M*can be much smaller than the original dimensionality of the signal

*N*), and $\boldsymbol{\epsilon }$ is measurement noise. Efficient signal recovery is possible even in the case when the number of the acquired measurements is far below the Shannon-Nyquist limit.

*et al.*, 2000; Bugeau

*et al.*, 2010), super-resolution (Yang

*et al.*, 2010; Dong

*et al.*, 2016), and denoising (Elad and Aharon, 2006). In order to reconstruct the signal $\boldsymbol{x}$ from a set of measurements $\boldsymbol{y}$, one has to solve the underdetermined (i.e. $M<N$) system of linear equations in Eq. (1). In the CS literature, the ratio $r=\phantom{(}M/\phantom{(}N$ is called the CS measurement rate. In order to recover the signal $\boldsymbol{x}$ from its low dimensional measurements, it is necessary to use a signal prior that enables the identification of a true solution from an infinite set of feasible solutions. This is usually done by introducing a regularization term to an existing loss function. Usually, the ${l_{0}}$ norm, or its convex relaxation, the ${l_{1}}$ norm, is used as the regularizer under the assumption that the observed signal is sparse in certain transformation domain

**Ψ**: where $\boldsymbol{s}$ denotes the sparse representation of the signal $\boldsymbol{x}$. Other signal priors can be used as regularizers as well. An unconstrained optimization problem for the sparse signal recovery using ${l_{1}}$ regularization can be written as:

##### (3)

\[ \underset{\boldsymbol{s}}{\min }\big\| \boldsymbol{y}-\boldsymbol{\Phi }{\boldsymbol{\Psi }^{-1}}\boldsymbol{s}{\big\| _{2}^{2}}+\lambda \| \boldsymbol{s}{\| _{1}}.\]*et al.*, 1993; Needell and Tropp, 2009; Beck and Teboulle, 2009; Becker

*et al.*, 2011; Wright

*et al.*, 2009). This presents a serious drawback when it comes to the real-world applications of CS.

*et al.*, 2015; Mousavi and Baraniuk, 2017; Mousavi

*et al.*, 2017; Kulkarni

*et al.*, 2016; Hantao

*et al.*, 2019; Lohit

*et al.*, 2018). Novel CS reconstruction algorithms based on deep neural networks have recently been proposed, and they represent a non-iterative, fast and efficient alternative to the traditional CS reconstruction algorithms.

## 2 Related Work

*et al.*(2015) and it represents pioneer work in the area of CS reconstruction using the learning-based approach. The main drawback of the SDA approach is that the network consists of fully-connected layers, which means that all units in two consecutive layers are connected to each other. Thus, as the signal size increases, so does the computational complexity of the neural network. Authors present an extension of their previous work in Mousavi and Baraniuk (2017) and Mousavi

*et al.*(2017). The DeepInverse network proposed in Mousavi and Baraniuk (2017) solves the image dimensionality problem by using the adjoint operator ${\boldsymbol{\Phi }^{T}}$ to initialize the weights of the fully connected reconstruction layer. In Mousavi

*et al.*(2017), a non-linear measurement operator is trained to learn a transformation from the original signal space to an undersampled measurement space. A novel class of convolutional neural networks (CNN) architectures inspired by the work of Dong

*et al.*(2016) was proposed in Kulkarni

*et al.*(2016). The proposed CNN takes image block CS measurements as inputs and outputs a block reconstruction obtained from low-dimensional measurements. Improved ReconNet was proposed in Lohit

*et al.*(2018), where the authors use adversarial loss to further improve the CS reconstruction results. Moreover, the authors add a linear fully connected layer to the existing ReconNet architecture and learn the optimal measurement and reconstruction matrix in a single network. Based on their initial work in Xie

*et al.*(2017) and Du

*et al.*(2019), the authors propose to train the neural network using perceptual loss in Du

*et al.*(2018). Perceptual loss (Johnson

*et al.*, 2016) is defined in the latent space of a secondary network and helps to preserve higher level information when compared to the commonly used per-pixel Euclidean loss. In Hantao

*et al.*(2019), the authors propose a novel

*Deep Residual Reconstruction Network*(${\text{DR}^{2}}\text{-Net}$) to restore the image from its blockwise CS measurements with an additional residual layer that enhances the preliminary image reconstruction.

*et al.*, 2015). Disadvantage of using the fully convolutional architecture is that it is not directly applicable to certain imaging modalities where the measurements correspond to the whole signal, and one cannot perform measurements in a blockwise manner. In contrast to Mousavi

*et al.*(2017) where the authors propose to learn a non-linear measurement operator in their

*DeepCodec*network, we use a linear encoding part while the non-linearities are introduced only into the residual learning network. Motivation for this is to ensure that the learned measurement operator is implementable in the real-world CS measurement systems which are mostly linear. The residual network improves the initial image reconstruction and removes eventual reconstruction artifacts.

## 3 Proposed Architecture for the CS Model

### 3.1 Convolutional Autoencoder

##### Fig. 1

##### (4)

\[ {y_{m}}=\langle {\boldsymbol{\phi }_{m}},\boldsymbol{x}\rangle ={\sum \limits_{i=1}^{N}}{\phi _{m,i}}{x_{i}}.\]**Φ**is created by arranging the measurement vectors ${\boldsymbol{\phi }_{m}^{T}}$ as rows. Signal dimensionality (i.e. image dimensions) determines the number of columns in the measurement matrix. Consequently, when image dimensions are large, a block-based CS approach is suitable since it operates on local image patches (Du

*et al.*, 2012). The block-based CS results in a lower computational complexity and requires less memory to store the measurement matrix.

##### (5)

\[ \begin{aligned}{}& Y=X\hspace{2.5pt}\underset{D}{\ast \ast }\hspace{2.5pt}\{{\phi _{m}}\},\\ {} & {Y_{m}}[i,j]={\sum \limits_{k}^{}}{\sum \limits_{l}^{}}X[Di+k,Dj+k]{\phi _{m}}[k,l].\end{aligned}\]*D*equals the size of the block

*B*and the double asterisk ($\underset{D}{\ast \ast }$) denotes a 2D convolutional operator decimated with the same factor. A two-dimensional measurement filter ${\boldsymbol{\phi }_{m}}$ is created column-wise from the measurement vector ${\boldsymbol{\phi }_{m}}$ as shown in Fig. 2. In Eq. (5),

*Y*denotes all the measurements obtained using decimated convolution over the whole input image

*X*with the collection of measurement filters $\{{\phi _{m}}\}$. A visualization of the measurement process modelled using 2D convolution is shown in Fig. 3.

##### Fig. 2

**Φ**has $M-1$ rows that are optimized. The collection of measurement filters $\{{\phi _{m}}\}$ has a depth size of

*M*(i.e. $M-1$ trainable filters and one fixed filter).

##### Fig. 3

*x*of size $B\times B$ from the whole image

*X*of size $N\times N$ is convolved with a collection of measurement filters $\{{\phi _{m}}\}$ of size $B\times B\times M$. This results in a measurement tensor $\boldsymbol{y}$ of size $1\times 1\times M$. A set of measurement tensors is denoted by

*Y*and has a size of $\frac{N}{B}\times \frac{N}{B}\times M$.

### 3.2 Predefined vs. Adaptive Measurement Matrix

**Φ**can be used in the measurement process to obtain measurements $\boldsymbol{y}$ from the input images. In the traditional CS, the measurement matrix with independent and identically distributed (i.i.d.) Gaussian measurement vectors is often used. In that case, the encoding layer of the autoencoder is initialized using the weights defined by the vectors from the measurement matrix

**Φ**and is kept fixed during the training process. A signal dimensionality reduction using a predefined (e.q. random Gaussian, Hadamard, DCT) measurement matrix

**Φ**is sub-optimal due to the fact that it does not exploit the underlying structure of the observed signal.

**Φ**from the training dataset. In the experimental section, we show the effect of the measurement matrix choice on the reconstruction results.

### 3.3 Network Training Using Normalized Measurements

**Φ**, so that it corresponds to a row vector containing all ones:

##### (6)

\[ {y_{1}}=\frac{1}{{B^{2}}}{\sum \limits_{i=1}^{{B^{2}}}}{\phi _{1,i}}{x_{i}}=\frac{1}{{B^{2}}}{\sum \limits_{i=1}^{{B^{2}}}}{x_{i}}.\]##### (7)

\[ {\hat{y}_{m}}=\frac{1}{{\textstyle\textstyle\sum _{i=1}^{{B^{2}}}}{\phi _{m,i}}}{y_{m}}-{y_{1}},\hspace{1em}m\in [2,M].\]*w*becomes either all-positive, or all-negative (depending on the gradient of the whole expression

*f*) during the back-propagation step. In return, this could introduce the undesirable zig-zagging dynamics in the gradient updates of the weights (Karpathy, 2017). As shown in Fig. 4, zig-zagging is also manifested in the loss function. The training loss function for the unnormalized measurements (red dashed line) and normalized measurements (blue solid line) are shown in the

*log*scale simultaneously. Notice that the loss function for the proposed network that is trained on mean-centred data converges significantly faster than the network trained on non-centred measurements.

### 3.4 Efficient Method for Network Initialization

*et al.*(2018) and Du

*et al.*(2019), the authors optimize the linear encoder in order to infer the optimal measurement matrix for each measurement rate

*r*. In Baldi and Hornik (1989), it has been shown that the linear autoencoder with the mean squared error (MSE) loss converges to a unique minimum corresponding to the projection onto the subspace generated by the first principal component vectors of the covariance matrix obtained using the principal component analysis (PCA). Thus, it is sub-optimal to retrain the model for each measurement rate

*r*.

##### (8)

\[ C(\boldsymbol{x})=E\big[\boldsymbol{x}{\boldsymbol{x}^{T}}\big]-E[\boldsymbol{x}]E\big[{\boldsymbol{x}^{T}}\big],\]*E*denotes the expectation operator. In the case when images are the signals of interest, PCA is performed by calculating an unbiased estimate of the covariance matrix $C(\boldsymbol{x})$ for the vectorized images, where $\boldsymbol{x}$ is a flattened image vector, and $\bar{\boldsymbol{x}}$ is its mean value:

##### (9)

\[ C(\boldsymbol{x})=\frac{1}{N-1}{\sum \limits_{i=1}^{N}}({\boldsymbol{x}_{n}}-\bar{\boldsymbol{x}}){({\boldsymbol{x}_{n}}-\bar{\boldsymbol{x}})^{T}}.\]**Σ**contains positive eigenvalues

*λ*sorted in a descending order. The eigenvalues explain the variance in the direction of corresponding eigenvector in the orthonormal matrix $\boldsymbol{U}$. Under the assumption that the variance reflects the informational content, a subset of

*M*eigenvectors with the largest eigenvalues (i.e. principal components) optimally describes the observed signal in terms of the mean squared error:

##### (10)

\[ C(\boldsymbol{x})=\boldsymbol{U}\boldsymbol{\Sigma }{\boldsymbol{U}^{T}}\approx {\boldsymbol{U}_{1:M}}{\boldsymbol{\Sigma }_{1:M}}{({\boldsymbol{U}_{1:M}})^{T}}.\]**Φ**is equal to the reduced eigenvector matrix ${\boldsymbol{U}_{1:M}^{T}}$ as in our proposal, we can write: The original image $\boldsymbol{x}$ can be reconstructed using the pseudo-inverse of the measurement matrix:

##### (13)

\[\begin{aligned}{}\boldsymbol{x}& ={\boldsymbol{\Phi }^{+}}\boldsymbol{y}\\ {} & ={\big(\boldsymbol{\Phi }{\boldsymbol{\Phi }^{T}}\big)^{-1}}\boldsymbol{\Phi }\boldsymbol{y}\\ {} & ={\big[{({\boldsymbol{U}_{1:M}})^{T}}{\boldsymbol{U}_{\mathbf{1}\mathbf{:}\boldsymbol{M}}}\big]^{-1}}{({\boldsymbol{U}_{1:M}})^{T}}\boldsymbol{y}\\ {} & ={\boldsymbol{U}_{1:M}}\boldsymbol{y}.\end{aligned}\]**Φ**for a different sub-rate

*r*, the PCA approach outputs the whole eigenvector matrix $\boldsymbol{U}$. Thus, for any measurement rate the initial measurement matrix

**Φ**can be formed by selecting a subset of

*M*largest eigenvectors and one can use them in order to initialize the model. The learning based approach is significantly slower since it is extremely hard to learn the optimal measurement operator and the network might not fully converge. Contrary, in the case of linear autoencoder, we obtain the exact solution for optimal measurement and reconstruction operator in a fraction of time needed to train the neural network. Using the PCA initialization for the autoencoder might be beneficial even when the loss function in the training procedure is not pixel-wise Euclidean and when additional regularization is introduced in the training procedure.

### 3.5 Residual Network

*et al.*, 2017), linearity is an important property of measurement systems and we want our CS model to be realizable in real physical measurement setups like Takhar

*et al.*(2006) and Ralašić

*et al.*(2018).

##### Fig. 5

*Barbara*, 2)

*Parrot*, 3)

*Peppers*. Notice that the residual network improves the preliminary reconstructions in aspects of blocking artifacts, high frequency content restoration and edge preservation.

*et al.*, 2015) that induces non-linearity and reduces potential reconstruction and blocking artifacts, and eliminates the need for an off-the-shelf denoiser such as BM3D (Dabov

*et al.*, 2009) used in the competitive methods. Figure 5 shows several examples of the estimated residual. Residual learning compensates for some of the high-frequency loss and improves the initial image reconstruction.

### 3.6 Choice of the Loss Function

*et al.*(2018) uses the adversarial loss function in addition to Euclidean loss to obtain better and sharper reconstructions. Furthermore, Du

*et al.*(2018) uses perceptual loss in order to achieve better reconstruction results. The authors train their model using the Euclidean loss in the latent space of the ${\textit{VGG}_{19}}$ neural network (Simonyan and Zisserman, 2014).

*et al.*(2018), where the authors optimize the whole network using the Euclidean loss in the latent space. As a consequence, their method results in semantically informative reconstructions, but with low per-pixel accuracy. By using a combination of Euclidean and perceptual loss, we obtain semantically informative reconstructions that have high accuracy of per-pixel reconstruction resulting in higher PSNR compared to Du

*et al.*(2018).

##### (14)

\[ {\mathcal{L}_{1}}\big(\big\{\boldsymbol{\Phi },{\boldsymbol{\Phi }^{+}}\big\}\big)=\big\| x-f\big\{x,\big\{\boldsymbol{\Phi },{\boldsymbol{\Phi }^{+}}\big\}\big\}{\big\| _{2}^{2}},\]**Φ**denotes the weights of the measurement operator, ${\boldsymbol{\Phi }^{+}}$ are the weights of the reconstruction operator,

*x*is the original image and $f\{x,\{\boldsymbol{\Phi },{\boldsymbol{\Phi }^{+}}\}\}$ is the image reconstruction obtained by the autoencoder.

*et al.*(2018). In contrast with Du

*et al.*(2018), we use a linear combination of Euclidean losses defined on the features of second and third max-pooling layer of the ${\textit{VGG}_{19}}$ network instead of the Euclidean loss on individual feature map. The motivation for this is to simultaneously reconstruct both the low-level information contained in the bottom layers, as well as the high-level semantic features contained in the top layers of the ${\textit{VGG}_{19}}$ network.

##### (15)

\[ {\mathcal{L}_{2}}\big(\{\boldsymbol{W}\}\big)=\frac{1}{2}{\sum \limits_{j=2}^{3}}\big\| {\phi _{j}}(x)-{\phi _{j}}\big(f\{x,\boldsymbol{W}\}\big){\big\| _{2}^{2}}.\]*j*-th layer of the ${\textit{VGG}_{19}}$ with input

*x*. Furthermore, $\boldsymbol{W}$ denotes filter weights in the residual network and $f\{x,\boldsymbol{W}\}$ is the final image reconstruction.

## 4 Experiments

### 4.1 Network Training

*tensorflow*(Abadi

*et al.*, 2015) deep learning framework for training and testing purposes. The training dataset is formed using uncalibrated JPEG images from the publicly available

*Barcelona Calibrated Images Database*(Párraga

*et al.*, 2010). Our training dataset is created by extracting 1676 image patches of size $256\times 256$, taken from different parts of the original high-resolution ($2268\times 1512$) images. This corresponds to 107264 unique image blocks for training.

*Monarch, Fingerprint, Flintstones, House, Parrot, Barbara, Boats, Cameraman, Foreman, Lena, Peppers*– see TestDataset), which were used in the evaluation of the competitive methods are used for testing purposes with four different measurement sub-rates $r=\frac{M}{N}$, where $r\in \{0.25,0.1,0.04,0.01\}$. In our experiments, block size of $32\times 32$ is used.

### 4.2 Measurement Matrix

##### Table 1

PSNR [dB] | $r=0.25$ | $r=0.10$ | $r=0.04$ | $r=0.01$ |

PCA | 31.45 | 27.11 | 23.95 | 20.56 |

Linear autoencoder | 31.39 | 27.06 | 23.92 | 20.55 |

*Parrot*test image are shown. The reconstructions are presented for measurement rates $r=\{0.01,0.25\}$. Notice that the adaptive measurement matrix preserves more information compared to the random Gaussian matrix.

##### Fig. 7

*Parrot*” test image (1) and for two measurement ratios $r=0.01$ (2, 3) and $r=0.25$ (4, 5). Reconstructions labelled with (2) and (4) are obtained using the random Gaussian measurement matrix, while (3) and (5) are obtained using the adaptive measurement matrix.

### 4.3 Comparison to Other Methods

*et al.*, 2018), Adp-Rec (Xie

*et al.*, 2017), FCMN (Du

*et al.*, 2019) and two variants of PCS (Du

*et al.*, 2018), namely ${\textit{PCS}_{\textit{conv22}}}$ and $PC{S_{\textit{conv34}}}$. In Table 2, mean PSNR reconstruction results (on the same test dataset) for the proposed method and for the competitive methods are shown. ImpReconNet (Euc) denotes a variant of a ReconNet model that uses Euclidean loss function for the network training, while the ImpReconNet (Euc+Adv) denotes a variant which uses a combination of Euclidean and adversarial loss. The competitive PNSR values are shown as reported in the original papers or reproduced using the available algorithms and models. In Fig. 8, “

*Fingerprint*” test image reconstructions are shown compared to the ground-truth.

##### Table 2

*r*. Although, FCMN achieves better results in terms of PSNR, it is clearly visible from Fig. 8 that it does not preserve structural information. This is due to the fact that PSNR measures image quality on per pixel basis, which is not a relevant measure for the preservation of high-level image features.

Mean PSNR [dB] for different methods | $r=0.25$ | $r=0.10$ | $r=0.04$ | $r=0.01$ |

ImpReconNet (Euc) (Lohit et al., 2018) |
26.59 | 25.51 | 23.14 | 19.44 |

ImpReconNet (Euc + Adv) (Lohit et al., 2018) |
30.53 | 26.47 | 22.98 | 19.06 |

Adp-Rec (Xie et al., 2017) |
30.80 | 27.53 | – | 20.33 |

FCMN (Du et al., 2019) |
32.67 | 28.30 | 23.87 | 21.27 |

$PC{S_{\textit{conv22}}}$ (Du et al., 2018) |
– | – | 19.38 | 18.30 |

$PC{S_{\textit{conv34}}}$ (Du et al., 2018) |
– | – | 16.72 | 16.80 |

Proposed method |
32.00 | 26.36 | 23.67 | 20.51 |

##### Fig. 8

*Fingerprint*” test image and for measurement rate $r=0.04$: (1) original, (2) ImpReconNet (Euc + Adv), $\textit{PSNR}=16.97$ dB, (3) FCMN, $\textit{PSNR}=19.05$ dB, (4) ${\textit{PCS}_{\textit{conv}22}}$, $\textit{PSNR}=14.83$ dB, (5) ${\textit{PCS}_{\textit{conv}34}}$, $\textit{PSNR}=14.35$ dB, (6) proposed method, $\textit{PSNR}=20.31$ dB. Our method results in better structure preservation compared to the ImpReconNet and FCMN methods, while we achieve significantly higher PSNR compared to the

*PCS*methods by a margin of around 5 dB in PSNR.

*Monarch*” reconstructions in Fig. 9 where a comparison between the competitive perceptual CS methods and the proposed method at extremely low measurement rate $r=0.01$ is presented. Notice the high level of noise in the PCS reconstructions compared to the reconstruction obtained using the proposed method.

##### Fig. 9

*Monarch*” test image and for measurement rate $r=0.01$: (1) original, (2) ${\textit{PCS}_{\textit{conv}22}}$, $\textit{PSNR}=16.28$ dB, (3) ${\textit{PCS}_{\textit{conv}34}}$, $\textit{PSNR}=14.87$ dB, (4) proposed method, $\textit{PSNR}=18.04$ dB. Although PCS method successfully reconstructs higher semantic information, it suffers from significant amount of noise. Contrary, our method reconstructs the same amount of information with less noise and visual artifacts.