Sparse Reconstruction with Compressed Sensing: Difference between revisions

Revision as of 21:49, 19 December 2021

Author: Ngoc Ly (SysEn 5800 Fall 2021)

Compressed Sensing (CS)

^{Compressed Sensing summary here}

Compression is synonymous with sparsity. So when we talk about compression we are actually referring to the sparsity. We introduce Compressed Sensing and then focus on reconstruction.

Introduction

sub module goal

The goal of compressed sensing is to interact with an underdetermined linear system in which the number of variables is much greater than the number of observations, resulting in an infinite number of signal coefficient vectors $$ x $$ for the same set of compressive measurements $$ y $$ . As a result, additional information is necessary for $$ x $$ to recover from $$ y $$ . The objective is to reconstruct a vector $$ x $$ in a given of measurements $$ y $$ and a sensing matrix A. Instead of taking a large number of high-resolution measurements and discarding the majority of them, consider taking way fewer random measurements and reconstructing the original $$ x $$ with high probability from its sparse representation.

sub modual

Begin with a linear equation $$ y = A x + e $$ , where $A \in \mathbb{R}^{M \times N}$ is a sensing matrix that must be obtained and will result in either exact or approximated optimum solution depending on how it is chosen, $x \in \mathbb{R}^{N}$ is a signal vector with at most $$ k $$ -sparse entries, which means $$ x $$ has $$ k $$ non-zero entries, $[ N ] = \{ 1, \dots , N \}$ be an index set, $y \in \mathbb{R}^{M}$ is a compressed measurement vector, $[ M ] = \{ 1, \dots , M \}$ , $e \in \mathbb{R}^{M}$ is a noise vector and assumed to be bounded $\| e \|_2 \leq \eta$ if it exists, and $M \ll N$ .

The goal of compressed sensing is to being with the under determined linear system $y = \Phi x + e$ , Where $\Phi \in \mathbb{R}^{M \times N}$ for $M \ll N$ How can we reconstruct x from The goal is to reconstruct $x \in \mathbb{R}^N$ given $$ y $$ and $\Phi$ Considerably fewer random measurements and reconstruct the original $$ x $$ with high probability from its sparse representation instead of taking a large number of high-resolution measurements and discarding the majority of them. being a random matrix

let $[ N ] = \{ 1, \dots , N \}$ be an index set $$ [N] $$ enumerates the columns of $\Phi$ and $$ x $$ . $\Phi$ is an under determined systems with infinite solutions since $M \ll N$ . Why $\ell_2$ norm won't give sparse solutions, where asl $\ell_1$ norm will return a sparse solution.

Notation =

$x \in \mathbb{R}^N$ often not really sparse but approximately sparse

$\mathbf{x} \in \mathbb{R}^{N}$

$\Phi \in \mathbb{R}^{M \times N}$ for $M \ll N$ Sensing matrix a Random Gaussian or Bernoulli matrix

$y \in \mathbb{R}^M$ are the observed y samples

$e \in \mathbb{R}^M$ noise vector $\| e \|_2 \leq \eta$

put defn of p norm here

$x = \Psi \alpha$ where $\Psi$ is the sparsifying matrix and $\alpha$ are coeficients

sub module sparsity

A vector $$ x $$ is said to be $$ k $$ -sparse in $\mathbb{R}^N$ if it has at most $$ k $$ nonzero coefficients. The support of $$ x $$ is $supp(x) = \{i \in [N] : x_i \neq 0 \}$ , and $$ x $$ is a $$ k $$ -sparse signal when $|supp(x)| \leq k$ . The set of $$ k $$ -sparse vectors is denoted by $\Sigma_k = \{x \in \mathbb{R}^N : \|x\|_0 \leq k \}$ . Consequently, there are $\binom{N}{k}$ different subsets of $$ k $$ -sparse vectors. If a random $$ k $$ -sparse $$ x $$ is drawn uniformly from $\Sigma_k$ , its entropy $\log \binom{N}{k}$ is approximately equivalent to $k \log \frac{N}{k}$ bits are required for compression of $\Sigma_k$ ~cite(Measurements vs Bits).

The idea is to search for the sparsest $x \in \Sigma_k$ from the measurement vector $y \in \mathbb{R}^M$ and a sensing matrix $A \in \mathbb{R}^{M \times N}$ with $M \ll N$ . If the number of linear measurements is at least twice as its sparsity $$ x $$ , i.e., $M \geq 2k$ , then there exists at most one signal $x \in \Sigma_k$ that satisfies the constraint $$ y = A x $$ and produce the correct result for any $x \in \Sigma_k$ [coluccia2015 book7]. Hence, the reconstruction problem can be formulated as an $$ l_0 $$ "norm" program.

$\hat{x} = \underset{x \in \Sigma_k}{arg min} \|x\|_0 \quad s.t. \quad y = A x$

This optimization problem minimizes the number of nonzero entries of $$ x $$ subject to the constraint $$ y = Ax $$ , that is to find the sparsest element in the affine space $\{ x \in \mathbb{R}^N : A x = y\}$ [2019 book33]. It turns out to be a combinatorial optimization problem, which is NP-Hard because it includes all possible sets of $$ k $$ -sparse out of $$ [N] $$ . Furthermore, if noise is present, the recovery is unstable [Buraniuk "compressed sensing"].

In other words, only the smallest support set of $$ x $$ is of interest, i.e., $$ min(supp(x)) $$ .

The goal is to search for the sparsest $x \in \Sigma_k$ given the meassurment $$ y $$ and the constraint matrix $\Phi$ .

This is antiquated to find the $\min \|x\|_0$ in the set $\{ x \in \mathbb{R}^N : \Phi x = y\}$

This searching problem can be formulated to the following $\ell_0$ program

Another words we are interested in the smallest $$ supp(x) $$ , i.e. $$ min(supp(x)) $$

sub module

Let $\Phi \in \mathbb{R}^{M \times N}$ satisfy RIP, Let $$ [N] $$ be an index set For $$ s $$ is a restriction on $\mathbf{x}$ denoted by $x_{|s}$ $x \in \mathbb{R}^N$ to $$ s $$ k-sparse $\mathbf{x}$ s.t. RIP is satisfied the $s = supp(\mathbf{x})$ i.e. $s \subseteq [N]$ and $\Phi_{|s} \subseteq \Phi$ where the columns of $\Phi_{|s}$ is indexed by $i \in S$

In search for a unique solution we have the following $\ell_0 = |supp(x)|$ optimization problem.

sub module zero norm program

$\mathbf{\hat{x}} = \underset{\Sigma_k}{arg min} \| \mathbf{x}\|_0 \quad s.t. \quad \mathbf{y} = \Phi \mathbf{x}$ , which is an combinatorial NP-Hard problem. Hence, if noise is presence the recovery is not stable. [Buraniuk "compressed sensing"]

Restricted Isometry Property (RIP)

A matrix $$ A $$ is said to satisfy the RIP of order $$ k $$ if for all $x \in \Sigma_k$ has a $\delta_k \in [0, 1)$ . A restricted isometry constant (RIC) of $$ A $$ is the smallest $\delta_k$ satisfying this condition [2019 book38, coluccia2015 book7].

$(1 - \delta_k) \| x \|_2 ^2 \leq \| A x \|_2^2 \leq (1 + \delta_k) \| x \|_2 ^2$

Under projections through matrix $$ A $$ , the restricted isometry property allows $$ k $$ -sparse vectors to have unique measurement vectors $$ y $$ . If $$ A $$ meets RIP, then $$ A $$ does not send two distinct $$ k $$ -sparse $x \in \Sigma_k$ to the same measurement vector $$ y $$ , indicating that $$ x $$ is a unique solution under RIP.

If the matrix $A \in \mathcal{R}^{M \times N}$ satisfies the RIP condition of order $$ 2k $$ and the constant $\delta_{2k} \in [0,1)$ , there are two distinct $$ k $$ -sparse vectors in $\Sigma_{2k}$ . When they are equal, the restricted isometry property holds. If $$ A $$ is a $$ 2k $$ -order RIP matrix, it means that no two $$ k $$ -sparse vectors are mapped to the same measurement vector $$ y $$ by $$ A $$ . In other words, when working with sparse vectors, the RIP ensures that the columns of $$ A $$ are nearly orthonormal. Furthermore, $$ A $$ is an approximately norm-preserving function, which means that it preserves its distance when mapping for $$ k $$ -sparse signals for all or more as $\delta_k$ approaches zero. [Candes, Romberg, Tao[4]] demonstrate that if $$ x $$ is $$ k $$ -sparse, and $$ A $$ satisfies the RIP of order $$ 2k $$ with RIP-constant $\delta_{2k} < \sqrt(2) - 1$ , then $$ l_1 $$ gives a unique sparse solution. The $$ l_1 $$ convex optimization problem is the same as the solution to the $$ l_0 $$ program and can be solved using the Linear Program [2019 book38, coluccia2015 book7].

$\hat{x} = \underset{x \in \Sigma_k}{arg min} \|x\|_1 \quad s.t. \quad y = A x$

If $$ A $$ satisfies RIP then $$ A $$ doesn't send two distinct k-sparse $x \in \Sigma_k$ to the same measurment vector $$ y $$ . Anothers words $$ x $$ is a unique solution under RIP.

RIP defined as

$\Phi$ satisfies RIP of order $$ k $$ if for $\forall x \in \Sigma_k$ $\delta_k \in [0, 1)$ satisfies for inequalaty

TODO switch s to k

sub module RIP matracies

If a sensing matrix $\Phi$ must satisfy RIP, then the number of measurements $M = \mathcal{O}(K/log(N/K))$ is required recover $$ x $$ with high probability.

sub module RIC

Restricted Isometry Constant (RIC) is the smallest $\delta_{|s}$ in $\{\delta_k \in [0, 1): (1 - \delta_s) \| x \|_2 ^2 \leq \| \Phi x \|_2^2 \leq (1 + \delta_s) \| x \|_2 ^2\}$

if $M \geq 2k$ i.e. twise the sparsity, then there exists an unique $$ x $$ such that $y = \Phi x$ .

for $$ x $$ is k-sparse and and $\Phi$ satisfies RIP of order $$ 2k $$ RIC $\delta_{2k} < \sqrt{2} - 1$ then the $$ l_0 $$ program can have to relaxed convex form $\ell_1$ program.

If $\Phi$ satisfies RIP and $\mathbf{y}$ is sparse the $\ell_0$ gives sparse solutions and is a unique. It is equivalent to the following $\ell_1$ convex optimization problem and can solve by Linear Program.

sub problem 1 norm program

From Results of Candes, Romberg, Tao, and Donoho

$\mathbf{\hat{x}} = \underset{s}{arg min} \| \mathbf{x}\|_1 \quad s.t. \quad \mathbf{y} = \Phi \mathbf{x}$

Theory

Two things need to be considered when recovering $$ x $$

(1) The design of the sensing matrix

(2) The recovery algorithm

Sensing Matrix

Check if $\Phi$ satisfies RIP Checking $\Phi$ satisfies RIP is combinatorial hard in general so it's unreasonable to ask a computer to verify a matrix satisfies RIP. In order to get around this problem, we need an understanding of what matrices satisfy RIP and recover $$ x $$ with high probability.

Random Sensing matrices: Gaussian, Bernoulli, Rademacher
Deterministic Sensing Matrices: binary, bipolar, ternary, Vandermond
Structural Sensing Matrices: Toeplitz, Circulant, Hadamard
Optimized Sensing Matrices: (Parkale, Nalbalwar, Sensing Matrices in Compressed Sensing)

Are some examples. Different sensing matrices are more suited for different problems, but in general, we want to use an alternative to Gaussian because it reduces the computational complexity.

Verification of the Sensing matrix

Definition Mutual Coherence

Let $\Phi \in R^{M \times N}$ , the mutual coherence $\mu_\Phi$ is defined by:</math>

$\mu_{\Phi} = \underset{i \neq j} {\frac{| \langle a_i, a_j \rangle |}{ \| a_i \| \| a_j \|}}$ ^[1]

TODO switch s to k

$(1 - \mu) \| x \|_2 ^2 \leq \| \Phi x \|_2^2 \leq (1 + \mu) \| x \|_2 ^2$

Welch bound $\mu_\Phi \geq \sqrt{\frac{n}{m(n-m)}}$ > ^[1] $\mu \geq \sqrt{\frac{N -M}{M(N-1)}}$ > is the coherence between $\Phi$ and $\Psi$ We want a small $\mu_{\Phi}$ because it will be close to the normal matrix, which satisfies RIP. Also, $\mu_{\Phi}$ will be needed for the step size for the following IHT.

Need to make the connection of Coherence to RIP and RIC.

TODO switch s to k

$(1 - \mu) \| x \|_2 ^2 \leq \| \Phi x \|_2^2 \leq (1 + \mu) \| x \|_2 ^2$

Algorithms

Three big groups of algorithms are:^[2]

Optimization methods: includes $\ell_1$ minimization i.e. Basis Pursuit, and quadratically constraint $\ell_1$

minimization i.e. basis pursuit denoising.

Greedy methods: include Orthogonal matching pursuit and Compressive Sampling Matching Pursuit (CoSaMP)

thresholding-based methods: such as Iterative Hard Thresholding(IHT) and Iterative Soft Thresholding, Approximate IHT or AM-IHT, and many more.

More cutting-edge methods include dynamic programming.

We will cover one, i.e. IHT. WHY IHT THEN? Basis pursuit, matching pursuit type algorithms belong to a more general class of iterative thresholding algorithms. ^[3] So IHT seems like the ideal place to start. If everything compliment with RIP, then IHT has fast convergence.

Algorithm IHT

The $\ell_1$ convex program mentioned in introduction has an equivalent nonconstraint optimization program.

$\underset{y}{min} \| \mathbf{y} - \Phi \mathbf{x} \|_2^2 + \lambda \| \mathbf{y} \|_0$ (cite IT for sparse approximations) ??? $\hat{\mathbf{x}} = arg \underset{s}{min} \frac{1}{n} \| \mathbf{y} - \Phi \mathbf{x}\|_2^2 + \lambda \| \mathbf{x}\|_1$ ^[1]. In statistics we call the $$ l_1 $$ regularization LASSO with $\lambda$ as the regularization parameter. This is the closest convex relaxation to $$ l_0 $$ the first program menttioned in the introduction.[The Benefit of Group Sparsity]

$z_v^{(n)} = \nabla f_v(x^{(n)}) = - \Phi_v^T( \mathbf{y} - \Phi \mathbf{x})$ Then $x^{n+1} = \mathcal{H}\left( \mathbf{x}^{(n)} - \tau \sum_{j \in N}^{N} z_v^{(n)}\right)$

sub modual

Define the threashholding operators as: $\mathcal{H}_s[\mathbf{x}] = \underset{z \in \sum_s}{argmin} \| x - \Phi \mathbf{x}\|_2$ selects the best-k term approximation for some k

Stopping criterion is $\| y - \Phi \mathbf{x}^{(n)}\|_2 \leq \epsilon$ iff RIC $\delta_{3s} < \frac{1}{\sqrt{32}}$ ^[4]

Input $\Phi, \mathbf{y}, \mathbf{e} \ \mbox{with} \ \mathbf{y} = \mathbf{\Phi} \mathbf{x} | \mathbf{e} and \mathfrak{M}$
output $IHT(\mathbf{y}, \mathbf{\Phi}, \mathcal{S})$
Set $x^{(0)} = \mathbf{0}$
While Stopping criterion false do

- $x^{(n+1)} \leftarrow \mathcal{H}_{|s} \left[ x^{(n)} + \Phi^* (\mathbf{y} - \mathbf{\Phi x}^{(n)}) \right]$

- $n \leftarrow n + 1$
- end while
return: $IHT(\mathbf{y}, \mathbf{\Phi}, \mathfrak{M}) \leftarrow \mathbf{x}^{(n)}$

$\Phi^*$ is a Adjoint matrix i.e. the transpost of it's cofactor.

Numerical Example

Applications and Motivations

Low-Rank Matrices

The Netflix Prize was accompanied by low-rank matrix recovery or the matrix completion problem. The approach then fills in the missing values in the user's ratings for movies that the user hasn't seen. These estimates are based on ratings from other users, who have similar ratings if a matrix is created with all the users as rows and the movie titles as columns. Because some users' interests will be similar and therefore overlap, it is possible to reduce the degrees of freedom significantly. This low-rank structure is frequently assumed for the problem domain of collaborative filtering ~cited citations.

Dictionary Learning

The goal in dictionary learning is to infer the original dictionary as possible. Instead of using a predefined dictionary, researchers have found that learning the dictionary by obtaining "dynamic features" from training data often yields representation. Biometric features can be taken from video clips of each subject in a dataset and used to populate the dictionary's columns. Using random projections and sparse representations for iris detection for noncontact biometrics-based authentication systems from video samples has been proposed ~cited citations.

Single-pixel cameras

Single-pixel cameras or single detector imaging are used in situations when detectors are either prohibitively expensive or difficult to miniaturize. A microarray is made up of a large number of miniature mirrors that can be individually turned on and off. The mechanism behind the random sampling, which results in low coherence between measurements, is the most important component of the single-pixel camera. This microarray reflects the light from the scene, and a lens combines all of the reflected beams into one sensor, which is the single detector of the camera used to capture the image ~cited citations.

Conclusion

Referencse

^[5] ^[6] ^[7] ^[8] ^[9] ^[10]

↑ ^1.0 ^1.1 ^1.2 Cite error: Invalid <ref> tag; no text was provided for refs named :1
↑ Cite error: Invalid <ref> tag; no text was provided for refs named :0
↑ Cite error: Invalid <ref> tag; no text was provided for refs named :4
↑ Cite error: Invalid <ref> tag; no text was provided for refs named :2
↑ D. L. Donoho, “Compressed sensing,” vol. 52, pp. 1289–1306, 2006, doi: 10.1109/tit.2006.871582.
↑ E. J. Candès and T. Tao, “Decoding by linear programming,” IEEE Trans. Inf. Theory, vol. 51, no. 12, Art. no. 12, 2005, doi: 10.1109/TIT.2005.858979.
↑ D. L. Donoho, “Compressed sensing,” vol. 52, pp. 1289–1306, 2006, doi: 10.1109/tit.2006.871582.
↑ T. Blumensath and M. E. Davies, “Iterative Hard Thresholding for Compressed Sensing,” May 2008.
↑ S. Foucart and H. Rauhut, A mathematical introduction to compressive sensing. New York [u.a.]: Birkhäuser, 2013.
↑ R. G. Baraniuk, “Compressive Sensing [Lecture Notes],” IEEE Signal Processing Magazine, vol. 24, no. 4, Art. no. 4, 2007, doi: 10.1109/MSP.2007.4286571.

[:1-1] 1.0 ^1.1 ^1.2 Cite error: Invalid <ref> tag; no text was provided for refs named :1

[:0-2] Cite error: Invalid <ref> tag; no text was provided for refs named :0

[:4-3] Cite error: Invalid <ref> tag; no text was provided for refs named :4

[:2-4] Cite error: Invalid <ref> tag; no text was provided for refs named :2

[5] D. L. Donoho, “Compressed sensing,” vol. 52, pp. 1289–1306, 2006, doi: 10.1109/tit.2006.871582.

[6] E. J. Candès and T. Tao, “Decoding by linear programming,” IEEE Trans. Inf. Theory, vol. 51, no. 12, Art. no. 12, 2005, doi: 10.1109/TIT.2005.858979.

[7] D. L. Donoho, “Compressed sensing,” vol. 52, pp. 1289–1306, 2006, doi: 10.1109/tit.2006.871582.

[8] T. Blumensath and M. E. Davies, “Iterative Hard Thresholding for Compressed Sensing,” May 2008.

[9] S. Foucart and H. Rauhut, A mathematical introduction to compressive sensing. New York [u.a.]: Birkhäuser, 2013.

[10] R. G. Baraniuk, “Compressive Sensing [Lecture Notes],” IEEE Signal Processing Magazine, vol. 24, no. 4, Art. no. 4, 2007, doi: 10.1109/MSP.2007.4286571.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

@@ Line 94: / Line 94: @@
 Under projections through matrix <math>A</math>, the restricted isometry property allows <math>k</math>-sparse vectors to have unique measurement vectors <math>y</math>. If <math>A</math> meets RIP, then <math>A</math> does not send two distinct <math>k</math>-sparse <math>x \in \Sigma_k</math> to the same measurement vector <math>y</math>, indicating that <math>x</math> is a unique solution under RIP.
-<math>\hat{x} = \underset{x \in \Sigma_k}{arg min} \|x\|_1 \quad s.t. \quad y = A x</math>
 If the matrix <math>A \in \mathcal{R}^{M \times N}</math> satisfies the RIP condition of order <math>2k</math> and the constant <math>\delta_{2k} \in [0,1)</math>, there are two distinct <math>k</math>-sparse vectors in <math>\Sigma_{2k}</math>. When they are equal, the restricted isometry property holds. If <math>A</math> is a <math>2k</math>-order RIP matrix, it means that no two <math>k</math>-sparse vectors are mapped to the same measurement vector <math>y</math> by <math>A</math>.  In other words, when working with sparse vectors, the RIP ensures that the columns of <math>A</math> are nearly orthonormal. Furthermore, <math>A</math> is an approximately norm-preserving function, which means that it preserves its distance when mapping for <math>k</math>-sparse signals for all or more as <math>\delta_k</math> approaches zero. [Candes, Romberg, Tao[4]] demonstrate that if <math>x</math> is <math>k</math>-sparse, and <math>A</math> satisfies the RIP of order <math>2k</math> with RIP-constant <math>\delta_{2k} < \sqrt(2) - 1</math>, then <math>l_1</math> gives a unique sparse solution. The <math>l_1</math> convex optimization problem is the same as the solution to the <math>l_0</math> program and can be solved using the Linear Program [2019 book38, coluccia2015 book7].

Sparse Reconstruction with Compressed Sensing: Difference between revisions

Revision as of 21:49, 19 December 2021

Contents

Compressed Sensing (CS)

Introduction

sub module goal

sub modual

Notation =

sub module sparsity

sub module

sub module zero norm program

Restricted Isometry Property (RIP)

sub module RIP matracies

sub module RIC

sub problem 1 norm program

Theory

Sensing Matrix

Verification of the Sensing matrix

Algorithms

Algorithm IHT

sub modual

Numerical Example

Applications and Motivations

Low-Rank Matrices

Dictionary Learning

Single-pixel cameras

Conclusion

Referencse

Navigation menu

Sparse Reconstruction with Compressed Sensing: Difference between revisions

Revision as of 21:49, 19 December 2021

Compressed Sensing (CS)

Introduction

sub module goal

sub modual

Notation =

sub module sparsity

sub module

sub module zero norm program

Restricted Isometry Property (RIP)

sub module RIP matracies

sub module RIC

sub problem 1 norm program

Theory

Sensing Matrix

Verification of the Sensing matrix

Algorithms

Algorithm IHT

sub modual

Numerical Example

Applications and Motivations

Low-Rank Matrices

Dictionary Learning

Single-pixel cameras

Conclusion

Referencse

Navigation menu

Search