Evan Chou Math: Information Theory of Compressed Sensing, Compressible Signals

Now we want to know about signals that are not $k$-sparse, but are compressible: if the coefficients of the signal $x$ are arranged in decreasing order of magnitude $x_{(n)}$, then they exhibit some power law decay:
\[ |x_{(n)}| \lesssim n^{-1/p} \]

We will consider the case $p=1$ to fix ideas, and call this space $l^{1,w}$ (weak $l^1$).

A first fact about compressible signals is that if we want to approximate $l^{1,w}$ with $\Sigma_k$, i.e. the best $k$-term approximation in the $l^2$ norm say, then the answer is just to take the top $k$ terms. For example, take $x \in l^{1,w}$ with $|x_{(n)}| \leq Cn^{-1/p}$, Then if $x^{(k)}$ is formed from the top $k$ terms of $x$, we have
\[ \| x - x^{(k)} \|_2 \leq C k^{-1/2} \]

[First, some musings: If we introduce the quantity $|x| := \sup_j j |x_{(j)}|$, the rearrangement destroys triangle inequality: for instance $|(1,1/2,1/3)| = |(1/3,1/2,1)| = 1$, but the sum $|(4/3,4/3,4/3)| = 4$. Without rearranging terms, it is a norm]

Directions... The previous quantization result does not work well for compressible signals, mainly because it focuses on the best $k$-term approximation, and the scheme cannot affect the best $k$-term error even when more measurements are used. This is a deficiency with this particular encoding...

From an information-theoretic point of view, a first question we can ask is, what is best way to encode/compress this class of signals? Given a fixed bit budget, what is the encoding that achieves the best distortion? Alternatively, given a fixed distortion target, what is the number of bits needed to represent the entire class of signals?This can be answered by studying minimal $\epsilon$-nets, Kolmogorov entropy...

The encoding given by entropy considerations is non-constructive...

A further question is then, given a particular method of encoding, what are the theoretical limits? Sigma-like schemes on compressed sensing measurements...

A toy problem... 3 dimensions... $(x_1,x_2,x_3)$ where one is large, the second is medium size, and the third is small. Specifically, $x_{(1)} \leq 1, |x_{(2)}| \leq |x_{(1)}| / 2$, and $|x_{(3)}| \leq 2|x_{(2)}| / 3$. How to compress such a space?
Some quick musings: let's compare the volume of this space to the cube ($[-1,1]^3$, volume 8). Focusing on the region $x_1 > x_2 > x_3 > 0$, the intersection with the space in question is in the convex hull of $(0,0,0)$, $(1,0,0)$, $(1,1/2,0)$ and $(1,1/2,1/3)$. The volume can be computed using the pyramid 1/3 base height formula: 1/3*(1/2*1/2*1/3) = 1/36, and by symmetry the volume is the same as any of the other 6*8 regions, so the total volume of the space is 8/6. So we are dealing with a space that is 6 times smaller in volume.

How about $n$ dimensions? We need the volume of the convex hull of $(0,\ldots,0)$, $(1,0,\ldots,0)$, $(1,1/2,0,\ldots,0)$, $\ldots$, $(1,1/2,1/3,\ldots,1/n)$. (If the recursive formula of $1/n * V_{n-1} * h_n$ continues to hold for "pyramid-like" structures, then we have $1/(n!)^2 * n! * 2^n$, compared to the full hypercube, this is $1/n!$ times smaller. (Need to check....)

This space is different than the space $(x_1,x_2,x_3)$ where with rearrangement, $|x_{(j)}| \leq 1/j$ for all $j$... but this doesn't quite capture what we want in this small dimensional example, since we want a dominant component and some sort of tail, and there are only 3 spots. Maybe for later.

Toy Problem Compression
Also to investigate, how can we compress the toy problem with linear measurements? First, let's look at something naive:
If we are given a target distortion $D$, let's take all three coordinates and just round to the nearest $2D/\sqrt{3}$ spaced point (introduces $2$-norm distortion of $D$). Then we need $\log(\sqrt{3}/D)$ bits per coordinate, so we need $3 \log(\sqrt{3}/D)$ bits total. In terms of number of quantization points, this is $(\sqrt{3}/D)^3$, which matches compression bounds for the full cube $[-1,1]^3$ (actually the lower bound has a constant $c=3$ instead of $\sqrt{3}$ that is independent of dimension). In the toy problem, we should be able to reduce the number of quantization points by a factor of 6, roughly speaking.

Let $D' = 2D/\sqrt{3}$ be the spacing of points per coordinate (to achieve distortion of $D$ in the 2-norm). One adaptive strategy is simply to allocate $\log(2/D')$ bits for the largest coordinate, $\log(1/D')$ bits for the middle coordinate, and $\log(2/(3D'))$ bits for the smallest (corresponding to ranges $[-1,1]$, $[-1/2,1/2]$ and $[-1/3,1/3]$). In addition to storing the order information (one of six possibilities, so can use three bits to store, say). Note that this scheme uses $3 + 3 \log( \sqrt{3}/(6D))$ bits, and if translated to the number of quantization points, it becomes $(\sqrt{3}/D)^3 * (8/6)$, which is actually worse (actually, it is worse only because of integrality issues, since we are wasteful in using 3 bits for 6-possibilities for order. otherwise we would be even with the simpler method above of just using $[-1,1]$ for all three and not recording order).

Here's something fun, pictorally, it's easy to see how to recover one-sparse vectors in 3 dimensions with two measurements:

(taken from here)

Of course, just any drawing of a 3d axis shows plainly how this would work. The 2d drawing is a 2d projection (i.e. 2 linear measurements), and every 1-sparse vector (a point on the axes) corresponds to a unique point in this 2d projection.

For our toy model, it would look more like...

The dotted lines enclose one piece of the space and we can already see that in this 2d projection there are many potential points that correspond to the same point in the image. In particular, there does not seem to be a way to obtain a $D$-distortion code for small $D$ for these measurements. The smallest $D$ for which we can code $x$ using 2 measurements $\Phi x$ with $\Phi = \begin{pmatrix} u^T \\ v^T \end{pmatrix}$ for two unit vectors $u,v$ is given by $\sup_{y\in \Phi(X)} {\rm diam}(\Phi^{-1}y)/2$

I wonder what the best angle to "project" this picture is. For instance, if we just project to the x-y plane, the thickness would be $2$, and it would be a very lousy detector for signals concentrated along the z axis. How can I figure this out?

This does not include an additional quantization step needed to compress to a specified bit budget.

Evan Chou Math

Friday, August 26, 2011

Information Theory of Compressed Sensing, Compressible Signals

No comments:

Post a Comment