tag:blogger.com,1999:blog-22474287427412902622014-10-06T18:16:31.119-07:00Evan Chou MathGeneral math blog for whatever comes up.
Since I haven't been using it for research I've converted this to a general blog with very sparse updates (whenever I have something to say)Evanhttp://www.blogger.com/profile/13507568616873564291noreply@blogger.comBlogger6125tag:blogger.com,1999:blog-2247428742741290262.post-76326354023115659542012-07-24T16:31:00.001-07:002012-07-25T17:52:11.528-07:00Liar's Guessing Game and the De Brujin SequenceThis post concerns the first part of IMO 2012 Problem 3 (a polymath problem), which is about a liar's guessing game.<br /><br /><a href="http://michaelnielsen.org/polymath1/index.php?title=Imo_2012">http://michaelnielsen.org/polymath1/index.php?title=Imo_2012</a><br /><br />A quick summary of the problem (already reduced):<br /><br />$k$ is a fixed positive integer. Two players, player 1 chooses a number between 1 and $2^k+1$. Player 2 can ask yes/no questions of the sort, "Is it in the set {... } ?". Player 1 is allowed to lie at most k times in a row (or, in a string of $k+1$ questions, one of the answers must be truthful). The goal of player 2 is to narrow down the possibilities of player 1's number to a set of size $2^k$ from any number of questions.<br />Part 1 boils down to showing that player 2 can set up a sequence of questions so that after player 1 answers, player 2 can throw out one of the numbers (which if it were player 1's number, would have been lied about $k+1$ times)<br /><br />I wanted to point out a pretty connection to the <a href="http://en.wikipedia.org/wiki/De_Bruijn_sequence">De Brujin sequence</a>.<br /><br /><h2> <span style="font-size: large;"> The Liar Graph</span></h2>First, let's introduce a structure which we can call the 'liar graph':<br /><div><br /></div><div>Each vertex of the graph corresponds to either the set of numbers asked about in a particular question, or the complement of the set.</div><div><br /></div>$k$=1 example (3 numbers):<br /><br />Suppose my set of questions were:<br /><div><br /><span style="font-family: 'Courier New', Courier, monospace;">q1: {1} {2,3}</span><br /><span style="font-family: 'Courier New', Courier, monospace;">q2: {1} {2,3}</span><br /><span style="font-family: 'Courier New', Courier, monospace;">q3: {2} {1,3}</span><br /><span style="font-family: 'Courier New', Courier, monospace;">q4: {2} {1,3}</span><br /><span style="font-family: 'Courier New', Courier, monospace;">q5: {3} {1,2}</span><br /><span style="font-family: 'Courier New', Courier, monospace;">q6: {3} {1,2}</span></div><div><br /></div><div>(Asked, "is it x?" two times for each x)<br /><br />There are 12 nodes in the graph.<br /><br /><div>If player A answers 'yes' on q1, then he is lying if he picked {2,3}<br />If 'no' on q1, then he is lying if he picked {1}<br /><br />We are trying to catch him lying twice. Draw an edge between two <br />nodes in adjacent rows if they share an element.<br /><br />Here it looks like<br /><br /><span style="font-family: 'Courier New', Courier, monospace;"><br /></span><br /><span style="font-family: 'Courier New', Courier, monospace;">{1} {2,3}</span><br /><span style="font-family: 'Courier New', Courier, monospace;"> | |</span><br /><span style="font-family: 'Courier New', Courier, monospace;">{1} {2,3}</span><br /><span style="font-family: 'Courier New', Courier, monospace;"> X |</span><br /><span style="font-family: 'Courier New', Courier, monospace;">{2} {1,3}</span><br /><span style="font-family: 'Courier New', Courier, monospace;"> | |</span><br /><span style="font-family: 'Courier New', Courier, monospace;">{2} {1,3}</span><br /><span style="font-family: 'Courier New', Courier, monospace;"> X |</span><br /><span style="font-family: 'Courier New', Courier, monospace;">{3} {1,2}</span><br /><span style="font-family: 'Courier New', Courier, monospace;"> | |</span><br /><span style="font-family: 'Courier New', Courier, monospace;">{3} {1,2}</span><br /><br />(hope that formatted correctly. The X above should be seen as 2 edges that are crossing)<br /><br />So, now, player 1 wins if he can select 1 node from each row<br />so that the selected nodes form an independent set<br />(if he selects two nodes that are connected by an edge, it means<br />he lied twice in a row for some number, which we can then eliminate)<br /><br />Another way to view this setup is to ask, what are the valid choices for player 1 <br />for the current configuration? This we can build up as we are determining what the<br />questions are.<br /><br /><br />So from the first 2 questions,<br /><br /><span style="font-family: 'Courier New', Courier, monospace;"><br /></span><br /><span style="font-family: 'Courier New', Courier, monospace;">{1} {2,3}</span><br /><span style="font-family: 'Courier New', Courier, monospace;"> | |</span><br /><span style="font-family: 'Courier New', Courier, monospace;">{1} {2,3}</span><br /><br /><br />we already know that 'Y,Y' and 'N,N' will cause player 1 to lie twice already.<br /><br /><br />So the only choices are 'Y,N' and 'N,Y'<br /><br /><br />On the third question, we can see the choices get restricted a lot more<br /><br /><br /><span style="font-family: 'Courier New', Courier, monospace;">q2: {1} {2,3}</span><br /><span style="font-family: 'Courier New', Courier, monospace;"> X |</span><br /><span style="font-family: 'Courier New', Courier, monospace;">q3: {2} {1,3}</span><br /><br /><br />If player 1 answered 'Y' on question 2 (so that 2,3 is a lie) then no matter what player 1 will lie on the third question.<br /><br /><br />Thus at this point, the only safe choice for player 1 is 'Y,N,N'<br /><br /><br />for q4, player 1 has to choose 'Y', and then we see that q5 player 1 is now out of options and will lie for some number.<br /><br /><br />-----</div><div>(As an aside, here is a quicker sequence of questions:<br /><br /><span style="font-family: 'Courier New', Courier, monospace;">{1} {2,3}</span><br /><span style="font-family: 'Courier New', Courier, monospace;"> X |</span><br /><span style="font-family: 'Courier New', Courier, monospace;">{2} {1,3}</span><br /><span style="font-family: 'Courier New', Courier, monospace;"> | |</span></div><div><span style="font-family: 'Courier New', Courier, monospace;">{2} {1,3}</span><br /><span style="font-family: 'Courier New', Courier, monospace;"> X |</span><br /><span style="font-family: 'Courier New', Courier, monospace;">{1} {2,3}</span><br /><br />4 questions are enough here)</div><div><br /></div><div><br /></div><div>For $k>1$, we have the same layout, but instead of drawing edges, we should instead draw <span style="background-color: white;">paths (liar path) of length $k$ which connects a single number. Then we need to set the questions up so that no matter what sequence </span><span style="background-color: white;">of answers Player 1 chooses, it contains such a liar's path.</span></div><div><br /><h2> <span style="font-size: large;"> Restricting Possible Answers</span></h2>Note that within $k$ questions, there are $2^k$ possible paths (answer sequences) that Player 1 can choose. So as player 2, along each of the possible paths, place one of the $2^{k+1}$ numbers in the</div><div>appropriate node. </div><div><br /></div><div>{ For $k=2$, it looks like</div><div><br /></div><div><span style="font-family: 'Courier New', Courier, monospace;">q1 {1,2} {3,4}</span><br /><span style="font-family: 'Courier New', Courier, monospace;"><br /></span><br /><span style="font-family: 'Courier New', Courier, monospace;">q2 {1,3} {2,4}</span><br /><br />(1 = NN, 2 = NY, 3 = YN, 4 = YY) }</div><div><br /></div><div><br /></div><div>Then within the first 2 questions, no matter what player 1 chooses, he will lie above some number $k$ times. But he is allowed to lie $k$ times, so we need to catch him with further questions. Note that we have an</div><div>additional number we have not included $(2^k + 1)$. We can then eliminate one of the possible paths for</div><div>player 1 in the next question by placing this extra number along the same path as one of the numbers (say 1), and then in the ($k+1$)-th question, placing $1$ and $2^k+1$ in opposite sides of the question.</div><div><br /></div><div>This way, if player 1 chooses a sequence which highlights $1$ and $2^k+1$ within the first $k$ questions, </div><div>in the ($k+1$)-th question the player will be forced to lie. We have then eliminated 1 of the $2^k$ possible</div><div>sequences in the first $k$ questions. </div><div><br /></div><div>{ For the $k=2$ example above, it looks like</div><div><br /></div><div><span style="font-family: 'Courier New', Courier, monospace;">q1 {1,2,5} {3,4}</span><br /><span style="font-family: 'Courier New', Courier, monospace;"><br /></span><br /><span style="font-family: 'Courier New', Courier, monospace;">q2 {1,3,5} {2,4}</span><br /><span style="font-family: 'Courier New', Courier, monospace;"><br /></span><br /><span style="font-family: 'Courier New', Courier, monospace;">q3 {1, ?} {5, ?}</span><br /><br />Now player 1 cannot choose 'N,N' in the first 2 questions or else he's stuck. }</div><div><br /></div><div>Now that one of them is eliminated, the idea is to maintain the same number of possible paths and reduce them one <span style="background-color: white;">by one.</span></div><div><span style="background-color: white;"><br /></span></div><div>The observation that by the $k$-th question, he has picked 1 of the $2^k$ paths is crucial to this argument. </div><div><br /></div><div>Let's examine the $k=2$ example above again. Suppose Player1 has instead followed 2 (by picking 'N,Y').</div><div>Note now that if we place 2 in the first set of question 3, Player 1 would be forced to pick 'Y' so as not to lie. Thus, we note that just by placing {1,..,4} arbitrarily within question 3, Player 1 is <i>still restricted</i> to $2^k-1=3$ possible answer choices within the 3 questions. Yet we still have the 5th number to place, which will be the key to removing more paths.</div><div><br /></div><div>We're not done yet, however. We have to be careful to place {1,..,4} in such a way that the same property holds in the next question. This is where the De Brujin sequence comes in.</div><div><br /></div><div>k=2 sequence: 'NN', 'NY', 'YY', 'YN' (or 'N,N,Y,Y,N,N,...')</div><div><br />which has the property that all possible strings of size $k$ can be extracted by sliding a window of size $k$ along the sequence and reading the string, and furthermore, it is a periodic sequence. </div><div><br /></div><div>For each number (1,2,3,4), start from a different part of the sequence and place them in the questions:</div><div><br /></div><div><div><span style="font-family: 'Courier New', Courier, monospace;">1:N(N)</span></div><div><span style="font-family: 'Courier New', Courier, monospace;">2:N(Y)</span></div><div><span style="font-family: 'Courier New', Courier, monospace;">3:Y(Y)</span></div><div><span style="font-family: 'Courier New', Courier, monospace;">4:Y(N)</span></div></div><div><br /></div><div>After placing "1":</div><div><br /></div><div><span style="font-family: 'Courier New', Courier, monospace;">q1 {1}</span></div><div><span style="font-family: 'Courier New', Courier, monospace;">q2 {1}</span></div><div><span style="font-family: 'Courier New', Courier, monospace;">q3 {1}</span></div><div><span style="font-family: 'Courier New', Courier, monospace;">q4 {1}</span></div><div><span style="font-family: 'Courier New', Courier, monospace;">q5 {1}</span></div><div><span style="font-family: 'Courier New', Courier, monospace;">q6 {1}</span></div><div><span style="font-family: 'Courier New', Courier, monospace;">q7 {1}</span></div><div><span style="font-family: 'Courier New', Courier, monospace;">q8 {1}</span></div><div><span style="font-family: 'Courier New', Courier, monospace;">q9 {1}</span></div><div><span style="font-family: 'Courier New', Courier, monospace;">...</span></div><div><br /></div><div>After placing "2":</div><div><br /></div><div><div><span style="font-family: 'Courier New', Courier, monospace;">q1 {1,2}</span></div><div><span style="font-family: 'Courier New', Courier, monospace;">q2 {1} {2}</span></div><div><span style="font-family: 'Courier New', Courier, monospace;">q3 {1,2}</span></div><div><span style="font-family: 'Courier New', Courier, monospace;">q4 {2} {1}</span></div><div><span style="font-family: 'Courier New', Courier, monospace;">q5 {1,2}</span></div><div><span style="font-family: 'Courier New', Courier, monospace;">q6 {1} {2}</span></div><div><span style="font-family: 'Courier New', Courier, monospace;">q7 {2}</span></div><div><span style="font-family: 'Courier New', Courier, monospace;">q8 {2} {1}</span></div><div><span style="font-family: 'Courier New', Courier, monospace;">q9 {1,2}</span></div><div><span style="font-family: 'Courier New', Courier, monospace;">...</span></div></div><div><br /></div><div>Completed:</div><div><div><span style="font-family: 'Courier New', Courier, monospace;">q1 {1,2} {3,4}</span></div><div><span style="font-family: 'Courier New', Courier, monospace;">q2 {1,4} {2,3}</span></div><div><span style="font-family: 'Courier New', Courier, monospace;">q3 {3,4} {1,2}</span></div><div><span style="font-family: 'Courier New', Courier, monospace;">q4 {2,3} {1,4}</span></div><div><span style="font-family: 'Courier New', Courier, monospace;">q5 {1,2} {3,4}</span></div><div><span style="font-family: 'Courier New', Courier, monospace;">q6 {1,4} {2,3}</span></div><div><span style="font-family: 'Courier New', Courier, monospace;">q7 {3,4} {1,2}</span></div><div><span style="font-family: 'Courier New', Courier, monospace;">q8 {2,3} {1,4}</span></div><div><span style="font-family: 'Courier New', Courier, monospace;">q9 {1,2} {3,4}</span></div><div><span style="font-family: 'Courier New', Courier, monospace;">...</span></div></div><div><br /></div><div>As mentioned before, to eliminate each of the 4 threads, we then place "5" to agree with a thread for 2 questions and then separate so that the thread is eliminated:</div><div><div><br /></div></div><div>Final questions<span style="background-color: white;">:</span></div><div><span style="background-color: white;"><br /></span></div><div><div><span style="font-family: 'Courier New', Courier, monospace;">q1 {1,2,5} {3,4}</span></div><div><span style="font-family: 'Courier New', Courier, monospace;">q2 {1,4,5} {2,3}</span></div><div><span style="font-family: 'Courier New', Courier, monospace;">q3 {3,4,5} {1,2}</span></div><div><span style="font-family: 'Courier New', Courier, monospace;">q4 {2,3} {1,4,5}</span></div><div><span style="font-family: 'Courier New', Courier, monospace;">q5 {1,2,5} {3,4}</span></div><div><span style="font-family: 'Courier New', Courier, monospace;">q6 {1,4} {2,3,5}</span></div><div><span style="font-family: 'Courier New', Courier, monospace;">q7 {3,4} {1,2,5}</span></div><div><span style="font-family: 'Courier New', Courier, monospace;">q8 {2,3} {1,4,5}</span></div><div><span style="font-family: 'Courier New', Courier, monospace;">q9 {1,2} {3,4,5}</span></div><div><span style="font-family: 'Courier New', Courier, monospace;">q10 {1,4,5} {2,3}</span></div><div><span style="font-family: 'Courier New', Courier, monospace;">q11 {3,4} {1,2,5}</span></div><div><br /></div></div><div><br /></div><div>Sample run as person 1:</div><div>Suppose we pick 'Y,Y' , for q1q2, lying about "3" twice. </div><div>For q3, must say "Y", but now we lied about "2" twice.</div><div>For q4, must say "Y", and now we lied about "1" twice.</div><div>For q5, must say "Y", and now we lied about "4" twice.</div><div>For q6, must say "Y", and now we lied about "3" twice.</div><div>For q7, must say "Y", and now we lied about "2,5" twice.</div><div>For q8, trapped.</div><div><br /></div><div>The reason this works is because by using the De Brujin sequence, we have narrowed the <span style="background-color: white;">possible answer choices to $2^k$ paths of arbitrary length. Note that for every consecutive block of $k$ questions each of the $2^k$ paths are covered by a string of the same number. It is then a matter of tracing the path </span><span style="background-color: white;">and placing the extra element appropriately.</span></div></div><div><br /></div><div>In general, we can construct a De Brujin sequence for $k$-length bitstrings and do the same construction. The number of questions is around $(k+1) 2^k$ to detect the lying number (but shorter if we can catch an overlap while placing the extra number).</div><div><br /></div><div><br /></div>Evanhttp://www.blogger.com/profile/11447111218857116024noreply@blogger.com1tag:blogger.com,1999:blog-2247428742741290262.post-4795699959770121162011-09-24T13:57:00.000-07:002011-09-24T14:14:22.467-07:00Investigating Quantization Cells<br />How to investigate the cells from the quantizer/encoder ${\rm round}_\mathbb{Z} \tilde{E}$ where $\tilde{E}$ is $m \times N$ on the set $X$ from before ($1 \geq |x_{(1)}| \geq 2|x_{(2)}| \geq \ldots \geq N|x_{(N)}|$)<br /><br />In any case, need to cover our set X with the quantization cells.<br /><br />Are the cells concentrated near the axes?<br /><br />Is there any experiment I can run?<br />First study the signal on the boundary:<br /><br /><ol><li>Take random permutation $P$ and signs $\xi$</li><li>Set $x(P(i)) = \xi(i)/i$ </li></ol><br /> First question: Can I determine what cell it is in, and what the size is? $q = {\rm round}_\mathbb{Z}\tilde{E} x$. What other signals in $X$?<br /><br />Next question: What properties must $\tilde{E}$ have for the cell to be small?<br /><br /><br /> How to determine the set of all signals that map to this $q$?<br /><br /> The region of measurements that rounds to $q$ we know... $<e_i,x> = q_i \pm 0.5$ where $e_i$ are the entries of $\tilde{E}$. How about the inverse image of this region? Some notes:<br /><br /><ul><li>It's a convex region, and a linear transformation, so we only need to map the endpoints ($2^m$ of them), and take the convex hull.</li><li>For any given measurement, the preimage under $\tilde{E}$ is a $N-m$ dimensional subspace (assume $\tilde{E}$ is full rank), which we then need to intersect with $X$.</li></ul><br />This seems like quite the daunting task...Evanhttp://www.blogger.com/profile/11447111218857116024noreply@blogger.com0tag:blogger.com,1999:blog-2247428742741290262.post-28128852314345066652011-09-06T15:36:00.000-07:002011-09-19T17:25:16.157-07:00More on Toy ProblemContinuing from last time, we had this picture:<br /><div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-dSxRN-ueoNY/TmfAiUHSxlI/AAAAAAAAAAU/d8RRDXHr9Zg/s1600/compressible3d.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="160" src="http://3.bp.blogspot.com/-dSxRN-ueoNY/TmfAiUHSxlI/AAAAAAAAAAU/d8RRDXHr9Zg/s320/compressible3d.png" width="320" /></a></div><br />(there should be one of these cross-shaped structures on every axis). An encoding scheme based off of such a 2D projection would be to cover the 2D image with balls of a suitable radius (or potentially different radii, depending on the thickness in the orthogonal direction in 3D), and round to the nearest ball center to encode. Then to recover, convert the ball center to the set of corresponding line of 3D coordinates and choose the center.<br /><br />If we use a fixed radius $r$, and the max thickness in the other dimension is $T$, then the distortion at worst will be $\sqrt{r^2 + (T/2)^2}$.<br /><br />Idea: (not quite true, but almost true) Because of the special nature of the set (star-shaped), the thickness also behaves in a similarly-nice way: If the thickness at the 2d projection is $T(x,y)$, then $T(rx,ry) = rT(x,y)$ (thickness is homogenous). Thus, if we just cover the bow-tie face first, we can figure out how to reduce the encoding radius of the ball as we approach the origin. Then we can study the properties of the rate distortion curve with this encoding method....<br /><br />By symmetry considerations a guess for a good projection angle is $(1,1,1)/\sqrt{3}$ (using a basis for the orthogonal complement to project onto).<br />Thus $(1,1/2,1/3) = (11,11,11)/18 + (7,-2,-5)/18$ is mapped to $(7,-2,-5)/18$. We can additionally place a coordinate axis on the grid by using an orthogonal basis for the projection plane, but we will not need that now.<br /><br />As an example computation, we can compute the thickness of how many points are mapped to this same point, at least the ones in the same convex region.<br /><br />Consider the point $x=(1,1/2,1/3)$, and we use the projection direction $(1,1,1)$. We can first focus on the region containing $x$ defined by the inequalities $1 \geq x_1 \geq 2x_2 \geq 3x_3$ and $x_i \geq 0$. How far along $(1,1,1)$ can we travel until we escape this region? We find the minimal $t>0$ for which one of the inequalities becomes equality (looking at $x - t(1,1,1)$):<br /><br />$1 - t = 1, 1 - t = 2(1/2 - t), 2(1/2 - t) = 3(1/3 - t)$ (all $t=0$)<br /><br />and<br /><br />$1 - t = 0, 1/2 - t = 0, 1/3 - t = 0$<br /><br />The minimal $t$ from above is $t=1/3$, which brings us to the point $(2/3,1/6,0)$. Now we are in the region where $x_3 < 0$, so the bounding inequalities become $1 \geq x_1 \geq 2x_2 \geq -3x_3$ and $x_1,x_2 \geq 0, x_3 \leq 0$. Again we solve for when we hit the boundary:<br /><br />$2/3 - t = 1, 2/3-t = 2(1/6-t), 2(1/6-t)=-3(0-t)$<br /><br />and<br /><br />$2/3 - t = 0, 1/6 - t = 0, 0 - t = 0$<br /><br />The minimal $t$ from above is now $t = 1/15$, when $2|x_2| = 3|x_3|$, at the point $(3/5,1/10,-1/15)$, and now we are outside the compressible region. Thus the thickness of the points mapping the same as $(1,1/2,1/3)$ is $2\sqrt{3}/5$.Evanhttp://www.blogger.com/profile/13507568616873564291noreply@blogger.com0tag:blogger.com,1999:blog-2247428742741290262.post-33746944538778158662011-08-26T23:30:00.000-07:002011-09-07T12:05:50.597-07:00Information Theory of Compressed Sensing, Compressible Signals<span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">Now we want to know about signals that are not $k$-sparse, but are </span><b><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">compressible: </span></b><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">if the coefficients of the signal $x$ are arranged in decreasing order of magnitude $x_{(n)}$, then they exhibit some power law decay:</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">\[ |x_{(n)}| \lesssim n^{-1/p} \]</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"><br /></span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">We will consider the case $p=1$ to fix ideas, and call this space $l^{1,w}$ (weak $l^1$).</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"><br /></span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">A first fact about compressible signals is that if we want to approximate $l^{1,w}$ with $\Sigma_k$, i.e. the best $k$-term approximation in the $l^2$ norm say, then the answer is just to take the top $k$ terms. For example, take $x \in l^{1,w}$ with $|x_{(n)}| \leq Cn^{-1/p}$, Then if $x^{(k)}$ is formed from the top $k$ terms of $x$, we have</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">\[ \| x - x^{(k)} \|_2 \leq C k^{-1/2} \]</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"><br /></span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">[First, some musings: If we introduce the quantity $|x| := \sup_j j |x_{(j)}|$, the rearrangement destroys triangle inequality: for instance $|(1,1/2,1/3)| = |(1/3,1/2,1)| = 1$, but the sum $|(4/3,4/3,4/3)| = 4$. Without rearranging terms, it is a norm]</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"><br /></span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">Directions... The previous quantization result does not work well for compressible signals, mainly because it focuses on the best $k$-term approximation, and the scheme cannot affect the best $k$-term error even when more measurements are used. This is a deficiency with this particular encoding...</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"><br /></span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">From an information-theoretic point of view, a first question we can ask is, what is best way to encode/compress this class of signals? Given a fixed bit budget, what is the encoding that achieves the best distortion? Alternatively, given a fixed distortion target, what is the number of bits needed to represent the entire class of signals?This can be answered by studying minimal \(\epsilon\)-nets, Kolmogorov entropy...</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"><br /></span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">The encoding given by entropy considerations is non-constructive...</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"><br /></span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">A further question is then, given a particular method of encoding, what are the theoretical limits? Sigma-like schemes on compressed sensing measurements...</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"><br /></span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">A toy problem... 3 dimensions... $(x_1,x_2,x_3)$ where one is large, the second is medium size, and the third is small. Specifically, $x_{(1)} \leq 1, |x_{(2)}| \leq |x_{(1)}| / 2$, and $|x_{(3)}| \leq 2|x_{(2)}| / 3$. How to compress such a space?</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">Some quick musings: let's compare the volume of this space to the cube ($[-1,1]^3$, volume 8). Focusing on the region $x_1 > x_2 > x_3 > 0$, the intersection with the space in question is in the convex hull of $(0,0,0)$, $(1,0,0)$, $(1,1/2,0)$ and $(1,1/2,1/3)$. The volume can be computed using the pyramid 1/3 base height formula: 1/3*(1/2*1/2*1/3) = 1/36, and by symmetry the volume is the same as any of the other 6*8 regions, so the total volume of the space is 8/6. So we are dealing with a space that is 6 times smaller in volume.</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"><br /></span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">How about $n$ dimensions? We need the volume of the convex hull of $(0,\ldots,0)$, $(1,0,\ldots,0)$, $(1,1/2,0,\ldots,0)$, $\ldots$, $(1,1/2,1/3,\ldots,1/n)$. (If the recursive formula of $1/n * V_{n-1} * h_n$ continues to hold for "pyramid-like" structures, then we have $1/(n!)^2 * n! * 2^n$, compared to the full hypercube, this is $1/n!$ times smaller. (Need to check....)</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"><br /></span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">This space is different than the space $(x_1,x_2,x_3)$ where </span><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">with rearrangement, $|x_{(j)}| \leq 1/j$ for all $j$... but this doesn't quite capture what we want in this small dimensional example, since we want a dominant component and some sort of tail, and there are only 3 spots. Maybe for later.</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"><br /></span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"><span class="Apple-style-span" style="font-size: large;">Toy Problem Compression</span></span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">Also to investigate, how can we compress the toy problem with linear measurements? First, let's look at something naive: </span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">If we are given a target distortion $D$, let's take all three coordinates and just round to the nearest $2D/\sqrt{3}$ spaced point (introduces $2$-norm distortion of $D$). Then we need $\log(\sqrt{3}/D)$ bits per coordinate, so we need $3 \log(\sqrt{3}/D)$ bits total. In terms of number of quantization points, this is $(\sqrt{3}/D)^3$, which matches compression bounds for the full cube $[-1,1]^3$ (actually the lower bound has a constant $c=3$ instead of $\sqrt{3}$ that is independent of dimension). In the toy problem, we should be able to reduce the number of quantization points by a factor of 6, roughly speaking.</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"><br /></span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">Let $D' = 2D/\sqrt{3}$ be the spacing of points per coordinate (to achieve distortion of $D$ in the 2-norm). One adaptive strategy is simply to allocate $\log(2/D')$ bits for the largest coordinate, $\log(1/D')$ bits for the middle coordinate, and $\log(2/(3D'))$ bits for the smallest (corresponding to ranges $[-1,1]$, $[-1/2,1/2]$ and $[-1/3,1/3]$). In addition to storing the order information (one of six possibilities, so can use three bits to store, say). Note that this scheme uses $3 + 3 \log( \sqrt{3}/(6D))$ bits, and if translated to the number of quantization points, it becomes $(\sqrt{3}/D)^3 * (8/6)$, which is actually worse (actually, it is worse only because of integrality issues, since we are wasteful in using 3 bits for 6-possibilities for order. otherwise we would be even with the simpler method above of just using $[-1,1]$ for all three and not recording order).</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"><br /></span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">Here's something fun, pictorally, it's easy to see how to recover one-sparse vectors in 3 dimensions with two measurements:</span><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://csdt.rpi.edu/subcult/brdance/tutorial/images/CartesianCoordinates.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://csdt.rpi.edu/subcult/brdance/tutorial/images/CartesianCoordinates.gif" /></a></div><div class="separator" style="clear: both; text-align: center;">(taken from <a href="http://csdt.rpi.edu/subcult/brdance/tutorial/intro3d.html">here</a>)</div><div class="separator" style="clear: both; text-align: center;"><br /></div><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">Of course, just any drawing of a 3d axis shows plainly how this would work. The 2d drawing is a 2d projection (i.e. 2 linear measurements), and every 1-sparse vector (a point on the axes) corresponds to a unique point in this 2d projection.</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"><br /></span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">For our toy model, it would look more like...</span><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-dSxRN-ueoNY/TmfAiUHSxlI/AAAAAAAAAAU/d8RRDXHr9Zg/s1600/compressible3d.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="160" src="http://3.bp.blogspot.com/-dSxRN-ueoNY/TmfAiUHSxlI/AAAAAAAAAAU/d8RRDXHr9Zg/s320/compressible3d.png" width="320" /></a></div><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"><br /></span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"><br /></span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">The dotted lines enclose one piece of the space and we can already see that in this 2d projection there are many potential points that correspond to the same point in the image. In particular, there does not seem to be a way to obtain a $D$-distortion code for small $D$ for these measurements. The smallest $D$ for which we can code $x$ using 2 measurements $\Phi x$ with $\Phi = \begin{pmatrix} u^T \\ v^T \end{pmatrix}$</span><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"> for two unit vectors $u,v$ is given by $\sup_{y\in \Phi(X)} {\rm diam}(\Phi^{-1}y)/2$</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"><br /></span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">I wonder what the best angle to "project" this picture is. For instance, if we just project to the x-y plane, the thickness would be $2$, and it would be a very lousy detector for signals concentrated along the z axis. How can I figure this out?</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"><br /></span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">This does not include an additional quantization step needed to compress to a specified bit budget.</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"><br /></span>Evanhttp://www.blogger.com/profile/13507568616873564291noreply@blogger.com0tag:blogger.com,1999:blog-2247428742741290262.post-14345076915445537522011-08-20T15:08:00.000-07:002011-08-29T15:08:45.366-07:00Sigma Delta for Frame CoefficientsThe idea behind the compressed sensing scheme in the paper mentioned earlier<sup><a href="http://arxiv.org/abs/1002.0182">1</a></sup> is as follows. We know that under suitable conditions, we can recover a sparse vector $x\in\Sigma_k$ with few non-adaptive linear measurements $y = \Phi x$. For practical applications, we need to be able to store these measurements, and robustness results tell us that perturbing $y$ still allows approximate recovery of $x$. Specifically, we know that we can determine the approximate support of $x$ from perturbed measurements. Let $T = {\rm supp}(x)$.<br /><br />If we know the support of $x$, then we also know that if we took the columns of $\Phi$ corresponding to $T$, which we denote $\Phi_T$, then we can recover $x$ from $y$ via any left-inverse of $\Phi_T$ (there are many such left-inverses as $\Phi_T$ is $m \times k$ with $m>k$, i.e. overdetermined). In the language of frame theory, the columns of $\Phi_T$ will form a frame and the rows of the left-inverse form a dual frame.<br /><br />From what follows we will let $E = \Phi_T$. The problem then boils down to the following: How can we quantize the measurements $q = Q(y)$, so that the error from recovery is as small as possible? Let us fix some alphabet $\mathcal{A} = \delta \mathbb{Z}$ say, and $Q: \mathbb{R}^m \to \mathcal{A}^m$. Essentially we want to solve<br />\[ \min_{Q, F: FE=I} \| x - FQ(y) \|_2 \]<br />Then taking the worst case $x \in \Sigma_k$ will tell us the rate-distortion tradeoff for compressing $\Sigma_k$ with this scheme.<br /><br />The paper cited above investigates the situation where $\Sigma\Delta$ quantization is used for the quantizer. That is, for an $r$-th order scheme, if $D$ is the finite difference matrix, the scheme solves a difference equation<br />\[ y - q = D^r u \]<br />for $q \in \mathcal{A}^m$ and $u$ bounded. In the setting of bandlimited signals, an $r$-th order scheme can obtain an error of the form $\lambda^{-r}$ where $\lambda$ is the oversampling ratio. The intuition here is that actually, since we know the support, which has size $k$, we are effectively oversampling by a factor $\lambda := m/k$, and thus we hope that we can achieve similar results in this setting.<br /><br />This equation is solved via recursion greedily, by choosing $q_n$ to minimize $u_n$, with the catch that for larger $r$ we need a finer alphabet (or more bits in quantization).<br /><br />Then to obtain an error estimate, for a dual frame $F$ (satisfying $FE=I$), we have the error estimate<br />\[ \|x - \hat{x}\| = \| F(y - q) \| = \| FD^r u \| \leq \|FD^r\|_{op} \|u\|_2 \]<br />Since $u$ is bounded, $\|u\|_2 \leq \sqrt{m} \|u\|_\infty$, and we want to find the dual frame $F$ which minimizes this operator norm, i.e. solve<br />\[ \min_{FE=I} \|FD^r\|_{op} \]<br /><br />The solution is $F = (D^{-r}E)^\dagger D^{-r}$, with operator norm $(\sigma_{min}(D^{-r}E)^{-1}$, which if $E$ is taken to be a random Gaussian frame (which is the case in one setting of compressed sensing), we can obtain an error estimate of $\lambda^{-\alpha r} / \sqrt{m}$ with overwhelming probability. When we put everything together, this gives a recovery error of $\lambda^{-\alpha r}$. (Above $\alpha$ is some constant in $(0,1)$, which affects the probability and rate of recovery)<br /><br /><span class="Apple-style-span" style="font-size: large;">An Extension</span><br />One possible extension of the above is then to say, what happens if we don't use $D^r$ as the noise shaping operator? Given $D$ and $E$, we note that there is an optimal dual frame $F$ for recovery. Can we choose $D$ so that the corresponding $F$ leads to a better result? The catch is that we need to constrain $D$ to lead to a stable scheme, so that we can still solve $y - q = Du$ with $u$ bounded, and constrained to the bit budget (alphabet constraint)<br /><br />Let us study $D$ of the form $D = I + H$ where $H$ is lower triangular with $0$'s on the diagonal<br />(will add more later...)<br /><br />Evanhttp://www.blogger.com/profile/13507568616873564291noreply@blogger.com0tag:blogger.com,1999:blog-2247428742741290262.post-17031842827189132962011-08-12T13:26:00.000-07:002011-08-30T10:52:55.963-07:00Information Theory of Compressed Sensing, Sparse Vectors<span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">The papers leading up to this point are the papers in the compressed sensing literature about sparse recovery (recovering sparse vectors of unknown support with few linear measurements relative to the ambient dimension), and most recently, a paper about quantization of compressed sensing measurements</span><sup><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"><a href="http://arxiv.org/abs/1002.0182">1</a></span></sup><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">. Here is the specific setup:</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"><br /></span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">High dimensional space $\mathbb{R}^N$</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">Class of $k$-sparse signals $\Sigma_k = \{ x\in \mathbb{R}^N: |\mbox{supp}(x)| = k \}$</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">Measurement matrix $\Phi \in \mathbb{R}^{m\times N}$, $m \ll N$, matrix satisfies some special property.</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">Compression: Given $x \in \Sigma_k$, compute $y = \Phi x$, store $y$.</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">Recovery: Solve $\min_z \|z\|_1$ s.t. $\Phi z = y$. </span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"><br /></span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">The recent theoretical results allow successful recovery with $m \sim C_1 k \log(N/k)$ for some constant $C_1$. Furthermore, if the measurements are perturbed by some small vector $e$ with $\|e\|_2 \leq \epsilon$, then using a modified recovery algorithm: $\min_z \|z\|_1$ s.t. $\| \Phi z - y \|_2 \leq \epsilon$, the recovered vector $x_\ast$ satisfies $\|x - x_\ast\|_2 \leq C_2 \epsilon / \sqrt{m}$.</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"><br /></span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">If we apply this result to quantization error, rounding each measurement $y_i$ to the nearest point in an evenly spaced grid $\delta \mathbb{Z}$, then we note $\|e\|_2 \leq \sqrt{m} \delta$ so that the recovered vector will be off by $\leq C_2 \delta$. Note there is no dependence on $m$. By rounding each measurement independently, we have not taken advantage of the fact that the measurements are correlated (a union of $k$-dimensional subspaces embedded into $m > k$ dimensions). The higher the number of measurements, the more correlated the measurements are. </span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"><br /></span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">The main result of the quantization paper above is that if we use $\Sigma\Delta$ modulation as the quantization scheme, then we can achieve a better error bound which depends on $m$:</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"><br /></span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"><b>Theorem B. </b>Let $\Phi$ be an $m\times N$ matrix whose entries are i.i.d. according to $\mathcal{N}(0,1)$. Suppose $\alpha \in (0,1)$ and $\lambda := m/k \geq c(\log N)^{1/(1-\alpha)}$ where $c = c(r,\alpha)$. Then there are two constants $c'$ and $C$ that depend only on $r$ such that with probability at least $1-\exp(-c' m \lambda^{-\alpha})$ on the draw of $\Phi$, the following holds: For every $x\in \Sigma_k$ such that $\min_{j \in \rm{supp}(x)} |x_j| \geq C\delta$, the reconstruction satisfies</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">\[ \|x - \tilde{x}\|_2 \lesssim \delta \lambda^{-\alpha (r-1/2)} \]</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"><br /></span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">Here $r$ is the order of $\Sigma\Delta$ used, where we note that for this particular result, for larger $r$ we need more bits in the quantization to keep the scheme stable. This result is also just for random Gaussian measurements. We will ignore the size condition $|x_j| \geq C \delta$ for now, but the paper mentions that this technical condition is also practical ($\delta$ is smaller than the max uncertainty from rounding, may as well assume elements with magnitude $\leq C \delta$ are $0$)</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"><br /></span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">For the rest, assume also that $|x_j| \leq 1$, so that we are studying the compression of $X = \Sigma_k \cap B_1(l^\infty(\mathbb{R}^N))$. To summarize the information-theoretic results of the paper, for an $r$-th order scheme, the number of bits needed in the quantizer is $\approx \log_2\left(\frac{5}{C\delta} \ 2^{r+1/2} \lambda^{(1-\alpha)/2} k\right)$, and the corresponding distortion is $D(r,\lambda) \lesssim \frac{\lambda^{-\alpha(r-1/2)} \delta}{2^{r+1/2}}$. Given a bit budget $B$, let $b(r,\lambda)$ be number of bits of the quantizer. We need to minimize the distortion $D(r,\lambda)$ such that $k\lambda b(r,\lambda) \leq B$. Then this will give us a rate-distortion curve $D(B)$ using compressed sensing and sigma delta quantization. (To be continued... Edit: There's something missing here and I can't seem to figure out what I need to make all the parameters and constants make sense...)</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"><br /></span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">To compare to a sillier encoding scheme, we can simply</span><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"> store ordered pairs $(x_i, i)$, where we use $b$ bits to store the coefficient and $\log_2 N$ bits to store the index. In total this would use $bk \log(N)$ bits. Setting this to the bit budget $B$, we have $b = B / (k \log(N))$. Then to compute the distortion, again we assume $|x_i| \leq 1$, and if we use spacing $\delta$ for quantization, then we need $b = \log_2(2/\delta)$. Then $\delta = 2^{(b-1)}$. Each entry would pick up an error of at most $\delta/2 = 2^{-b}$, and then the overall distortion in the $2$-norm would be $\sqrt{k}\delta/2 = \sqrt{k} 2^{-B/(k \log(N))} =: D(B)$.</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">Or if we look at the inverse function,</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">\[ B(D) = k\ \log_2(\sqrt{k} / D) \log(N) \]</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"><br /></span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif; font-size: large;">Kolmogorov Entropy</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif; font-size: large;"><br /></span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">We can also compute the theoretical limits via Kolmogorov entropy, or logarithm of the minimum number of balls (with respect to a fixed norm, $l^2$ say) of radius $D$ needed to cover the set $X$. In this setup, a cover corresponds to an encoding scheme, which is simply to map a given element to the nearest ball center. Then the number of bits of the scheme is the logarithm of the size of the cover.</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"><br /></span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">The minimum cover size is the same as the maximum number of points mutually separated by $D$, since a cover must necessarily cover each point, and no $D$-ball can cover two such $D$-separated points. A maximal set of $D$-separated points can be used to form a cover by using the points as the centers of the $D$-balls, and it is necessarily a cover by maximality of the separated set.</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"><br /></span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">Consider again $X=\Sigma_k \cap B_1(l^\infty(\mathbb{R}^N))$. This is a $k$-dimensional subset, and effectively the ($k$-dimensional) volume of $X$ is $\binom{N}{k} 2^k$. Now consider a maximal $\epsilon$-separated set. If we consider (disjoint) $l^2(\mathbb{R}^N)$ balls of radius $D/2$ around each point in the set, and intersect with $X$, we have a subset of $X$ with effective volume bounded below by</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">${\rm \#\ points} \times {\rm vol} (B_{D/2}(l^2(\mathbb{R}^k)))$ so that</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">\[ {\rm \#\ points} \leq \frac{1}{c_k} (2/D)^k \binom{N}{k} 2^k \sim \frac{1}{c_k \sqrt{k}} (4\lambda / D)^k \]</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">Here $c_k = \pi^{k/2} / \Gamma(k/2+1) \sim (2\pi e/k)^{k/2} / \sqrt{\pi k}$</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">In other words, the optimal bitrate given distortion has the bound</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">\[ B(D) \lesssim k \log(c\sqrt{k} \lambda/D) \]</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"><br /></span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">(What's hidden: This bound is probably not that great because intersecting the ball with $X$ may have many $k$-dimensional pieces, especially around the origin where all $\binom{N}{k}$ hyperplanes pass through)</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"><br /></span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">Another upper bound can be obtained by multiplying the number of optimal $\epsilon$-separated points in $B_1(l^\infty(\mathbb{R}^k)$ by $\binom{N}{k}$. This gives</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">\[ B(D) \lesssim k \log(c\lambda / D) \]</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">(generic result for bounded sets in finite dimensions can be found in paper by Kolmogorov-Tihomirov, $\epsilon$-entropy and $\epsilon$-capacity of sets in function spaces)</span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"><br /></span><br /><span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;">Edit: The difference of $\sqrt{k}$ seems to be a difference in normalization... actually the $\sqrt{k}$ inside the logarithm does not affect the asymptotics with respect to $1/D$, as $D$ decreases to $0$, so maybe something is being ignored in the latter estimate. </span>Evanhttp://www.blogger.com/profile/13507568616873564291noreply@blogger.com0