Tuesday, July 24, 2012

Liar's Guessing Game and the De Brujin Sequence

This post concerns the first part of IMO 2012 Problem 3  (a polymath problem), which is about a liar's guessing game.

http://michaelnielsen.org/polymath1/index.php?title=Imo_2012

A quick summary of the problem (already reduced):

$k$ is a fixed positive integer. Two players, player 1 chooses a number between 1 and $2^k+1$.  Player 2 can ask yes/no questions of the sort,  "Is it in the set {... } ?". Player 1 is allowed to lie at most k times in a row  (or, in a string of $k+1$ questions, one of the answers must be truthful). The goal of player 2 is to narrow down the possibilities of player 1's number to a set of size $2^k$ from any number of questions.
Part 1 boils down to showing that player 2 can set up a sequence of questions so that after player 1 answers, player 2 can throw out one of the numbers (which if it were player 1's number, would have been lied about $k+1$ times)

I wanted to point out a pretty connection to the De Brujin sequence.

The Liar Graph

First, let's introduce a structure which we can call the 'liar graph':

Each vertex of the graph corresponds to either the set of numbers asked about in a particular question, or the complement of the set.

$k$=1 example (3 numbers):

Suppose my set of questions were:

q1: {1} {2,3}
q2: {1} {2,3}
q3: {2} {1,3}
q4: {2} {1,3}
q5: {3} {1,2}
q6: {3} {1,2}

(Asked,  "is it x?" two times for each x)

There are 12 nodes in the graph.

If player A answers 'yes' on q1, then he is lying if he picked {2,3}
If 'no' on q1, then he is lying if he picked {1}

We are trying to catch him lying twice. Draw an edge between two
nodes in adjacent rows if they share an element.

Here it looks like



{1} {2,3}
  |   |
{1} {2,3}
   X  |
{2} {1,3}
  |   |
{2} {1,3}
   X  |
{3} {1,2}
  |   |
{3} {1,2}

(hope that formatted correctly. The X above should be seen as 2 edges that are crossing)

So, now, player 1 wins if he can select 1 node from each row
so that the selected nodes form an independent set
(if he selects two nodes that are connected by an edge, it means
he lied twice in a row for some number, which we can then eliminate)

Another way to view this setup is to ask, what are the valid choices for player 1
for the current configuration? This we can build up as we are determining what the
questions are.


So from the first 2 questions,



{1} {2,3}
  |   |
{1} {2,3}


we already know that 'Y,Y' and 'N,N' will cause player 1 to lie twice already.


So the only choices are 'Y,N' and 'N,Y'


On the third question, we can see the choices get restricted a lot more


q2: {1} {2,3}
       X  |
q3: {2} {1,3}


If player 1 answered 'Y' on question 2 (so that 2,3 is a lie) then no matter what player 1 will lie on the third question.


Thus at this point, the only safe choice for player 1 is 'Y,N,N'


for q4, player 1 has to choose 'Y', and then we see that q5 player 1 is now out of options and will lie for some number.


-----
(As an aside, here is a quicker sequence of questions:

{1} {2,3}
   X  |
{2} {1,3}
 |    |
{2} {1,3}
   X  |
{1} {2,3}

4 questions are enough here)


For $k>1$, we have the same layout, but instead of drawing edges, we should instead draw paths (liar path) of length $k$ which connects a single number. Then we need to set the questions up so that no matter what  sequence of answers Player 1 chooses, it contains such a liar's path.

Restricting Possible Answers

Note that within $k$ questions, there are $2^k$ possible paths (answer sequences) that Player 1 can choose. So as player 2, along each of the possible paths, place one of the $2^{k+1}$ numbers in the
appropriate node. 

{ For $k=2$, it looks like

q1 {1,2} {3,4}


q2 {1,3} {2,4}

(1 = NN, 2 = NY, 3 = YN, 4 = YY)  }


Then within the first 2 questions, no matter what player 1 chooses, he will lie above some number $k$ times. But he is allowed to lie $k$ times, so we need to catch him with further questions. Note that we have an
additional number we have not included $(2^k + 1)$.  We can then eliminate one of the possible paths for
player 1 in the next question by placing this extra number along the same path as one of the numbers (say 1), and then in the ($k+1$)-th question, placing $1$ and $2^k+1$ in opposite sides of the question.

This way, if player 1 chooses a sequence which highlights $1$ and $2^k+1$ within the first $k$ questions, 
in the ($k+1$)-th question the player will be forced to lie. We have then eliminated 1 of the $2^k$ possible
sequences in the first $k$ questions. 

{ For the $k=2$ example above, it looks like

q1 {1,2,5} {3,4}


q2 {1,3,5} {2,4}


q3 {1, ?} {5, ?}

Now player 1 cannot choose 'N,N' in the first 2 questions or else he's stuck.  }

Now that one of them is eliminated, the idea is to maintain the same number of possible paths and reduce them one by one.

The observation that by the $k$-th question, he has picked 1 of the $2^k$ paths is crucial to this argument. 

Let's examine the $k=2$ example above again. Suppose Player1 has instead followed 2  (by picking 'N,Y').
Note now that if we place 2 in the first set of question 3, Player 1 would be forced to pick 'Y' so as not to lie. Thus, we note that just by placing {1,..,4} arbitrarily within question 3, Player 1 is still restricted to $2^k-1=3$ possible answer choices within the 3 questions. Yet we still have the 5th number to place, which will be the key to removing more paths.

We're not done yet, however. We have to be careful to place {1,..,4} in such a way that the same property holds in the next question. This is where the De Brujin sequence comes in.

k=2 sequence:  'NN',  'NY',  'YY',  'YN'    (or 'N,N,Y,Y,N,N,...')

which has the property that all possible strings of size $k$ can be extracted by sliding a window of size $k$ along the sequence and reading the string, and furthermore, it is a periodic sequence. 

For each number (1,2,3,4), start from a different part of the sequence and place them in the questions:

1:N(N)
2:N(Y)
3:Y(Y)
4:Y(N)

After placing "1":

q1  {1}
q2  {1}
q3           {1}
q4           {1}
q5  {1}
q6  {1}
q7           {1}
q8           {1}
q9  {1}
...

After placing "2":

q1  {1,2}
q2  {1}    {2}
q3         {1,2}
q4  {2}    {1}
q5  {1,2}
q6  {1}    {2}
q7         {2}
q8  {2}    {1}
q9  {1,2}
...

Completed:
q1  {1,2} {3,4}
q2  {1,4} {2,3}
q3  {3,4} {1,2}
q4  {2,3} {1,4}
q5  {1,2} {3,4}
q6  {1,4} {2,3}
q7  {3,4} {1,2}
q8  {2,3} {1,4}
q9  {1,2} {3,4}
...

As mentioned before, to eliminate each of the 4 threads, we then place "5" to agree with a thread for 2 questions and then separate so that the thread is eliminated:

Final questions:

q1  {1,2,5} {3,4}
q2  {1,4,5} {2,3}
q3  {3,4,5} {1,2}
q4  {2,3}   {1,4,5}
q5  {1,2,5} {3,4}
q6  {1,4}   {2,3,5}
q7  {3,4}   {1,2,5}
q8  {2,3}   {1,4,5}
q9  {1,2}   {3,4,5}
q10 {1,4,5} {2,3}
q11 {3,4}   {1,2,5}


Sample run as person 1:
Suppose we pick  'Y,Y' , for q1q2,  lying about "3" twice. 
For q3, must say "Y", but now we lied about "2" twice.
For q4, must say "Y", and now we lied about "1" twice.
For q5, must say "Y", and now we lied about "4" twice.
For q6, must say "Y", and now we lied about "3" twice.
For q7, must say "Y", and now we lied about "2,5" twice.
For q8, trapped.

The reason this works is because by using the De Brujin sequence, we have narrowed the possible answer choices to $2^k$ paths of arbitrary length. Note that for every consecutive block of $k$ questions each of the $2^k$ paths are covered by a string of the same number.  It is then a matter of tracing the path and placing the extra element appropriately.

In general, we can construct a De Brujin sequence for $k$-length bitstrings and do the same construction. The number of questions is around $(k+1) 2^k$ to detect the lying number (but shorter if we can catch an overlap while placing the extra number).