Saturday, February 22, 2014

A Solitary Pawn


A Solitary Pawn

This is best printed out for easy reference, especially once one comes to the numbered equations.
Imagine an infinite chessboard, populated by a single pawn sitting on, say, a black square.  Let that solitary pawn move, after regular time intervals, in one of four directions -- forward, backward, left or right. Let the choice of direction be determined by the fall of a four-sided (tetrahedral) die, so that the probability of moving in each direction is 1/4.
                  (Since infinite chessboards cannot be displayed, a finite one is shown above.)

Although these are not the conventional moves for a chess pawn, let these be the rules of movement for our hapless one.

Let {i} be the (infinite) set of squares. Let {t}  be the discrete times at which the pawn completes a move and arrives at a new square. Then (i,t) are the discrete space-time co-ordinates of the pawn, with (ia, ta) representing the event a -- the pawn's arrival at square ia at time ta.

The set of squares {i} could be numbered, for example, by the x,y co-ordinates of the squares relative to an origin square, which could be the starting square. Both x and y would be (signed) integers in this case, with the set {i} being represented by pairs of these signed integers.

We might wish, instead, to choose a single index number, i, to represent each such pair (x,y) and so also each square. Whether that is feasible in our two-dimensionally infinite case, I will leave for you to figure out, as it is beyond this non-mathematician.

Let P(a) be the probability of event a, so that P(a) = 1 if a is true. All the probabilities are fixed by the initial position of the pawn at the time we start our observations. But that is only because we found the pawn at that position at that time.

Of course, as we follow the pawn in an experiment/game, we will see it take a particular path, determined by the successive rolls of the die. That will be our particular, experienced reality. But there are other possibilities, each deserving the title of "reality" if realized. In other realities, all obeying the same rules of movement, the pawn would take alternate routes. We believe that if we could explore all these possibilities or alternate realities (which in this case happen to be infinite in number) we would then be able to experimentally determine the probabilities P(a).

The only way to approximate such a thought-experiment with a real one would be to repeat the game a very large number of times, starting always with the pawn in the same position and taking the starting time to be the same.  The probabilities so obtained should get closer and closer to the actual (theoretical) probabilities. But to get them exactly, we would need an infinite number of trials. This infinite repetition would be required even for a game with a finite number of moves.  In practice, we could (for a finite game) repeat the experiment a tiresome number of times and then say that we have come close enough.

But what can we do instead, as lazy theorists, averse to wasting our time mucking around with a pawn and a tetrahedral die, or giving up on our attempts to purchase or construct an infinite chessboard?  We can assume that the die is a fair one, and then apply the venerable laws of the Bernoullis et al to calculate the probabilities P(a) -- as far as we are able to.  We could do this, by hand, for, say, 4 successive moves -- or,
by using a computer, for many more. 

But if we are lazier still, we can refrain from even bothering with numerical calculations.  Instead, we could just explore the game using algebra, hopefully establish some features of how the game plays out, declare victory and retire.  Let us take this route, favored by the laziest.  Unfortunately, this also requires some knowledge, which may be acquired, along with some acumen, which we may or may not possess. But no matter.

Let P(b|a) be the probability of event b, given event a.  More explicitly, let P(b|a) be the probability of event b occurring, given that event a occurs..

I have adopted here a notation,
P(b|a), for the relative probability of events, that is closer to that suggested by blank01 and
Alpan (Rawal).

Let us write down some useful facts or results regarding this quantity, valid in the situation we are considering -- the lonely pawn moving on the squares of an infinite chessboard at regular gong-chimes, according to the throw of a fair tetrahedral die.

Recall that b and a are events, being the presence of a pawn at a certain square at a certain time, and recall also that we can represent these events by their space-time co-ordinates, so that a = (ia, ta) and b = (ib, tb).

1) 0 <= P(b|a) <= 1

This follows because P(b|a) is a probability, which is a ratio of cases (possibilities) resulting in the outcome to the total number of cases.  This ratio has values ranging from 0 (impossibility = in no case) to 1 (certainty = in all cases).

2) P(b|a) = 1  if and only if b=a
    If b=a, then  P(b|a) = 1 follows from the definition of P(b|a).
If b =/= a (by which we mean that b is not the same as a) and P(b|a) = 1, then this means that a move into square ib at time tb is a certainty, while movements into other squares at that time (tb) are impossible, with zero probabilities. (The sum of all probabilities at a given time must equal 1). Simply from symmetry considerations, this is absurd. (The movement-rules and the chessboard imply symmetries in both time and space).  So P(b|a) cannot be 1 in this case (b=/=a). So we have proved the "only if", albeit by waving my hands a bit.

2a) P(b|a) = 0 if ta=tb and ib=/=ia

This follows because the pawn cannot be at two different squares, ib and ia, at the same time, tb=ta.

We are neglecting the pawn's jump-time, during part of which some might argue the pawn is in both of two adjacent squares. We could take the jump to be instantaneous, or the jump-time to be so small compared to the time between chimes that we can exclude it from our considerations here.  Even if this is not the case, from the way we originally defined the times {t}, stating that these are the discrete times at which "the pawn completes a move and arrives at a square", the jump-time need not concern us.

3) P(a|b) = P(b|a)

Since the rules of the game are completely symmetrical with respect to both time and space, this follows. If, for example, b is at a later time than a (tb  > ta) then, along every path moving forward in time, linking the prior event a to the later event b, we could retrace our steps, at each step (jump between squares) using, this time, the probability that the pawn was at an adjacent square in the previous time interval. This will again be just 1/4.

So the probability calculations going forward and backward yield the same result, and (3) follows. 

I have waved my hands again, but, hopefully, not with any sleight of hand, intended or inadvertent.

4) The probability of event c, given event a and an (intermediate) event b, is, by the multiplicative (AND) law of probability: 

      P(c|b|a) = P(c|b) x P(b|a)

5) Next, consider all possible paths between events a and c, with {b} being the set of all possible intermediate events, intersected by these paths at time tb, where we have the constraint:

   5a)   ta < tb < tc    OR    ta > tb > tc

Since the events {b} are exclusive events (being point-like, with no overlap) and, by our previous statement, exhaust all the possibilities at time tb, we have, from the additive (OR) law of probability:

   5b)    P(c|a) = Sum over {b} of P(c|b|a)  =  Sum over {b} of ( P(c|b) x P(b|a) )

Since each term in this sum is positive, being a probability, it follows that:

   5c)     P(c|a) >= P(c|b|a) = P (c|b) x P (b|a)

where we have retained only one of the terms on the right, that corresponding to the particular event b. (The inequality follows because the sum of positive terms must be greater than or equal to any one of those terms.)

Let us now define a tentative distance-measure (a candidate "metric") u(b,a), between two events, b and a:

6)  u(b,a) = - C log ( P(b|a) )

where C is a positive constant and the logarithm of a number, x, is defined as the power, y, to which the base 10 must be raised to give that number:
7)     x = 10^y  <=> y = log (x)

with ^ being used for power (exponent) and  <=> standing for symmetrical implication (if and only if).

Note that since 10^0 = 1,  log (1) = 0.

Also, since 10^(-infinity) = 0,   log (0) = - infinity.

Those who know mathematics will not like our use of "infinity". But, here again, no matter!  We shall continue merrily, pooh-poohing such concerns.

For those who might have forgotten their high-school logarithms:

Let             X = 10^a       and      Y  = 10^b

So that      log(X) = a      and     
log(X) = b

Then          X x Y  =  (10^a)  x  (10^b)  = 10^(a+b)

So that  we have the useful result:

7a)    log(X x Y)  =  a + b  = log(X) + log(Y)

So "taking logs" converts a product to a sum. 
When we were high-school students, we used logarithm tables to to do difficult multiplications (and divisions, which reduce  to subtractions of log values).  The slide rule was also based on this.                                                                         

We shall use this result ourselves, below. 
u(b,a) could be thought of as the "unlikelihood" of event b occurring, given that event a occurs.  Or we can think of it as a distance between (point-event) possibilities in a possibility-space. So we could also refer to it as a measure of "possibility distance".

But we are jumping too far ahead. Can we really think of this quantity as a "distance" in a meaningful way?  Is it a proper distance measure -- a true metric?

The general requirements for a metric d(a,b), in a space in which a, b are "points", are:

8.1)   d(b,a) >=0                                positivity
8.2)   d(b,a) = 0 if and only if b=a       distinct events separated by non-zero distance
8.3)   d(b,a) = d(a,b)                          symmetry
8.4)   d(c,a) <= d(c,b) + d(b,a)            triangle inequality 

(In an Euclidean space, 8.4 means that the shortest distance between two points is along the straight line between the two. Detours are longer.)

If these conditions,
8..1 -- 8..4, are satisfied, then the space of points {a,b,c...}, with the metric d(b,a) used to measure distances between points, forms a "metric space", with many familiar and useful properties. The two dimensional surface of a blackboard, with distances between chalk points measured with a ruler, is a familiar example of such a space. If the points on the board are represented by x,y co-ordinates, then the distance can be written, using Pythagoras' Theorem, in terms of the differences between the x, y coordinates of two points. So, because Pythagoras (and all of Euclidean geometry) applies to the space of points on a blackboard, it is called an Euclidean metric space.

The 3D space in which the teacher who writes on the blackboard (or whiteboard, alas) lives and moves about, along with his/her students, is also an Euclidean metric space.

There are, however, many other metric spaces which are not Euclidean. They are still metric spaces because we can define metrics (distance measures) on them that obey the rules listed above.

Let us now attempt to see whether the unlikelihood function u(a,b) does indeed behave like a proper measure of separation --  a metric, d(a,b) -- should behave.

Or does it have behavior problems that can or cannot be dealt with? Teachers, take note!

We shall confine our attention to the case of the pawn on the infinite chessboard.  So one should be careful not to generalize the results we obtain below to other situations, without taking note of differences that may arise.

From the definition of the unlikelihood (6) and of the (relative) probability (1), we see that:

since P <=1

u = - C log(P) >= log (1) = 0

where the inversion of the inequality comes from the negative sign in the definition of u.  (Recall that C is positive, so - C is negative.)
So we have established:

9.1)  u(b,a) >= 0     (positivity)

Combining our result (
2)   P(b,a) = 1 if and only if b=a  
with our definition     (6)    u(b,a) = - C log( P(b|a) )  we get:

9.2)  u(b,a) = 0 if and only if b=a     (The possibility-distance between distinct point-events is non-zero.)

Combining our result  (2a) 
P(b|a) = 0 if ta=tb and ib=/=ia
with our definition       (6)   u(b,a) = - C log
( P(b|a) )   
we get:

9.2a)   u(b,a) = + infinity  if ta=tb and ib=/=ia

This is a special case of  the more general result that two events that cannot both be true are separated, in the "possibility space", by an infinite distance, as measured by our proposed metric.  Although this is a digression from our attempt at establishing that our candidate is indeed qualified (under certain conditions, as for a handicapped worker) for that job, this result is worth noting.

Next, from our (hand-waving) result  (3)   P(a|b) = P(b|a)
    and  from our definition                     (6)   u(b,a) = - C log( P(b|a) )
   we get:

   9.3)  u(a|b) = u(b|a)    (symmetry)

   Finally, from our result   (5c)  P(c|a) >= P(c|b) x P(b|a)

   (where, from (5a), tb lies between ta and tc)  
   and from our definition   (6)  
u(b,a) = - C log(P)
   we get

  u(c,a)  <= u(c,b) + u(b,a)     (provisional triangle inequality)

where again, as for (9.1), the inversion in the inequality comes from the negative sign in the definition of the possibility-distance.

The provision on the validity of 9.4 is that of (5a), namely:

9.4a)   ta < tb < tc    OR    ta > tb > tc  

and this is a handicap that our candidate and his/her employers have to deal with.
Enough for now!  Apologies for the infliction. Print it out and read it at your leisure, if you feel so inclined. Do let me know where the reasoning is faulty or sloppy. I am sure you sure you will find several instances of these.  This is a quick and perhaps premature, send-out -- but I may not be able to think about these things again for a while -- or ever!

-- Arjun
Correspondence on this follows at the next post on the blog.

No comments:

Post a Comment