## Monday, 20 October 2014

### Level n numbers tend to 1 in 2(3^n)/3 of the outputs produced by the 'pqr function'

An attempt at a geometrical proof  that the proportion of level n Collatz numbers in a very large region tends to 1 in 2(3^n)/3 of the outputs produced by the 'pqr function'

This is what I am calling the 'pqr function' (for want of a better name)

•  Any odd number p can be expressed as 2^n – q where 2^n is the first power of 2 higher than p.
•  Express q as a series of powers of 2 in ascending order. For instance, where p = 475 (and 2^n = 512), q = 37 = 1 +  4 + 32
• Restrict the inputs to this function to numbers q < 2^n/8
• Multiply each term in this series by a power of 3 with the power descending by one for each term (ending with 3^0).
• So in this case 1*9 + 4*3 + 32*1 = 53. We’ll call this number q(3).
• Subtract q(3) from 2^n and divide by the next highest power of 3. So in this case (512 – 53)/27 = 17. We will call this result r.
• Here are two more simple examples of this function.

15 = 16 – 1
q(3) = 1
(16 – 1)/3 = 5
So r = 5

1897 = (2048 – 1 – 2 – 4 – 16 – 128)
q(3) = 81 + 2*27 + 4*9 + 3*16 + 128 = 347
(2048 – 347)/243 = 7

The examples above result in positive integer solutions. However, most instances of this function result in fractional or negative solutions.

For instance:

49 = 64 – 1 – 2 – 4 – 8
q(3) = 27 + 9*2 + 4*3 + 8 = 65
(64 – 65)/81 = -1/81

27 = 32 – 1 – 4
q(3) = 3+4 = 7
(32 – 7)/9 = 25/9

Where this function produces a positive integer, it can be used to uniquely identify a Collatz chain from that integer to 1.

For instance take the first example above

475 = 512 – 1 – 4 – 32
q(3) = 9 + 3*4 + 32 = 53
(512 – 53)/27 = 17

What this tells us is that from 17 there are three tripling steps in the Collatz chain to 1.

If there are n steps from an integer to a power of 2, we will call that integer a Level n number.

Here is why this function doesn't produce duplicate outputs. Every integer that has a level n solution also has a solution for all higher levels. For instance 5 = (16-1)/3 = (64-3-16)/9 = (256-9-48-64)/27 etc. However after the first solution, the additional solutions all have q > 2^n/4 so these do not get output when we restrict inputs to q < 2^n/8

First I'm going to show how Level 1 and Level 2 integer outputs from the function tend to limits of 1 in 2 and 1 in 6, then I want to extrapolate from this to higher levels.

1. Level 1
Level 1 inputs to the pqr function are 2^n-1 where 2^n > 8. The outputs are (2^n-1)/3)
2^n numbers alternate between 1 and 2 mod 3
So 2^n - 1 alternates between 0 and 1 mod 3.
When 2^n-1 = 0 mod 3 the output (2^n-1)/3) is an integer.

The series of outputs is 5, 31/3, 21, 127/3, 85...
So the proportion of integer outputs is 1, 1/2, 2/3, 1/2, 3/5, 1/2, 4/7 ... , which oscillates between 1/2 and the series p/(2p-1) which tends towards 1 in 2.

Therefore the proportion of Level 1 integers output by level 1 inputs tends towards 1 in 2 .

2. Level 2

Level 2 inputs = 2^n - (1 + 2^m) where n > m+3. The outputs are (2^n - (3 + 2^m))/9

First, here is a 6x6 grid of integer outputs from a calculation of this sort (to start with we will ignore the restriction q < 2^n/8).

2^n increases along the horizontal axis. q increases down the vertical axis.

9 is a power of 3.

Every prime power (except powers of 2) has a primitive root; thus the multiplicative group of integers modulo 3n is cyclic. (I've quoted this - I hope it is the right terminology - in this case what I mean is that as we double from 1 mod 9 we will pass through each residue class mod 9 other than multiples of 3 before returning to 1).

The series of 3+2^m values (the q(3) column in the grid) is congruent to this cycle - with values of 5, 7, 2, 1, 8, 4 mod 9

When we subtract these values from a particular value of 2^n we get either a cycle through the 0 and 1 mod 3 numbers or a cycle through the 0 and 2 mod 3 numbers.

Therefore 1 and only 1 number in each column in a 6x6 grid  =  0 mod 9.

The horizontal series of 2^ n numbers is also congruent to the same cycle (in this case, 5, 1, 2, 4, 8, 7)

Therefore when we subtract any given (3+2^m) from the 2^n series we also get either a cycle through the 0 and 1 mod 3 numbers or a cycle through the 0 and 2 mod 3 numbers.

Therefore 1 and only 1 number in each row in a 6x6 grid =  0 mod 9.

Therefore a square 6x6 grid of this sort will always contain 6 in 36 (=1 in 6) integer outputs.

However, the grid above contains numbers that aren't permissible inputs, since our function is restricted to inputs where q < 2^n/8. So now we need to consider only this triangle (the numbers that aren't crossed out:

In this area we only have 4/21 (= 1 in 5.25) integer outputs. (The 'worst case scenario' would be 1/21, with one integer output in the top right corner and the rest in the 'restricted zone - while the 'best case scenario' would be 6 in 21 down the central diagonal).

However, when we continue to develop the pattern we get this:

The square has 6 in 36, each triangle has 4 in 21 - total proportion, 14 in 78 = 1 in 5.57

Then we get:

= 3 squares (6 in 36) and 3 triangles (4 in 21) = 30 in 171 = 1 in 5.7

And eventually we get a pattern more like this:

It's obvious from looking at this that the area in which the proportion of the area filled by squares rather than triangles is increasing as it is similar to the way that calculus uses rectangles under a curve to approximate area.And the squares have a ratio of 1 in 6 integer results, so the ratio for the whole triangle tends up towards 1 in 6.

Level 3.

At level 3 (and subsequent levels) we get a similar pattern in 3 (and higher) dimensions. We can visualise the initial level 3 pattern as a tetrahedron constructed from a series of triangles that fit together

The pattern is built up thus:

2^n = 64
q = 1+2+4, q(3) = 9+6+4

2^n =128
q = 1+2+4, q(3) = 9+6+4
q = 1+2+8, q(3) = 9+6+8
q = 1+4+8, q(3) = 9+12+8

2^n=256
q = 1+2+4, q(3) = 9+6+4
q = 1+2+8, q(3) = 9+6+8
q = 1+4+8, q(3) = 9+12+8
q = 1+2+16, q(3) = 9+6+16
q = 1+4+16, q(3) = 9+12+16
q = 1+8+16, q(3) = 9+24+16

2^n=512
q = 1+2+4, q(3) = 9+6+4
q = 1+2+8, q(3) = 9+6+8
q = 1+4+8, q(3) = 9+12+8
q = 1+2+16, q(3) = 9+6+16
q = 1+4+16, q(3) = 9+12+16
q = 1+8+16, q(3) = 9+24+16
q = 1+2+32, q(3) = 9+6+32
q = 1+4+32, q(3) = 9+12+32
q = 1+8+32, q(3) = 9+24+32
q = 1+16+32, q(3) = 9+48+32

When these are arrayed in a series of triangles we get:

9+6+4

9+6+4  9+6+8
9+12+8

9+6+4    9+6+8      9+6+16
9+12+8  9+12+16
9+24+16

9+6+4      9+6+8       9+6+16    9+6+32
9+12+8    9+12+16  9+12+32
9+24+16  9+24+32
9+48+32

Once again the vertical and horizontal rows and columns (as well as the series 2^n - q(3) for each value of q) must be cyclic, this time through the values of 27 (excluding multiples of 3). So 1 in 18 in each row and column must be an integer output.

And as we expand the pattern beyond the 18th power of 2 we get a series of cubes within which the 18x18x18 grid must contain exactly 18x18 integer solutions: just as in this badly drawn diagram.

As the overall shape grows larger, we can fill more of the volume with cubes and less with pyramids:

So again, the proportion of integer outputs will either tend up towards 1 in 18 or down towards 1 in 18 depending on whether the initial pyramid pattern has more or less than 1 in 18.

Level 4 and beyond:

We can visualise larger versions of the pattern by projecting a series of the level 3 pattern, then a series of series etc: Here is what the level 4 pattern will look like:

Again, there is an initial 'triangular' pattern in which we can have more or less than 1 in 54 integer solutions. And again, because doubling mod 81 is also cyclic, we can fill an increasing proportion of the space with 54x54x54x54 cuboids (with exactly 1 in 54x54x54 integer solutions) as the pattern develops.

And we can extrapolate a similar pattern for any level n. Level 5 could be depicted by a growing series of the diagram above. Level 6 could be depicted by a growing series of the Level 5 diagram. And so on. In each case the volume can be filled by an increasing proportion of n-dimensional cuboids.

Therefore the proportion of level n Collatz numbers in a large region tends towards 1 in 2(3^n)/3 of the outputs produced by the function.

Final thoughts

Clearly if we are trying to exactly 'count' all integer outputs, then an error term of even 1 in a trillion is too high. Though if the proportions do tend to 1 in 2, 1 in 6 etc, it might at least help support a statistical argument and I think there may be strategies for using this method that don't strictly rely on counting.

I can explain more about this latter train of thought, but I want to keep this post as simple as possible for now.