**An attempt at a geometrical proof**

**that the proportion of level n Collatz numbers in a large region tends to 1 in 2(3^n)/3 of the outputs produced by the function described in my previous post.****First I'm going to show how Level 1 and Level 2 integer outputs from the function tend to limits of 1 in 2 and 1 in 6, then I want to extrapolate from this to higher levels.**

**1. Level 1**

(To recap, the Level 1 inputs to the function are 2^n-1 where 2^n > 4. The outputs are (2^n-1)/3)

2^n numbers alternate between 1 and 2 mod 3

So 2^n - 1 alternates between 0 and 1 mod 3.

When 2^n-1 = 0 mod 3 the output (2^n-1)/3) is an integer.

The series of outputs is 7/3, 5, 31/3, 21, 127/3, 85...

So the proportion of integer outputs is 0, 1/2, 1/3, 1/2, 2/5, 1/2, and this continues to alternate between p/(2p+1) and 1/2, with the value of p growing so that p(2p+1) tends towards a limit of 0.5

**Therefore the proportion of Level 1 integers output by level 1 inputs tends towards 1 in 2 .**

**2. Level 2**

To recap, level 2 inputs = 2^n - (1 + 2^m) where n > m+2. The outputs are (2^n - (3 + 2^m))/9

First, here is a 6x6 grid of integer outputs from a calculation of this sort (to start with we can ignore the restriction q < 2^n/4).

2^n increases along the horizontal axis. q increases down the vertical axis.

9 is a power of 3.

Every prime power (except powers of 2) has a primitive root; thus the multiplicative group of integers modulo 3

^{n}is cyclic. (I've quoted this - I hope it is the right terminology - in this case what I mean is that as we double from 1 mod 9 we will pass through each residue class mod 9 other than multiples of 3 before returning to 1).

The series of 3+2^m values (the q(3) column in the grid) is congruent to this cycle - with values of 2, 4, 8, 7, 5, 1 mod 9

When we subtract these values from a particular value of 2^n we get either a cycle through the 0 and 1 mod 3 numbers or a cycle through the 0 and 2 mod 3 numbers.

**Therefore 1 and only 1 number in each column in a 6x6 grid = 0 mod 9.**

The horizontal series of 2^ n numbers is also congruent to the same cycle (in this case, 7, 5, 1, 2, 4, 8)

Therefore when we subtract any given (3+2^m) from the 2^n series we also get either a cycle through the 0 and 1 mod 3 numbers or a cycle through the 0 and 2 mod 3 numbers.

**Therefore 1 and only 1 number in each row in a 6x6 grid = 0 mod 9.**

**Therefore a square 6x6 grid of this sort we will always contain 1 in 6 integer outputs.**

**However, the grid above contains numbers that aren't permissible inputs, since our function is restricted to inputs where q < 2^n/4. So now we need to consider only this triangle:**

In this area we only have 3/21 (= 1 in 7) integer outputs. (The 'worst case scenario' would be 1/21, with one integer output in the top right corner and the rest in the 'restricted zone - while the 'best case scenario' would be 6 in 21 down the central diagonal).

However, when we continue to develop the pattern we get this:

The square has 6 in 36, each triangle has 3 in 21 - total proportion, 12 in 78 = 1 in 6.5.

Then we get:

= 3 squares (6 in 36) and 3 triangles (3 in 21) = 27 in 171 = 1 in 6.33

And eventually we get a pattern more like this:

It's obvious from looking at this that the area in which the proportion of the area filled by squares rather than triangles is increasing as it is similar to the way that calculus uses rectangles under a curve to approximate area. But just to run through the maths . . .

Bearing in mind that each triangle or square is 6 powers of 2^n wide and tall - in this image there are 17 triangles and (16+15+....2+1) squares = 136 squares (6 in 36) and 51 triangles (3 in 21) = 1 in 6.058

This image is 17 columns wide. The formula for N columns is [N triangles and N(N-1)/2 squares].

The formula for the proportion of integer outputs here is (6 x the number of squares + 3 x the number of triangles) divided by (36 x the number of squares + 21 times the number of triangles) =

[6N(N-1)/2 + 3N]/[36N(N-1)/2 + 21N] = 3N^2/(18N^2 + 3N) =

**N^2/(6N^2 + N)**

**N^2/(6N^2 + N)**tends up towards 1/6.

(For instance, at 1000 columns wide the proportion is 1 in 6.001)

I mentioned the best case scenario for the original triangle being 6 (typo corrected, this said 5) in 21 and the worst case being 1 in 21. In those cases the formulae would be:

(3N^2 + 3N)/(18N^2 + 3N) which tends down towards 1 in 6

(3N^2 - 2N)/(18N^2 + 3N) which tends up towards 1 in 6

**So regardless of the pattern in the initial triangle the proportion of integer outputs in this pattern would have to tend towards 1 in 6.**

**Level 3.**

At level 3 (and subsequent levels) we get a similar pattern in 3 (and higher) dimensions. We can visualise the initial level 3 pattern as a tetrahedron (or a cube, dissected by a plane through three of its corners), constructed from a series of triangles that fit together

The pattern is built up thus:

2^n = 32

**q = 1+2+4, q(3) = 9+6+4**

2^n =64

q = 1+2+4, q(3) = 9+6+4

**q = 1+2+8, q(3) = 9+6+8**

**q = 1+4+8, q(3) = 9+12+8**

2^n=128

q = 1+2+4, q(3) = 9+6+4

q = 1+2+8, q(3) = 9+6+8

q = 1+4+8, q(3) = 9+12+8

**q = 1+2+16, q(3) = 9+6+16**

**q = 1+4+16, q(3) = 9+12+16**

**q = 1+8+16, q(3) = 9+24+16**

2^n=256

q = 1+2+4, q(3) = 9+6+4

q = 1+2+8, q(3) = 9+6+8

q = 1+4+8, q(3) = 9+12+8

q = 1+2+16, q(3) = 9+6+16

q = 1+4+16, q(3) = 9+12+16

q = 1+8+16, q(3) = 9+24+16

**q = 1+2+32, q(3) = 9+6+32**

**q = 1+4+32, q(3) = 9+12+32**

**q = 1+8+32, q(3) = 9+24+32**

**q = 1+16+32, q(3) = 9+48+32**

When these are arrayed in a series of triangles we get:

9+6+4

9+6+4 9+6+8

9+12+8

9+6+4 9+6+8 9+6+16

9+12+8 9+12+16

9+24+16

9+6+4 9+6+8 9+6+16 9+6+32

9+12+8 9+12+16 9+12+32

9+24+16 9+24+32

9+48+32

Once again the vertical and horizontal rows and columns (as well as the series 2^n - q(3) for each value of q) must be cyclic, this time through the values of 27 (excluding multiples of 3). So 1 in 18 in each row and column must be an integer output.

And as we expand the pattern beyond the 18th power of 2 we get a series of cubes within which the 18x18x18 grid must contain exactly 18 integer solutions: just as in this badly drawn diagram.

As the overall shape grows larger, we can fill more of the volume with cubes and less with pyramids:

So again, the proportion of integer outputs will either tend up towards 1 in 18 or down towards 1 in 18 depending on whether the initial pyramid pattern has more or less than 1 in 18.

**Level 4 and beyond:**

We can visualise larger versions of the pattern by projecting a series of the level 3 pattern, then a series of series etc: Here is what the level 4 pattern will look like:

Again, there is an initial 'triangular' pattern in which we can have more or less than 1 in 54 integer solutions. And again, because doubling mod 81 is also cyclic, we can fill an increasing proportion of the space with 54x54x54x54 cuboids (with exactly 1 in 54 integer solutions) as the pattern develops.

And we can extrapolate a similar pattern for any level n. Level 5 could be depicted by a growing series of the diagram above. Level 6 could be depicted by a growing series of the Level 5 diagram. And so on. In each case the volume can be filled by an increasing proportion of n-dimensional cuboids.

**Therefore the proportion of level n Collatz numbers in a large region tends towards 1 in 2(3^n)/3 of the outputs produced by the function.**

**Final thoughts**

I think this demonstrates why this method has an error smaller than a Poisson distribution. However this obviously wouldn't solve all the problems with the method, such as:

1. What happens when q(3) is larger than the 2^n it is being subtracted from? From exploring this, I think that the outputs tend down towards -1 without ever reaching it, which would mean this isn't actually a problem, and that the first integer output from a level must be > 0. I haven't tried to write out a proof for this yet, but it should be easy to do.

2. Clearly if we are trying to exactly 'count' all integer outputs, then an error term of even 1 in a trillion is too high. Though if the proportions do tend to 1 in 2, 1 in 6 etc, it does at least help support a statistical argument that the Collatz conjecture is 'probably true'.

With regards to a proof, the thing that interested me about this method was that it can perhaps be used thus: we estimate how many levels we would need to test to be certain we can 'fill' a particular region of integers (

**and we allow sufficient room for error in this estimation**). Then we show that we can always double this estimate each time we double the region. That's what I mean by a 'pigeonhole' process - not to prove that we must output each individual integer, but to show that if we assume we don't and that we could search every level without filling some of the regions, that leads to a contradiction.
I'd like to talk more about that later, but for now I hope the attempted proof above gets us through the question of whether the estimates produced will tend towards a limit?