Work on Idea #152

Squares Problem Python Script and Early Results 

In the Squares Problem, you have a table with cells that can either assume the value "1" or "0", randomly, given a certain probability p. The goal is to study the distribution of the contiguous areas of ones, an area that have neighbors on the directions up, down, right or left. Cells are not considered contiguous if they are neighbors in the diagonal.

FIGURE 1. Example of a Table.

I tackled the Squares Problem a couple of times, on the computer, not having much luck at first. Eventually, I devised a brute force python script that does indeed calculate the area distribution of a randomly generated table. The script is available for download the bottom of this page.

[After inquiring ChatGPT and walking him into the problem, I found out that what I did was essentially an un-optimized version of a depth-first algorithm].

First, the script generates the table, using random numbers. Then, for each cell with a one, it tries to move to the four directions, and keeps doing so, until reaching the boundaries of the contiguous areas. Whenever it does a successful movement, it tries to move again, in a cascade algorithm.

However it does so at an enormous computational effort, and it is highly inefficient for large tables. Despite its limitations, it was able to show some curious aspect of the problem.

Let's analyze two results of the python script:

The image on the left shows the tables, with the ones and zeros. The Excel table on the right tabulates the calculated area distribution, counting how many areas have a single one, two ones and so forth. There are no areas with more than 11 cells on both runs. As it can be seen, although the area distribution is not exactly the same, it is roughly the same. The proportion of white/black cells for a given probability, in the entire table, remains the same, approximately equal to p.  

This semi-predictability, that even random results may show similar area distributions, was one of the first reasons why I decided to investigate this problem more closely. 

Since the table is generated randomly, it is of little use an individual run. It makes more sense to run a few times and average the results of the area distribution.

FIGURE 2. Averaged area distribution for 100 tables of side 50, with p = 0.5. 

As it can be seen in the plot above, the most common areas is an isolated one, with a single cell (counted as 85.3 separate areas). With area 2, two cells, the count of areas drop to 22.35. With 3 cells, 13.18. Four cells, 9.27. Five cells, 7.04. It is easy to see that this resembles an hyperbolic function, sharply decreasing. 

We can also change the probability p and run it a few times, to see how the results change:


FIGURE 3. Area distribution for several values of p.

The graph above shows a few results, but it only shows the initial interval of area values for clarity reasons. Isolated areas are much more common than areas with say, an area above 10 cell units. This is valid for a wide variety of probability values. 

For really small values of p, however, there are a lot of ones on the board, so they start forming huge clusters. This forms a tale on the curve above, highly irregular, made up of these big clusters. 

One could say that, as you decrease p, this curve swings from a curve that drops sharply at low area values to another curve that raises sharply on high area values. They do not reach the same vertical height however, as a single area with many cells can account for a large portion of the table. The position of the tale also varies with p, as expected.

FIGURE 4. Area distribution tale for low values of the probability p. 

It seems very likely that these results can be fitted into statistical distributions, however my knowledge in this matter is rather limited.

DOWNLOAD LINKS:

area_distribuition_v1_5.py

The file above contains the script that calculates area distribution for a table. It outputs several files, one for each run, with tables containing calculated area distribution for the random table. it is known to be much faster for larger values of p, since there are less ones on those tables.

merge_v1.py

The file above is a simple script that average several area distribution files from different runs.

You should run first the area_distribution_v_1_5.py file and then the merge_v1.py script.

BANNER IMAGE CREDITS: ESA/Hubble & NASA, A. Filippenko, R. Jansen 

Want to know more about this image? Follow this external link.