Uniformly Distributed Random Numbers and Random Bitstreams

1 Comment

To many people the problem that is discussed in this article is self-evident or at least irrelevant. However, I do have a teaser that got me thinking on this problem.

The task

Generate series of randomly distributed positive integers that have a determined maximum value and can be used with the TXOR algorithm in the role of a key.

First Thought

Create a bitstream by remixing (whatever way, interlacing randomly and XOR-ing) some zip-files and video files and random generator output and then cut pieces of the bitstream that all have the length of the number that represents the maximum number that is used for the key in the TXOR algorithm.

The Teaser

I generated the following bitstream by literally throwing a flat object (a pencil eraser):

1000110001111010

I started to wonder, if I have any kind of a practical chance to generate a uniformly distributed set of 16-bit (or 32-bit or 64-bit or 1000-bit) numbers by concatenating uniformly distributed random bits. The catch is that in that kind of a number series any number greater than or equal to 0 and smaller than 2^maxnumberofbits must have the same probability of being generated, but according to the real-life coin-throwing experiments it seems intuitively almost impossible to generate a truly random bitstream that has 30 zeros or 30 ones in a row. So I suspected that bitstream tokens that have 90% of its digits either zeros or ones in a row might be generated with a different probability than bitstream tokens that consist of about 50% zeros and 50% in any order.

If that's the case, the series of uniformly distributed random numbers should generally not be generated from series of uniformly distributed random bits, because "uniform distribution" means that all numbers have the same chance of being present, being generated.

As with any statement, one of the easiest way to prove that a statement does not hold is to find at least one example, where the statement does not hold. On the other hand, my current task of generating uniformly distributed series of numbers for the keys of the TXOR algorithm demands that the random bitstream tokenization gives series that is "uniform enough". The "uniform enough" part can be found out by a simplistic experiment.

The Experiment

The 2013_09_13_bitstream_experiment_v2.rb took about 2 minutes to run on machine that has a 3GHz CPU.

The Results of the Experiment


------------------------------------------------------------
Experiment stage 0:

Test results: i_0==1001266   i_1==998734
              i_0/i_1 ~ 1.00254


------------------------------------------------------------
Experiment stage 1:

                     i_number_of_numbers==3145728
      i_max_n_of_bits_in_a_single_number==16
    i_histogram_columns_number_parameter==3


Histogram:

    | 393791 | 393340 | 393490 | 392559 | 393125 | 392820 | 393931 | 392665 | 

------------------------------------------------------------
Experiment stage 2:

                     i_number_of_numbers==3145728
      i_max_n_of_bits_in_a_single_number==16
    i_histogram_columns_number_parameter==16

  ar_hist[0] == 55
  ar_hist[16384] == 46
  ar_hist[32768] == 40
  ar_hist[49152] == 41
  ar_hist[65535] == 48


According to the experiment it is possible to generate "uniformly enough" distributed random number series that have a maximum of N bits by cutting tokens of N bits from a bitstream that has been assembled by concatenation of uniformly distributed random bits.

Honestly, if I compare that statement with the bitstream that I generated by a flat, coin-like, object, then I find the result un-intuitive.

After thinking about it a little bit (more), it, in a way, makes sense, because

generation_probability(101)

is

generation_probability(1, first_digit)*
generation_probability(0, second_digit)*
generation_probability(1, third_digit)

and the contemplation would not change if in stead of the 101 there were 000 or 111 and the product were an induction base.

May be one way to put it is that the greater the number of bits in the maximum number, the less amount of those "all zeros in a row" and "all ones in a row" tokens is needed. It seems to be an interesting scalability property that might provide a parallel based solution to some other problems.

Wade

http://whatsapphackenx.wordpress.com/

2014-05-31 10:32 am

Comments are closed for this post