The context of the current blog post is that the Intel and classical FPGA vendors offer CPU-s that are at the same die with an FPGA. FPGA based acceleration has moved from hardware development domain into the software development domain. As one of the most classic theoretical computer science algorithm to study is sorting, an FPGA based sorting acceleration seems relevant. If hardware can sort an array of numbers that has a fixed size of 256, then at most the last 8 recursion steps of the quicksort algorithm might be skipped.

*The image can be seen in greater detail by clicking on it.*

The array length at the illustration is 4=k.

The variable x_3 takes 3=(4-1) comparator blocks.

The variable x_2 takes 2=(4-2) comparator blocks.

The variable x_1 takes 1=(4-3) comparator blocks.

The variable x_0 takes 0=(4-4) comparator blocks.

An array of length k=4 takes (k-1)+(k-2)+(k-3)+(k-k)=sum_comp comparator blocks.

The classical proof for finding a sum of 1+2+3+4+...+n=classical_sum is

classical_sum = (1 + 2 + 3 + 4 + ... + n) classical_sum_double = classical_sum * 2 = = classical_sum + classical_sum classical_sum_double = = 1 + 2 + 3 + 4 + ... + n + + n + (n-1) + (n-2) + (n-3) + ... + 1 = = n*(n+1) classical_sum = classical_sum_double/2 = = n*(n+1)/2

The form of the sum_comp is very similar to the classical_sum:

sum_comp = (k-1) + (k-2) + ... + (k-k) = = (k-1) + (k-2) + ... + 0 = = (k-1) + (k-2) + ... + 1 + (0+0+0+...+0) z=(k-1) sum_comp = (k-1) + (k-2) + ... + 1 = = z + (z-1) + ... + 1

According to the formula of the classical_sum

sum_comp = z + (z-1) + ... + 1 = = z*(z+1)/2 = = (k-1)*((k-1)+1)/2 = = (k-1)*(k-1+1)/2 = = (k-1)*(k)/2 = = k*(k-1)/2

Therefore, an array that has the length of 256 has (256*(256-1)/2)=32640 comparator blocks.

An array with the length of 128 has (128*(128-1)/2)=8128 comparator blocks.

An array with the length of 64 has (64*(64-1)/2)=2016 comparator blocks.

An array with the length of 32 has (32*(32-1)/2)=496 comparator blocks.

An array with the length of 16 has (16*(16-1)/2)=120 comparator blocks.

An array with the length of 8 has (8*(8-1)/2)=28 comparator blocks.

32 is 2 at the power of 5.

I got the idea from a sorting page*(archival copy)* of the Prof. Dr. Hans Werner Lang*(archival copy)*. The Konstantin Tretyakov has also a slightly similar, but better, blog post *(archival copy)*.