How long does it take to fill up an array prior to sorting?...

Kevin Simonson · Nov 21, 2021

Most sorting algorithms I\'ve noticed seem to have an interface somewhat like this:

void someAlgorithm ( elemType[] elements);

So to implement this algorithm an application needs to fill the {elements} array, call the {someAlgorithm()} algorithm, and then read out the (sorted) elements of {elements}. For an {elements} object that contains n {elemType}s, how long does it take to fill that array? Is the most efficient way to implement a loop like this:

elemType[] elems = new elemType[ n];
int ix;
for (ix = 0; ix < n; ix++)
{ elems[ ix] = produceElem();
}
someAlgorithm( elems);

and then is the most efficient way to read the results of the array something like this:

for (ix = 0; ix < n; ix++)
{ consumeElem( elems[ ix]);
}

This would amount to, for each element, incrementing {ix}, comparing it to {n}, and then actually sending the produced {elemType} to the right position in the array (for filling the array), and then, for each element, incrementing {ix}, comparing it to {n}, and then reading the {elemType} in the corresponding position in the array (for emptying the array). So n+n+n = 3n instructions for each stage, and each instruction requires at least one clock cycle to fetch the instruction, at least one clock cycle to interpret the instruction, and at least one clock cycle to execute the instruction. So that\'s at least 9n clock cycles to fill the array and at least 9n clock cycles to empty it after calling {someAlgorithm()}.

I\'ve heard that some scientists have found ways to quickly move large blocks of data from one place on disk to another. Does that speed this process up, make it less than the 9n clock cycles I\'ve postulated above?

I\'ve got my eye on sorting of large amounts of data that\'s actually done in the industry. Do organizations that on a regular basis sort large amounts of data typically take 9n clock cycles to fill up the array, or is there a quicker way to do it that\'s in general use?

David Brown · Nov 21, 2021

On 21/06/2021 23:07, Kevin Simonson wrote:

David Brown: \"Can I assume, due to your \'.edu\' address, that you are
a student?\"

Close. I\'ve been admitted into the Senior Citizen Audit Program at
Utah Valley University. I turn 62 in September, so I will start
classes this Fall. I could just ask my questions in class, but my
curiousity is killing me, and it would be nice if I could get it
resolved as soon as possible.

You are never too old to be a student! What sort of courses are you
doing that raise questions like these?

My background is actually in Computer Science; I got a BS degree in
1987 and an MS degree in 1994, both at the University of Washington.
At this late date my interests are turning to Electrical Engineering;
hence my attempt to take classes at UVU.

If you have a background in computer science, then surely you know
already that counting cycles like that is completely meaningless in most
situations?

\"And you are posting to a group that is almost totally and utterly
unrelated to the question.\"

Can you tell me which group I should be posting this to?

Have you any experience with Usenet? You are using google\'s interface -
that\'s okay for searching archives, but an extremely poor interface for
posting or reading regularly. Amongst other things, it fails to follow
or encourage Usenet standards for posting, quoting and attributing. And
these days it suffers more from google\'s idiotic \"throw out the baby
with the bathwater\" attitude to censorship - they let people use their
own interface to post spam, conspiracy theories, abuse, etc., and then
they block the entire group because it has been spreading such posts.
(I\'m not anti-google at all - merely anti-stupid.)

If you want to use Usenet, get an account at a proper newserver
(news.eternal-september.org is easy and free), and a proper newsreader
(Thunderbird is fine if you don\'t have any other preferences).

comp.arch.fpga is about programmable logic. Yes, some people here might
make cpu designs, but it is not about software algorithms.

comp.lang.c++ is about C++ (I\'m guessing that\'s what you are using here,
but the snippets are not long enough to be sure). But that\'s more about
the language than algorithms.

comp.programming is a general programming group. It is dominated by a
couple of unpleasant characters who make huge numbers of nonsense posts
- so you need to filter them out. The others in the group don\'t start
threads very often, but there are plenty who are ready to contribute to
interesting threads started by others. It is worth a try.

But it might well be that there are other places better suited to
answering your questions these days. I like the Usenet format, and
think it is far more efficient than web-based forums. But the fact is
that there are orders of magnitude more people using things like Stack
Exchange than there are on Usenet.

\"Filling or handling an array of data scales as O in the size of
the data. Sorting, for the most part, scales as O(n log n).\"

That\'s true for single threaded sorters. It turns out that massively
parallel quicksort can get it down to O( (log n)^2). But quicksort
still requires the array to be totally filled before the algorithm
starts, doesn\'t it? So it seems the amount of time it takes to enter
the elements into the array becomes relevant.

There are a wide variety of sorting algorithms, with different types of
parallelism and different trade-offs, limitations and strengths. And
this is combined with an even wider variety of misconceptions,
misunderstandings, and a failure to distinguish theoretical calculations
from practical reality.

A parallel sorting algorithm run on N processors cannot achieve greater
speedup than a factor of N. Theoretical speeds of O(logÂ² n) and the
like are upper bounds when there is no limit to the number of
processors. /Sometimes/ you have a sorting task where you can use
massive arrays of simple elements with the aim of minimal latency or
maximum throughput when sorting large numbers of small arrays - but that
is going to be very rare.

In reality, sorting comes down to the speed of the data I/O - the
memory, and the caches. Algorithms that are cache-friendly beat those
that are not, even if they look a little slower in the theoretical
calculations. And they are /always/ slower than O

, even when you
have data suitable for a bucket sort, at least when you have sizes big
enough for the efficiency to be important.

David Brown · Nov 21, 2021

On 21/06/2021 23:07, Kevin Simonson wrote:

David Brown: \"Can I assume, due to your \'.edu\' address, that you are
a student?\"

Close. I\'ve been admitted into the Senior Citizen Audit Program at
Utah Valley University. I turn 62 in September, so I will start
classes this Fall. I could just ask my questions in class, but my
curiousity is killing me, and it would be nice if I could get it
resolved as soon as possible.

You are never too old to be a student! What sort of courses are you
doing that raise questions like these?

My background is actually in Computer Science; I got a BS degree in
1987 and an MS degree in 1994, both at the University of Washington.
At this late date my interests are turning to Electrical Engineering;
hence my attempt to take classes at UVU.

If you have a background in computer science, then surely you know
already that counting cycles like that is completely meaningless in most
situations?

\"And you are posting to a group that is almost totally and utterly
unrelated to the question.\"

Can you tell me which group I should be posting this to?

Have you any experience with Usenet? You are using google\'s interface -
that\'s okay for searching archives, but an extremely poor interface for
posting or reading regularly. Amongst other things, it fails to follow
or encourage Usenet standards for posting, quoting and attributing. And
these days it suffers more from google\'s idiotic \"throw out the baby
with the bathwater\" attitude to censorship - they let people use their
own interface to post spam, conspiracy theories, abuse, etc., and then
they block the entire group because it has been spreading such posts.
(I\'m not anti-google at all - merely anti-stupid.)

If you want to use Usenet, get an account at a proper newserver
(news.eternal-september.org is easy and free), and a proper newsreader
(Thunderbird is fine if you don\'t have any other preferences).

comp.arch.fpga is about programmable logic. Yes, some people here might
make cpu designs, but it is not about software algorithms.

comp.lang.c++ is about C++ (I\'m guessing that\'s what you are using here,
but the snippets are not long enough to be sure). But that\'s more about
the language than algorithms.

comp.programming is a general programming group. It is dominated by a
couple of unpleasant characters who make huge numbers of nonsense posts
- so you need to filter them out. The others in the group don\'t start
threads very often, but there are plenty who are ready to contribute to
interesting threads started by others. It is worth a try.

But it might well be that there are other places better suited to
answering your questions these days. I like the Usenet format, and
think it is far more efficient than web-based forums. But the fact is
that there are orders of magnitude more people using things like Stack
Exchange than there are on Usenet.

\"Filling or handling an array of data scales as O in the size of
the data. Sorting, for the most part, scales as O(n log n).\"

That\'s true for single threaded sorters. It turns out that massively
parallel quicksort can get it down to O( (log n)^2). But quicksort
still requires the array to be totally filled before the algorithm
starts, doesn\'t it? So it seems the amount of time it takes to enter
the elements into the array becomes relevant.

There are a wide variety of sorting algorithms, with different types of
parallelism and different trade-offs, limitations and strengths. And
this is combined with an even wider variety of misconceptions,
misunderstandings, and a failure to distinguish theoretical calculations
from practical reality.

A parallel sorting algorithm run on N processors cannot achieve greater
speedup than a factor of N. Theoretical speeds of O(logÂ² n) and the
like are upper bounds when there is no limit to the number of
processors. /Sometimes/ you have a sorting task where you can use
massive arrays of simple elements with the aim of minimal latency or
maximum throughput when sorting large numbers of small arrays - but that
is going to be very rare.

In reality, sorting comes down to the speed of the data I/O - the
memory, and the caches. Algorithms that are cache-friendly beat those
that are not, even if they look a little slower in the theoretical
calculations. And they are /always/ slower than O

, even when you
have data suitable for a bucket sort, at least when you have sizes big
enough for the efficiency to be important.

David Brown · Nov 21, 2021

On 21/06/2021 23:07, Kevin Simonson wrote:

David Brown: \"Can I assume, due to your \'.edu\' address, that you are
a student?\"

Close. I\'ve been admitted into the Senior Citizen Audit Program at
Utah Valley University. I turn 62 in September, so I will start
classes this Fall. I could just ask my questions in class, but my
curiousity is killing me, and it would be nice if I could get it
resolved as soon as possible.

You are never too old to be a student! What sort of courses are you
doing that raise questions like these?

My background is actually in Computer Science; I got a BS degree in
1987 and an MS degree in 1994, both at the University of Washington.
At this late date my interests are turning to Electrical Engineering;
hence my attempt to take classes at UVU.

If you have a background in computer science, then surely you know
already that counting cycles like that is completely meaningless in most
situations?

\"And you are posting to a group that is almost totally and utterly
unrelated to the question.\"

Can you tell me which group I should be posting this to?

Have you any experience with Usenet? You are using google\'s interface -
that\'s okay for searching archives, but an extremely poor interface for
posting or reading regularly. Amongst other things, it fails to follow
or encourage Usenet standards for posting, quoting and attributing. And
these days it suffers more from google\'s idiotic \"throw out the baby
with the bathwater\" attitude to censorship - they let people use their
own interface to post spam, conspiracy theories, abuse, etc., and then
they block the entire group because it has been spreading such posts.
(I\'m not anti-google at all - merely anti-stupid.)

If you want to use Usenet, get an account at a proper newserver
(news.eternal-september.org is easy and free), and a proper newsreader
(Thunderbird is fine if you don\'t have any other preferences).

comp.arch.fpga is about programmable logic. Yes, some people here might
make cpu designs, but it is not about software algorithms.

comp.lang.c++ is about C++ (I\'m guessing that\'s what you are using here,
but the snippets are not long enough to be sure). But that\'s more about
the language than algorithms.

comp.programming is a general programming group. It is dominated by a
couple of unpleasant characters who make huge numbers of nonsense posts
- so you need to filter them out. The others in the group don\'t start
threads very often, but there are plenty who are ready to contribute to
interesting threads started by others. It is worth a try.

But it might well be that there are other places better suited to
answering your questions these days. I like the Usenet format, and
think it is far more efficient than web-based forums. But the fact is
that there are orders of magnitude more people using things like Stack
Exchange than there are on Usenet.

\"Filling or handling an array of data scales as O in the size of
the data. Sorting, for the most part, scales as O(n log n).\"

That\'s true for single threaded sorters. It turns out that massively
parallel quicksort can get it down to O( (log n)^2). But quicksort
still requires the array to be totally filled before the algorithm
starts, doesn\'t it? So it seems the amount of time it takes to enter
the elements into the array becomes relevant.

There are a wide variety of sorting algorithms, with different types of
parallelism and different trade-offs, limitations and strengths. And
this is combined with an even wider variety of misconceptions,
misunderstandings, and a failure to distinguish theoretical calculations
from practical reality.

A parallel sorting algorithm run on N processors cannot achieve greater
speedup than a factor of N. Theoretical speeds of O(logÂ² n) and the
like are upper bounds when there is no limit to the number of
processors. /Sometimes/ you have a sorting task where you can use
massive arrays of simple elements with the aim of minimal latency or
maximum throughput when sorting large numbers of small arrays - but that
is going to be very rare.

In reality, sorting comes down to the speed of the data I/O - the
memory, and the caches. Algorithms that are cache-friendly beat those
that are not, even if they look a little slower in the theoretical
calculations. And they are /always/ slower than O

, even when you
have data suitable for a bucket sort, at least when you have sizes big
enough for the efficiency to be important.

gnuarm.del...@gmail.com · Nov 21, 2021