Is this Intel i7 machine good for LTSpice?

On Tue, 11 Nov 2014 20:21:22 -0500, rickman <gnuarm@gmail.com> Gave us:

On 11/8/2014 2:14 AM, miso wrote:

The Xeon product line is all about stability. No overclocking. They use ECC,
which some say is slower. [I don't know.] If you are seriously going to do a
ram disk (dumb idea), you would want the ECC. For software RAID, you should
have ECC. I give Dell credit for at least using a Supermicro mobo, since
some of the Asus mobos don't use ECC correctly.

The bad news is RAM prices are up for some reason.

ECC *has* to be slower. It involves calculating check bits from the
word being stored and saving them. Then on the read all the bits are
calculated to see if there is an error and to correct it. That takes
some time on both the write and the read. It may not be a lot, but it
takes time.

It is hard calculated as the array is filled. It takes no additional
time, and no code is involved at the OS level. The access speed for
THAT RAM is exactly that. All overhead already considered. What you
can and or do run it at is what it runs at.

There is no, "this non-ECC such and such MHz RAM is faster than this
same speed rated ECC RAM because it has ECC delays".

Nope. Yer makin' shit up... again.
 
On 2014-11-12, DecadentLinuxUserNumeroUno <DLU1@DecadentLinuxUser.org> wrote:
On Tue, 11 Nov 2014 21:41:53 -0500, "Maynard A. Philbrook Jr."
jamie_ka1lpa@charter.net> Gave us:

Back in the 486 days, the majority of software that was being used
didn't support the FPU because it didn't exist on many PC's.

You mean the 386 days. The 486 had it integrated in.

No, only some of them did.

--
umop apisdn
 
On 12/11/2014 10:06, Jasen Betts wrote:
On 2014-11-12, DecadentLinuxUserNumeroUno <DLU1@DecadentLinuxUser.org> wrote:
On Tue, 11 Nov 2014 21:41:53 -0500, "Maynard A. Philbrook Jr."
jamie_ka1lpa@charter.net> Gave us:

Back in the 486 days, the majority of software that was being used
didn't support the FPU because it didn't exist on many PC's.

That isn't true though. Most software that was aiming for any kind of
serious performance and needed floating point had two versions one
linked with an emulator and another linked with inline code.

You mean the 386 days. The 486 had it integrated in.

No, only some of them did.

Disabled (possibly NBG) in the SX models.

Full spec i486 mostly had the FPU on chip as standard.

--
Regards,
Martin Brown
 
Den onsdag den 12. november 2014 18.01.49 UTC+1 skrev DecadentLinuxUserNumeroUno:
On 12 Nov 2014 10:06:12 GMT, Jasen Betts <jasen@xnet.co.nz> Gave us:

On 2014-11-12, DecadentLinuxUserNumeroUno <DLU1@DecadentLinuxUser.org> wrote:
On Tue, 11 Nov 2014 21:41:53 -0500, "Maynard A. Philbrook Jr."
jamie_ka1lpa@charter.net> Gave us:

Back in the 486 days, the majority of software that was being used
didn't support the FPU because it didn't exist on many PC's.

You mean the 386 days. The 486 had it integrated in.

No, only some of them did.

No. They ALL did. They were ALL made as "DXs" and those that failed
production testing ended up becoming a "new line" (the SX) if that was
the only thing not working. It was a way to recover die losses, which
were high at the time.

afaiu they started that way but eventually they made a new die without the FPU


-Lasse
 
On 12 Nov 2014 10:06:12 GMT, Jasen Betts <jasen@xnet.co.nz> Gave us:

On 2014-11-12, DecadentLinuxUserNumeroUno <DLU1@DecadentLinuxUser.org> wrote:
On Tue, 11 Nov 2014 21:41:53 -0500, "Maynard A. Philbrook Jr."
jamie_ka1lpa@charter.net> Gave us:

Back in the 486 days, the majority of software that was being used
didn't support the FPU because it didn't exist on many PC's.

You mean the 386 days. The 486 had it integrated in.

No, only some of them did.

No. They ALL did. They were ALL made as "DXs" and those that failed
production testing ended up becoming a "new line" (the SX) if that was
the only thing not working. It was a way to recover die losses, which
were high at the time.
 
In article <8al56a92sunoifnj2eq6rvqblsoqg8sdgh@4ax.com>, DLU1
@DecadentLinuxUser.org says...
On Tue, 11 Nov 2014 21:41:53 -0500, "Maynard A. Philbrook Jr."
jamie_ka1lpa@charter.net> Gave us:

Back in the 486 days, the majority of software that was being used
didn't support the FPU because it didn't exist on many PC's.


You mean the 386 days. The 486 had it integrated in.

Software authors were most definitely writing to use it.
Titles out at the time lacked, but that did not last long at all.

only the DX versions.. you seem to forgot the SXes


Jamie
 
On Thursday, 13 November 2014 14:46:00 UTC+11, rickman wrote:
On 11/11/2014 10:46 PM, Bill Sloman wrote:
On Wednesday, 12 November 2014 12:22:00 UTC+11, rickman wrote:
On 11/8/2014 2:14 AM, miso wrote:

<snip>

ECC *has* to be slower. It involves calculating check bits from the
word being stored and saving them. Then on the read all the bits are
calculated to see if there is an error and to correct it. That takes
some time on both the write and the read. It may not be a lot, but it
takes time.

It doesn't have to be significantly slower. The processes of creating the check bits, and of using them to calculate a corrected output can in principle be handled by look-up tables - which get a bit big - and in practice are handled by logic networks which are almost as fast.

Lol, I find it amusing that you think a lookup table is faster than
logic. A lookup table is a bunch of logic for doing the operation with
a fixed pattern of bits. Doing the same operation in discrete logic is
almost certainly faster and almost certainly much smaller.

Discrete logic is almost as old-fashioned as hydraulic logic.

In practice, either solution is going to be realised in programmable logic, and the look-up table is the version that uses most gates to get the lowest propagation delay, and "logic" is the approach that trades off fewer gates against longer propagation paths that make more choices.

For special cases, well-realised logic can be as fast as look up table - faster if you can exploit the reduced number of gates to sue bigger, faster gates to realise the relevant logic paths.
I don't know how you define "significantly slower". So I can't argue
that point.

It's a weasel phrase, introduced to set up exactly that response.

I did some searching and found a review of several benchmarks on memory
performance that found a small difference. The wikipedia page says
there is a speed penalty because of the additional logic, but that most
modern ECC has the controller in the CPU chip which at best hides the
delay. So depending on how you define "significantly slower", you can
say you are right.


The process of getting stuff out of memory is slower, because memory cells have to be tiny, so the electric charge involved is equally tiny.

The last time I looked - which was a very long time ago - the costs were in extra components, extra board area, extra pins, extra bus tracks and extra bus drivers. Extra propagation delay didn't really come into it.

I guess the last time I checked was longer ago.

I was looking at setting up a gigaword or so of random access memory to hold the lithographic data for a single layer of an integrated circuit.

It was to be accessed by variable aperture electron beam microfabricator, writing flashes of electrons onto the electron beam sensitive resist on a moving silicon wafer. The machine never got built, but we spent almost four million UK pounds on the project around 1985 and 1986.Cambridge Instruments had agreed to commercialise what Thompson-CFS had sold us as a protoptype machine, which turned out to be a proof-of-principle machine, which turned out to have to be redesigned in every detail, When we were sure that it was going to cost us almost as much to finish the project (which we could have afforded), and that it was going to tie up every engineer and programmer in the place for the next eighteen months (which we could afford) spent upwards of three million pounds buying our way out of our promises to finish the project.

The data buffer design study did get published

http://www.sciencedirect.com/science/article/pii/0167931787900293

J.P.Melot was a collaborator from the University of Bristol, with CERN experience, and totally brilliant, and Mike Penberth was my boss, who knew about stuff like hydraulic logic. He wasn't a particularly creative engineer, but he was utterly brilliant at working out why things weren't working, or not working right.

I worked on an array
processor with ECC memory controller in a separate chip. The ECC
happened in it's own clock cycle and so did not affect the clock
frequency, but that added latency to the memory access. However this
was a micro programmed machine so the algorithm anticipated all the
various delays to get the right data in the right place at the right
time. The ECC delay was incorporated into the operation of the machine.

That could be longer ago, but I doubt it. There were micro-programmed machines around in the mid-1980's but the people who used them tended to be very specialised number crunchers.

--
Bill Sloman, Sydney
 
On Wed, 12 Nov 2014 17:49:12 -0500, "Maynard A. Philbrook Jr."
<jamie_ka1lpa@charter.net> Gave us:

In article <8al56a92sunoifnj2eq6rvqblsoqg8sdgh@4ax.com>, DLU1
@DecadentLinuxUser.org says...

On Tue, 11 Nov 2014 21:41:53 -0500, "Maynard A. Philbrook Jr."
jamie_ka1lpa@charter.net> Gave us:

Back in the 486 days, the majority of software that was being used
didn't support the FPU because it didn't exist on many PC's.


You mean the 386 days. The 486 had it integrated in.

Software authors were most definitely writing to use it.
Titles out at the time lacked, but that did not last long at all.

only the DX versions.. you seem to forgot the SXes


Jamie

The SX started out as FAILED DXs.

Very late in life, they actually fabbed them deliberately.

Failed Die tested FPUs which had fully working x86 units otherwise are
what started it, and there was no reason for a dedicated artwork,
because the failures were now being used, and THAT was a direct profit
increase, and loss reduction.

YOU failed to read.
 
In article <tiq76a5219chvrg4qfem8tavpasj08t06a@4ax.com>, DLU1
@DecadentLinuxUser.org says...
On Wed, 12 Nov 2014 17:49:12 -0500, "Maynard A. Philbrook Jr."
jamie_ka1lpa@charter.net> Gave us:

In article <8al56a92sunoifnj2eq6rvqblsoqg8sdgh@4ax.com>, DLU1
@DecadentLinuxUser.org says...

On Tue, 11 Nov 2014 21:41:53 -0500, "Maynard A. Philbrook Jr."
jamie_ka1lpa@charter.net> Gave us:

Back in the 486 days, the majority of software that was being used
didn't support the FPU because it didn't exist on many PC's.


You mean the 386 days. The 486 had it integrated in.

Software authors were most definitely writing to use it.
Titles out at the time lacked, but that did not last long at all.

only the DX versions.. you seem to forgot the SXes


Jamie

The SX started out as FAILED DXs.

Very late in life, they actually fabbed them deliberately.

Failed Die tested FPUs which had fully working x86 units otherwise are
what started it, and there was no reason for a dedicated artwork,
because the failures were now being used, and THAT was a direct profit
increase, and loss reduction.

YOU failed to read.

No, you failed at trying to twist it around in your favor.

Not only that, you are simply now commenting from what others have
already said, before you.

It does not matter if they were failed or intentional. The fact remains
that a large amount of software did not force the use of a FPU, some of
it didn't even attempt to detour in software if there was one present.

Years ago I wrote a sat tracking program that optionally would switch
to the FPU if one was present, there was a speed up but it wasn't what I
called worth a fist full of money to get a CPU or add on FPU for it.

Jamie
 
On Wed, 12 Nov 2014 19:29:41 -0500, "Maynard A. Philbrook Jr."
<jamie_ka1lpa@charter.net> Gave us:

>No, you failed at trying to twist it around in your favor.

No. The first 486 fabs ALL had it, and the off-the-die failure rates
were so high that they were talking about what sort of loss that
represents. Then the engineering folks noted that the main body of
failures were just the FPU core, and that the units could be released as
a non-FPU included device, because the rest all worked.

A huge saving was realized. They eventually DID make an actual SX fab
line, but that was NOT the initial plan.

The initial plan was the selling boost that an integrated FPU core
meant to everyone, even the real estate frugal MOBO makers. They were
never going to look back.

I failed at nothing. Go READ it. It is the same as what I say here.
So what did I twist?
 
On Thu, 13 Nov 2014 09:19:27 +0200, upsidedown@downunder.com Gave us:

On Wed, 12 Nov 2014 20:10:07 -0800, josephkk
joseph_barrett@sbcglobal.net> wrote:

On Wed, 12 Nov 2014 08:30:43 +0200, upsidedown@downunder.com wrote:


The actual data bits can be stored as they arrive, calculating the
check bits take some time, but they can be written into memory cells
which will occur slightly later. With multiple write cycles in
succession, storing the check bits from the previous write can overlap
with writing the actual data bits of the next write.

Doing a partial memory word update, e.g. writing only a single byte
into a 64 bit (8 byte) memory word is costly, since first you have to
read the 7 unmodified bytes, combine the new byte, calculate the ECC
for 8 bytes and write 8 bytes+ECC or at least the new byte+full ECC.
With cache between the processor and main memory, memory writes should
use the full memory words, so this is not be an issue today.

Then on the read all the bits are
calculated to see if there is an error and to correct it.

These days the read returns correct results in a huge majority of
cases, so why not just send out the speculative data and after ECC
check declare it valid by a separate signal from memory to CPU.
However, if the ECC check fails, the memory needs to calculate the
correct data and indicate that the data word is now valid. Then the
memory must calculate the new correct data+ECC and store it into that
memory cell to deal with soft errors (i.e. flush the memory cell). Of
course, if there is a hard error, this does not help, since the
correction must be repeated on each read access to that cell, slowing
it up considerably.

You need to study up on Amdahl's law. It relates the frequency of any
event in the instruction and data sequences to the amount of speed impact
it has.

?-)


One should also remember that magnetic core as well as dynamic RAM
perform a destructive readout, so you have to perform a writeback
after each read cycle. For core, you only have to do that for the
actual read location (at the X and Y wire crossing), for dynamic RAM,
you have to write back the whole column (hundreds or thousands of
bits). For this reason, real access time (not using CAS multiplexing)
is much shorter as the full cycle time.

Putting the ECC logic into the writeback loop doesn't slow down the
_cycle_ time, as long as the ECC writeback is phase shifted from the
main data write back.

Of course, this does require that the ECC logic is on the same memory
chip, using ECC memory bits and logic on separate chips doesn't work.

For high radiation environment (if it makes sense to use DRAMs at
all), I would put the ECC into the writeback loop so that the memory
is flushed (ECC corrected) at every refresh as well as every read
access to a column. This will quickly detect single bit errors, which
are correctable, from entering into a multibit non-correctable error.

It sounds like a running ECC on each column string might be better
than byte, word, or actual string correction would. And achieve what
you said about getting single bit errors before they become monsters in
the datagrams.
 
On 11/12/2014 11:17 PM, Bill Sloman wrote:
On Thursday, 13 November 2014 14:46:00 UTC+11, rickman wrote:
On 11/11/2014 10:46 PM, Bill Sloman wrote:
On Wednesday, 12 November 2014 12:22:00 UTC+11, rickman wrote:
On 11/8/2014 2:14 AM, miso wrote:

snip

ECC *has* to be slower. It involves calculating check bits from the
word being stored and saving them. Then on the read all the bits are
calculated to see if there is an error and to correct it. That takes
some time on both the write and the read. It may not be a lot, but it
takes time.

It doesn't have to be significantly slower. The processes of creating the check bits, and of using them to calculate a corrected output can in principle be handled by look-up tables - which get a bit big - and in practice are handled by logic networks which are almost as fast.

Lol, I find it amusing that you think a lookup table is faster than
logic. A lookup table is a bunch of logic for doing the operation with
a fixed pattern of bits. Doing the same operation in discrete logic is
almost certainly faster and almost certainly much smaller.

Discrete logic is almost as old-fashioned as hydraulic logic.

In practice, either solution is going to be realised in programmable logic, and the look-up table is the version that uses most gates to get the lowest propagation delay, and "logic" is the approach that trades off fewer gates against longer propagation paths that make more choices.

You don't seem to understand, in logic more does not equal faster
delays. I can assure you that more logic is slower than less.

I'm not sure why you are bringing programmable logic into this. That is
a red herring.


I worked on an array
processor with ECC memory controller in a separate chip. The ECC
happened in it's own clock cycle and so did not affect the clock
frequency, but that added latency to the memory access. However this
was a micro programmed machine so the algorithm anticipated all the
various delays to get the right data in the right place at the right
time. The ECC delay was incorporated into the operation of the machine.

That could be longer ago, but I doubt it. There were micro-programmed machines around in the mid-1980's but the people who used them tended to be very specialised number crunchers.

It was in the late 80's. Star Technologies was a spin off from Floating
Point Systems. FPS decided the market wanted 64 bit floating point and
Star Tech was about speed at 32 bits. They provided a machine (two rack
cabinets) that did 100 MFLOPS... the second fastest floating point in
the world next to the Cray. This was before DSPs were terribly useful.
But it didn't last long. They pumped out a design that did 50 MFLOPS
in a single 9U rack which was incorporated into GE CAT scanners. They
nursed that design for continuing support for a long time. Ultimately
they folded without ever producing another viable design. The day of
the array processor was over.

--

Rick
 
On Wed, 12 Nov 2014 20:10:07 -0800, josephkk
<joseph_barrett@sbcglobal.net> Gave us:

On Wed, 12 Nov 2014 08:30:43 +0200, upsidedown@downunder.com wrote:


The actual data bits can be stored as they arrive, calculating the
check bits take some time, but they can be written into memory cells
which will occur slightly later. With multiple write cycles in
succession, storing the check bits from the previous write can overlap
with writing the actual data bits of the next write.

Doing a partial memory word update, e.g. writing only a single byte
into a 64 bit (8 byte) memory word is costly, since first you have to
read the 7 unmodified bytes, combine the new byte, calculate the ECC
for 8 bytes and write 8 bytes+ECC or at least the new byte+full ECC.
With cache between the processor and main memory, memory writes should
use the full memory words, so this is not be an issue today.

Then on the read all the bits are
calculated to see if there is an error and to correct it.

These days the read returns correct results in a huge majority of
cases, so why not just send out the speculative data and after ECC
check declare it valid by a separate signal from memory to CPU.
However, if the ECC check fails, the memory needs to calculate the
correct data and indicate that the data word is now valid. Then the
memory must calculate the new correct data+ECC and store it into that
memory cell to deal with soft errors (i.e. flush the memory cell). Of
course, if there is a hard error, this does not help, since the
correction must be repeated on each read access to that cell, slowing
it up considerably.

You need to study up on Amdahl's law. It relates the frequency of any
event in the instruction and data sequences to the amount of speed impact
it has.

?-)

The memory is tagged with the speed it runs at.

Two sets of two sticks.

One pair with and one without ECC

The MOBO requires an ECC setting for the RAM, so the check bits are
generated FOR the stick as part of the chipset's hand off methods for
the memory bus.

The timing tag declarations on both sticks are identical.

Both then run at identical speeds, and all this imaginary overhead you
all are quacking about is already taken into account and managed OUTSIDE
the speed with which the RAM sticks are being operated at and touted as
able to be run at.

The math is REAL simple. The MOBO pings each at the same rate. They
run at the same rate.

ZERO difference in a running machine because BOTH are accessed at
identical speeds and will benchmark that way too. Down at the nitty
gritty level, there is more taking place, but it does so WITHIN the
timing constraints of the declared access rate for the array.

Slower because more is being done? Maybe... down there at the nitty
gritty level. Nothing we see though.

OC them till they start failing, and you might see the ECC fail more
often tying to keep up. That would be the test right there. Spool up
the clock and watch the errors and error corrections start wading in.
Then talk about specific causation.
 
On Wed, 12 Nov 2014 20:10:07 -0800, josephkk
<joseph_barrett@sbcglobal.net> wrote:

On Wed, 12 Nov 2014 08:30:43 +0200, upsidedown@downunder.com wrote:


The actual data bits can be stored as they arrive, calculating the
check bits take some time, but they can be written into memory cells
which will occur slightly later. With multiple write cycles in
succession, storing the check bits from the previous write can overlap
with writing the actual data bits of the next write.

Doing a partial memory word update, e.g. writing only a single byte
into a 64 bit (8 byte) memory word is costly, since first you have to
read the 7 unmodified bytes, combine the new byte, calculate the ECC
for 8 bytes and write 8 bytes+ECC or at least the new byte+full ECC.
With cache between the processor and main memory, memory writes should
use the full memory words, so this is not be an issue today.

Then on the read all the bits are
calculated to see if there is an error and to correct it.

These days the read returns correct results in a huge majority of
cases, so why not just send out the speculative data and after ECC
check declare it valid by a separate signal from memory to CPU.
However, if the ECC check fails, the memory needs to calculate the
correct data and indicate that the data word is now valid. Then the
memory must calculate the new correct data+ECC and store it into that
memory cell to deal with soft errors (i.e. flush the memory cell). Of
course, if there is a hard error, this does not help, since the
correction must be repeated on each read access to that cell, slowing
it up considerably.

You need to study up on Amdahl's law. It relates the frequency of any
event in the instruction and data sequences to the amount of speed impact
it has.

?-)

One should also remember that magnetic core as well as dynamic RAM
perform a destructive readout, so you have to perform a writeback
after each read cycle. For core, you only have to do that for the
actual read location (at the X and Y wire crossing), for dynamic RAM,
you have to write back the whole column (hundreds or thousands of
bits). For this reason, real access time (not using CAS multiplexing)
is much shorter as the full cycle time.

Putting the ECC logic into the writeback loop doesn't slow down the
_cycle_ time, as long as the ECC writeback is phase shifted from the
main data write back.

Of course, this does require that the ECC logic is on the same memory
chip, using ECC memory bits and logic on separate chips doesn't work.

For high radiation environment (if it makes sense to use DRAMs at
all), I would put the ECC into the writeback loop so that the memory
is flushed (ECC corrected) at every refresh as well as every read
access to a column. This will quickly detect single bit errors, which
are correctable, from entering into a multibit non-correctable error.
 
On Wed, 12 Nov 2014 08:30:43 +0200, upsidedown@downunder.com wrote:

The actual data bits can be stored as they arrive, calculating the
check bits take some time, but they can be written into memory cells
which will occur slightly later. With multiple write cycles in
succession, storing the check bits from the previous write can overlap
with writing the actual data bits of the next write.

Doing a partial memory word update, e.g. writing only a single byte
into a 64 bit (8 byte) memory word is costly, since first you have to
read the 7 unmodified bytes, combine the new byte, calculate the ECC
for 8 bytes and write 8 bytes+ECC or at least the new byte+full ECC.
With cache between the processor and main memory, memory writes should
use the full memory words, so this is not be an issue today.

Then on the read all the bits are
calculated to see if there is an error and to correct it.

These days the read returns correct results in a huge majority of
cases, so why not just send out the speculative data and after ECC
check declare it valid by a separate signal from memory to CPU.
However, if the ECC check fails, the memory needs to calculate the
correct data and indicate that the data word is now valid. Then the
memory must calculate the new correct data+ECC and store it into that
memory cell to deal with soft errors (i.e. flush the memory cell). Of
course, if there is a hard error, this does not help, since the
correction must be repeated on each read access to that cell, slowing
it up considerably.

You need to study up on Amdahl's law. It relates the frequency of any
event in the instruction and data sequences to the amount of speed impact
it has.

?-)
 
On Wed, 12 Nov 2014 19:29:41 -0500, "Maynard A. Philbrook Jr."
<jamie_ka1lpa@charter.net> wrote:

It does not matter if they were failed or intentional. The fact remains
that a large amount of software did not force the use of a FPU, some of
it didn't even attempt to detour in software if there was one present.

Years ago I wrote a sat tracking program that optionally would switch
to the FPU if one was present, there was a speed up but it wasn't what I
called worth a fist full of money to get a CPU or add on FPU for it.

Jamie

The first program that i used that had a noticeable improvement with the
FPU was SPICE. There it made a huge difference. Similar applications had
the same kind of results.

?-)
 
On 11/11/2014 10:46 PM, Bill Sloman wrote:
On Wednesday, 12 November 2014 12:22:00 UTC+11, rickman wrote:
On 11/8/2014 2:14 AM, miso wrote:

The Xeon product line is all about stability. No overclocking. They use ECC,
which some say is slower. [I don't know.] If you are seriously going to do a
ram disk (dumb idea), you would want the ECC. For software RAID, you should
have ECC. I give Dell credit for at least using a Supermicro mobo, since
some of the Asus mobos don't use ECC correctly.

The bad news is RAM prices are up for some reason.

ECC *has* to be slower. It involves calculating check bits from the
word being stored and saving them. Then on the read all the bits are
calculated to see if there is an error and to correct it. That takes
some time on both the write and the read. It may not be a lot, but it
takes time.

It doesn't have to be significantly slower. The processes of creating the check bits, and of using them to calculate a corrected output can in principle be handled by look-up tables - which get a bit big - and in practice are handled by logic networks which are almost as fast.

Lol, I find it amusing that you think a lookup table is faster than
logic. A lookup table is a bunch of logic for doing the operation with
a fixed pattern of bits. Doing the same operation in discreet logic is
almost certainly faster and almost certainly much smaller.

I don't know how you define "significantly slower". So I can't argue
that point.

I did some searching and found a review of several benchmarks on memory
performance that found a small difference. The wikipedia page says
there is a speed penalty because of the additional logic, but that most
modern ECC has the controller in the CPU chip which at best hides the
delay. So depending on how you define "significantly slower", you can
say you are right.


The process of getting stuff out of memory is slower, because memory cells have to be tiny, so the electric charge involved is equally tiny.

The last time I looked - which was a very long time ago - the costs were in extra components, extra board area, extra pins, extra bus tracks and extra bus drivers. Extra propagation delay didn't really come into it.

I guess the last time I checked was longer ago. I worked on an array
processor with ECC memory controller in a separate chip. The ECC
happened in it's own clock cycle and so did not affect the clock
frequency, but that added latency to the memory access. However this
was a micro programmed machine so the algorithm anticipated all the
various delays to get the right data in the right place at the right
time. The ECC delay was incorporated into the operation of the machine.

--

Rick
 
On Tue, 11 Nov 2014 21:41:53 -0500, "Maynard A. Philbrook Jr."
<jamie_ka1lpa@charter.net> wrote:

In article <m3ubfh$fkq$1@dont-email.me>, gnuarm@gmail.com says...
If I understood that article correctly, they are discussing the
relative performance of the SIMD (vector processing) instruction
performance between vendors.

Unless your application (e.g. LTSpice) has been compiled with a good
compiler to use those SIMD extension instructions that article might
not be of much use.

At one time several years ago AMD had indeed a better non vectored FP
performance compared to Intel, but this situation changes with each
generation of chips.

The same is true for floating point performance. I recall when the 486
came out with built in floating point acceleration many benchmarks
didn't show an appreciable speed up on apps like spread sheets because
most of the program time was spent doing things other than the floating
point calculations. I believe most programs fit this category as well,
with the programs where enhanced floating point performance making a
real difference being in the small minority.

What you believe is false, mostly that is.

The 386/486 DX parts had internal core clock speed doublers. Later even
higher multipliers. The 386SX had a 16-bit data bus width.

Wow. The way i remember things going it that the 486 added the MMU (and
caches) first. Bringing the FPU onboard came later, maybe with the
Pentiums.
You should really do your home work before posting here.

Back in the 486 days, the majority of software that was being used
didn't support the FPU because it didn't exist on many PC's.

Then you had the option in the math libs to have it detect the presents
of a FPU and the lib would then call the FPU instructions. A lot of
DOS based programs also did this trick.

This detouring didn't help with the speed performance but it did show
a faster operating program when ever there was floating point math
involved.

It was only later when the Pentium's came along is when new software
and updates demanded that you have FPU support, because there
was no longer support for software FP, with the exception of some
custom forms of floating point that can only be supported in software.

Of course the first Pentium's had the problem of a bug in the FPU due
to some missing silicon. I wrote a command line app just to test
for that bug and found many PC's that had those CPU's in it.

I remember the FDIV bug rather well.

I am sure you knew all of this, correct?

Yeah sure.

Jamie
 
On Thursday, 13 November 2014 16:43:49 UTC+11, rickman wrote:
On 11/12/2014 11:17 PM, Bill Sloman wrote:
On Thursday, 13 November 2014 14:46:00 UTC+11, rickman wrote:
On 11/11/2014 10:46 PM, Bill Sloman wrote:
On Wednesday, 12 November 2014 12:22:00 UTC+11, rickman wrote:
On 11/8/2014 2:14 AM, miso wrote:

snip

ECC *has* to be slower. It involves calculating check bits from the
word being stored and saving them. Then on the read all the bits are
calculated to see if there is an error and to correct it. That takes
some time on both the write and the read. It may not be a lot, but it
takes time.

It doesn't have to be significantly slower. The processes of creating the check bits, and of using them to calculate a corrected output can in principle be handled by look-up tables - which get a bit big - and in practice are handled by logic networks which are almost as fast.

Lol, I find it amusing that you think a lookup table is faster than
logic. A lookup table is a bunch of logic for doing the operation with
a fixed pattern of bits. Doing the same operation in discrete logic is
almost certainly faster and almost certainly much smaller.

Discrete logic is almost as old-fashioned as hydraulic logic.

In practice, either solution is going to be realised in programmable logic, and the look-up table is the version that uses most gates to get the lowest propagation delay, and "logic" is the approach that trades off fewer gates against longer propagation paths that make more choices.

You don't seem to understand, in logic more does not equal faster
delays. I can assure you that more logic is slower than less.

If I build my extra logic with ECLinPS and you build yours with 74LS, this won't be true.

I'm not sure why you are bringing programmable logic into this. That is
a red herring.

If you can buy purpose-built ECC chips for you are particular choice of word length it's certainly going to be a red herring. If you have real world requirements that don't correspond to an application that buys more than 100,000 chips per year, you are going to realise most of your system in a programmable logic device.

I worked on an array
processor with ECC memory controller in a separate chip. The ECC
happened in it's own clock cycle and so did not affect the clock
frequency, but that added latency to the memory access. However this
was a micro programmed machine so the algorithm anticipated all the
various delays to get the right data in the right place at the right
time. The ECC delay was incorporated into the operation of the machine.

That could be longer ago, but I doubt it. There were micro-programmed machines around in the mid-1980's but the people who used them tended to be very specialised number crunchers.

It was in the late 80's. Star Technologies was a spin off from Floating
Point Systems. FPS decided the market wanted 64 bit floating point and
Star Tech was about speed at 32 bits. They provided a machine (two rack
cabinets) that did 100 MFLOPS... the second fastest floating point in
the world next to the Cray. This was before DSPs were terribly useful.
But it didn't last long. They pumped out a design that did 50 MFLOPS
in a single 9U rack which was incorporated into GE CAT scanners. They
nursed that design for continuing support for a long time. Ultimately
they folded without ever producing another viable design. The day of
the array processor was over.

When I applied for my job at EMI central research in 1975, one of the job interviews was with the guys who were building the number-crunching logic for the EMI body-scanner. I knew enough to ask them whether they were going to use AMD's TTL bit-slice components. or Motorola's ECL bit-slices.

At the time they hadn't made up their minds, but by the time I'd got the job (and got the security clearance that let me actually start work) they'd gone for the AMD parts. They weren't as fast, but they integrated bigger chunks of functionality. By the 1980's integrated circuits could integrate a lot more transistors, and bit-slices weren't all that interesting.

--
Bill Sloman, Sydney
 
On Thursday, 13 November 2014 18:26:18 UTC+11, DecadentLinuxUserNumeroUno wrote:
On Thu, 13 Nov 2014 09:19:27 +0200, upsidedown@downunder.com Gave us:

On Wed, 12 Nov 2014 20:10:07 -0800, josephkk
joseph_barrett@sbcglobal.net> wrote:

On Wed, 12 Nov 2014 08:30:43 +0200, upsidedown@downunder.com wrote:


The actual data bits can be stored as they arrive, calculating the
check bits take some time, but they can be written into memory cells
which will occur slightly later. With multiple write cycles in
succession, storing the check bits from the previous write can overlap
with writing the actual data bits of the next write.

Doing a partial memory word update, e.g. writing only a single byte
into a 64 bit (8 byte) memory word is costly, since first you have to
read the 7 unmodified bytes, combine the new byte, calculate the ECC
for 8 bytes and write 8 bytes+ECC or at least the new byte+full ECC.
With cache between the processor and main memory, memory writes should
use the full memory words, so this is not be an issue today.

Then on the read all the bits are
calculated to see if there is an error and to correct it.

These days the read returns correct results in a huge majority of
cases, so why not just send out the speculative data and after ECC
check declare it valid by a separate signal from memory to CPU.
However, if the ECC check fails, the memory needs to calculate the
correct data and indicate that the data word is now valid. Then the
memory must calculate the new correct data+ECC and store it into that
memory cell to deal with soft errors (i.e. flush the memory cell). Of
course, if there is a hard error, this does not help, since the
correction must be repeated on each read access to that cell, slowing
it up considerably.

You need to study up on Amdahl's law. It relates the frequency of any
event in the instruction and data sequences to the amount of speed impact
it has.

?-)


One should also remember that magnetic core as well as dynamic RAM
perform a destructive readout, so you have to perform a writeback
after each read cycle. For core, you only have to do that for the
actual read location (at the X and Y wire crossing), for dynamic RAM,
you have to write back the whole column (hundreds or thousands of
bits). For this reason, real access time (not using CAS multiplexing)
is much shorter as the full cycle time.

Putting the ECC logic into the writeback loop doesn't slow down the
_cycle_ time, as long as the ECC writeback is phase shifted from the
main data write back.

Of course, this does require that the ECC logic is on the same memory
chip, using ECC memory bits and logic on separate chips doesn't work.

For high radiation environment (if it makes sense to use DRAMs at
all), I would put the ECC into the writeback loop so that the memory
is flushed (ECC corrected) at every refresh as well as every read
access to a column. This will quickly detect single bit errors, which
are correctable, from entering into a multibit non-correctable error.

It sounds like a running ECC on each column string might be better
than byte, word, or actual string correction would. And achieve what
you said about getting single bit errors before they become monsters in
the datagrams.

ECC correction makes more sense for longer words. 64-bit words were a sweet spot, because they could be error detected and corrected with an eight-bit check word.

Packet-switched networks detected and corrected on whole packets, with even longer check words.

--
Bill Sloman, Sydney
 

Welcome to EDABoard.com

Sponsor

Back
Top