The Mathworks is offering more than 1600 dB of attenuation...

S

Simon S Aysdie

Guest
https://www.mathworks.com/help/rf/ug/richards-kuroda-workflow-for-rf-filter-circuit.html

Those plots go to -1800 dB. What an incredible toolbox.
 
On Friday, January 7, 2022 at 10:29:41 AM UTC+11, Simon S Aysdie wrote:
https://www.mathworks.com/help/rf/ug/richards-kuroda-workflow-for-rf-filter-circuit.html

Those plots go to -1800 dB. What an incredible toolbox.

Somebody hasn\'t heard of rounding errors.

--
Bill Sloman, Sydney
 
On 06/01/2022 23:29, Simon S Aysdie wrote:
https://www.mathworks.com/help/rf/ug/richards-kuroda-workflow-for-rf-filter-circuit.html

Those plots go to -1800 dB. What an incredible toolbox.

That is perfectly possible depending on luck. The smallest numbers that
can arise in floating point are O(10^-308) which is -6160 dB
(actually denorms can go even smaller 10^-323 with lost precision)

They are not at all meaningful beyond about 10^-17 or so allowing for
the typical 53 bit mantissa and 64 bit intermediate results.

Realistically any plot going beyond -320dB is into rounding error noise.

You can occasionality get -inf if the computation produces exact zero.

I defend against it by adding 1e-20 which is different enough from the
nearest real non-denormalised answer 2.2e-16 to be obvious and doesn\'t
corrupt the output dataset in ways that disrupt further processing.

That happens sometimes in my high precision calculations for easier
problems with a near analytic solution. It is a bit annoying since it
causes discontinuities in otherwise smooth residual error curves.


--
Regards,
Martin Brown
 
On Friday, January 7, 2022 at 7:52:41 PM UTC+11, Martin Brown wrote:
On 06/01/2022 23:29, Simon S Aysdie wrote:
https://www.mathworks.com/help/rf/ug/richards-kuroda-workflow-for-rf-filter-circuit.html

Those plots go to -1800 dB. What an incredible toolbox.

That is perfectly possible depending on luck. The smallest numbers that
can arise in floating point are O(10^-308) which is -6160 dB
(actually denorms can go even smaller 10^-323 with lost precision) .

But if your calculation depends on you subtracting a couple of larger numbers, rounding errors can stack up quite fast.

If you organise the process very carefully, you may be able to avoid this particular problem, but toolboxes don\'t lend themselves to that.

<snip>

--
Bill Sloman, Sydney
 
On 2022-01-07 09:52, Martin Brown wrote:
On 06/01/2022 23:29, Simon S Aysdie wrote:
https://www.mathworks.com/help/rf/ug/richards-kuroda-workflow-for-rf-filter-circuit.html



Those plots go to -1800 dB. What an incredible toolbox.

That is perfectly possible depending on luck. The smallest numbers
that can arise in floating point are O(10^-308) which is -6160 dB
(actually denorms can go even smaller 10^-323 with lost precision)

They are not at all meaningful beyond about 10^-17 or so [...]

It\'s a joke! Excepting some special circumstances, you
don\'t usually care what happens below -80dB or so.

Jeroen Belleman
 
Martin Brown wrote:
On 06/01/2022 23:29, Simon S Aysdie wrote:
https://www.mathworks.com/help/rf/ug/richards-kuroda-workflow-for-rf-filter-circuit.html


Those plots go to -1800 dB. What an incredible toolbox.

That is perfectly possible depending on luck. The smallest numbers that
can arise in floating point are O(10^-308) which is -6160 dB
(actually denorms can go even smaller 10^-323 with lost precision)

They are not at all meaningful beyond about 10^-17 or so allowing for
the typical 53 bit mantissa and 64 bit intermediate results.

Realistically any plot going beyond -320dB is into rounding error noise.

You can occasionality get -inf if the computation produces exact zero.

I defend against it by adding 1e-20 which is different enough from the
nearest real non-denormalised answer 2.2e-16 to be obvious and doesn\'t
corrupt the output dataset in ways that disrupt further processing.

That happens sometimes in my high precision calculations for easier
problems with a near analytic solution. It is a bit annoying since it
causes discontinuities in otherwise smooth residual error curves.
Denormals are a huge pain. Nice enough in theory, of course--why throw
away information you could keep?

The problem is that it\'s a waste of good silicon to make such marginal
creatures fast, so they aren\'t.

In early versions of my clusterized FDTD simulator, the run time was
usually dominated by rounding error causing the simulation domain to
fill up with denormals until the actual simulated fields got to all the
corners.

I couldn\'t fix it by adding a DC offset, because that would go away
completely in two half steps, so I filled all the field arrays with very
low-level random noise. Sped some simulations up by 100x.

At the time I was using the Intel C++ compiler, which didn\'t have an
option for flush-to-zero on underflow.

Cheers

Phil Hobbs

--
Dr Philip C D Hobbs
Principal Consultant
ElectroOptical Innovations LLC / Hobbs ElectroOptics
Optics, Electro-optics, Photonics, Analog Electronics
Briarcliff Manor NY 10510

http://electrooptical.net
http://hobbs-eo.com
 
On Fri, 7 Jan 2022 08:52:34 +0000, Martin Brown
<\'\'\'newspam\'\'\'@nonad.co.uk> wrote:

On 06/01/2022 23:29, Simon S Aysdie wrote:
https://www.mathworks.com/help/rf/ug/richards-kuroda-workflow-for-rf-filter-circuit.html

Those plots go to -1800 dB. What an incredible toolbox.

That is perfectly possible depending on luck. The smallest numbers that
can arise in floating point are O(10^-308) which is -6160 dB
(actually denorms can go even smaller 10^-323 with lost precision)

They are not at all meaningful beyond about 10^-17 or so allowing for
the typical 53 bit mantissa and 64 bit intermediate results.

Realistically any plot going beyond -320dB is into rounding error noise.

You can occasionality get -inf if the computation produces exact zero.

I defend against it by adding 1e-20 which is different enough from the
nearest real non-denormalised answer 2.2e-16 to be obvious and doesn\'t
corrupt the output dataset in ways that disrupt further processing.

That happens sometimes in my high precision calculations for easier
problems with a near analytic solution. It is a bit annoying since it
causes discontinuities in otherwise smooth residual error curves.

I do much the same.

In the analog parts (remember them?) of the implementation, it\'s
uncommon to achieve more than maybe 100 dB of isolation, and 80 dB is
more common.

Joe Gwinn
 
On Friday, January 7, 2022 at 2:46:44 AM UTC-8, Jeroen Belleman wrote:
On 2022-01-07 09:52, Martin Brown wrote:
On 06/01/2022 23:29, Simon S Aysdie wrote:
https://www.mathworks.com/help/rf/ug/richards-kuroda-workflow-for-rf-filter-circuit.html



Those plots go to -1800 dB. What an incredible toolbox.

That is perfectly possible depending on luck. The smallest numbers
that can arise in floating point are O(10^-308) which is -6160 dB
(actually denorms can go even smaller 10^-323 with lost precision)

They are not at all meaningful beyond about 10^-17 or so [...]

It\'s a joke! Excepting some special circumstances, you
don\'t usually care what happens below -80dB or so.

10 of 10. That\'s what I thought. I don\'t see the value of these -1800 dB plots and I\'m not generally suggesting throwing away significant digits in computations. (After all, the polynomials in filter synthesis work can be notoriously ill-conditioned, to put a fuzz-ball term on it.)

In realized physical terms, even -60 dB can (sometimes) be challenging for RF/micro/mm work.

I\'d clip the plot before someone laughed at me.
 
On 07/01/2022 15:43, Phil Hobbs wrote:
Martin Brown wrote:
On 06/01/2022 23:29, Simon S Aysdie wrote:
https://www.mathworks.com/help/rf/ug/richards-kuroda-workflow-for-rf-filter-circuit.html


Those plots go to -1800 dB. What an incredible toolbox.

That is perfectly possible depending on luck. The smallest numbers
that can arise in floating point are O(10^-308) which is -6160 dB
(actually denorms can go even smaller 10^-323 with lost precision)

They are not at all meaningful beyond about 10^-17 or so allowing for
the typical 53 bit mantissa and 64 bit intermediate results.

Realistically any plot going beyond -320dB is into rounding error noise.

You can occasionality get -inf if the computation produces exact zero.

I defend against it by adding 1e-20 which is different enough from the
nearest real non-denormalised answer 2.2e-16 to be obvious and doesn\'t
corrupt the output dataset in ways that disrupt further processing.

That happens sometimes in my high precision calculations for easier
problems with a near analytic solution. It is a bit annoying since it
causes discontinuities in otherwise smooth residual error curves.


Denormals are a huge pain.  Nice enough in theory, of course--why throw
away information you could keep?

The problem is that it\'s a waste of good silicon to make such marginal
creatures fast, so they aren\'t.

Tell me about it. One of my early contributions to that game was
noticing that a particular astrophysical plasma simulation was spending
all its runtime in interrupts handling denorm underflows. A couple of
orders of magnitude speed improvement was very welcome. It needed
rescaling to safer territory x ~ h^2/c^3 is just asking for trouble in
single precision (it was a fluid dynamics code).

The thing I am working on at the moment involves powers of tan(x)^(2^N)
in the range -pi to pi. It gets quite hairy for even modest N and falls
over completely for N>5. I have a cunning fix that makes it work for any
N < 8 but by then it is almost all rounding error anyway.

In early versions of my clusterized FDTD simulator, the run time was
usually dominated by rounding error causing the simulation domain to
fill up with denormals until the actual simulated fields got to all the
corners.

They are sometimes better than having it hit zero (though not always).

I couldn\'t fix it by adding a DC offset, because that would go away
completely in two half steps, so I filled all the field arrays with very
low-level random noise.  Sped some simulations up by 100x.

A bit of judicious random noise can work wonders on breaking degeneracy.

At the time I was using the Intel C++ compiler, which didn\'t have an
option for flush-to-zero on underflow.

I\'ve been very impressed with the latest MSC 2019 code generator on some
of my stuff. It somehow groks tediously complex higher order difference
correctors in a way that no other compiler can match. A lucky
combination of out of order and speculative execution makes some things
run much faster with SEE, inlining and full optimisation all permitted.

2nd order Newton-Raphson and 3rd order Halley are almost the same
execution time now and the 4th order one just 10% slower. That\'s quite a
bonus when the functions f(x) and its derivatives being evaluated are
S-L--O---W.

In one instance a single apparently harmless line to protect against a
rounding error giving an impossible answer had an effective execution
time of -100 cycles because it prevented a pipeline stall. I could
hardly believe it so I double checked the same code with and without.

if (E<M) E=E+pi;

The really weird thing is that the branch statement is almost never
taken except when M is extremely close to pi, but doing the comparison
somehow gives the CPU enough recovery time to run so much faster.

I\'m slowly collecting a library of short code fragments that give
different optimising compilers and certain Intel CPU\'s trouble.

--
Regards,
Martin Brown
 
On Friday, January 7, 2022 at 5:33:34 AM UTC-5, bill....@ieee.org wrote:
On Friday, January 7, 2022 at 7:52:41 PM UTC+11, Martin Brown wrote:
On 06/01/2022 23:29, Simon S Aysdie wrote:
https://www.mathworks.com/help/rf/ug/richards-kuroda-workflow-for-rf-filter-circuit.html

Those plots go to -1800 dB. What an incredible toolbox.

That is perfectly possible depending on luck. The smallest numbers that
can arise in floating point are O(10^-308) which is -6160 dB
(actually denorms can go even smaller 10^-323 with lost precision) .

But if your calculation depends on you subtracting a couple of larger numbers, rounding errors can stack up quite fast.

If you organise the process very carefully, you may be able to avoid this particular problem, but toolboxes don\'t lend themselves to that.

So the reported number is too high? That may well be the calculated attenuation of a filter with a node at that frequency and no resistive component or some similar calculation where a slight rounding error will knock the point just off the node.

Reminds me of FFT results that a couple of techs were looking at that would counterintuitive. I don\'t recall the details but they were looking at different smoothing functions. They were being confused by the details in the curve being plotted because in one case it looked worse when it should have been better, but they weren\'t paying careful attention to the numbers on the axis, just the \"look\" of the curve! Because various minima were much lower, the scale factor changed making the shape of the curve flatter.

The devil is in the details. You have to know what to expect and understand how the data is being presented.

--

Rick C.

- Get 1,000 miles of free Supercharging
- Tesla referral code - https://ts.la/richard11209
 
On Sat, 8 Jan 2022 11:19:43 +0000, Martin Brown
<\'\'\'newspam\'\'\'@nonad.co.uk> wrote:

On 07/01/2022 15:43, Phil Hobbs wrote:
Martin Brown wrote:
On 06/01/2022 23:29, Simon S Aysdie wrote:
https://www.mathworks.com/help/rf/ug/richards-kuroda-workflow-for-rf-filter-circuit.html


Those plots go to -1800 dB. What an incredible toolbox.

That is perfectly possible depending on luck. The smallest numbers
that can arise in floating point are O(10^-308) which is -6160 dB
(actually denorms can go even smaller 10^-323 with lost precision)

They are not at all meaningful beyond about 10^-17 or so allowing for
the typical 53 bit mantissa and 64 bit intermediate results.

Realistically any plot going beyond -320dB is into rounding error noise.

You can occasionality get -inf if the computation produces exact zero.

I defend against it by adding 1e-20 which is different enough from the
nearest real non-denormalised answer 2.2e-16 to be obvious and doesn\'t
corrupt the output dataset in ways that disrupt further processing.

That happens sometimes in my high precision calculations for easier
problems with a near analytic solution. It is a bit annoying since it
causes discontinuities in otherwise smooth residual error curves.


Denormals are a huge pain.  Nice enough in theory, of course--why throw
away information you could keep?

The problem is that it\'s a waste of good silicon to make such marginal
creatures fast, so they aren\'t.

Tell me about it. One of my early contributions to that game was
noticing that a particular astrophysical plasma simulation was spending
all its runtime in interrupts handling denorm underflows. A couple of
orders of magnitude speed improvement was very welcome. It needed
rescaling to safer territory x ~ h^2/c^3 is just asking for trouble in
single precision (it was a fluid dynamics code).

The thing I am working on at the moment involves powers of tan(x)^(2^N)
in the range -pi to pi. It gets quite hairy for even modest N and falls
over completely for N>5. I have a cunning fix that makes it work for any
N < 8 but by then it is almost all rounding error anyway.

In early versions of my clusterized FDTD simulator, the run time was
usually dominated by rounding error causing the simulation domain to
fill up with denormals until the actual simulated fields got to all the
corners.

They are sometimes better than having it hit zero (though not always).

I couldn\'t fix it by adding a DC offset, because that would go away
completely in two half steps, so I filled all the field arrays with very
low-level random noise.  Sped some simulations up by 100x.

A bit of judicious random noise can work wonders on breaking degeneracy.

At the time I was using the Intel C++ compiler, which didn\'t have an
option for flush-to-zero on underflow.

I\'ve been very impressed with the latest MSC 2019 code generator on some
of my stuff. It somehow groks tediously complex higher order difference
correctors in a way that no other compiler can match. A lucky
combination of out of order and speculative execution makes some things
run much faster with SEE, inlining and full optimisation all permitted.

What are \"MSC 2019 code generator\" and \"SEE\"?


2nd order Newton-Raphson and 3rd order Halley are almost the same
execution time now and the 4th order one just 10% slower. That\'s quite a
bonus when the functions f(x) and its derivatives being evaluated are
S-L--O---W.

In one instance a single apparently harmless line to protect against a
rounding error giving an impossible answer had an effective execution
time of -100 cycles because it prevented a pipeline stall. I could
hardly believe it so I double checked the same code with and without.

if (E<M) E=E+pi;

The really weird thing is that the branch statement is almost never
taken except when M is extremely close to pi, but doing the comparison
somehow gives the CPU enough recovery time to run so much faster.

I\'m slowly collecting a library of short code fragments that give
different optimising compilers and certain Intel CPU\'s trouble.

I\'m assuming that you are coding in C here.

I have one similar surprise to report, but in MatLab:

The behavior of a megawatt power system for a shipboard radar was
modeled in Simulink (integrated with MatLab). This was circa 2000.
The simulations ran very slowly, but none of us thought much about it,
for lack of a comparison.

One day, I was working with the Mathematician who was running the
simulation, and idly watching the usual stream of sim-in-progress
messages roll by as we talked, and saw a message that I did not
recognize or understand. Turned out, those messages were relatively
common, but never really noticed in the blather from the sim.

Now curious, I dug into that message. -Saga omitted- It turns out
that the simulation was coded (by us the users) in such a way that the
solver was forced to solve an implicit equation at each solution time
step in a large system of coupled ODEs. So, instead of one or two big
matrix operations per step, it was one or two hundred operations per
step. Ouch! But why?

The presence of implicit forms was a byproduct of using a block
diagram and line language to describe the power system being modeled,
being programmed by placing standard blocks and connecting them with
standard connection lines on the computer screen. But what made sense
and looked simple on the screen was anything but under the covers.

Redesigning and recoding the simulation yielded a 100x speedup.

Joe Gwinn
 
On 08/01/2022 17:25, Joe Gwinn wrote:
On Sat, 8 Jan 2022 11:19:43 +0000, Martin Brown
\'\'\'newspam\'\'\'@nonad.co.uk> wrote:

I\'ve been very impressed with the latest MSC 2019 code generator on some
of my stuff. It somehow groks tediously complex higher order difference
correctors in a way that no other compiler can match. A lucky
combination of out of order and speculative execution makes some things
run much faster with SEE, inlining and full optimisation all permitted.

What are \"MSC 2019 code generator\" and \"SEE\"?

MS C/C++ compiler under Visual Studio and \"SEE\" (sic) is a typo for SSE
(the extended floating point registers on modern Intel CPUs)
/SSE2 works best for me but YMMV.

Even so the compiler sometimes generates hybrid code with the x87 still
being used for some parts but not all of the computations.

MSC 2019 appears to have been replaced by 2022 so yet another compiler
to check my code against to see if there are any more improvements.

https://visualstudio.microsoft.com/downloads/

One of the tricky quirks I know about is that sincos can either speed
things up or slow them down. It depends on whether the result gets used
before it has been computed (pipeline stalls are hellishly expensive).

I\'m assuming that you are coding in C here.

I have one similar surprise to report, but in MatLab:

The behavior of a megawatt power system for a shipboard radar was
modeled in Simulink (integrated with MatLab). This was circa 2000.
The simulations ran very slowly, but none of us thought much about it,
for lack of a comparison.

One day, I was working with the Mathematician who was running the
simulation, and idly watching the usual stream of sim-in-progress
messages roll by as we talked, and saw a message that I did not
recognize or understand. Turned out, those messages were relatively
common, but never really noticed in the blather from the sim.

Now curious, I dug into that message. -Saga omitted- It turns out
that the simulation was coded (by us the users) in such a way that the
solver was forced to solve an implicit equation at each solution time
step in a large system of coupled ODEs. So, instead of one or two big
matrix operations per step, it was one or two hundred operations per
step. Ouch! But why?

The presence of implicit forms was a byproduct of using a block
diagram and line language to describe the power system being modeled,
being programmed by placing standard blocks and connecting them with
standard connection lines on the computer screen. But what made sense
and looked simple on the screen was anything but under the covers.

Redesigning and recoding the simulation yielded a 100x speedup.

The classic one when I was at university was that inevitably a new
graduate student would grind the Starlink VAX to a standstill by
transposing what was then a big image of 512x512 with nested loops.

x[i,j] = x[j,i]

Generating roughly a quarter of a million page faults in the process.

There were libraries with algorithms we had sweated blood over to do
this efficiently with the minimum possible number of page faults.

--
Regards,
Martin Brown
 
Martin Brown wrote:
On 07/01/2022 15:43, Phil Hobbs wrote:
Martin Brown wrote:
On 06/01/2022 23:29, Simon S Aysdie wrote:
https://www.mathworks.com/help/rf/ug/richards-kuroda-workflow-for-rf-filter-circuit.html



Those plots go to -1800 dB. What an incredible toolbox.

That is perfectly possible depending on luck. The smallest
numbers that can arise in floating point are O(10^-308) which is
-6160 dB (actually denorms can go even smaller 10^-323 with lost
precision)

They are not at all meaningful beyond about 10^-17 or so allowing
for the typical 53 bit mantissa and 64 bit intermediate results.

Realistically any plot going beyond -320dB is into rounding error
noise.

You can occasionality get -inf if the computation produces exact
zero.

I defend against it by adding 1e-20 which is different enough
from the nearest real non-denormalised answer 2.2e-16 to be
obvious and doesn\'t corrupt the output dataset in ways that
disrupt further processing.

That happens sometimes in my high precision calculations for
easier problems with a near analytic solution. It is a bit
annoying since it causes discontinuities in otherwise smooth
residual error curves.


Denormals are a huge pain. Nice enough in theory, of course--why
throw away information you could keep?

The problem is that it\'s a waste of good silicon to make such
marginal creatures fast, so they aren\'t.

Tell me about it. One of my early contributions to that game was
noticing that a particular astrophysical plasma simulation was
spending all its runtime in interrupts handling denorm underflows. A
couple of orders of magnitude speed improvement was very welcome. It
needed rescaling to safer territory x ~ h^2/c^3 is just asking for
trouble in single precision (it was a fluid dynamics code).

The thing I am working on at the moment involves powers of
tan(x)^(2^N) in the range -pi to pi. It gets quite hairy for even
modest N and falls over completely for N>5. I have a cunning fix that
makes it work for any N < 8 but by then it is almost all rounding
error anyway.

Hairy? With a mere 32nd order pole at each end? Surely not!

In early versions of my clusterized FDTD simulator, the run time
was usually dominated by rounding error causing the simulation
domain to fill up with denormals until the actual simulated fields
got to all the corners.

They are sometimes better than having it hit zero (though not
always).

I couldn\'t fix it by adding a DC offset, because that would go away
completely in two half steps, so I filled all the field arrays
with very low-level random noise. Sped some simulations up by
100x.

A bit of judicious random noise can work wonders on breaking
degeneracy.

At the time I was using the Intel C++ compiler, which didn\'t have
an option for flush-to-zero on underflow.

I\'ve been very impressed with the latest MSC 2019 code generator on
some of my stuff. It somehow groks tediously complex higher order
difference correctors in a way that no other compiler can match. A
lucky combination of out of order and speculative execution makes
some things run much faster with SEE, inlining and full optimisation
all permitted.

Yeah, the reason I was using Intel is that it vectorized complicated
stuff the other compilers (MSVC++ and gcc) wouldn\'t touch.

Getting the curl equations to vectorize is the key to making FDTD fast
on modern hardware.

2nd order Newton-Raphson and 3rd order Halley are almost the same
execution time now and the 4th order one just 10% slower. That\'s
quite a bonus when the functions f(x) and its derivatives being
evaluated are S-L--O---W.

Yup. High order methods are the ticket for smooth expensive functions.

Doing good metals with FDTD requires using an auxiliary differential
equation for the electric polarization because the Yee (normal FDTD)
updating equations are unstable when n < k, or equivalently when
Re(epsilon) goes negative.

In the IR, copper, silver, and gold have epsilons that are essentially
large negative real numbers. At DC such a material is impossible,
because it would spontaneously turn to lava.

In one instance a single apparently harmless line to protect against
a rounding error giving an impossible answer had an effective
execution time of -100 cycles because it prevented a pipeline stall.
I could hardly believe it so I double checked the same code with and
without.

if (E<M) E=E+pi;

The really weird thing is that the branch statement is almost never
taken except when M is extremely close to pi, but doing the
comparison somehow gives the CPU enough recovery time to run so much
faster.

I suspect it gives the compiler permission to do some more daring
optimizations by preventing that error.

I\'m slowly collecting a library of short code fragments that give
different optimising compilers and certain Intel CPU\'s trouble.

I\'d be interested to see it!

Cheers

Phil Hobbs

--
Dr Philip C D Hobbs
Principal Consultant
ElectroOptical Innovations LLC / Hobbs ElectroOptics
Optics, Electro-optics, Photonics, Analog Electronics
Briarcliff Manor NY 10510

http://electrooptical.net
http://hobbs-eo.com
 
Joe Gwinn wrote:
On Sat, 8 Jan 2022 11:19:43 +0000, Martin Brown
\'\'\'newspam\'\'\'@nonad.co.uk> wrote:

On 07/01/2022 15:43, Phil Hobbs wrote:
Martin Brown wrote:
On 06/01/2022 23:29, Simon S Aysdie wrote:
https://www.mathworks.com/help/rf/ug/richards-kuroda-workflow-for-rf-filter-circuit.html


Those plots go to -1800 dB. What an incredible toolbox.

That is perfectly possible depending on luck. The smallest numbers
that can arise in floating point are O(10^-308) which is -6160 dB
(actually denorms can go even smaller 10^-323 with lost precision)

They are not at all meaningful beyond about 10^-17 or so allowing for
the typical 53 bit mantissa and 64 bit intermediate results.

Realistically any plot going beyond -320dB is into rounding error noise.

You can occasionality get -inf if the computation produces exact zero.

I defend against it by adding 1e-20 which is different enough from the
nearest real non-denormalised answer 2.2e-16 to be obvious and doesn\'t
corrupt the output dataset in ways that disrupt further processing.

That happens sometimes in my high precision calculations for easier
problems with a near analytic solution. It is a bit annoying since it
causes discontinuities in otherwise smooth residual error curves.


Denormals are a huge pain.  Nice enough in theory, of course--why throw
away information you could keep?

The problem is that it\'s a waste of good silicon to make such marginal
creatures fast, so they aren\'t.

Tell me about it. One of my early contributions to that game was
noticing that a particular astrophysical plasma simulation was spending
all its runtime in interrupts handling denorm underflows. A couple of
orders of magnitude speed improvement was very welcome. It needed
rescaling to safer territory x ~ h^2/c^3 is just asking for trouble in
single precision (it was a fluid dynamics code).

The thing I am working on at the moment involves powers of tan(x)^(2^N)
in the range -pi to pi. It gets quite hairy for even modest N and falls
over completely for N>5. I have a cunning fix that makes it work for any
N < 8 but by then it is almost all rounding error anyway.

In early versions of my clusterized FDTD simulator, the run time was
usually dominated by rounding error causing the simulation domain to
fill up with denormals until the actual simulated fields got to all the
corners.

They are sometimes better than having it hit zero (though not always).

I couldn\'t fix it by adding a DC offset, because that would go away
completely in two half steps, so I filled all the field arrays with very
low-level random noise.  Sped some simulations up by 100x.

A bit of judicious random noise can work wonders on breaking degeneracy.

At the time I was using the Intel C++ compiler, which didn\'t have an
option for flush-to-zero on underflow.

I\'ve been very impressed with the latest MSC 2019 code generator on some
of my stuff. It somehow groks tediously complex higher order difference
correctors in a way that no other compiler can match. A lucky
combination of out of order and speculative execution makes some things
run much faster with SEE, inlining and full optimisation all permitted.

What are \"MSC 2019 code generator\" and \"SEE\"?


2nd order Newton-Raphson and 3rd order Halley are almost the same
execution time now and the 4th order one just 10% slower. That\'s quite a
bonus when the functions f(x) and its derivatives being evaluated are
S-L--O---W.

In one instance a single apparently harmless line to protect against a
rounding error giving an impossible answer had an effective execution
time of -100 cycles because it prevented a pipeline stall. I could
hardly believe it so I double checked the same code with and without.

if (E<M) E=E+pi;

The really weird thing is that the branch statement is almost never
taken except when M is extremely close to pi, but doing the comparison
somehow gives the CPU enough recovery time to run so much faster.

I\'m slowly collecting a library of short code fragments that give
different optimising compilers and certain Intel CPU\'s trouble.

I\'m assuming that you are coding in C here.

I have one similar surprise to report, but in MatLab:

The behavior of a megawatt power system for a shipboard radar was
modeled in Simulink (integrated with MatLab). This was circa 2000.
The simulations ran very slowly, but none of us thought much about it,
for lack of a comparison.

One day, I was working with the Mathematician who was running the
simulation, and idly watching the usual stream of sim-in-progress
messages roll by as we talked, and saw a message that I did not
recognize or understand. Turned out, those messages were relatively
common, but never really noticed in the blather from the sim.

Now curious, I dug into that message. -Saga omitted- It turns out
that the simulation was coded (by us the users) in such a way that the
solver was forced to solve an implicit equation at each solution time
step in a large system of coupled ODEs. So, instead of one or two big
matrix operations per step, it was one or two hundred operations per
step. Ouch! But why?

The presence of implicit forms was a byproduct of using a block
diagram and line language to describe the power system being modeled,
being programmed by placing standard blocks and connecting them with
standard connection lines on the computer screen. But what made sense
and looked simple on the screen was anything but under the covers.

Redesigning and recoding the simulation yielded a 100x speedup.

Joe Gwinn

As one of my old colleagues used to say, \"Ah, Labview--spaghetti code
that even _looks_ like spaghetti.\"

Cheers

Phil Hobbs


--
Dr Philip C D Hobbs
Principal Consultant
ElectroOptical Innovations LLC / Hobbs ElectroOptics
Optics, Electro-optics, Photonics, Analog Electronics
Briarcliff Manor NY 10510

http://electrooptical.net
http://hobbs-eo.com
 
On 1/8/2022 10:55 AM, Martin Brown wrote:
The classic one when I was at university was that inevitably a new graduate
student would grind the Starlink VAX to a standstill by transposing what was
then a big image of 512x512 with nested loops.

x[i,j] = x[j,i]

You don\'t even need to be transposing a large object to see this
problem. Just initializing an N-dimensional array is fraught
with pitfalls for a \"programmer\" (who likely has only knowledge
of the language and not the hardware, OS, etc.).

for (i = 0; i < MAX; i++)
for (j = 0; j , MAX; j++)
foo[i,j] = 0;

vs.

for (i = 0; i < MAX; i++)
for (j = 0; j , MAX; j++)
foo[j,i] = 0;

[In any of the equivalent forms}

will stump most \"programmers\". Much the same as:

double x, y;
....
if (x == y)...

This is why \"programmers\" are such a drain on the industry\'s reputation.

[And, apparently, more and more diploma mills try to reduce things
to this level of simplification so their graduates can join the
workforce with some (dubious) credentials]

Generating roughly a quarter of a million page faults in the process.

There were libraries with algorithms we had sweated blood over to do this
efficiently with the minimum possible number of page faults.

A more insidious issue is effectively exploiting caches. Esp when you
are writing for untargeted hardware (sizes of data vary, alignment
constraints, I vs D, etc.)

I spend a good deal of time carefully considering the layout of
data elements in my RTOS with an eye to allowing the hardware to
benefit from all of that extra silicon instead of making
\"unfortunate\" design choices that render it useless.

[Again, someone who knows a \"language\" is largely clueless as
to how it actually maps into instruction/data fetches on any
particular target. OTOH, languages are supposed to free the
casual developer from those details so why fault them for
doing so? Folks who are expert in the field will have the extra
know-how to rise above those limitations -- in theory! :> ]

I cringe whenever I look at the script behind a web page.
Obviously, some *thing* generated it cuz a human wouldn\'t
be that clueless/generic.
 
On Sat, 8 Jan 2022 13:03:37 -0500, Phil Hobbs
<pcdhSpamMeSenseless@electrooptical.net> wrote:

Martin Brown wrote:
On 07/01/2022 15:43, Phil Hobbs wrote:
Martin Brown wrote:
On 06/01/2022 23:29, Simon S Aysdie wrote:
https://www.mathworks.com/help/rf/ug/richards-kuroda-workflow-for-rf-filter-circuit.html



Those plots go to -1800 dB. What an incredible toolbox.

That is perfectly possible depending on luck. The smallest
numbers that can arise in floating point are O(10^-308) which is
-6160 dB (actually denorms can go even smaller 10^-323 with lost
precision)

They are not at all meaningful beyond about 10^-17 or so allowing
for the typical 53 bit mantissa and 64 bit intermediate results.

Realistically any plot going beyond -320dB is into rounding error
noise.

You can occasionality get -inf if the computation produces exact
zero.

I defend against it by adding 1e-20 which is different enough
from the nearest real non-denormalised answer 2.2e-16 to be
obvious and doesn\'t corrupt the output dataset in ways that
disrupt further processing.

That happens sometimes in my high precision calculations for
easier problems with a near analytic solution. It is a bit
annoying since it causes discontinuities in otherwise smooth
residual error curves.


Denormals are a huge pain. Nice enough in theory, of course--why
throw away information you could keep?

The problem is that it\'s a waste of good silicon to make such
marginal creatures fast, so they aren\'t.

Tell me about it. One of my early contributions to that game was
noticing that a particular astrophysical plasma simulation was
spending all its runtime in interrupts handling denorm underflows. A
couple of orders of magnitude speed improvement was very welcome. It
needed rescaling to safer territory x ~ h^2/c^3 is just asking for
trouble in single precision (it was a fluid dynamics code).

The thing I am working on at the moment involves powers of
tan(x)^(2^N) in the range -pi to pi. It gets quite hairy for even
modest N and falls over completely for N>5. I have a cunning fix that
makes it work for any N < 8 but by then it is almost all rounding
error anyway.

Hairy? With a mere 32nd order pole at each end? Surely not!

In early versions of my clusterized FDTD simulator, the run time
was usually dominated by rounding error causing the simulation
domain to fill up with denormals until the actual simulated fields
got to all the corners.

They are sometimes better than having it hit zero (though not
always).

I couldn\'t fix it by adding a DC offset, because that would go away
completely in two half steps, so I filled all the field arrays
with very low-level random noise. Sped some simulations up by
100x.

A bit of judicious random noise can work wonders on breaking
degeneracy.

At the time I was using the Intel C++ compiler, which didn\'t have
an option for flush-to-zero on underflow.

I\'ve been very impressed with the latest MSC 2019 code generator on
some of my stuff. It somehow groks tediously complex higher order
difference correctors in a way that no other compiler can match. A
lucky combination of out of order and speculative execution makes
some things run much faster with SEE, inlining and full optimisation
all permitted.

Yeah, the reason I was using Intel is that it vectorized complicated
stuff the other compilers (MSVC++ and gcc) wouldn\'t touch.

Getting the curl equations to vectorize is the key to making FDTD fast
on modern hardware.


2nd order Newton-Raphson and 3rd order Halley are almost the same
execution time now and the 4th order one just 10% slower. That\'s
quite a bonus when the functions f(x) and its derivatives being
evaluated are S-L--O---W.

Yup. High order methods are the ticket for smooth expensive functions.

Doing good metals with FDTD requires using an auxiliary differential
equation for the electric polarization because the Yee (normal FDTD)
updating equations are unstable when n < k, or equivalently when
Re(epsilon) goes negative.

In the IR, copper, silver, and gold have epsilons that are essentially
large negative real numbers. At DC such a material is impossible,
because it would spontaneously turn to lava.

In one instance a single apparently harmless line to protect against
a rounding error giving an impossible answer had an effective
execution time of -100 cycles because it prevented a pipeline stall.
I could hardly believe it so I double checked the same code with and
without.

if (E<M) E=E+pi;

The really weird thing is that the branch statement is almost never
taken except when M is extremely close to pi, but doing the
comparison somehow gives the CPU enough recovery time to run so much
faster.

I suspect it gives the compiler permission to do some more daring
optimizations by preventing that error.

I\'m slowly collecting a library of short code fragments that give
different optimising compilers and certain Intel CPU\'s trouble.


I\'d be interested to see it!

As would I.

Joe Gwinn
 
On 1/8/22 4:28 PM, Don Y wrote:
On 1/8/2022 10:55 AM, Martin Brown wrote:
The classic one when I was at university was that inevitably a new
graduate student would grind the Starlink VAX to a standstill by
transposing what was then a big image of 512x512 with nested loops.

x[i,j] = x[j,i]

You don\'t even need to be transposing a large object to see this
problem.  Just initializing an N-dimensional array is fraught
with pitfalls for a \"programmer\" (who likely has only knowledge
of the language and not the hardware, OS, etc.).

for (i = 0; i < MAX; i++)
  for (j = 0; j , MAX; j++)
    foo[i,j] = 0;

vs.

for (i = 0; i < MAX; i++)
  for (j = 0; j , MAX; j++)
    foo[j,i] = 0;

[In any of the equivalent forms}

And C and FORTRAN store multi-dimensional arrays is opposite order, so
translating one to the other can cause cache problems.

will stump most \"programmers\".  Much the same as:

double x, y;
...
if (x == y)...

I once had a colleague come to me when he had this sort of \"failure\". He
pointed out the the numbers in the computations that lead up this were
the same on both paths leading to the compare. This was on a Power
processor, so I went over the generated code with him. On one path there
was a multiply-add instruction, on the other the operations were far
enough apart that they were done separately. The difference between one
and two rounding errors was enough to cause the compare to fail.

This is why \"programmers\" are such a drain on the industry\'s reputation.

[And, apparently, more and more diploma mills try to reduce things
to this level of simplification so their graduates can join the
workforce with some (dubious) credentials]
 
On 1/8/2022 6:10 PM, Dennis wrote:
On 1/8/22 4:28 PM, Don Y wrote:
On 1/8/2022 10:55 AM, Martin Brown wrote:
The classic one when I was at university was that inevitably a new graduate
student would grind the Starlink VAX to a standstill by transposing what was
then a big image of 512x512 with nested loops.

x[i,j] = x[j,i]

You don\'t even need to be transposing a large object to see this
problem. Just initializing an N-dimensional array is fraught
with pitfalls for a \"programmer\" (who likely has only knowledge
of the language and not the hardware, OS, etc.).

for (i = 0; i < MAX; i++)
for (j = 0; j , MAX; j++)
foo[i,j] = 0;

vs.

for (i = 0; i < MAX; i++)
for (j = 0; j , MAX; j++)
foo[j,i] = 0;

[In any of the equivalent forms}

And C and FORTRAN store multi-dimensional arrays is opposite order, so
translating one to the other can cause cache problems.

It boils down to being clueless as to what is happening IN THE
HARDWARE -- which also depends on what the intervening software
(OS) interjects. There are no \"real\" virtual machines, despite
what the \"language manuals\" lead you to believe (to make
presenting the language more palatable)

will stump most \"programmers\". Much the same as:

double x, y;
...
if (x == y)...

I once had a colleague come to me when he had this sort of \"failure\". He
pointed out the the numbers in the computations that lead up this were the same
on both paths leading to the compare. This was on a Power processor, so I went
over the generated code with him. On one path there was a multiply-add
instruction, on the other the operations were far enough apart that they were
done separately. The difference between one and two rounding errors was enough
to cause the compare to fail.

Or, if extra bits of precision aren\'t stored/restored from the FPU
during an inopportune context switch, etc.

Trying to explain cancellation to someone who was taught that
the two roots of a quadratic were readily obtainable from the
\"quadratic formula\" will always devolve into a concrete
example -- as folks can\'t visualize the finite nature of
the hardware.

Likewise, trying to explain why digital (anti-)resonators differ
from their analog counterparts, etc. \"in practice\".

Or, the subtle differences between match algorithms (implicit
in certain languages) based on their relative \"greediness\".
(had it ever occurred to them that there *can* be different
interpretations?)

\"Programmers\" just write code, often without understanding
why what they\'ve written isn\'t what they *intended*.

This is why \"programmers\" are such a drain on the industry\'s reputation.

[And, apparently, more and more diploma mills try to reduce things
to this level of simplification so their graduates can join the
workforce with some (dubious) credentials]
 
On Saturday, January 8, 2022 at 7:51:05 PM UTC-5, Joe Gwinn wrote:
On Sat, 8 Jan 2022 13:03:37 -0500, Phil Hobbs
pcdhSpamM...@electrooptical.net> wrote:

Martin Brown wrote:

I\'m slowly collecting a library of short code fragments that give
different optimising compilers and certain Intel CPU\'s trouble.


I\'d be interested to see it!
As would I.

There are tons of people who find these sorts of issues and document them, often to the particular combination of processor and compiler. What works well on one combination can be pathological on another because of changes at the micro-architecture level or in the compiler optimizations. It\'s a tough job trying to optimize for many combinations of machine and software.

You might ask in c.l.forth. I know plenty of people spend a lot of time tracking such combinations there because they are writing optimizing compilers.

--

Rick C.

- Get 1,000 miles of free Supercharging
- Tesla referral code - https://ts.la/richard11209
 
On 2022-01-09 02:10, Dennis wrote:
On 1/8/22 4:28 PM, Don Y wrote:
On 1/8/2022 10:55 AM, Martin Brown wrote:
The classic one when I was at university was that inevitably a
new graduate student would grind the Starlink VAX to a standstill
by transposing what was then a big image of 512x512 with nested
loops.

x[i,j] = x[j,i]

You don\'t even need to be transposing a large object to see this
problem. Just initializing an N-dimensional array is fraught with
pitfalls for a \"programmer\" (who likely has only knowledge of the
language and not the hardware, OS, etc.).

for (i = 0; i < MAX; i++) for (j = 0; j , MAX; j++) foo[i,j] = 0;

vs.

for (i = 0; i < MAX; i++) for (j = 0; j , MAX; j++) foo[j,i] = 0;

[In any of the equivalent forms}

And C and FORTRAN store multi-dimensional arrays is opposite order,
so translating one to the other can cause cache problems.


will stump most \"programmers\". Much the same as:

double x, y; ... if (x == y)...

I once had a colleague come to me when he had this sort of \"failure\".
[...]

Any programmer worth his salt should know that comparing floating
point values for equality doesn\'t work.

Jeroen Belleman
 

Welcome to EDABoard.com

Sponsor

Back
Top