# high accuracy math library...

D

#### David Brown

##### Guest
On 16/01/2022 18:21, Lasse Langwadt Christensen wrote:
sÃ¸ndag den 16. januar 2022 kl. 11.45.44 UTC+1 skrev David Brown:
On 16/01/2022 10:26, Martin Brown wrote:
On 15/01/2022 17:58, jla...@highlandsniptechnology.com wrote:
On Sat, 15 Jan 2022 17:50:14 +0000, Martin Brown

On 14/01/2022 21:01, Joe Gwinn wrote:

In the old days, only VAX/VMS had hardware support for 128-bit floats

Probably a very wasteful decision that cost them dear. The requirement
for anything above a 64 bit FP word length is very esoteric.

And thus most VAX processors emulated it in software - only a few had
hardware support. (It is not unlikely that software emulation was
faster than hardware for some tasks - hardware floating point used to be
very slow for anything other than add, subtract and multiply.)

that leaves division which was and still is \"slow\" compared to +/-/*

Yes, exactly.

(Somewhere in the development of the m68k processor family - I forget
exactly where, but I /think/ it was the 68030 - the cpu designers
realised that they could do a division in software faster than using the
hardware division block they had. Removing the hardware division
instruction saved significant die space.)

D

#### David Brown

##### Guest
On 16/01/2022 18:16, Lasse Langwadt Christensen wrote:
sÃ¸ndag den 16. januar 2022 kl. 17.23.19 UTC+1 skrev David Brown:
On 16/01/2022 16:50, jla...@highlandsniptechnology.com wrote:
On Sun, 16 Jan 2022 09:26:42 +0000, Martin Brown

On 15/01/2022 17:58, jla...@highlandsniptechnology.com wrote:
On Sat, 15 Jan 2022 17:50:14 +0000, Martin Brown

On 14/01/2022 21:01, Joe Gwinn wrote:

In the old days, only VAX/VMS had hardware support for 128-bit floats

Probably a very wasteful decision that cost them dear. The requirement
for anything above a 64 bit FP word length is very esoteric.

The most popular back in that era for high speed floating point was the
Cyber 7600 (60 bit word) which powered Manchester universities Jodrell
Bank processing and BMEWS early warning system amongst other things.

(not IEEE format though). In the cited GCC list, which of these are
directly supported in hardware, versus software emulation?

32, 64 are native x87 and full SSE floating point support
80 x87 only but GCC does it fairly well

Power Basic has a native 80-bit float type and a 64-bit integer.

I\'m so impressed. NOT

Of course not. The word BASIC triggers too much emotion, facts not
required.

I suspect he simply means it is not a hard or exciting feature if you
are making a language designed purely to run on a single target
processor family and OS. It is not impressive that Power BASIC has
support for 80-bit floats. It /would/ be impressive if it supported
128-bit floats, because that would require a lot of development effort.

BASIC is okay for small and simple programs. It is not uncommon to need
a something quick and easy - you want a language and tool that has
minimum developer time overhead, is interpreted (to minimise the
edit/run cycle time),

isn\'t Power BASIC compiled?

I don\'t know - I don\'t use it myself. But BASIC compilers for small
single-module programs tend to be fast enough that you can pretend they
are interpreted - you just \"run\" the program.

M

#### Martin Brown

##### Guest
On 16/01/2022 20:51, Phil Hobbs wrote:
Martin Brown wrote:

MS C used to have it back in the old v6 days but they rationalised
things to only have 64 bit FP support in C/C++ a very long time ago.

Most decent compilers *do* offer 80 bit reals. It is a pity that
Mickeysoft don\'t because their code optimiser is streets ahead of both
Intel and GCC\'s at handling out of order execution parallelism.

Last time I compared them directly was 2006ish, using almost all
single-precision C++ code.Â  Back then, Intel was streets ahead of
Microsoft for vectorization and loop unrolling, and gcc was a distant
distant third.

I think it will depend a lot on the application. Vectorising real*4 I
expect the Intel compiler might well still have the edge. I\'m mostly
interested in real*8 and real*10 function computations.

The elderly compiler that impressed me the most was Apple\'s Clang for
the M1 - that CPU really motors and at very low power cf Intel. PITA the
differences from classic standard C but once over that it was worth it.
It was already fast enough on its safer default settings that I didn\'t
notice /Ofast wasn\'t set!

My various Pade approximations didn\'t max out its ability to hide
operations inside the latency time of the divide. They all took the same
time irrespective of the polynomials up to 7,6 (as high as I go). On
Intel CPUs they get slower once the polynomial order goes above 4.

Intel C and GCC compilers still support 80 bit floating point.

On the code I have been testing recently Intel generates code that
effectively *forces* a pipeline stall more often than not. MSC somehow
manages the opposite. Pipeline stalls cost around 90 cycles which is
not insignificant in a routine that should take 300 cycles.

What sorts of code are you comparing?

Solving cubics, polynomials, a few transcendental functions and higher
order correctors in the series that begins Newton-Raphson, Halley, D4 ...

Mostly they are snippets that seldom exceed 20 lines. They form a set of
functional Lego bricks that solve a particular problem.

On modern optimising compilers NR and Halley take essentially the same
elapsed time for quadratic or better cubic convergence and D4 ~15%
slower for quartic convergence and D5 ~25% slower. After that they slow
down. In one special case Halley is *faster* than NR!

Traditional way of doing it D5 would be 2x slower so it is quite a game
changer in terms of which corrector you can use (or conversely how crude
an initial guess you need to get full machine precision).

Putting two divides close together with the second one dependent on
the result of the other is one way to do it. MSC tries much harder to
(at least it does when you enable every possible speed optimisation)

Sometimes it generates loop unrolled code that is completely wrong too

Yikes.

Indeed. It was to be fair quite a pathological piece of code (but it
still shouldn\'t happen).

The other thing I have had problems with is modern global optimisers
spotting benchmark loops of a simple form and coding the algebraic
answer! (their strength reduction tricks have become quite cunning)

x = 0;
dx = 1e-6;
for (i = 0; i<100000; i++) x += dx;

They also have to return a result that might be printed out later so
that there is a potential side effect.

--
Regards,
Martin Brown

W

#### whit3rd

##### Guest
On Sunday, January 16, 2022 at 9:21:05 AM UTC-8, lang...@fonz.dk wrote:
sÃ¸ndag den 16. januar 2022 kl. 11.45.44 UTC+1 skrev David Brown:
On 16/01/2022 10:26, Martin Brown wrote:
On 15/01/2022 17:58, jla...@highlandsniptechnology.com wrote:
On Sat, 15 Jan 2022 17:50:14 +0000, Martin Brown

On 14/01/2022 21:01, Joe Gwinn wrote:

In the old days, only VAX/VMS had hardware support for 128-bit floats

Probably a very wasteful decision that cost them dear. The requirement
for anything above a 64 bit FP word length is very esoteric.

And thus most VAX processors emulated it in software - only a few had
hardware support. (It is not unlikely that software emulation was
faster than hardware for some tasks - hardware floating point used to be
very slow for anything other than add, subtract and multiply.)

that leaves division which was and still is \"slow\" compared to +/-/*

Yeah, but... division is done by multiple multiplications, and if it takes five
stages for 64-bit, it only takes six stages for 128-bit; each two multiplies doubles
the precision of the result. So, the division penalty is proportionally less for
extensions to high precision.

The high-order speedup to multiply is with FFT techniques. Has anyone built those into
hardware for long-word computing?

D

#### David Brown

##### Guest
On 17/01/2022 11:06, whit3rd wrote:
On Sunday, January 16, 2022 at 9:21:05 AM UTC-8, lang...@fonz.dk wrote:
sÃ¸ndag den 16. januar 2022 kl. 11.45.44 UTC+1 skrev David Brown:
On 16/01/2022 10:26, Martin Brown wrote:
On 15/01/2022 17:58, jla...@highlandsniptechnology.com wrote:
On Sat, 15 Jan 2022 17:50:14 +0000, Martin Brown

On 14/01/2022 21:01, Joe Gwinn wrote:

In the old days, only VAX/VMS had hardware support for 128-bit floats

Probably a very wasteful decision that cost them dear. The requirement
for anything above a 64 bit FP word length is very esoteric.

And thus most VAX processors emulated it in software - only a few had
hardware support. (It is not unlikely that software emulation was
faster than hardware for some tasks - hardware floating point used to be
very slow for anything other than add, subtract and multiply.)

that leaves division which was and still is \"slow\" compared to +/-/*

Yeah, but... division is done by multiple multiplications, and if it takes five
stages for 64-bit, it only takes six stages for 128-bit; each two multiplies doubles
the precision of the result. So, the division penalty is proportionally less for
extensions to high precision.

The high-order speedup to multiply is with FFT techniques. Has anyone built those into
hardware for long-word computing?

FFT multiplication algorithms are inefficient for less than about 10,000
digits - that\'s a little big for a hardware solution!

Of course, it is possible to have instructions and hardware blocks that
accelerate parts of the process. Many DSP processors have special
instructions to improve the speed of FFT\'s (though that is generally for
filtering rather than multiplication).

M

#### Martin Brown

##### Guest
On 16/01/2022 15:50, jlarkin@highlandsniptechnology.com wrote:
On Sun, 16 Jan 2022 09:26:42 +0000, Martin Brown

On 15/01/2022 17:58, jlarkin@highlandsniptechnology.com wrote:
On Sat, 15 Jan 2022 17:50:14 +0000, Martin Brown

32, 64 are native x87 and full SSE floating point support
80 x87 only but GCC does it fairly well

Power Basic has a native 80-bit float type and a 64-bit integer.

I\'m so impressed. NOT

Of course not. The word BASIC triggers too much emotion, facts not
required.

I don\'t have any problem with Basic. I sometimes use it as VBA dialect
in inside Excel to automate jobs I can\'t otherwise be bothered doing.

Ripping content authored by someone else as output by MS Office \"save
for web\" into a compact shape fit to go on a web page for instance!

We had a couple of cases where we wanted to do a signal processing
routine that processed an array of adc samples, on x86. My official
programmer guys did it in gcc and I did it in Power Basic. Mine used
subscripts in the most obvious loop and they used pointers. Mine ran
4x as fast. After a day of mucking with code and compiler switches,
many combinations, they got within about 40%.

There is something wrong with how they are doing it then! One
dimensional arrays should be roughly equivalent performance in any
compiled language. C can do 1-D array indexing the same way as Basic.

C becomes messy when there are 2D or higher dimensional arrays since
that is implemented as a pointer to a pointer which does not cache well.
If they were using that array construct then they would be at a
disadvantage but it should not be by a factor of 4. Nothing like.

GCC is a lukewarm optimiser too - its main claim to fame is being free.
They should evaluate the latest MSC 2022 compiler if speed is important.

Almost all serious large scale computing in C people flatten their huge
arrays to 1 dimension using clumsy macros to do the indexing (or use
another language like Fortran which supports N dimensional arrays well).

Python looks a lot like Basic to me. Some of the goofier features were
added so that it couldn\'t be directly accused of being Basic syntax,
which would have been toxic.

PB has wonderful string functions. It has TCP OPEN and such, and can
send/receive emails if you really want to. The cool stuff is native,
not libraries; make an EXE file in half a second and you\'re done.

Whatever turns you on...

--
Regards,
Martin Brown

P

#### Phil Hobbs

##### Guest
jlarkin@highlandsniptechnology.com wrote:
On Sun, 16 Jan 2022 09:26:42 +0000, Martin Brown

On 15/01/2022 17:58, jlarkin@highlandsniptechnology.com wrote:
On Sat, 15 Jan 2022 17:50:14 +0000, Martin Brown

On 14/01/2022 21:01, Joe Gwinn wrote:

In the old days, only VAX/VMS had hardware support for 128-bit floats

Probably a very wasteful decision that cost them dear. The requirement
for anything above a 64 bit FP word length is very esoteric.

The most popular back in that era for high speed floating point was the
Cyber 7600 (60 bit word) which powered Manchester universities Jodrell
Bank processing and BMEWS early warning system amongst other things.

(not IEEE format though). In the cited GCC list, which of these are
directly supported in hardware, versus software emulation?

32, 64 are native x87 and full SSE floating point support
80 x87 only but GCC does it fairly well

Power Basic has a native 80-bit float type and a 64-bit integer.

I\'m so impressed. NOT

Of course not. The word BASIC triggers too much emotion, facts not
required.

We had a couple of cases where we wanted to do a signal processing
routine that processed an array of adc samples, on x86. My official
programmer guys did it in gcc and I did it in Power Basic. Mine used
subscripts in the most obvious loop and they used pointers. Mine ran
4x as fast. After a day of mucking with code and compiler switches,
many combinations, they got within about 40%.

Python looks a lot like Basic to me. Some of the goofier features were
added so that it couldn\'t be directly accused of being Basic syntax,
which would have been toxic.

PB has wonderful string functions. It has TCP OPEN and such, and can
send/receive emails if you really want to. The cool stuff is native,
not libraries; make an EXE file in half a second and you\'re done.

Back in the very long ago, when I was a grad student, I used to really
like HP\'s Rocky Mountain Basic. I had a 9816 with some huge 20 MB hard
drive and a bunch of hardware I/O to run my laser microscope.

I managed to cadge an Infotek math coprocessor board plus their
RMB-compatible compiler, which really helped the speed.

The great thing about RMB was that it made instrument control a
breeze--it was written by the outfit that made the instruments, which
helps.

Haven\'t used BASIC since about 1987, even though I have a few
instruments that run RMB and can be used as controllers for other boxes
(notably the HP 35665A).

Cheers

Phil Hobbs

--
Dr Philip C D Hobbs
Principal Consultant
ElectroOptical Innovations LLC / Hobbs ElectroOptics
Optics, Electro-optics, Photonics, Analog Electronics
Briarcliff Manor NY 10510

http://electrooptical.net
http://hobbs-eo.com

D

#### David Brown

##### Guest
On 17/01/2022 14:26, Martin Brown wrote:
On 16/01/2022 15:50, jlarkin@highlandsniptechnology.com wrote:
On Sun, 16 Jan 2022 09:26:42 +0000, Martin Brown

On 15/01/2022 17:58, jlarkin@highlandsniptechnology.com wrote:
On Sat, 15 Jan 2022 17:50:14 +0000, Martin Brown

32, 64Â  are native x87 and full SSE floating point support
80Â Â Â Â  x87 only but GCC does it fairly well

Power Basic has a native 80-bit float type and a 64-bit integer.

I\'m so impressed. NOT

Of course not. The word BASIC triggers too much emotion, facts not
required.

I don\'t have any problem with Basic. I sometimes use it as VBA dialect
in inside Excel to automate jobs I can\'t otherwise be bothered doing.

Ripping content authored by someone else as output by MS Office \"save
for web\" into a compact shape fit to go on a web page for instance!

We had a couple of cases where we wanted to do a signal processing
routine that processed an array of adc samples, on x86. My official
programmer guys did it in gcc and I did it in Power Basic. Mine used
subscripts in the most obvious loop and they used pointers. Mine ran
4x as fast. After a day of mucking with code and compiler switches,
many combinations, they got within about 40%.

There is something wrong with how they are doing it then! One
dimensional arrays should be roughly equivalent performance in any
compiled language. C can do 1-D array indexing the same way as Basic.

Indeed.

When someone uses pointers for C arrays, it\'s an indication that they
are being smart-arse rather than smart - they are trying to
micro-optimise the source code instead of writing it clearly and letting
the compiler do the job. Without other information or seeing the code,
it is of course impossible to tell - but there is certainly no simple
explanation why the same array code could not be written in the same way
in C and get at least as good performance.

C becomes messy when there are 2D or higher dimensional arrays since
that is implemented as a pointer to a pointer which does not cache well.

No, two dimensional arrays in C are /not/ pointer-to-pointer accesses.

If you have:

int xs[10][100];

Then accessing \"xs[a]\" is just like accessing element \"100 * a + b\"
of a one-dimensional array. There are no extra levels of indirection,
and xs is /not/ an array of pointers.

If they were using that array construct then they would be at a
disadvantage but it should not be by a factor of 4. Nothing like.

There is no disadvantage. It would be absurd if C were designed in such
extra pointers to slow it down and waste memory. (Poor quality
compilers might generate poor quality object code from multi-dimensional
array access, but that\'s a matter of the compiler optimisation. It used
to make sense to use manual pointer arithmetic instead of array
expressions, in the old days when compilers were weak.)

GCC is a lukewarm optimiser too - its main claim to fame is being free.
They should evaluate the latest MSC 2022 compiler if speed is important.

gcc is an excellent optimiser. MSVC is not bad too (for C++ - it\'s a
shitty C compiler), and the same goes for clang and Intel icc. Each
will do better on some examples, worse on others. And each requires
extra effort, careful flag selection, and compiler-specific features if
you want to squeeze the last few drops of performance out of the
binaries. (This can make a big difference if you can vectorise the code
with SIMD instructions.)

Almost all serious large scale computing in C people flatten their huge
arrays to 1 dimension using clumsy macros to do the indexing (or use
another language like Fortran which supports N dimensional arrays well).

Nonsense. That was the case 20 years ago, but not now.

(They might do horrible things to their code to make them work well with
SIMD, as automatic vectorisation is still more of an art than a science.)

Python looks a lot like Basic to me. Some of the goofier features were
added so that it couldn\'t be directly accused of being Basic syntax,
which would have been toxic.

PB has wonderful string functions. It has TCP OPEN and such, and can
send/receive emails if you really want to. The cool stuff is native,
not libraries; make an EXE file in half a second and you\'re done.

The cool stuff in PB is libraries, not native - but it might be
libraries that are always included by the tool so that you don\'t need
any kind of \"import\" statement.

Certainly it is insanity to use C when you want networking, emails, and
the like. If you are doing something big enough that you want the speed
and efficiency of C, use C++, Go, Rust, D, or anything else that will do
a better job of letting you write such code easily and safely. If not,
use Python as it makes such code vastly simpler and faster to write, and
has more libraries for that kind of thing than any other language.

Basic was probably a solid choice for such things a couple of decades
ago. And of course if you are used to it, and it is still good enough,
then that\'s fine - good enough is good enough.

J

#### Jan Panteltje

##### Guest
On a sunny day (Mon, 17 Jan 2022 18:07:14 +0100) it happened David Brown
<david.brown@hesbynett.no> wrote in <ss47o2$ohn$1@dont-email.me>:

Certainly it is insanity to use C when you want networking, emails, and
the like.

That is not correct, read libcinfo, it should be on your lLinux system, else it is here:
http://panteltje.com/pub/libc.info.txt
I use networking in C all the time
wrote several servers, irc client, email client, this newsreader, so much, no problem.
If it [networking] is TOO difficult for you, then call \'netcat\' from C (or your script or from anything else).
In Linux that is,
But really, writing a server in C is only a few lines.

If you are doing something big enough that you want the speed
and efficiency of C, use C++

C++ is a crime against humanity.

,>Go, Rust, D, or anything else that will do
a better job of letting you write such code easily and safely. If not,
use Python as it makes such code vastly simpler and faster to write, and
has more libraries for that kind of thing than any other language.

Basic was probably a solid choice for such things a couple of decades
ago. And of course if you are used to it, and it is still good enough,
then that\'s fine - good enough is good enough.

The Sinclair ZX80 / ZX81 BASIC was very very good.
asm is even more fun.

P

#### Phil Hobbs

##### Guest
Jan Panteltje wrote:
On a sunny day (Mon, 17 Jan 2022 18:07:14 +0100) it happened David Brown
david.brown@hesbynett.no> wrote in <ss47o2$ohn$1@dont-email.me>:

Certainly it is insanity to use C when you want networking, emails, and
the like.

That is not correct, read libcinfo, it should be on your lLinux system, else it is here:
http://panteltje.com/pub/libc.info.txt
I use networking in C all the time
wrote several servers, irc client, email client, this newsreader, so much, no problem.
If it [networking] is TOO difficult for you, then call \'netcat\' from C (or your script or from anything else).
In Linux that is,
But really, writing a server in C is only a few lines.

If you are doing something big enough that you want the speed
and efficiency of C, use C++

C++ is a crime against humanity.

If you don\'t like the poison gas, lay off the pickled herring.

Cheers

Phil Hobbs

S

#### server

##### Guest
On Mon, 17 Jan 2022 17:58:42 GMT, Jan Panteltje
<pNaonStpealmtje@yahoo.com> wrote:

On a sunny day (Mon, 17 Jan 2022 18:07:14 +0100) it happened David Brown
david.brown@hesbynett.no> wrote in <ss47o2$ohn$1@dont-email.me>:

Certainly it is insanity to use C when you want networking, emails, and
the like.

That is not correct, read libcinfo, it should be on your lLinux system, else it is here:
http://panteltje.com/pub/libc.info.txt
I use networking in C all the time
wrote several servers, irc client, email client, this newsreader, so much, no problem.
If it [networking] is TOO difficult for you, then call \'netcat\' from C (or your script or from anything else).
In Linux that is,
But really, writing a server in C is only a few lines.

If you are doing something big enough that you want the speed
and efficiency of C, use C++

C++ is a crime against humanity.

,>Go, Rust, D, or anything else that will do
a better job of letting you write such code easily and safely. If not,
use Python as it makes such code vastly simpler and faster to write, and
has more libraries for that kind of thing than any other language.

Basic was probably a solid choice for such things a couple of decades
ago. And of course if you are used to it, and it is still good enough,
then that\'s fine - good enough is good enough.

The Sinclair ZX80 / ZX81 BASIC was very very good.
asm is even more fun.

PowerBasic is wonderful. A serious compiler will a great UI and all
sorts of intrinsic goodies. It also allows inline asm with variable
names common to Basic and ASM. That is handy now and then.

--

I yam what I yam - Popeye

S

#### server

##### Guest
Jan Panteltje <pNaonStpealmtje@yahoo.com> wrote in news:ss4apv$r3n$1
@dont-email.me:

> C++ is a crime against humanity.
^^^^
More like \"Jan Pantltje\'s whore mother committed a crime against
humanity.\"

J

#### Joe Gwinn

##### Guest
On Sat, 15 Jan 2022 17:50:14 +0000, Martin Brown

On 14/01/2022 21:01, Joe Gwinn wrote:
On Fri, 14 Jan 2022 18:12:43 +0000, Martin Brown

On 14/01/2022 16:50, Hul Tytus wrote:
There was once a math library in c, if memory serves, with the
basic functions, ie +, -, * and / and some others also. The resolution
was adjustable so changing a reference variable (or was that a #define?)
from 32 to 256 would change the size of the variables to 256 bits.
Anyone rember the name or location of that library?

I don\'t recall that particular one but GCC can be fairly easily
persuaded to go up to 128 bit reals which are usually good enough for
all but the most insane of floating point calculations.

I think your choices there are limited to 32, 64, 80, 128

https://gcc.gnu.org/onlinedocs/gcc/Floating-Types.html

It includes the most common transcendental functions as well.

Quad floating precision runs slowly so do as much as you can at a lower
precision and then refine the answer using that as a seed value.

I used to like having 80 bit reals available in the good old prehistoric
days of MSC v6. Today it requires some effort to use them with MSC

In the old days, only VAX/VMS had hardware support for 128-bit floats
(not IEEE format though). In the cited GCC list, which of these are
directly supported in hardware, versus software emulation?

32, 64 are native x87 and full SSE floating point support
80 x87 only but GCC does it fairly well
128 emulated and slower

Always work in the hardware supported ones to obtain an approximate
answer unless and until you need that extra precision.

Yes.

>Preferably frame it so you refine an approximate starting guess.

If possible. Sometimes I use a coarse-fine two-step algorithm.

Most current machines directly support multi precision integer
arithmetic for power-of-2 lengths, but it is done in multiple
coordinated machine-code operations, so it\'s partly in software.

32, 64 and 128 integer support are sometimes native at least for some
platforms. +, - and * all execute in one nominal CPU cycle* too!
(at least for 32, 64 bit - I have never bothered with 128 bit int)

Nor have I, although for such things as time, integers are preferred
because the precision does not vary with magnitude.

Depending on the hardware, a pair of 64-bit integers can be scaled
such that one integer is the integer part and the other integer is the
fractional part, the decimal point being between the two integers.
Depending on hardware and compiler, this may be faster than floating
point.

* sometimes they can appear to take less than one cycle due to out of
order execution and the opportunities to do work whilst divides are in
progress. Divides are always best avoided or if that is impossible their
number minimised. Divide is between 10-20x slower than all the other
primitive operations and two divides close together can be *much*
slower. Pipeline stalls typically cost around 90 cycles per hit.

divide remains a PITA and worth eliminating where possible.

It\'s usually possible to reformulate to multiply by the reciprocal.

I have an assembler implementation for a special case division that can
be faster than the hardware divide for the situation it aims to solve.

Basically 1/(1-x) = 1 + x + x^2 + x^3 + x^4 + ...
(1 + x)*(1 + x^2)*(1 + x^4)*(1 + x^8)

And for smallish x it converges faster than hardware FP divide.

Yes, that example comes up in practice a good bit.

Of course, when the word size goes up, the various approximations
polynomials must improve, which generally means to use higher-order
polynomials, so the slowdown isn\'t all due to slower computational
hardware.

There aren\'t all that many that need it.

out to be good enough for simulating planetary systems with chaos.

In ray-tracing problems, the big hitters are sine and cosine, and some
tangent. I have not needed to determine if the current approximations
are good enough, but I\'m suspicious, given that a slight displacement
of the intersection point on a curved optical surface will deviate the
ray ever so slightly, most likely changing the ray segment length ever
so slightly, ...

Most planetary dynamics can be done with 80 bit reals with a bit to spare.

The only real application of 128-bit floats that I am aware of was the
design of interferometers such as LIGO, where one is tracking very
small fractions of an optical wavelength over path lengths in the
kilometers, with at least two spare decimal digits to absorb numerical
noise from the ray-trace computations.

That might be a genuine application.

I\'m pretty sure that it was an actual application. Probably been
replaced by now.

The only times I have played with them have been to investigate the
weird constants that play a part in some chaotic equations. I was
curious to see how much of the behaviour was due to finite mantissa
length and how much was inherent in the mathematics. Doubling the length
of the mantissa goes a long way to solving that particular problem.
(but it is rather slow)

Yes, that would be the classic test, needed only once in a while.

Joe Gwinn

P

#### Phil Hobbs

##### Guest
Joe Gwinn wrote:
On Sat, 15 Jan 2022 17:50:14 +0000, Martin Brown

On 14/01/2022 21:01, Joe Gwinn wrote:
On Fri, 14 Jan 2022 18:12:43 +0000, Martin Brown

On 14/01/2022 16:50, Hul Tytus wrote:
There was once a math library in c, if memory serves, with the
basic functions, ie +, -, * and / and some others also. The resolution
was adjustable so changing a reference variable (or was that a #define?)
from 32 to 256 would change the size of the variables to 256 bits.
Anyone rember the name or location of that library?

I don\'t recall that particular one but GCC can be fairly easily
persuaded to go up to 128 bit reals which are usually good enough for
all but the most insane of floating point calculations.

I think your choices there are limited to 32, 64, 80, 128

https://gcc.gnu.org/onlinedocs/gcc/Floating-Types.html

It includes the most common transcendental functions as well.

Quad floating precision runs slowly so do as much as you can at a lower
precision and then refine the answer using that as a seed value.

I used to like having 80 bit reals available in the good old prehistoric
days of MSC v6. Today it requires some effort to use them with MSC

In the old days, only VAX/VMS had hardware support for 128-bit floats
(not IEEE format though). In the cited GCC list, which of these are
directly supported in hardware, versus software emulation?

32, 64 are native x87 and full SSE floating point support
80 x87 only but GCC does it fairly well
128 emulated and slower

Always work in the hardware supported ones to obtain an approximate
answer unless and until you need that extra precision.

Yes.

Preferably frame it so you refine an approximate starting guess.

If possible. Sometimes I use a coarse-fine two-step algorithm.

Most current machines directly support multi precision integer
arithmetic for power-of-2 lengths, but it is done in multiple
coordinated machine-code operations, so it\'s partly in software.

32, 64 and 128 integer support are sometimes native at least for some
platforms. +, - and * all execute in one nominal CPU cycle* too!
(at least for 32, 64 bit - I have never bothered with 128 bit int)

Nor have I, although for such things as time, integers are preferred
because the precision does not vary with magnitude.

Depending on the hardware, a pair of 64-bit integers can be scaled
such that one integer is the integer part and the other integer is the
fractional part, the decimal point being between the two integers.
Depending on hardware and compiler, this may be faster than floating
point.

* sometimes they can appear to take less than one cycle due to out of
order execution and the opportunities to do work whilst divides are in
progress. Divides are always best avoided or if that is impossible their
number minimised. Divide is between 10-20x slower than all the other
primitive operations and two divides close together can be *much*
slower. Pipeline stalls typically cost around 90 cycles per hit.

divide remains a PITA and worth eliminating where possible.

It\'s usually possible to reformulate to multiply by the reciprocal.

I have an assembler implementation for a special case division that can
be faster than the hardware divide for the situation it aims to solve.

Basically 1/(1-x) = 1 + x + x^2 + x^3 + x^4 + ...
(1 + x)*(1 + x^2)*(1 + x^4)*(1 + x^8)

And for smallish x it converges faster than hardware FP divide.

Yes, that example comes up in practice a good bit.

Of course, when the word size goes up, the various approximations
polynomials must improve, which generally means to use higher-order
polynomials, so the slowdown isn\'t all due to slower computational
hardware.

There aren\'t all that many that need it.

out to be good enough for simulating planetary systems with chaos.

In ray-tracing problems, the big hitters are sine and cosine, and some
tangent. I have not needed to determine if the current approximations
are good enough, but I\'m suspicious, given that a slight displacement
of the intersection point on a curved optical surface will deviate the
ray ever so slightly, most likely changing the ray segment length ever
so slightly, ...

Right. Of course manufacturing errors in a real lens will limit the
accuracy of a given ray trace faster than the mumerical limitations.

This is a slower version of the classical pool table problem. Go ahead
and rack up a nice set of perfectly elastic balls on a lossless table
with perfectly elastic cushions in a vacuum at zero kelvins, (*) Then
pick up your perfectly elastic cue, put some very high friction chalk on
it, and break.

Your eye-hand coordination is perfect, of course, but nevertheless the
cue ball\'s position and momentum are slightly uncertain due to the
Heisenberg inequality, $\\delta P \\delta S \\ge \\bar{h}/2$. This is a
very small number, so your break appears perfect. Two balls go straight
into pockets, and the rest keep rattling round the table.

Any uncertainty in the momentum of a given ball causes an aiming
uncertainty that builds up linearly with distance. That makes the point
of collision with the next ball slightly uncertain, causing an angular
error, which builds up linearly with distance till the next
collision.... The result is an exponential error amplifier.

In the absence of loss, the Heisenberg uncertainty of the motion of the
cue ball gets multiplied exponentially with time, until after 30 seconds
or so it becomes larger than the ball\'s diameter--in other words, past
that point it\'s impossible even in principle to predict from the initial
conditions which balls will hit each other.

(At that point you start simulating the pool table as though it were a
globular star cluster instead of a planetary system.)

Cheers

Phil Hobbs

(*) Yes, like the spherical cow emitting milk isotropically at a
constant rate...

Most planetary dynamics can be done with 80 bit reals with a bit to spare.

The only real application of 128-bit floats that I am aware of was the
design of interferometers such as LIGO, where one is tracking very
small fractions of an optical wavelength over path lengths in the
kilometers, with at least two spare decimal digits to absorb numerical
noise from the ray-trace computations.

That might be a genuine application.

I\'m pretty sure that it was an actual application. Probably been
replaced by now.

The only times I have played with them have been to investigate the
weird constants that play a part in some chaotic equations. I was
curious to see how much of the behaviour was due to finite mantissa
length and how much was inherent in the mathematics. Doubling the length
of the mantissa goes a long way to solving that particular problem.
(but it is rather slow)

Yes, that would be the classic test, needed only once in a while.

Joe Gwinn

--
Dr Philip C D Hobbs
Principal Consultant
ElectroOptical Innovations LLC / Hobbs ElectroOptics
Optics, Electro-optics, Photonics, Analog Electronics
Briarcliff Manor NY 10510

http://electrooptical.net
http://hobbs-eo.com

J

#### Jan Panteltje

##### Guest
On a sunny day (Mon, 17 Jan 2022 11:58:00 -0800) it happened
jlarkin@highlandsniptechnology.com wrote in
<7aibug14ktp8a81263veavq4phq9tq8tpg@4ax.com>:

On Mon, 17 Jan 2022 17:58:42 GMT, Jan Panteltje
pNaonStpealmtje@yahoo.com> wrote:

On a sunny day (Mon, 17 Jan 2022 18:07:14 +0100) it happened David Brown
david.brown@hesbynett.no> wrote in <ss47o2$ohn$1@dont-email.me>:

Certainly it is insanity to use C when you want networking, emails, and
the like.

That is not correct, read libcinfo, it should be on your lLinux system, else it is here:
http://panteltje.com/pub/libc.info.txt
I use networking in C all the time
wrote several servers, irc client, email client, this newsreader, so much, no problem.
If it [networking] is TOO difficult for you, then call \'netcat\' from C (or your script or from anything else).
In Linux that is,
But really, writing a server in C is only a few lines.

If you are doing something big enough that you want the speed
and efficiency of C, use C++

C++ is a crime against humanity.

,>Go, Rust, D, or anything else that will do
a better job of letting you write such code easily and safely. If not,
use Python as it makes such code vastly simpler and faster to write, and
has more libraries for that kind of thing than any other language.

Basic was probably a solid choice for such things a couple of decades
ago. And of course if you are used to it, and it is still good enough,
then that\'s fine - good enough is good enough.

The Sinclair ZX80 / ZX81 BASIC was very very good.
asm is even more fun.

PowerBasic is wonderful. A serious compiler will a great UI and all
sorts of intrinsic goodies. It also allows inline asm with variable
names common to Basic and ASM. That is handy now and then.

I once wrote
it is sort of a way to allow inline asm in MCS BASIC for the 8052 micro I think it was...
also from the eighties.
https://ininet.org/chapter-1-introduction-to-mcs-basic-52.html

J

#### Jan Panteltje

##### Guest
On a sunny day (Mon, 17 Jan 2022 13:43:31 -0500) it happened Phil Hobbs
<pcdhSpamMeSenseless@electrooptical.net> wrote in
<2e88109f-2296-25b5-31fd-0df2425004b8@electrooptical.net>:

Jan Panteltje wrote:
On a sunny day (Mon, 17 Jan 2022 18:07:14 +0100) it happened David Brown
david.brown@hesbynett.no> wrote in <ss47o2$ohn$1@dont-email.me>:

Certainly it is insanity to use C when you want networking, emails, and
the like.

That is not correct, read libcinfo, it should be on your lLinux system, else it is here:
http://panteltje.com/pub/libc.info.txt
I use networking in C all the time
wrote several servers, irc client, email client, this newsreader, so much, no problem.
If it [networking] is TOO difficult for you, then call \'netcat\' from C (or your script or from anything else).
In Linux that is,
But really, writing a server in C is only a few lines.

If you are doing something big enough that you want the speed
and efficiency of C, use C++

C++ is a crime against humanity.

If you don\'t like the poison gas, lay off the pickled herring.

Cheers

Phil Hobbs

Good protection clothes are essential,
http://panteltje.com/pub/asbestos_aliens_1.gif
guys here yesterday replacing asbestos roof.
Same for Cpushplush dispose of it with care.
I like herring, the smell does not bother me,
when I was very young my father could not get me past a herring stand (many in Amsterdam)
without me having one.

These days those may be contaminated with anything from plutonium to plastics though.

J

#### Joe Gwinn

##### Guest
On Mon, 17 Jan 2022 19:23:58 -0500, Phil Hobbs
<pcdhSpamMeSenseless@electrooptical.net> wrote:

Joe Gwinn wrote:
On Sat, 15 Jan 2022 17:50:14 +0000, Martin Brown

On 14/01/2022 21:01, Joe Gwinn wrote:
On Fri, 14 Jan 2022 18:12:43 +0000, Martin Brown

On 14/01/2022 16:50, Hul Tytus wrote:
There was once a math library in c, if memory serves, with the
basic functions, ie +, -, * and / and some others also. The resolution
was adjustable so changing a reference variable (or was that a #define?)
from 32 to 256 would change the size of the variables to 256 bits.
Anyone rember the name or location of that library?

I don\'t recall that particular one but GCC can be fairly easily
persuaded to go up to 128 bit reals which are usually good enough for
all but the most insane of floating point calculations.

I think your choices there are limited to 32, 64, 80, 128

https://gcc.gnu.org/onlinedocs/gcc/Floating-Types.html

It includes the most common transcendental functions as well.

Quad floating precision runs slowly so do as much as you can at a lower
precision and then refine the answer using that as a seed value.

I used to like having 80 bit reals available in the good old prehistoric
days of MSC v6. Today it requires some effort to use them with MSC

In the old days, only VAX/VMS had hardware support for 128-bit floats
(not IEEE format though). In the cited GCC list, which of these are
directly supported in hardware, versus software emulation?

32, 64 are native x87 and full SSE floating point support
80 x87 only but GCC does it fairly well
128 emulated and slower

Always work in the hardware supported ones to obtain an approximate
answer unless and until you need that extra precision.

Yes.

Preferably frame it so you refine an approximate starting guess.

If possible. Sometimes I use a coarse-fine two-step algorithm.

Most current machines directly support multi precision integer
arithmetic for power-of-2 lengths, but it is done in multiple
coordinated machine-code operations, so it\'s partly in software.

32, 64 and 128 integer support are sometimes native at least for some
platforms. +, - and * all execute in one nominal CPU cycle* too!
(at least for 32, 64 bit - I have never bothered with 128 bit int)

Nor have I, although for such things as time, integers are preferred
because the precision does not vary with magnitude.

Depending on the hardware, a pair of 64-bit integers can be scaled
such that one integer is the integer part and the other integer is the
fractional part, the decimal point being between the two integers.
Depending on hardware and compiler, this may be faster than floating
point.

* sometimes they can appear to take less than one cycle due to out of
order execution and the opportunities to do work whilst divides are in
progress. Divides are always best avoided or if that is impossible their
number minimised. Divide is between 10-20x slower than all the other
primitive operations and two divides close together can be *much*
slower. Pipeline stalls typically cost around 90 cycles per hit.

divide remains a PITA and worth eliminating where possible.

It\'s usually possible to reformulate to multiply by the reciprocal.

I have an assembler implementation for a special case division that can
be faster than the hardware divide for the situation it aims to solve.

Basically 1/(1-x) = 1 + x + x^2 + x^3 + x^4 + ...
(1 + x)*(1 + x^2)*(1 + x^4)*(1 + x^8)

And for smallish x it converges faster than hardware FP divide.

Yes, that example comes up in practice a good bit.

Of course, when the word size goes up, the various approximations
polynomials must improve, which generally means to use higher-order
polynomials, so the slowdown isn\'t all due to slower computational
hardware.

There aren\'t all that many that need it.

out to be good enough for simulating planetary systems with chaos.

In ray-tracing problems, the big hitters are sine and cosine, and some
tangent. I have not needed to determine if the current approximations
are good enough, but I\'m suspicious, given that a slight displacement
of the intersection point on a curved optical surface will deviate the
ray ever so slightly, most likely changing the ray segment length ever
so slightly, ...

Right. Of course manufacturing errors in a real lens will limit the
accuracy of a given ray trace faster than the mumerical limitations.

Yes, although I\'d venture that the LIGO conspiracy will have rather
better optics than is common.

This is a slower version of the classical pool table problem. Go ahead
and rack up a nice set of perfectly elastic balls on a lossless table
with perfectly elastic cushions in a vacuum at zero kelvins, (*) Then
pick up your perfectly elastic cue, put some very high friction chalk on
it, and break.

Hmm. Slower?

I\'m not sure that it would be a good idea to witness billiard balls
traveling at the speed of light colliding with anything solid. A few
light years standoff might be safe.

Well, 50 light years would be safer, as we\'d all be dead before the

Your eye-hand coordination is perfect, of course, but nevertheless the
cue ball\'s position and momentum are slightly uncertain due to the
Heisenberg inequality, $\\delta P \\delta S \\ge \\bar{h}/2$. This is a
very small number, so your break appears perfect. Two balls go straight
into pockets, and the rest keep rattling round the table.

Yes, that is the bounding case for sure. My hand isn\'t quite that
steady, probably for lack of sufficient practice.

Any uncertainty in the momentum of a given ball causes an aiming
uncertainty that builds up linearly with distance. That makes the point
of collision with the next ball slightly uncertain, causing an angular
error, which builds up linearly with distance till the next
collision.... The result is an exponential error amplifier.

In the absence of loss, the Heisenberg uncertainty of the motion of the
cue ball gets multiplied exponentially with time, until after 30 seconds
or so it becomes larger than the ball\'s diameter--in other words, past
that point it\'s impossible even in principle to predict from the initial
conditions which balls will hit each other.

(At that point you start simulating the pool table as though it were a
globular star cluster instead of a planetary system.)

Yes.

Cheers

Phil Hobbs

(*) Yes, like the spherical cow emitting milk isotropically at a
constant rate...

Well, I\'ve known babies like that.

Joe Gwinn

M

#### Martin Brown

##### Guest
On 17/01/2022 23:35, Joe Gwinn wrote:
On Sat, 15 Jan 2022 17:50:14 +0000, Martin Brown

There aren\'t all that many that need it.

out to be good enough for simulating planetary systems with chaos.

In ray-tracing problems, the big hitters are sine and cosine, and some
tangent. I have not needed to determine if the current approximations
are good enough, but I\'m suspicious, given that a slight displacement
of the intersection point on a curved optical surface will deviate the
ray ever so slightly, most likely changing the ray segment length ever
so slightly, ...

Funnily enough the same is true of astrodynamics.

It is a bad idea to use cosine directly since:

cos(x) = 1 - 2*sin(x/2).

Quite often physics problems have terms in \"cos(x)-1\" or nearly so.

The terms in \"sin(x)-x\" are much more problematic and in the limit
x<0.25 pretty much have to be computed by a Pade approximation (or a
much older method of summing a fairly well convergent polynomial series)

One of the other tricks I have been working on is moving everything into
expressions in tan(x/2) which eliminates some independent ripple errors
on the lsb of the sin and cos expansions. Only really worthwhile on
platforms that don\'t compute sincos as a matched pair.

Most planetary dynamics can be done with 80 bit reals with a bit to spare.

The only real application of 128-bit floats that I am aware of was the
design of interferometers such as LIGO, where one is tracking very
small fractions of an optical wavelength over path lengths in the
kilometers, with at least two spare decimal digits to absorb numerical
noise from the ray-trace computations.

That might be a genuine application.

I\'m pretty sure that it was an actual application. Probably been
replaced by now.

I expect the codebase survives somewhere in an archive.
Big scientific kit often gets revamped at some point.
The sheer mechanical inginuity of the mirror supports in that thing are
amazing at holding it still. Remarkable the number of black hole mergers
that they and the other gravitational wave detectors have seen.

The only times I have played with them have been to investigate the
weird constants that play a part in some chaotic equations. I was
curious to see how much of the behaviour was due to finite mantissa
length and how much was inherent in the mathematics. Doubling the length
of the mantissa goes a long way to solving that particular problem.
(but it is rather slow)

Yes, that would be the classic test, needed only once in a while.

It is what got me into using the GCC compiler for certain tests.
Normally I stick with the MS VC/C++ environment. But their lack of 80bit
real support is annoying. I might get around to writing a C++ stub to
implement the most useful parts one day. But for now GCC and Salford\'s
Fortran will do everything I need for extended precision.

--
Regards,
Martin Brown