Conical inductors--still $10!...

On Tue, 1 Dec 2020 13:46:29 -0500, Phil Hobbs
<pcdhSpamMeSenseless@electrooptical.net> wrote:

On 12/1/20 12:48 PM, Joe Gwinn wrote:
On Mon, 30 Nov 2020 20:59:50 -0500, Phil Hobbs
pcdhSpamMeSenseless@electrooptical.net> wrote:

On 11/30/20 8:26 PM, Joe Gwinn wrote:
On Mon, 30 Nov 2020 17:26:03 -0500, Phil Hobbs
pcdhSpamMeSenseless@electrooptical.net> wrote:

On 11/30/20 12:07 PM, albert wrote:
In article
c8c0cf25-aa21-8b1d-b6f6-518624c35183@electrooptical.net>,
Phil Hobbs <pcdhSpamMeSenseless@electrooptical.net> wrote:
On 2020-08-02 20:56, Les Cargill wrote:
Phil Hobbs wrote:
On 2020-08-02 08:46, Martin Brown wrote:
On 23/07/2020 19:10, Phil Hobbs wrote:
On 2020-07-23 12:43, Tom Gardner wrote:
On 23/07/20 16:30, pcdhobbs@gmail.com wrote:
Isn\'t our ancient and settled idea of what a
computer is, and what an OS and languages
are, overdue for the next revolution?

In his other famous essay, \"No Silver Bullet\",
Brooks points out that the factors-of-10
productivity improvements of the early days
were gained by getting rid of extrinsic
complexity--crude tools, limited hardware, and
so forth.

Now the issues are mostly intrinsic to an
artifact built of thought. So apart from more
and more Python libraries, I doubt that there
are a lot more orders of magnitude available.

It is ironic that a lot of the potentially avoidable
human errors are typically fence post errors. Binary
fence post errors being about the most severe since
you end up with the opposite of what you intended.

Not in a single processor (except perhaps the
Mill).

But with multiple processors there can be
significant improvement - provided we are
prepared to think in different ways, and the
tools support it.

Examples: mapreduce, or xC on xCORE processors.

The average practitioner today really struggles on
massively parallel hardware. If you have ever done
any serious programming on such kit you quickly
realised that the process which ensures all the
other processes are kept busy doing useful things is
by far the most important.

I wrote a clusterized optimizing EM simulator that I
still use--I have a simulation gig just starting up
now, in fact. I learned a lot of ugly things about the
Linux thread scheduler in the process, such as that the
pthreads documents are full of lies about scheduling
and that you can\'t have a real-time thread in a user
mode program and vice versa. This is an entirely
arbitrary thing--there\'s no such restriction in Windows
or OS/2. Dunno about BSD--I should try that out.


In Linux, realtime threads are in \"the realtime context\".
It\'s a bit of a cadge. I\'ve never really seen a good
explanation of that that means.

Does anybody here know if you can mix RT and user
threads in a single process in BSD?

Sorry; never used BSD.

Realtime threads are simply of a different group of
priorities. You can install kernel loadable modules ( aka
device drivers ) to provide a timebase that will make
them eligible. SFAIK, you can\'t guarantee them to run.
You may be able to get close if you remove unnecessary
services.

I don\'t think this does what you want.

In Linux if one thread is real time, all the threads in the
process have to be as well. Any compute-bound thread in a
realtime process will bring the UI to its knees.

I\'d be perfectly happy with being able to _reduce_ thread
priority in a user process, but noooooo. They all have to
have the same priority, despite what the pthreads docs say.
So in Linux there is no way to express the idea that some
threads in a process are more important than others. That
destroys the otherwise-excellent scaling of my simulation
code.

Wouldn\'t the system call setpriority() work, after a thread
has bee started?

As I said way back in the summer, you can\'t change the priority
or scheduling of an individual user thread, and you can\'t mix
user and real-time threads in one process.

Thus you can have the default scheduler, which doesn\'t scale at
all well for this sort of job, or you can run the whole process
as real-time and make the box\'s user interface completely
unresponsive any time the real-time job is running.

In Windows and OS/2, you can set thread priorities any way you
like. Linux is notionally a multiuser system, so it\'s no
surprise there are limits to how much non-root users can
increase priority. I\'d be very happy being able to _reduce_
the priority of my compute threads so that the comms threads
would preempt them, but nooooooo.

I don\'t think that this is true for Red Hat Linux Enterprise
Edition. I don\'t recall if the realtime stuff is added, or built
in. People also use MRG with this.

More generally, not all Linux distributions care about such
things.

Joe Gwinn

It\'s in the kernel, so they mostly don\'t get to care or not care.
I\'ve recently tried again with CentOS, which is RHEL EE without
the support--no joy.

Let me add that while it does take root permission to turn realtime
on, it does not follow that the application must also run as root.

Running a big application as root is a real bad idea, for both
robustness and security reasons.

They\'re my boxes, and when doing cluster sims they usually run a special
OS for the purpose: Rocks 7. It has one head-node and N compute nodes
that get network-booted off the head, so I get a clean install every
time they boot up. So no worries there.

I hadn\'t heard of Rocks 7. I\'ll look into it.


There\'s no point in running as root, though, because you can\'t change
the priority of user threads even then--only realtime ones.

So, why not use realtime ones? This is done all the time. The
application code works the same.


>The Linux thread scheduler is a really-o truly-o Charlie Foxtrot.

The legacy UNIX \"nice\" scheduler, retained for backward compatibility,
it unsuited for realtime for sure. And for what you are doing it
would seem.


What I\'ve seen done is that during startup, the application startup
process or thread is given permission to run sudo, which it uses to
set scheduling policies (FIFO) and numerical priority (real urgent)
for the main processes and threads before normal operation
commences.

The other thing that is normally set up during startup is the
establishment of shared memory windows between processes. For may
applications, passing data blocks around by pointer passing is the
only practical approach.

Passing pointers only works on unis or shared-memory SMP boxes. My
pre-cluster versions did it that way, but by about 2006 I needed 20+
cores to do the job in a reasonable time.

The present-day large-scale solution is to use shared memory via
Infiniband. Memory is transferred to and from IB to local memory
using DMA (Direct Memory Access) hardware.

This is done with a fleet of identical enterprise-class PCs, often
from HP or Dell or the like.


I was building nanoantennas with travelling wave plasmonic waveguides,
coupled to SOI optical waveguides. (They eventually worked really well,
but it was a long slog, mostly by myself.)

The plasmonic section was intended to avoid the ~30 THz RC bandwidth of
my tunnel junction detectors--the light wave went transverse to the
photocurrent, so the light didn\'t have to drive the capacitance all at
once. That saved me over 30 dB of rolloff right there.

Wow! That\'s worth a bunch of trouble for sure.


The issue was that metals that exhibit plasmons (copper, silver, and
gold) exhibit free-electron behaviour in the infrared, i.e. their
epsilons are very nearly pure, negative real numbers. That makes the
real part of their refractive indices very very small, so you have to
take very small time steps or the simulation becomes unstable due to
superluminal propagation.

Are there any metals that don\'t exhibit plasmons?


Their imaginary epsilons are very large, so you need very small voxels
to represent the fields well. Small voxels and short time steps make
for loooong run times, especially when it\'s wrapped in an optimization loop.

If I recall/understand, the imaginary epsilons are the loss factors,
so this is quite lossy.

This sounds like a problem that would benefit from parallelism, and
shared memory. And lots of physical memory.

Basically divide the propagation media into independent squares or
boxes that overlap somewhat, with inter square/box traffic needed only
for the overlap regions. So, there will be an optimum square/box area
or volume.

Joe Gwinn
 
On 12/6/20 5:58 PM, Joe Gwinn wrote:
On Tue, 1 Dec 2020 13:46:29 -0500, Phil Hobbs
pcdhSpamMeSenseless@electrooptical.net> wrote:

On 12/1/20 12:48 PM, Joe Gwinn wrote:
On Mon, 30 Nov 2020 20:59:50 -0500, Phil Hobbs
pcdhSpamMeSenseless@electrooptical.net> wrote:

On 11/30/20 8:26 PM, Joe Gwinn wrote:
On Mon, 30 Nov 2020 17:26:03 -0500, Phil Hobbs
pcdhSpamMeSenseless@electrooptical.net> wrote:

On 11/30/20 12:07 PM, albert wrote:
In article
c8c0cf25-aa21-8b1d-b6f6-518624c35183@electrooptical.net>,
Phil Hobbs <pcdhSpamMeSenseless@electrooptical.net> wrote:
On 2020-08-02 20:56, Les Cargill wrote:
Phil Hobbs wrote:
On 2020-08-02 08:46, Martin Brown wrote:
On 23/07/2020 19:10, Phil Hobbs wrote:
On 2020-07-23 12:43, Tom Gardner wrote:
On 23/07/20 16:30, pcdhobbs@gmail.com wrote:
Isn\'t our ancient and settled idea of what a
computer is, and what an OS and languages
are, overdue for the next revolution?

In his other famous essay, \"No Silver Bullet\",
Brooks points out that the factors-of-10
productivity improvements of the early days
were gained by getting rid of extrinsic
complexity--crude tools, limited hardware, and
so forth.

Now the issues are mostly intrinsic to an
artifact built of thought. So apart from more
and more Python libraries, I doubt that there
are a lot more orders of magnitude available.

It is ironic that a lot of the potentially avoidable
human errors are typically fence post errors. Binary
fence post errors being about the most severe since
you end up with the opposite of what you intended.

Not in a single processor (except perhaps the
Mill).

But with multiple processors there can be
significant improvement - provided we are
prepared to think in different ways, and the
tools support it.

Examples: mapreduce, or xC on xCORE processors.

The average practitioner today really struggles on
massively parallel hardware. If you have ever done
any serious programming on such kit you quickly
realised that the process which ensures all the
other processes are kept busy doing useful things is
by far the most important.

I wrote a clusterized optimizing EM simulator that I
still use--I have a simulation gig just starting up
now, in fact. I learned a lot of ugly things about the
Linux thread scheduler in the process, such as that the
pthreads documents are full of lies about scheduling
and that you can\'t have a real-time thread in a user
mode program and vice versa. This is an entirely
arbitrary thing--there\'s no such restriction in Windows
or OS/2. Dunno about BSD--I should try that out.


In Linux, realtime threads are in \"the realtime context\".
It\'s a bit of a cadge. I\'ve never really seen a good
explanation of that that means.

Does anybody here know if you can mix RT and user
threads in a single process in BSD?

Sorry; never used BSD.

Realtime threads are simply of a different group of
priorities. You can install kernel loadable modules ( aka
device drivers ) to provide a timebase that will make
them eligible. SFAIK, you can\'t guarantee them to run.
You may be able to get close if you remove unnecessary
services.

I don\'t think this does what you want.

In Linux if one thread is real time, all the threads in the
process have to be as well. Any compute-bound thread in a
realtime process will bring the UI to its knees.

I\'d be perfectly happy with being able to _reduce_ thread
priority in a user process, but noooooo. They all have to
have the same priority, despite what the pthreads docs say.
So in Linux there is no way to express the idea that some
threads in a process are more important than others. That
destroys the otherwise-excellent scaling of my simulation
code.

Wouldn\'t the system call setpriority() work, after a thread
has bee started?

As I said way back in the summer, you can\'t change the priority
or scheduling of an individual user thread, and you can\'t mix
user and real-time threads in one process.

Thus you can have the default scheduler, which doesn\'t scale at
all well for this sort of job, or you can run the whole process
as real-time and make the box\'s user interface completely
unresponsive any time the real-time job is running.

In Windows and OS/2, you can set thread priorities any way you
like. Linux is notionally a multiuser system, so it\'s no
surprise there are limits to how much non-root users can
increase priority. I\'d be very happy being able to _reduce_
the priority of my compute threads so that the comms threads
would preempt them, but nooooooo.

I don\'t think that this is true for Red Hat Linux Enterprise
Edition. I don\'t recall if the realtime stuff is added, or built
in. People also use MRG with this.

More generally, not all Linux distributions care about such
things.

Joe Gwinn

It\'s in the kernel, so they mostly don\'t get to care or not care.
I\'ve recently tried again with CentOS, which is RHEL EE without
the support--no joy.

Let me add that while it does take root permission to turn realtime
on, it does not follow that the application must also run as root.

Running a big application as root is a real bad idea, for both
robustness and security reasons.

They\'re my boxes, and when doing cluster sims they usually run a special
OS for the purpose: Rocks 7. It has one head-node and N compute nodes
that get network-booted off the head, so I get a clean install every
time they boot up. So no worries there.

I hadn\'t heard of Rocks 7. I\'ll look into it.


There\'s no point in running as root, though, because you can\'t change
the priority of user threads even then--only realtime ones.

So, why not use realtime ones? This is done all the time. The
application code works the same.

*sigh* One more time, with feelin\'.

Because putting compute-bound processes in the RT class detonates the UI
and eventually brings the other system services such as disk and network
to their knees.

The Linux thread scheduler is a really-o truly-o Charlie Foxtrot.

The legacy UNIX \"nice\" scheduler, retained for backward compatibility,
it unsuited for realtime for sure. And for what you are doing it
would seem.

You can call it names all you want, but that doesn\'t change anything.
There\'s only the one available, and it stinks on ice.

What I\'ve seen done is that during startup, the application startup
process or thread is given permission to run sudo, which it uses to
set scheduling policies (FIFO) and numerical priority (real urgent)
for the main processes and threads before normal operation
commences.

The other thing that is normally set up during startup is the
establishment of shared memory windows between processes. For may
applications, passing data blocks around by pointer passing is the
only practical approach.

Passing pointers only works on unis or shared-memory SMP boxes. My
pre-cluster versions did it that way, but by about 2006 I needed 20+
cores to do the job in a reasonable time.

The present-day large-scale solution is to use shared memory via
Infiniband. Memory is transferred to and from IB to local memory

If it has to cross a network it isn\'t shared memory. Shared memory is a
uniprocessor or SMP thing.

using DMA (Direct Memory Access) hardware.

This is done with a fleet of identical enterprise-class PCs, often
from HP or Dell or the like.

Riiiggghhhht. Lots of processors talk to their local memory over
Infiniband because of its low latency. Not.

I was building nanoantennas with travelling wave plasmonic waveguides,
coupled to SOI optical waveguides. (They eventually worked really well,
but it was a long slog, mostly by myself.)

The plasmonic section was intended to avoid the ~30 THz RC bandwidth of
my tunnel junction detectors--the light wave went transverse to the
photocurrent, so the light didn\'t have to drive the capacitance all at
once. That saved me over 30 dB of rolloff right there.

Wow! That\'s worth a bunch of trouble for sure.


The issue was that metals that exhibit plasmons (copper, silver, and
gold) exhibit free-electron behaviour in the infrared, i.e. their
epsilons are very nearly pure, negative real numbers. That makes the
real part of their refractive indices very very small, so you have to
take very small time steps or the simulation becomes unstable due to
superluminal propagation.

Are there any metals that don\'t exhibit plasmons?

Sure, a good 100 of them. In fact all except copper, silver, and gold
AFAIK. The undergraduate \"normal conductor\" model of metals cannot
exhibit plasmons because it predicts that the real and imaginary parts
of the refractive index have equal magnitude.

Their imaginary epsilons are very large, so you need very small voxels
to represent the fields well. Small voxels and short time steps make
for loooong run times, especially when it\'s wrapped in an optimization loop.

If I recall/understand, the imaginary epsilons are the loss factors,
so this is quite lossy.

If the imaginary parts of epsilon is less than or comparable with the
real part, that\'s true. However, at 1.5 um, silver has a refractive
index of 0.36+j10, corresponding to an epsilon of -100 + j3.5, which is
in an entirely different regime from normal conduction. A material with
a negative real epsilon at DC would turn to lava very rapidly.

Free-electron metals can propagate surface plasmons over many microns\'
distance.

This sounds like a problem that would benefit from parallelism, and
shared memory. And lots of physical memory.

Right. And many many threads with sensible priority scheduling that
doesn\'t assume that the Kernel Gods know better than you how your
program should run on your own boxes--as in OS/2 or Windows, but not Linux.

Basically divide the propagation media into independent squares or
boxes that overlap somewhat, with inter square/box traffic needed only
for the overlap regions. So, there will be an optimum square/box area
or volume.

I have no idea what that means. If you\'re interested in how the program
actually works, the manual is at
<https://electrooptical.net/static/media/uploads/Projects/Poems/poemsmanual.pdf>

It\'s written mostly for our own benefit, so it isn\'t too polished, but
you can easily get the gist.

Cheers

Phil Hobbs


--
Dr Philip C D Hobbs
Principal Consultant
ElectroOptical Innovations LLC / Hobbs ElectroOptics
Optics, Electro-optics, Photonics, Analog Electronics
Briarcliff Manor NY 10510

http://electrooptical.net
http://hobbs-eo.com
 
On 12/6/20 5:58 PM, Joe Gwinn wrote:
On Tue, 1 Dec 2020 13:46:29 -0500, Phil Hobbs
pcdhSpamMeSenseless@electrooptical.net> wrote:

On 12/1/20 12:48 PM, Joe Gwinn wrote:
On Mon, 30 Nov 2020 20:59:50 -0500, Phil Hobbs
pcdhSpamMeSenseless@electrooptical.net> wrote:

On 11/30/20 8:26 PM, Joe Gwinn wrote:
On Mon, 30 Nov 2020 17:26:03 -0500, Phil Hobbs
pcdhSpamMeSenseless@electrooptical.net> wrote:

On 11/30/20 12:07 PM, albert wrote:
In article
c8c0cf25-aa21-8b1d-b6f6-518624c35183@electrooptical.net>,
Phil Hobbs <pcdhSpamMeSenseless@electrooptical.net> wrote:
On 2020-08-02 20:56, Les Cargill wrote:
Phil Hobbs wrote:
On 2020-08-02 08:46, Martin Brown wrote:
On 23/07/2020 19:10, Phil Hobbs wrote:
On 2020-07-23 12:43, Tom Gardner wrote:
On 23/07/20 16:30, pcdhobbs@gmail.com wrote:
Isn\'t our ancient and settled idea of what a
computer is, and what an OS and languages
are, overdue for the next revolution?

In his other famous essay, \"No Silver Bullet\",
Brooks points out that the factors-of-10
productivity improvements of the early days
were gained by getting rid of extrinsic
complexity--crude tools, limited hardware, and
so forth.

Now the issues are mostly intrinsic to an
artifact built of thought. So apart from more
and more Python libraries, I doubt that there
are a lot more orders of magnitude available.

It is ironic that a lot of the potentially avoidable
human errors are typically fence post errors. Binary
fence post errors being about the most severe since
you end up with the opposite of what you intended.

Not in a single processor (except perhaps the
Mill).

But with multiple processors there can be
significant improvement - provided we are
prepared to think in different ways, and the
tools support it.

Examples: mapreduce, or xC on xCORE processors.

The average practitioner today really struggles on
massively parallel hardware. If you have ever done
any serious programming on such kit you quickly
realised that the process which ensures all the
other processes are kept busy doing useful things is
by far the most important.

I wrote a clusterized optimizing EM simulator that I
still use--I have a simulation gig just starting up
now, in fact. I learned a lot of ugly things about the
Linux thread scheduler in the process, such as that the
pthreads documents are full of lies about scheduling
and that you can\'t have a real-time thread in a user
mode program and vice versa. This is an entirely
arbitrary thing--there\'s no such restriction in Windows
or OS/2. Dunno about BSD--I should try that out.


In Linux, realtime threads are in \"the realtime context\".
It\'s a bit of a cadge. I\'ve never really seen a good
explanation of that that means.

Does anybody here know if you can mix RT and user
threads in a single process in BSD?

Sorry; never used BSD.

Realtime threads are simply of a different group of
priorities. You can install kernel loadable modules ( aka
device drivers ) to provide a timebase that will make
them eligible. SFAIK, you can\'t guarantee them to run.
You may be able to get close if you remove unnecessary
services.

I don\'t think this does what you want.

In Linux if one thread is real time, all the threads in the
process have to be as well. Any compute-bound thread in a
realtime process will bring the UI to its knees.

I\'d be perfectly happy with being able to _reduce_ thread
priority in a user process, but noooooo. They all have to
have the same priority, despite what the pthreads docs say.
So in Linux there is no way to express the idea that some
threads in a process are more important than others. That
destroys the otherwise-excellent scaling of my simulation
code.

Wouldn\'t the system call setpriority() work, after a thread
has bee started?

As I said way back in the summer, you can\'t change the priority
or scheduling of an individual user thread, and you can\'t mix
user and real-time threads in one process.

Thus you can have the default scheduler, which doesn\'t scale at
all well for this sort of job, or you can run the whole process
as real-time and make the box\'s user interface completely
unresponsive any time the real-time job is running.

In Windows and OS/2, you can set thread priorities any way you
like. Linux is notionally a multiuser system, so it\'s no
surprise there are limits to how much non-root users can
increase priority. I\'d be very happy being able to _reduce_
the priority of my compute threads so that the comms threads
would preempt them, but nooooooo.

I don\'t think that this is true for Red Hat Linux Enterprise
Edition. I don\'t recall if the realtime stuff is added, or built
in. People also use MRG with this.

More generally, not all Linux distributions care about such
things.

Joe Gwinn

It\'s in the kernel, so they mostly don\'t get to care or not care.
I\'ve recently tried again with CentOS, which is RHEL EE without
the support--no joy.

Let me add that while it does take root permission to turn realtime
on, it does not follow that the application must also run as root.

Running a big application as root is a real bad idea, for both
robustness and security reasons.

They\'re my boxes, and when doing cluster sims they usually run a special
OS for the purpose: Rocks 7. It has one head-node and N compute nodes
that get network-booted off the head, so I get a clean install every
time they boot up. So no worries there.

I hadn\'t heard of Rocks 7. I\'ll look into it.


There\'s no point in running as root, though, because you can\'t change
the priority of user threads even then--only realtime ones.

So, why not use realtime ones? This is done all the time. The
application code works the same.

*sigh* One more time, with feelin\'.

Because putting compute-bound processes in the RT class detonates the UI
and eventually brings the other system services such as disk and network
to their knees.

The Linux thread scheduler is a really-o truly-o Charlie Foxtrot.

The legacy UNIX \"nice\" scheduler, retained for backward compatibility,
it unsuited for realtime for sure. And for what you are doing it
would seem.

You can call it names all you want, but that doesn\'t change anything.
There\'s only the one available, and it stinks on ice.

What I\'ve seen done is that during startup, the application startup
process or thread is given permission to run sudo, which it uses to
set scheduling policies (FIFO) and numerical priority (real urgent)
for the main processes and threads before normal operation
commences.

The other thing that is normally set up during startup is the
establishment of shared memory windows between processes. For may
applications, passing data blocks around by pointer passing is the
only practical approach.

Passing pointers only works on unis or shared-memory SMP boxes. My
pre-cluster versions did it that way, but by about 2006 I needed 20+
cores to do the job in a reasonable time.

The present-day large-scale solution is to use shared memory via
Infiniband. Memory is transferred to and from IB to local memory

If it has to cross a network it isn\'t shared memory. Shared memory is a
uniprocessor or SMP thing.

using DMA (Direct Memory Access) hardware.

This is done with a fleet of identical enterprise-class PCs, often
from HP or Dell or the like.

Riiiggghhhht. Lots of processors talk to their local memory over
Infiniband because of its low latency. Not.

I was building nanoantennas with travelling wave plasmonic waveguides,
coupled to SOI optical waveguides. (They eventually worked really well,
but it was a long slog, mostly by myself.)

The plasmonic section was intended to avoid the ~30 THz RC bandwidth of
my tunnel junction detectors--the light wave went transverse to the
photocurrent, so the light didn\'t have to drive the capacitance all at
once. That saved me over 30 dB of rolloff right there.

Wow! That\'s worth a bunch of trouble for sure.


The issue was that metals that exhibit plasmons (copper, silver, and
gold) exhibit free-electron behaviour in the infrared, i.e. their
epsilons are very nearly pure, negative real numbers. That makes the
real part of their refractive indices very very small, so you have to
take very small time steps or the simulation becomes unstable due to
superluminal propagation.

Are there any metals that don\'t exhibit plasmons?

Sure, a good 100 of them. In fact all except copper, silver, and gold
AFAIK. The undergraduate \"normal conductor\" model of metals cannot
exhibit plasmons because it predicts that the real and imaginary parts
of the refractive index have equal magnitude.

Their imaginary epsilons are very large, so you need very small voxels
to represent the fields well. Small voxels and short time steps make
for loooong run times, especially when it\'s wrapped in an optimization loop.

If I recall/understand, the imaginary epsilons are the loss factors,
so this is quite lossy.

If the imaginary parts of epsilon is less than or comparable with the
real part, that\'s true. However, at 1.5 um, silver has a refractive
index of 0.36+j10, corresponding to an epsilon of -100 + j3.5, which is
in an entirely different regime from normal conduction. A material with
a negative real epsilon at DC would turn to lava very rapidly.

Free-electron metals can propagate surface plasmons over many microns\'
distance.

This sounds like a problem that would benefit from parallelism, and
shared memory. And lots of physical memory.

Right. And many many threads with sensible priority scheduling that
doesn\'t assume that the Kernel Gods know better than you how your
program should run on your own boxes--as in OS/2 or Windows, but not Linux.

Basically divide the propagation media into independent squares or
boxes that overlap somewhat, with inter square/box traffic needed only
for the overlap regions. So, there will be an optimum square/box area
or volume.

I have no idea what that means. If you\'re interested in how the program
actually works, the manual is at
<https://electrooptical.net/static/media/uploads/Projects/Poems/poemsmanual.pdf>

It\'s written mostly for our own benefit, so it isn\'t too polished, but
you can easily get the gist.

Cheers

Phil Hobbs


--
Dr Philip C D Hobbs
Principal Consultant
ElectroOptical Innovations LLC / Hobbs ElectroOptics
Optics, Electro-optics, Photonics, Analog Electronics
Briarcliff Manor NY 10510

http://electrooptical.net
http://hobbs-eo.com
 
On Sun, 6 Dec 2020 18:26:51 -0500, Phil Hobbs
<pcdhSpamMeSenseless@electrooptical.net> wrote:

On 12/6/20 5:58 PM, Joe Gwinn wrote:
On Tue, 1 Dec 2020 13:46:29 -0500, Phil Hobbs
pcdhSpamMeSenseless@electrooptical.net> wrote:

On 12/1/20 12:48 PM, Joe Gwinn wrote:
On Mon, 30 Nov 2020 20:59:50 -0500, Phil Hobbs
pcdhSpamMeSenseless@electrooptical.net> wrote:

On 11/30/20 8:26 PM, Joe Gwinn wrote:
On Mon, 30 Nov 2020 17:26:03 -0500, Phil Hobbs
pcdhSpamMeSenseless@electrooptical.net> wrote:

On 11/30/20 12:07 PM, albert wrote:
In article
c8c0cf25-aa21-8b1d-b6f6-518624c35183@electrooptical.net>,
Phil Hobbs <pcdhSpamMeSenseless@electrooptical.net> wrote:
On 2020-08-02 20:56, Les Cargill wrote:
Phil Hobbs wrote:
On 2020-08-02 08:46, Martin Brown wrote:
On 23/07/2020 19:10, Phil Hobbs wrote:
On 2020-07-23 12:43, Tom Gardner wrote:
On 23/07/20 16:30, pcdhobbs@gmail.com wrote:
Isn\'t our ancient and settled idea of what a
computer is, and what an OS and languages
are, overdue for the next revolution?

In his other famous essay, \"No Silver Bullet\",
Brooks points out that the factors-of-10
productivity improvements of the early days
were gained by getting rid of extrinsic
complexity--crude tools, limited hardware, and
so forth.

Now the issues are mostly intrinsic to an
artifact built of thought. So apart from more
and more Python libraries, I doubt that there
are a lot more orders of magnitude available.

It is ironic that a lot of the potentially avoidable
human errors are typically fence post errors. Binary
fence post errors being about the most severe since
you end up with the opposite of what you intended.

Not in a single processor (except perhaps the
Mill).

But with multiple processors there can be
significant improvement - provided we are
prepared to think in different ways, and the
tools support it.

Examples: mapreduce, or xC on xCORE processors.

The average practitioner today really struggles on
massively parallel hardware. If you have ever done
any serious programming on such kit you quickly
realised that the process which ensures all the
other processes are kept busy doing useful things is
by far the most important.

I wrote a clusterized optimizing EM simulator that I
still use--I have a simulation gig just starting up
now, in fact. I learned a lot of ugly things about the
Linux thread scheduler in the process, such as that the
pthreads documents are full of lies about scheduling
and that you can\'t have a real-time thread in a user
mode program and vice versa. This is an entirely
arbitrary thing--there\'s no such restriction in Windows
or OS/2. Dunno about BSD--I should try that out.


In Linux, realtime threads are in \"the realtime context\".
It\'s a bit of a cadge. I\'ve never really seen a good
explanation of that that means.

Does anybody here know if you can mix RT and user
threads in a single process in BSD?

Sorry; never used BSD.

Realtime threads are simply of a different group of
priorities. You can install kernel loadable modules ( aka
device drivers ) to provide a timebase that will make
them eligible. SFAIK, you can\'t guarantee them to run.
You may be able to get close if you remove unnecessary
services.

I don\'t think this does what you want.

In Linux if one thread is real time, all the threads in the
process have to be as well. Any compute-bound thread in a
realtime process will bring the UI to its knees.

I\'d be perfectly happy with being able to _reduce_ thread
priority in a user process, but noooooo. They all have to
have the same priority, despite what the pthreads docs say.
So in Linux there is no way to express the idea that some
threads in a process are more important than others. That
destroys the otherwise-excellent scaling of my simulation
code.

Wouldn\'t the system call setpriority() work, after a thread
has bee started?

As I said way back in the summer, you can\'t change the priority
or scheduling of an individual user thread, and you can\'t mix
user and real-time threads in one process.

Thus you can have the default scheduler, which doesn\'t scale at
all well for this sort of job, or you can run the whole process
as real-time and make the box\'s user interface completely
unresponsive any time the real-time job is running.

In Windows and OS/2, you can set thread priorities any way you
like. Linux is notionally a multiuser system, so it\'s no
surprise there are limits to how much non-root users can
increase priority. I\'d be very happy being able to _reduce_
the priority of my compute threads so that the comms threads
would preempt them, but nooooooo.

I don\'t think that this is true for Red Hat Linux Enterprise
Edition. I don\'t recall if the realtime stuff is added, or built
in. People also use MRG with this.

More generally, not all Linux distributions care about such
things.

Joe Gwinn

It\'s in the kernel, so they mostly don\'t get to care or not care.
I\'ve recently tried again with CentOS, which is RHEL EE without
the support--no joy.

Let me add that while it does take root permission to turn realtime
on, it does not follow that the application must also run as root.

Running a big application as root is a real bad idea, for both
robustness and security reasons.

They\'re my boxes, and when doing cluster sims they usually run a special
OS for the purpose: Rocks 7. It has one head-node and N compute nodes
that get network-booted off the head, so I get a clean install every
time they boot up. So no worries there.

I hadn\'t heard of Rocks 7. I\'ll look into it.


There\'s no point in running as root, though, because you can\'t change
the priority of user threads even then--only realtime ones.

So, why not use realtime ones? This is done all the time. The
application code works the same.

*sigh* One more time, with feelin\'.

Because putting compute-bound processes in the RT class detonates the UI
and eventually brings the other system services such as disk and network
to their knees.

That is exactly how many realtime systems work - it is assumed that
those realtime processes and threads will be designed and used to not
eat the computer without the Chief Engineer\'s permission, a matter
settled at design review time.

But \"realtime\" is not a simple binary property - it has degrees and
details.

In Linux, the squeeze-out behaviour is controlled by the scheduling
policy, and not directly by the numerical priority (which encoders
relative urgency).

..<https://man7.org/linux/man-pages/man2/sched_setscheduler.2.html>

It sounds like you want to use SCHED_RR, and not SCHED_FIFO.

FIFO is typical in big radars, and RR in small intense systems. In
the old days, it was hard frames driven by hardware time interrupts.

SCHED_RR ensures fairness between RT threads, and so no squeeze-out
within that family. But one must still meter how much of that is
used, of you will squeeze the GUI et al out for sure. But see
\"Limiting the CPU usage of real-time and deadline processes\" in the
following.

..<https://man7.org/linux/man-pages/man7/sched.7.html>



The Linux thread scheduler is a really-o truly-o Charlie Foxtrot.

The legacy UNIX \"nice\" scheduler, retained for backward compatibility,
it unsuited for realtime for sure. And for what you are doing it
would seem.

You can call it names all you want, but that doesn\'t change anything.
There\'s only the one available, and it stinks on ice.

Hmm. Is there any operating system that\'s really suitable?

More generally, I went through the development of the POSIX standard,
and the issues of scheduling policies and numerical priority were well
chewed, attempting to accomodate just about everything. And the big
radars I work on all use RHEL these days, having ejected all
proprietary UNIX flavors for business reasons.


What I\'ve seen done is that during startup, the application startup
process or thread is given permission to run sudo, which it uses to
set scheduling policies (FIFO) and numerical priority (real urgent)
for the main processes and threads before normal operation
commences.

The other thing that is normally set up during startup is the
establishment of shared memory windows between processes. For may
applications, passing data blocks around by pointer passing is the
only practical approach.

Passing pointers only works on unis or shared-memory SMP boxes. My
pre-cluster versions did it that way, but by about 2006 I needed 20+
cores to do the job in a reasonable time.

The present-day large-scale solution is to use shared memory via
Infiniband. Memory is transferred to and from IB to local memory

If it has to cross a network it isn\'t shared memory. Shared memory is a
uniprocessor or SMP thing.

True, but we do it anyway, if the hardware is fast enough.


using DMA (Direct Memory Access) hardware.

This is done with a fleet of identical enterprise-class PCs, often
from HP or Dell or the like.

Riiiggghhhht. Lots of processors talk to their local memory over
Infiniband because of its low latency. Not.

Well, sort-of. IB is slower than within-box transfer, which is why
one uses IB only for lateral communications between solvers running in
parallel on the boxes, which have immense amounts of local memory.

These systems are limited by memory system bandwidth, and not so much
CPU speed. Usually, the same patch of memory will not support more
than one DMA operating at the same time.


I was building nanoantennas with travelling wave plasmonic waveguides,
coupled to SOI optical waveguides. (They eventually worked really well,
but it was a long slog, mostly by myself.)

The plasmonic section was intended to avoid the ~30 THz RC bandwidth of
my tunnel junction detectors--the light wave went transverse to the
photocurrent, so the light didn\'t have to drive the capacitance all at
once. That saved me over 30 dB of rolloff right there.

Wow! That\'s worth a bunch of trouble for sure.


The issue was that metals that exhibit plasmons (copper, silver, and
gold) exhibit free-electron behaviour in the infrared, i.e. their
epsilons are very nearly pure, negative real numbers. That makes the
real part of their refractive indices very very small, so you have to
take very small time steps or the simulation becomes unstable due to
superluminal propagation.

Are there any metals that don\'t exhibit plasmons?

Sure, a good 100 of them. In fact all except copper, silver, and gold
AFAIK. The undergraduate \"normal conductor\" model of metals cannot
exhibit plasmons because it predicts that the real and imaginary parts
of the refractive index have equal magnitude.

All the low-loss metals have those plasmons. Aww.


Their imaginary epsilons are very large, so you need very small voxels
to represent the fields well. Small voxels and short time steps make
for loooong run times, especially when it\'s wrapped in an optimization loop.

If I recall/understand, the imaginary epsilons are the loss factors,
so this is quite lossy.

If the imaginary parts of epsilon is less than or comparable with the
real part, that\'s true. However, at 1.5 um, silver has a refractive
index of 0.36+j10, corresponding to an epsilon of -100 + j3.5, which is
in an entirely different regime from normal conduction. A material with
a negative real epsilon at DC would turn to lava very rapidly.

Free-electron metals can propagate surface plasmons over many microns\'
distance.

OK. I\'m out of my expertise here.


This sounds like a problem that would benefit from parallelism, and
shared memory. And lots of physical memory.

Right. And many many threads with sensible priority scheduling that
doesn\'t assume that the Kernel Gods know better than you how your
program should run on your own boxes--as in OS/2 or Windows, but not Linux.

Hmm. What OS/2 and/or Windows scheduling policy are you happy with?


Basically divide the propagation media into independent squares or
boxes that overlap somewhat, with inter square/box traffic needed only
for the overlap regions. So, there will be an optimum square/box area
or volume.

I have no idea what that means. If you\'re interested in how the program
actually works, the manual is at
https://electrooptical.net/static/media/uploads/Projects/Poems/poemsmanual.pdf

It\'s written mostly for our own benefit, so it isn\'t too polished, but
you can easily get the gist.

I scanned the manual. I think that POEMS\' Domains and Subdomains are
my squares (2D) and boxes (3D). I knew you had to have something of
the sort, or we would not be having this discussion.

Joe Gwinn
 
On 12/9/20 6:54 PM, Joe Gwinn wrote:
On Sun, 6 Dec 2020 18:26:51 -0500, Phil Hobbs
pcdhSpamMeSenseless@electrooptical.net> wrote:

On 12/6/20 5:58 PM, Joe Gwinn wrote:
On Tue, 1 Dec 2020 13:46:29 -0500, Phil Hobbs
pcdhSpamMeSenseless@electrooptical.net> wrote:

On 12/1/20 12:48 PM, Joe Gwinn wrote:
On Mon, 30 Nov 2020 20:59:50 -0500, Phil Hobbs
pcdhSpamMeSenseless@electrooptical.net> wrote:

On 11/30/20 8:26 PM, Joe Gwinn wrote:
On Mon, 30 Nov 2020 17:26:03 -0500, Phil Hobbs
pcdhSpamMeSenseless@electrooptical.net> wrote:

On 11/30/20 12:07 PM, albert wrote:
In article
c8c0cf25-aa21-8b1d-b6f6-518624c35183@electrooptical.net>,
Phil Hobbs <pcdhSpamMeSenseless@electrooptical.net> wrote:
On 2020-08-02 20:56, Les Cargill wrote:
Phil Hobbs wrote:
On 2020-08-02 08:46, Martin Brown wrote:
On 23/07/2020 19:10, Phil Hobbs wrote:
On 2020-07-23 12:43, Tom Gardner wrote:
On 23/07/20 16:30, pcdhobbs@gmail.com wrote:
Isn\'t our ancient and settled idea of what a
computer is, and what an OS and languages
are, overdue for the next revolution?

In his other famous essay, \"No Silver Bullet\",
Brooks points out that the factors-of-10
productivity improvements of the early days
were gained by getting rid of extrinsic
complexity--crude tools, limited hardware, and
so forth.

Now the issues are mostly intrinsic to an
artifact built of thought. So apart from more
and more Python libraries, I doubt that there
are a lot more orders of magnitude available.

It is ironic that a lot of the potentially avoidable
human errors are typically fence post errors. Binary
fence post errors being about the most severe since
you end up with the opposite of what you intended.

Not in a single processor (except perhaps the
Mill).

But with multiple processors there can be
significant improvement - provided we are
prepared to think in different ways, and the
tools support it.

Examples: mapreduce, or xC on xCORE processors.

The average practitioner today really struggles on
massively parallel hardware. If you have ever done
any serious programming on such kit you quickly
realised that the process which ensures all the
other processes are kept busy doing useful things is
by far the most important.

I wrote a clusterized optimizing EM simulator that I
still use--I have a simulation gig just starting up
now, in fact. I learned a lot of ugly things about the
Linux thread scheduler in the process, such as that the
pthreads documents are full of lies about scheduling
and that you can\'t have a real-time thread in a user
mode program and vice versa. This is an entirely
arbitrary thing--there\'s no such restriction in Windows
or OS/2. Dunno about BSD--I should try that out.


In Linux, realtime threads are in \"the realtime context\".
It\'s a bit of a cadge. I\'ve never really seen a good
explanation of that that means.

Does anybody here know if you can mix RT and user
threads in a single process in BSD?

Sorry; never used BSD.

Realtime threads are simply of a different group of
priorities. You can install kernel loadable modules ( aka
device drivers ) to provide a timebase that will make
them eligible. SFAIK, you can\'t guarantee them to run.
You may be able to get close if you remove unnecessary
services.

I don\'t think this does what you want.

In Linux if one thread is real time, all the threads in the
process have to be as well. Any compute-bound thread in a
realtime process will bring the UI to its knees.

I\'d be perfectly happy with being able to _reduce_ thread
priority in a user process, but noooooo. They all have to
have the same priority, despite what the pthreads docs say.
So in Linux there is no way to express the idea that some
threads in a process are more important than others. That
destroys the otherwise-excellent scaling of my simulation
code.

Wouldn\'t the system call setpriority() work, after a thread
has bee started?

As I said way back in the summer, you can\'t change the priority
or scheduling of an individual user thread, and you can\'t mix
user and real-time threads in one process.

Thus you can have the default scheduler, which doesn\'t scale at
all well for this sort of job, or you can run the whole process
as real-time and make the box\'s user interface completely
unresponsive any time the real-time job is running.

In Windows and OS/2, you can set thread priorities any way you
like. Linux is notionally a multiuser system, so it\'s no
surprise there are limits to how much non-root users can
increase priority. I\'d be very happy being able to _reduce_
the priority of my compute threads so that the comms threads
would preempt them, but nooooooo.

I don\'t think that this is true for Red Hat Linux Enterprise
Edition. I don\'t recall if the realtime stuff is added, or built
in. People also use MRG with this.

More generally, not all Linux distributions care about such
things.

Joe Gwinn

It\'s in the kernel, so they mostly don\'t get to care or not care.
I\'ve recently tried again with CentOS, which is RHEL EE without
the support--no joy.

Let me add that while it does take root permission to turn realtime
on, it does not follow that the application must also run as root.

Running a big application as root is a real bad idea, for both
robustness and security reasons.

They\'re my boxes, and when doing cluster sims they usually run a special
OS for the purpose: Rocks 7. It has one head-node and N compute nodes
that get network-booted off the head, so I get a clean install every
time they boot up. So no worries there.

I hadn\'t heard of Rocks 7. I\'ll look into it.


There\'s no point in running as root, though, because you can\'t change
the priority of user threads even then--only realtime ones.

So, why not use realtime ones? This is done all the time. The
application code works the same.

*sigh* One more time, with feelin\'.

Because putting compute-bound processes in the RT class detonates the UI
and eventually brings the other system services such as disk and network
to their knees.

That is exactly how many realtime systems work - it is assumed that
those realtime processes and threads will be designed and used to not
eat the computer without the Chief Engineer\'s permission, a matter
settled at design review time.

But \"realtime\" is not a simple binary property - it has degrees and
details.

In Linux, the squeeze-out behaviour is controlled by the scheduling
policy, and not directly by the numerical priority (which encoders
relative urgency).

.<https://man7.org/linux/man-pages/man2/sched_setscheduler.2.html

It sounds like you want to use SCHED_RR, and not SCHED_FIFO.

FIFO is typical in big radars, and RR in small intense systems. In
the old days, it was hard frames driven by hardware time interrupts.

SCHED_RR ensures fairness between RT threads, and so no squeeze-out
within that family. But one must still meter how much of that is
used, of you will squeeze the GUI et al out for sure. But see
\"Limiting the CPU usage of real-time and deadline processes\" in the
following.

.<https://man7.org/linux/man-pages/man7/sched.7.html



The Linux thread scheduler is a really-o truly-o Charlie Foxtrot.

The legacy UNIX \"nice\" scheduler, retained for backward compatibility,
it unsuited for realtime for sure. And for what you are doing it
would seem.

You can call it names all you want, but that doesn\'t change anything.
There\'s only the one available, and it stinks on ice.

Hmm. Is there any operating system that\'s really suitable?

More generally, I went through the development of the POSIX standard,
and the issues of scheduling policies and numerical priority were well
chewed, attempting to accomodate just about everything. And the big
radars I work on all use RHEL these days, having ejected all
proprietary UNIX flavors for business reasons.


What I\'ve seen done is that during startup, the application startup
process or thread is given permission to run sudo, which it uses to
set scheduling policies (FIFO) and numerical priority (real urgent)
for the main processes and threads before normal operation
commences.

The other thing that is normally set up during startup is the
establishment of shared memory windows between processes. For may
applications, passing data blocks around by pointer passing is the
only practical approach.

Passing pointers only works on unis or shared-memory SMP boxes. My
pre-cluster versions did it that way, but by about 2006 I needed 20+
cores to do the job in a reasonable time.

The present-day large-scale solution is to use shared memory via
Infiniband. Memory is transferred to and from IB to local memory

If it has to cross a network it isn\'t shared memory. Shared memory is a
uniprocessor or SMP thing.

True, but we do it anyway, if the hardware is fast enough.


using DMA (Direct Memory Access) hardware.

This is done with a fleet of identical enterprise-class PCs, often
from HP or Dell or the like.

Riiiggghhhht. Lots of processors talk to their local memory over
Infiniband because of its low latency. Not.

Well, sort-of. IB is slower than within-box transfer, which is why
one uses IB only for lateral communications between solvers running in
parallel on the boxes, which have immense amounts of local memory.

These systems are limited by memory system bandwidth, and not so much
CPU speed. Usually, the same patch of memory will not support more
than one DMA operating at the same time.


I was building nanoantennas with travelling wave plasmonic waveguides,
coupled to SOI optical waveguides. (They eventually worked really well,
but it was a long slog, mostly by myself.)

The plasmonic section was intended to avoid the ~30 THz RC bandwidth of
my tunnel junction detectors--the light wave went transverse to the
photocurrent, so the light didn\'t have to drive the capacitance all at
once. That saved me over 30 dB of rolloff right there.

Wow! That\'s worth a bunch of trouble for sure.


The issue was that metals that exhibit plasmons (copper, silver, and
gold) exhibit free-electron behaviour in the infrared, i.e. their
epsilons are very nearly pure, negative real numbers. That makes the
real part of their refractive indices very very small, so you have to
take very small time steps or the simulation becomes unstable due to
superluminal propagation.

Are there any metals that don\'t exhibit plasmons?

Sure, a good 100 of them. In fact all except copper, silver, and gold
AFAIK. The undergraduate \"normal conductor\" model of metals cannot
exhibit plasmons because it predicts that the real and imaginary parts
of the refractive index have equal magnitude.

All the low-loss metals have those plasmons. Aww.


Their imaginary epsilons are very large, so you need very small voxels
to represent the fields well. Small voxels and short time steps make
for loooong run times, especially when it\'s wrapped in an optimization loop.

If I recall/understand, the imaginary epsilons are the loss factors,
so this is quite lossy.

If the imaginary parts of epsilon is less than or comparable with the
real part, that\'s true. However, at 1.5 um, silver has a refractive
index of 0.36+j10, corresponding to an epsilon of -100 + j3.5, which is
in an entirely different regime from normal conduction. A material with
a negative real epsilon at DC would turn to lava very rapidly.

Free-electron metals can propagate surface plasmons over many microns\'
distance.

OK. I\'m out of my expertise here.


This sounds like a problem that would benefit from parallelism, and
shared memory. And lots of physical memory.

Right. And many many threads with sensible priority scheduling that
doesn\'t assume that the Kernel Gods know better than you how your
program should run on your own boxes--as in OS/2 or Windows, but not Linux.

Hmm. What OS/2 and/or Windows scheduling policy are you happy with?


Basically divide the propagation media into independent squares or
boxes that overlap somewhat, with inter square/box traffic needed only
for the overlap regions. So, there will be an optimum square/box area
or volume.

I have no idea what that means. If you\'re interested in how the program
actually works, the manual is at
https://electrooptical.net/static/media/uploads/Projects/Poems/poemsmanual.pdf

It\'s written mostly for our own benefit, so it isn\'t too polished, but
you can easily get the gist.

I scanned the manual. I think that POEMS\' Domains and Subdomains are
my squares (2D) and boxes (3D). I knew you had to have something of
the sort, or we would not be having this discussion.

Joe Gwinn

You\'ve obviously never tried it. If you ever do, the issues will become
clear very rapidly.

Cheers

Phil Hobbs

--
Dr Philip C D Hobbs
Principal Consultant
ElectroOptical Innovations LLC / Hobbs ElectroOptics
Optics, Electro-optics, Photonics, Analog Electronics
Briarcliff Manor NY 10510

http://electrooptical.net
http://hobbs-eo.com
 
On Wed, 9 Dec 2020 19:58:54 -0500, Phil Hobbs
<pcdhSpamMeSenseless@electrooptical.net> wrote:

On 12/9/20 6:54 PM, Joe Gwinn wrote:
On Sun, 6 Dec 2020 18:26:51 -0500, Phil Hobbs
pcdhSpamMeSenseless@electrooptical.net> wrote:

On 12/6/20 5:58 PM, Joe Gwinn wrote:
On Tue, 1 Dec 2020 13:46:29 -0500, Phil Hobbs
pcdhSpamMeSenseless@electrooptical.net> wrote:

On 12/1/20 12:48 PM, Joe Gwinn wrote:
On Mon, 30 Nov 2020 20:59:50 -0500, Phil Hobbs
pcdhSpamMeSenseless@electrooptical.net> wrote:

On 11/30/20 8:26 PM, Joe Gwinn wrote:
On Mon, 30 Nov 2020 17:26:03 -0500, Phil Hobbs
pcdhSpamMeSenseless@electrooptical.net> wrote:

On 11/30/20 12:07 PM, albert wrote:
In article
c8c0cf25-aa21-8b1d-b6f6-518624c35183@electrooptical.net>,
Phil Hobbs <pcdhSpamMeSenseless@electrooptical.net> wrote:
On 2020-08-02 20:56, Les Cargill wrote:
Phil Hobbs wrote:
On 2020-08-02 08:46, Martin Brown wrote:
On 23/07/2020 19:10, Phil Hobbs wrote:
On 2020-07-23 12:43, Tom Gardner wrote:
On 23/07/20 16:30, pcdhobbs@gmail.com wrote:
Isn\'t our ancient and settled idea of what a
computer is, and what an OS and languages
are, overdue for the next revolution?

In his other famous essay, \"No Silver Bullet\",
Brooks points out that the factors-of-10
productivity improvements of the early days
were gained by getting rid of extrinsic
complexity--crude tools, limited hardware, and
so forth.

Now the issues are mostly intrinsic to an
artifact built of thought. So apart from more
and more Python libraries, I doubt that there
are a lot more orders of magnitude available.

It is ironic that a lot of the potentially avoidable
human errors are typically fence post errors. Binary
fence post errors being about the most severe since
you end up with the opposite of what you intended.

Not in a single processor (except perhaps the
Mill).

But with multiple processors there can be
significant improvement - provided we are
prepared to think in different ways, and the
tools support it.

Examples: mapreduce, or xC on xCORE processors.

The average practitioner today really struggles on
massively parallel hardware. If you have ever done
any serious programming on such kit you quickly
realised that the process which ensures all the
other processes are kept busy doing useful things is
by far the most important.

I wrote a clusterized optimizing EM simulator that I
still use--I have a simulation gig just starting up
now, in fact. I learned a lot of ugly things about the
Linux thread scheduler in the process, such as that the
pthreads documents are full of lies about scheduling
and that you can\'t have a real-time thread in a user
mode program and vice versa. This is an entirely
arbitrary thing--there\'s no such restriction in Windows
or OS/2. Dunno about BSD--I should try that out.


In Linux, realtime threads are in \"the realtime context\".
It\'s a bit of a cadge. I\'ve never really seen a good
explanation of that that means.

Does anybody here know if you can mix RT and user
threads in a single process in BSD?

Sorry; never used BSD.

Realtime threads are simply of a different group of
priorities. You can install kernel loadable modules ( aka
device drivers ) to provide a timebase that will make
them eligible. SFAIK, you can\'t guarantee them to run.
You may be able to get close if you remove unnecessary
services.

I don\'t think this does what you want.

In Linux if one thread is real time, all the threads in the
process have to be as well. Any compute-bound thread in a
realtime process will bring the UI to its knees.

I\'d be perfectly happy with being able to _reduce_ thread
priority in a user process, but noooooo. They all have to
have the same priority, despite what the pthreads docs say.
So in Linux there is no way to express the idea that some
threads in a process are more important than others. That
destroys the otherwise-excellent scaling of my simulation
code.

Wouldn\'t the system call setpriority() work, after a thread
has bee started?

As I said way back in the summer, you can\'t change the priority
or scheduling of an individual user thread, and you can\'t mix
user and real-time threads in one process.

Thus you can have the default scheduler, which doesn\'t scale at
all well for this sort of job, or you can run the whole process
as real-time and make the box\'s user interface completely
unresponsive any time the real-time job is running.

In Windows and OS/2, you can set thread priorities any way you
like. Linux is notionally a multiuser system, so it\'s no
surprise there are limits to how much non-root users can
increase priority. I\'d be very happy being able to _reduce_
the priority of my compute threads so that the comms threads
would preempt them, but nooooooo.

I don\'t think that this is true for Red Hat Linux Enterprise
Edition. I don\'t recall if the realtime stuff is added, or built
in. People also use MRG with this.

More generally, not all Linux distributions care about such
things.

Joe Gwinn

It\'s in the kernel, so they mostly don\'t get to care or not care.
I\'ve recently tried again with CentOS, which is RHEL EE without
the support--no joy.

Let me add that while it does take root permission to turn realtime
on, it does not follow that the application must also run as root.

Running a big application as root is a real bad idea, for both
robustness and security reasons.

They\'re my boxes, and when doing cluster sims they usually run a special
OS for the purpose: Rocks 7. It has one head-node and N compute nodes
that get network-booted off the head, so I get a clean install every
time they boot up. So no worries there.

I hadn\'t heard of Rocks 7. I\'ll look into it.


There\'s no point in running as root, though, because you can\'t change
the priority of user threads even then--only realtime ones.

So, why not use realtime ones? This is done all the time. The
application code works the same.

*sigh* One more time, with feelin\'.

Because putting compute-bound processes in the RT class detonates the UI
and eventually brings the other system services such as disk and network
to their knees.

That is exactly how many realtime systems work - it is assumed that
those realtime processes and threads will be designed and used to not
eat the computer without the Chief Engineer\'s permission, a matter
settled at design review time.

But \"realtime\" is not a simple binary property - it has degrees and
details.

In Linux, the squeeze-out behaviour is controlled by the scheduling
policy, and not directly by the numerical priority (which encoders
relative urgency).

.<https://man7.org/linux/man-pages/man2/sched_setscheduler.2.html

It sounds like you want to use SCHED_RR, and not SCHED_FIFO.

FIFO is typical in big radars, and RR in small intense systems. In
the old days, it was hard frames driven by hardware time interrupts.

SCHED_RR ensures fairness between RT threads, and so no squeeze-out
within that family. But one must still meter how much of that is
used, of you will squeeze the GUI et al out for sure. But see
\"Limiting the CPU usage of real-time and deadline processes\" in the
following.

.<https://man7.org/linux/man-pages/man7/sched.7.html



The Linux thread scheduler is a really-o truly-o Charlie Foxtrot.

The legacy UNIX \"nice\" scheduler, retained for backward compatibility,
it unsuited for realtime for sure. And for what you are doing it
would seem.

You can call it names all you want, but that doesn\'t change anything.
There\'s only the one available, and it stinks on ice.

Hmm. Is there any operating system that\'s really suitable?

More generally, I went through the development of the POSIX standard,
and the issues of scheduling policies and numerical priority were well
chewed, attempting to accomodate just about everything. And the big
radars I work on all use RHEL these days, having ejected all
proprietary UNIX flavors for business reasons.


What I\'ve seen done is that during startup, the application startup
process or thread is given permission to run sudo, which it uses to
set scheduling policies (FIFO) and numerical priority (real urgent)
for the main processes and threads before normal operation
commences.

The other thing that is normally set up during startup is the
establishment of shared memory windows between processes. For may
applications, passing data blocks around by pointer passing is the
only practical approach.

Passing pointers only works on unis or shared-memory SMP boxes. My
pre-cluster versions did it that way, but by about 2006 I needed 20+
cores to do the job in a reasonable time.

The present-day large-scale solution is to use shared memory via
Infiniband. Memory is transferred to and from IB to local memory

If it has to cross a network it isn\'t shared memory. Shared memory is a
uniprocessor or SMP thing.

True, but we do it anyway, if the hardware is fast enough.


using DMA (Direct Memory Access) hardware.

This is done with a fleet of identical enterprise-class PCs, often
from HP or Dell or the like.

Riiiggghhhht. Lots of processors talk to their local memory over
Infiniband because of its low latency. Not.

Well, sort-of. IB is slower than within-box transfer, which is why
one uses IB only for lateral communications between solvers running in
parallel on the boxes, which have immense amounts of local memory.

These systems are limited by memory system bandwidth, and not so much
CPU speed. Usually, the same patch of memory will not support more
than one DMA operating at the same time.


I was building nanoantennas with travelling wave plasmonic waveguides,
coupled to SOI optical waveguides. (They eventually worked really well,
but it was a long slog, mostly by myself.)

The plasmonic section was intended to avoid the ~30 THz RC bandwidth of
my tunnel junction detectors--the light wave went transverse to the
photocurrent, so the light didn\'t have to drive the capacitance all at
once. That saved me over 30 dB of rolloff right there.

Wow! That\'s worth a bunch of trouble for sure.


The issue was that metals that exhibit plasmons (copper, silver, and
gold) exhibit free-electron behaviour in the infrared, i.e. their
epsilons are very nearly pure, negative real numbers. That makes the
real part of their refractive indices very very small, so you have to
take very small time steps or the simulation becomes unstable due to
superluminal propagation.

Are there any metals that don\'t exhibit plasmons?

Sure, a good 100 of them. In fact all except copper, silver, and gold
AFAIK. The undergraduate \"normal conductor\" model of metals cannot
exhibit plasmons because it predicts that the real and imaginary parts
of the refractive index have equal magnitude.

All the low-loss metals have those plasmons. Aww.


Their imaginary epsilons are very large, so you need very small voxels
to represent the fields well. Small voxels and short time steps make
for loooong run times, especially when it\'s wrapped in an optimization loop.

If I recall/understand, the imaginary epsilons are the loss factors,
so this is quite lossy.

If the imaginary parts of epsilon is less than or comparable with the
real part, that\'s true. However, at 1.5 um, silver has a refractive
index of 0.36+j10, corresponding to an epsilon of -100 + j3.5, which is
in an entirely different regime from normal conduction. A material with
a negative real epsilon at DC would turn to lava very rapidly.

Free-electron metals can propagate surface plasmons over many microns\'
distance.

OK. I\'m out of my expertise here.


This sounds like a problem that would benefit from parallelism, and
shared memory. And lots of physical memory.

Right. And many many threads with sensible priority scheduling that
doesn\'t assume that the Kernel Gods know better than you how your
program should run on your own boxes--as in OS/2 or Windows, but not Linux.

Hmm. What OS/2 and/or Windows scheduling policy are you happy with?


Basically divide the propagation media into independent squares or
boxes that overlap somewhat, with inter square/box traffic needed only
for the overlap regions. So, there will be an optimum square/box area
or volume.

I have no idea what that means. If you\'re interested in how the program
actually works, the manual is at
https://electrooptical.net/static/media/uploads/Projects/Poems/poemsmanual.pdf

It\'s written mostly for our own benefit, so it isn\'t too polished, but
you can easily get the gist.

I scanned the manual. I think that POEMS\' Domains and Subdomains are
my squares (2D) and boxes (3D). I knew you had to have something of
the sort, or we would not be having this discussion.

Joe Gwinn


You\'ve obviously never tried it. If you ever do, the issues will become
clear very rapidly.

No, I haven\'t, and probably never will.

But I do know people that have, and may have an idea or two. And will
know if your use case truly cannot be served by the current standard,
showing a gap in the standard.

And I know something about schedulers in operating systems (and in big
radars for that matter), and observe that people do manage to do
realtime and non-realtime in the same box, quite often actually.

One way is the have the GUI et al running non-realtime, and the
time-critical software run under various realtime policies and
priorities, with the time-critical stuff limited so it cannot
completely squeeze the GUI et al out.

The full-strength version of that is to have the entire non-realtime
world running in a virtual machine under the RT system.

So, I\'m trying to figure out exactly what\'s going on in your
application. But I\'ll stop if you wish.

Joe Gwinn
 

Welcome to EDABoard.com

Sponsor

Back
Top