Clock Edge notation

Ray Blaak · Mar 1, 2007

"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:

But par is quite low-level. What would be the semantics of:

declare
Thing : X;
begin
par
Foo Thing);
and
Bar Thing);
and
Baz Thing);
end par;
end;

Well, that depends on the definitions of Foo, Bar, and Baz, of course

.

--
Cheers, The Rhythm is around me,
The Rhythm has control.
Ray Blaak The Rhythm is inside me,
rAYblaaK@STRIPCAPStelus.net The Rhythm has my soul.

Georg Bauhaus · Mar 1, 2007

On Thu, 2007-03-01 at 13:58 +0000, Colin Paul Gloster wrote:

In news:1172754472.5352.11.camel@localhost.localdomain timestamped
Thu, 01 Mar 2007 14:07:52 +0100, Georg Bauhaus <bauhaus@futureapps.de

In fact, Ada compilers do detect this:
Ada_Is_As_Unsafe_As_VHDL_And_GNAT_Will_Not_Even_Warn.ada:7: warning:
=E2=80=98Value=E2=80=99 is used uninitialized in this function

You'd have to study the compiler docs, though. The warning is from the
backend and needs some switches to be turned on.

[..]"

I admit that I was completely unaware of this. I tried all of these
switches of GNAT's ...

Actually these are backend switches (although being documented in
the front end gnat_ug, look for "unitialized"). You need optimization
on and -Wuninitialized. Same report for C input:

int C_Is_As_Unsafe_As_VHDL_And_GCC_Will_Not_Even_Warn()
{
int Value,Y;

if (0) {
Value=0;
}
Y=Value;
}

C_Is_As_Unsafe_As_VHDL_And_GCC_Will_Not_Even_Warn.c:8: warning: âValueâ is used uninitialized in this function

Dr. Adrian Wrigley · Mar 2, 2007

On Thu, 01 Mar 2007 14:57:01 +0100, Dmitry A. Kazakov wrote:

On Thu, 01 Mar 2007 11:22:32 GMT, Dr. Adrian Wrigley wrote:

If you don't have multiple processors, lightweight threading is
less attractive than if you do? Inmos/Occam/Transputer was founded
on the basis that lightweight threading was highly relevant to multiple
processors.

Ada has no means of saying "Do these bits concurrently, if you like,
because I don't care what the order of execution is". And a compiler
can't work it out from the source. If your CPU has loads of threads,
compiling code with "PAR" style language concurrency is rather useful
and easy.

But par is quite low-level. What would be the semantics of:

declare
Thing : X;
begin
par
Foo Thing);
and
Bar Thing);
and
Baz Thing);
end par;
end;

Do Foo, Bar and Baz in any order or concurrently, all accessing Thing.

Roughly equivalent to doing the same operations in three separate
tasks. Thing could be a protected object, if concurrent writes
are prohibited. Seems simple enough!

I'm looking for something like Cilk, but even the concurrent loop
(JPR's for I in all 1 .. n loop?) would be a help.
--
Adrian

Marcus Harnisch · Mar 2, 2007

Georg Bauhaus <bauhaus@futureapps.de> writes:

C_Is_As_Unsafe_As_VHDL_And_GCC_Will_Not_Even_Warn.c:8: warning:
ĄValue˘ is used uninitialized in this function

To be fair, of the examples posted, only in C the behavior is
actually undefined.

In VHDL and, I suppose Ada as well(?), the variable *does* have an
initial value. It's not entirely obvious but at least you will
*always* get consistent results.

Regards,
-- Marcus
note that "property" can also be used as syntaxtic sugar to reference
a property, breaking the clean design of verilog; [...]

-- Michael McNamara
(http://www.veripool.com/verilog-mode_news.html)

Martin Thompson · Mar 2, 2007

Colin Paul Gloster <Colin_Paul_Gloster@ACM.org> writes:

"> Stephen A. Leake wrote in news:news:u649rx29a.fsf@stephe-leake.org on
news:comp.lang.ada for the benefit of Mike Silva:

"[..]

[..] FPGA development relies heavily on simulation,
which does not require real hardware."

A warning to Mike Silva: supposed simulators for languages chiefly
used for FPGAs often behave differently to how source code will
actually behave when synthesized (implemented) on a field programmable
gate array. This is true of cheap and expensive tools promoted as
simulators.

What do you mean by this? The VHDL I simulate behaves the same as the
FPGA, unless I do something bad like doing asynchronous design, or
miss a timing constraint."

Or if you use the enumeration encoding attribute and it is not supported
by both the simulation tool and the synthesis tool;

Well, attributes are all a bit tool specific, I'm not sure this is
important. The sim and the synth will *behave* the same, but the
numbers used to represent states might not be what you expect. Or am
I misunderstanding what you are getting at?

or if you use a
simulation tool which honors the semantics of mode BUFFER but use a
synthesis tool which simply and incorrectly replaces BUFFER with OUT
even if you are supposedly using an earlier version of VHDL in which
BUFFER and OUT are not very similar;

Never used buffers, so I dunno about that!

a global name might not be
reevaluated if the global value changes even if a function depends on
it (hey, I am not saying it is a good idea, but Synopsys does mention
it as an example of something which can cause a simulation
mismatch);

Again, can't say. But don't use Synopsys as a good example of a
high-end tool in the FPGA world

Can you even do FPGA's with
Synopsys any more? It hasn't been bundled by Xilinx for a long while.

'Z' not being treated as high impedance by a synthesis tool;

It will be if the thing you are targetting has tristates. Either as
IOBs or internally.

default
values being ignored for synthesis;

Works in my tools.

TRANSPORT and AFTER being ignored
for synthesis (that TRANSPORT is not amenable to being implemented by
technologies typically targetted by HDLs is not a valid excuse: in
such cases, a tool should report that it is unsupported instead of
ignoring it);

Yes, it should I agree.

sensitivity lists being ignored for synthesis;

That depends on the tool.

or other
discrepancies.

Well, that covers a lot ;-)

" These are both design problems, not
simulation or language problems."

It is true that asynchronous designs or missed timing constraints may
be design problems, but I would prefer that if the synthesized
behavior is bad, that the simulation (which is supposed to indicate
what will happen in reality) would not run well, or that the
simulation would be bad in the same way that the synthesized behavior
is.

But the simulation you use for developing code is necessarily an
idealised version of life. If you want to you can run an annotated
simulation full of delays from the back end tools with the same
testbench. You may or may not find anything, as these sims are very
slow and you testbench must trigger just the right set of timing
problems.

I've never found a timing problem through running this level of simulation.

This may be too much to expect for timing constraints, but I -- perhaps
naively -- do not see why an asynchronous design should be so dismissable.
How hard could it be to replace tools' warnings that a referenced signal
needs to be added to a sensitivity list with a rule in the language standard
which makes the omission from the sensitivity list illegal?

Because it might be handy for simulating something? I dunno to be honest.

Personally I'd prefer the option to make the synthesizer flag an error
for that sort of thing.

[I am not an async expert but...] You can do async design in VHDL and
with synthesis, but proving correctness by simulation does not work
out as I understand it.

"<snip
" multi-dimensional
arrays) gave a feeling of clarity and integrity absent from software
development languages."

Apparently for many years, VHDL subset tools (the only kind of VHDL
tools which exist, so even if the language was excellent one would
need to restrict one's code to what was supported) used to not support
multi-dimensional arrays.

What's this about "only VHDL subset" tools existing? Modelsim supports
all of VHDL..."

You may rightly deem that claim of mine to be unwarranted, but outside
of testbenches, I do not see what use the language is if it is not
transferrable to actual hardware.

What?! "Outside of testbenches, I do not see what use..." *Inside* of
testbenches is where I spend most of my coding time! The whole point
of having a rich language is to make running simulations easy.

The fact that we has a synthesisable subset is not a bad thing, just
how real life gets in the way. That subset has grown over the years,
who knows in ten year's time, maybe the synth will be able to figure
out that the way I'm using my ACCESS type variable infers some RAM and
address logic and then that'll be another synthesiable construct.

I wish VHDL had *more* non synthesisable features
(like dynamic array sizing for example). I'd like to write my
testbenches in Python

" It is true that synthesis tools only support a subset of
VHDL, but a lot of that is down to the fact that turning (say) an
access type into hardware is a bit tricky."

That is true, and even if not tricky, it may be pointless. However, I
think that people have become accustomed to this as an excuse even
when it is not valid, e.g. from Synopsys's documentation:
"[..]
Array ranges and indexes other than integers are unsupported.
[..]"

Again, I wouldn't use Synposys as an example of "the best" support for
the language... Synplify doesn't state that sort of limitation in
it's list of unsupported features.

Martin J. Thompson wrote:

"Multi dimensional arrays have worked (even in synthesis) for years in
my experience.

[..]"

Not always, and not with all tools. E.g. last month, someone
mentioned in
news:548d3iF1vbcf6U1@mid.individual.net
: "Using 2D-Arrays as I/O signals _may_ be a problem for some synthesis
tools. [..]"

Well, that's a bit weak ("*may* be a problem") - what tools do they
currently not work in?

I admit my next example is historical, but Table 7.1-1 Supported and
Unsupported Synthesis Constructs of Ben Cohen's second (1998) edition of
"VHDL Answers to Frequently Asked Questions" contains:
"[..]
[..] multidimensional arrays are not allowed
[..]"

Cheers,
C. P. G.

Yes, in the past it has been a problem. But in the past, inferring a
counter was a problem! Been sorted for a while now

Cheers,
Martin

--
martin.j.thompson@trw.com
TRW Conekt - Consultancy in Engineering, Knowledge and Technology
http://www.conekt.net/electronics.html

Georg Bauhaus · Mar 2, 2007

On Fri, 2007-03-02 at 14:22 +0100, Marcus Harnisch wrote:

fined.

In VHDL and, I suppose Ada as well(?), the variable *does* have an
initial value.

In Ada, whether or not a variable (without an initialization
expression) has a known initial value can depend on the type.
There are types whose objects can't be created without
initialization, e.g. indefinite types or limited types with
default-initialized components. (There is some news here:
Ada 2005 adds the possibility to initialize objects of limited
types using an initialization expression. (I.e. types that
otherwise forbid assignment.))

Simple scalar types have no defined default initial value.
OTOH, when the storage bits of a variable are mapped to
hardware it might not be the best idea to have the program
flip bits as a consequence of any default initialization.
An Ada program will then specify "no initialization at all."

The issue of no default values for scalars is well known and
Ada compilers must provide additional means of testing, and
detection where possible:
A configuration pragma can be used to make the compiler
assign predictable initial values. These should
be outside the range of a variable's type (more in LRM H.1),
and must be documented.

It's not entirely obvious but at least you will
*always* get consistent results.

Dmitry A. Kazakov · Mar 2, 2007

On Fri, 02 Mar 2007 11:36:22 GMT, Dr. Adrian Wrigley wrote:

On Thu, 01 Mar 2007 14:57:01 +0100, Dmitry A. Kazakov wrote:

On Thu, 01 Mar 2007 11:22:32 GMT, Dr. Adrian Wrigley wrote:

If you don't have multiple processors, lightweight threading is
less attractive than if you do? Inmos/Occam/Transputer was founded
on the basis that lightweight threading was highly relevant to multiple
processors.

Ada has no means of saying "Do these bits concurrently, if you like,
because I don't care what the order of execution is". And a compiler
can't work it out from the source. If your CPU has loads of threads,
compiling code with "PAR" style language concurrency is rather useful
and easy.

But par is quite low-level. What would be the semantics of:

declare
Thing : X;
begin
par
Foo Thing);
and
Bar Thing);
and
Baz Thing);
end par;
end;

Do Foo, Bar and Baz in any order or concurrently, all accessing Thing.

That's the question. If they just have an arbitrary execution order being
mutually exclusive then the above is a kind of select with anonymous
accepts invoking Foo, Bar, Baz. The semantics is clean.

Roughly equivalent to doing the same operations in three separate
tasks. Thing could be a protected object, if concurrent writes
are prohibited. Seems simple enough!

This is a very different variant:

declare
Thing : X;
begin
declare -- par
task Alt_1; task Alt_2; task Alt_3;
task body Alt_1 is
begin
Foo (Thing);
end Alt_1;
task body Alt_2 is
begin
Bar (Thing);
end Alt_2;
task body Alt_3 is
begin
Baz (Thing);
end Alt_3;
begin
null;
end; -- par

If par is a sugar for this, then Thing might easily get corrupted. The
problem with such par is that the rules of nesting and visibility for the
statements, which are otherwise safe, become very dangerous in the case of
par.

Another problem is that Thing cannot be a protected object. Clearly Foo,
Bar and Baz resynchronize themselves on Thing after updating its parts. But
the compiler cannot know this. It also does not know that the updates do
not influence each other. It does not know that the state of Thing is
invalid until resynchronization. So it will serialize alternatives on write
access to Thing. (I cannot imagine a use case where Foo, Bar and Baz would
be pure. There seems to always be a shared outcome which would block them.)
Further Thing should be locked for the outer world while Foo, Bar, Baz are
running. So the standard functionality of protected objects looks totally
wrong here.

I'm looking for something like Cilk, but even the concurrent loop
(JPR's for I in all 1 .. n loop?) would be a help.

Maybe, just a guess, the functional decomposition rather than statements
could be more appropriate here. The alternatives would access their
arguments by copy-in and resynchronize by copy-out.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

Dr. Adrian Wrigley · Mar 3, 2007

On Fri, 02 Mar 2007 17:32:26 +0100, Dmitry A. Kazakov wrote:

On Fri, 02 Mar 2007 11:36:22 GMT, Dr. Adrian Wrigley wrote:

On Thu, 01 Mar 2007 14:57:01 +0100, Dmitry A. Kazakov wrote:

On Thu, 01 Mar 2007 11:22:32 GMT, Dr. Adrian Wrigley wrote:

If you don't have multiple processors, lightweight threading is
less attractive than if you do? Inmos/Occam/Transputer was founded
on the basis that lightweight threading was highly relevant to multiple
processors.

Ada has no means of saying "Do these bits concurrently, if you like,
because I don't care what the order of execution is". And a compiler
can't work it out from the source. If your CPU has loads of threads,
compiling code with "PAR" style language concurrency is rather useful
and easy.

But par is quite low-level. What would be the semantics of:

declare
Thing : X;
begin
par
Foo Thing);
and
Bar Thing);
and
Baz Thing);
end par;
end;

Do Foo, Bar and Baz in any order or concurrently, all accessing Thing.

That's the question. If they just have an arbitrary execution order being
mutually exclusive then the above is a kind of select with anonymous
accepts invoking Foo, Bar, Baz. The semantics is clean.

Roughly equivalent to doing the same operations in three separate
tasks. Thing could be a protected object, if concurrent writes
are prohibited. Seems simple enough!

This is a very different variant:

declare
Thing : X;
begin
declare -- par
task Alt_1; task Alt_2; task Alt_3;
task body Alt_1 is
begin
Foo (Thing);
end Alt_1;
task body Alt_2 is
begin
Bar (Thing);
end Alt_2;
task body Alt_3 is
begin
Baz (Thing);
end Alt_3;
begin
null;
end; -- par

If par is a sugar for this, then Thing might easily get corrupted. The
problem with such par is that the rules of nesting and visibility for the
statements, which are otherwise safe, become very dangerous in the case of
par.

This is what I was thinking.

Syntax might be even simpler:
declare
Thing : X;
begin par
Foo (Thing);
Bar (Thing);
Baz (Thing);
end par;

Thing won't get corrupted if the programmed knows what they're doing!
In the case of pure functions, there is "obviously" no problem:

declare
Thing : X := InitThing;
begin par
A1 := Foo (Thing);
A2 := Bar (Thing);
A3 := Baz (Thing);
end par;
return A1+A2+A3;

In the case of procedures, there are numerous reasonable uses.
Perhaps the three procedures read Thing, and output three separate files.
Or maybe they write different parts of Thing. Maybe they validate
different properties of Thing, and raise an exception if a fault is found.
Perhaps they update statistics stored in a protected object, not shown.

The most obvious case is if the procedures are called on different
objects. Next most likely is if they are pure functions

Another problem is that Thing cannot be a protected object. Clearly Foo,
Bar and Baz resynchronize themselves on Thing after updating its parts. But
the compiler cannot know this. It also does not know that the updates do
not influence each other. It does not know that the state of Thing is
invalid until resynchronization. So it will serialize alternatives on write
access to Thing. (I cannot imagine a use case where Foo, Bar and Baz would
be pure. There seems to always be a shared outcome which would block them.)
Further Thing should be locked for the outer world while Foo, Bar, Baz are
running. So the standard functionality of protected objects looks totally
wrong here.

Could Thing be composed of protected objects? That way updates
would be serialised but wouldn't necessarily block the other procedures.

Maybe the procedures are very slow, but only touch Thing at the end?
Couldn't they run concurrently, and be serialised in an arbitrary order
at the end?

Nothing in this problem is different from the issues of doing it with
separate tasks. So why is this any more problematic?

The semantics I want permit serial execution in any order. And permit
operation even with a very large number of parallel statements in
effect. Imagine a recursive call with each level having many parallel
statements. Creating a task for each directly would probably break.
Something like an FFT, for example. FFT the upper and lower halves
of Thing in parallel. Combine serially.

Exception sematics would probably differ. Any statement excepting
would stop all other par statements(?)

The compiler should be able to generate code which generates a
reasonable number of threads, depending on the hardware being used.

I'm looking for something like Cilk, but even the concurrent loop
(JPR's for I in all 1 .. n loop?) would be a help.

Maybe, just a guess, the functional decomposition rather than statements
could be more appropriate here. The alternatives would access their
arguments by copy-in and resynchronize by copy-out.

Maybe you're right. But I can't see how to glue this in with
Ada (or VHDL) semantics.
--
Adrian

Ray Blaak · Mar 3, 2007

"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:

If par is a sugar for this, then Thing might easily get corrupted. The
problem with such par is that the rules of nesting and visibility for the
statements, which are otherwise safe, become very dangerous in the case of
par.

Another problem is that Thing cannot be a protected object.

I am somewhat rusty on my Ada tasking knowledge, but why can't Thing be a
protected object?

It seems to me that is precisely the kind of synchronization control mechanism
you want to be able to have here.
--
Cheers, The Rhythm is around me,
The Rhythm has control.
Ray Blaak The Rhythm is inside me,
rAYblaaK@STRIPCAPStelus.net The Rhythm has my soul.

Pascal Obry · Mar 3, 2007

Ray Blaak a écrit :

I am somewhat rusty on my Ada tasking knowledge, but why can't Thing be a
protected object?

I don't think this is true. Thing can be a protected object and passed
to some procedures. No problem here I would say and probably the right
approach.

It seems to me that is precisely the kind of synchronization control mechanism
you want to be able to have here.

Dmitry A. Kazakov · Mar 3, 2007

On Sat, 03 Mar 2007 00:00:52 GMT, Dr. Adrian Wrigley wrote:

On Fri, 02 Mar 2007 17:32:26 +0100, Dmitry A. Kazakov wrote:

On Fri, 02 Mar 2007 11:36:22 GMT, Dr. Adrian Wrigley wrote:

On Thu, 01 Mar 2007 14:57:01 +0100, Dmitry A. Kazakov wrote:

On Thu, 01 Mar 2007 11:22:32 GMT, Dr. Adrian Wrigley wrote:

Roughly equivalent to doing the same operations in three separate
tasks. Thing could be a protected object, if concurrent writes
are prohibited. Seems simple enough!

This is a very different variant:

declare
Thing : X;
begin
declare -- par
task Alt_1; task Alt_2; task Alt_3;
task body Alt_1 is
begin
Foo (Thing);
end Alt_1;
task body Alt_2 is
begin
Bar (Thing);
end Alt_2;
task body Alt_3 is
begin
Baz (Thing);
end Alt_3;
begin
null;
end; -- par

If par is a sugar for this, then Thing might easily get corrupted. The
problem with such par is that the rules of nesting and visibility for the
statements, which are otherwise safe, become very dangerous in the case of
par.

This is what I was thinking.

Syntax might be even simpler:
declare
Thing : X;
begin par
Foo (Thing);
Bar (Thing);
Baz (Thing);
end par;

Thing won't get corrupted if the programmed knows what they're doing!

Surely, but it becomes a pitfall for those who don't. The construct is
inherently unsafe, because it makes no any sense without some mutable Thing
or equivalent. This mutable thing is accessed unsafely, or else concurrency
gets killed.

In the case of pure functions, there is "obviously" no problem:

declare
Thing : X := InitThing;
begin par
A1 := Foo (Thing);
A2 := Bar (Thing);
A3 := Baz (Thing);
end par;
return A1+A2+A3;

In the case of procedures, there are numerous reasonable uses.
Perhaps the three procedures read Thing, and output three separate files.
Or maybe they write different parts of Thing. Maybe they validate
different properties of Thing, and raise an exception if a fault is found.
Perhaps they update statistics stored in a protected object, not shown.

The most obvious case is if the procedures are called on different
objects. Next most likely is if they are pure functions

The problem is that there always exists the final "A1+A2+A3" which
semantics is in question. The alternatives resynchronize on "A1+A2+A3" and
I see no obvious way to express this. A PAR statement would not really help
to decompose it.

(What you have done is replacing mutable Thing with mutable set {A1,A2,A3}.
Let's rename {A1,A2,A3} to Thing, the problem is still there.)

Another problem is that Thing cannot be a protected object. Clearly Foo,
Bar and Baz resynchronize themselves on Thing after updating its parts. But
the compiler cannot know this. It also does not know that the updates do
not influence each other. It does not know that the state of Thing is
invalid until resynchronization. So it will serialize alternatives on write
access to Thing. (I cannot imagine a use case where Foo, Bar and Baz would
be pure. There seems to always be a shared outcome which would block them.)
Further Thing should be locked for the outer world while Foo, Bar, Baz are
running. So the standard functionality of protected objects looks totally
wrong here.

Could Thing be composed of protected objects? That way updates
would be serialised but wouldn't necessarily block the other procedures.

That could be a "hierarchical" mutex. But mutexes are themselves very
low-level. The unsafety were still there, it just would show itself as
deadlocks, rather than as corrupted data.

Maybe the procedures are very slow, but only touch Thing at the end?
Couldn't they run concurrently, and be serialised in an arbitrary order
at the end?

That is the key issue, IMO. An ability to chop large chunks when the
procedures run most of the time independently into independent and
serialized parts is all the decomposition is about...

Nothing in this problem is different from the issues of doing it with
separate tasks. So why is this any more problematic?

Because tasks additionally have safe synchronization and data exchange
mechanisms, while PAR should rely on inherently unsafe memory sharing.

The semantics I want permit serial execution in any order. And permit
operation even with a very large number of parallel statements in
effect. Imagine a recursive call with each level having many parallel
statements. Creating a task for each directly would probably break.
Something like an FFT, for example. FFT the upper and lower halves
of Thing in parallel. Combine serially.

Yes, and the run-time could assign the worker tasks from some pool of,
fully transparently to the program. That would be very cool.

Exception sematics would probably differ. Any statement excepting
would stop all other par statements(?)

But not by abort, rather it should wait for the next synchronization point
an propagate an exception out of there, so that the alternatives might
clean up the temporal objects they create. (The synchronization points
could be explicit, for example when an alternative calls to an entry point
or procedure of a shared thing.)

The compiler should be able to generate code which generates a
reasonable number of threads, depending on the hardware being used.

Yes

I'm looking for something like Cilk, but even the concurrent loop
(JPR's for I in all 1 .. n loop?) would be a help.

Maybe, just a guess, the functional decomposition rather than statements
could be more appropriate here. The alternatives would access their
arguments by copy-in and resynchronize by copy-out.

Maybe you're right. But I can't see how to glue this in with
Ada (or VHDL) semantics.

That is the most difficult part!

-))

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

Dmitry A. Kazakov · Mar 3, 2007

On Sat, 03 Mar 2007 01:58:35 GMT, Ray Blaak wrote:

"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:
If par is a sugar for this, then Thing might easily get corrupted. The
problem with such par is that the rules of nesting and visibility for the
statements, which are otherwise safe, become very dangerous in the case of
par.

Another problem is that Thing cannot be a protected object.

I am somewhat rusty on my Ada tasking knowledge, but why can't Thing be a
protected object?

I tried to explain it in my previous post.

When Thing is a protected object, then the procedures and entries of,
called from the concurrent alternatives are all mutually exclusive. This is
not the semantics expected from PAR. Probably it would be better to rewrite
as:

declare
Thing : X;
begin
par -- Though this appears concurrent, it is not
Thing.Foo;
and
Thing.Bar;
and
Thing.Baz;
end par;
end;

It seems to me that is precisely the kind of synchronization control mechanism
you want to be able to have here.

No. The implied semantics of PAR is such that Thing should be accessed from
alternatives without interlocking because one *suggests* that the updates
are mutually independent. When Thing is visible from outside it should be
blocked by PAR for everyone else. This is not the behaviour of a protected
object. It is rather a "hierarchical" mutex.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

Jonathan Bromley · Mar 3, 2007

On Sat, 3 Mar 2007 12:00:08 +0100, "Dmitry A. Kazakov"
<mailbox@dmitry-kazakov.de> wrote:

Because tasks additionally have safe synchronization and data exchange
mechanisms, while PAR should rely on inherently unsafe memory sharing.

The PAR that I'm familiar with (from CSP/occam) most certainly does
*not* have "inherently unsafe memory sharing". There seems to be
an absurd amount of wheel-reinvention going on in this thread.

The semantics I want permit serial execution in any order. And permit
operation even with a very large number of parallel statements in
effect. Imagine a recursive call with each level having many parallel
statements. Creating a task for each directly would probably break.
Something like an FFT, for example. FFT the upper and lower halves
of Thing in parallel. Combine serially.

Yes, and the run-time could assign the worker tasks from some pool of,
fully transparently to the program. That would be very cool.

And easy to do, and done many times before.

The compiler should be able to generate code which generates a
reasonable number of threads, depending on the hardware being used.

Yes

For heaven's sake... You have a statically-determinable number of
processors. It's your (or your compiler's) choice whether each of
those processors runs a single thread, or somehow runs multiple
threads. If each processor is entitled to run multiple threads, then
there's no reason why the number and structure of cooperating
threads should not be dynamically variable. If you choose to run
one thread on each processor, your thread structure is similarly
static. Hardware people have been obliged to think about this
kind of thing for decades. Software people seem to have a
pretty good grip on it too, if the textbooks and papers I've read
are anything to go by. Why is it suddenly such a big deal?

Maybe you're right. But I can't see how to glue this in with
Ada (or VHDL) semantics.

In VHDL, a process represents a single statically-constructed
thread. It talks to its peers in an inherently safe way through
signals. With this mechanism, together with dynamic memory
allocation, you can easily fake-up whatever threading regime
takes your fancy. You probably wouldn't bother because
there are more convenient tools to do such things in software
land, but it can be done. In hardware you can do exactly the
same thing, but one (or more) of your processes must then
take responsibility for emulating the dynamic memory allocation,
carving up some real static physical memory according to
whatever strategy you choose to implement.

That is the most difficult part! -))

Maybe. But then again, maybe organising the structure of
the actual application is the most difficult part, and this
vapid rambling about things that are already well-understood
is actually rather straightforward.
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.bromley@MYCOMPANY.com
http://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

Simon Farnsworth · Mar 3, 2007

Jonathan Bromley wrote:

On Sat, 3 Mar 2007 12:00:08 +0100, "Dmitry A. Kazakov"
mailbox@dmitry-kazakov.de> wrote:

The compiler should be able to generate code which generates a
reasonable number of threads, depending on the hardware being used.

Yes

For heaven's sake... You have a statically-determinable number of
processors. It's your (or your compiler's) choice whether each of
those processors runs a single thread, or somehow runs multiple
threads. If each processor is entitled to run multiple threads, then
there's no reason why the number and structure of cooperating
threads should not be dynamically variable. If you choose to run
one thread on each processor, your thread structure is similarly
static. Hardware people have been obliged to think about this
kind of thing for decades. Software people seem to have a
pretty good grip on it too, if the textbooks and papers I've read
are anything to go by. Why is it suddenly such a big deal?

Not disagreeing with most of what you're saying, but I do feel the need to

point out the existence of systems with hotpluggable CPUs. Sun and IBM have
both sold systems for some years where CPUs can be added and removed at
runtime; software is expected to just cope with this.

Also in the software domain; there is a cost to switching between different
threads. Thus, in software, the aim is to limit the number of runnable
threads to the number of active CPUs. If there are more threads runnable
than CPUs available, some CPU time is wasted switching between threads,
which is normally undesirable behaviour.
--
Simon Farnsworth

Dr. Adrian Wrigley · Mar 3, 2007

On Sat, 03 Mar 2007 11:27:50 +0000, Jonathan Bromley wrote:

On Sat, 3 Mar 2007 12:00:08 +0100, "Dmitry A. Kazakov"
mailbox@dmitry-kazakov.de> wrote:

Because tasks additionally have safe synchronization and data exchange
mechanisms, while PAR should rely on inherently unsafe memory sharing.

The PAR that I'm familiar with (from CSP/occam) most certainly does
*not* have "inherently unsafe memory sharing". There seems to be
an absurd amount of wheel-reinvention going on in this thread.

I think reinvention is necessary. Whatever "par" semantics was
in Occam is not available in Ada (or C, C++, Perl or whatever).
It was considered useful then - bring it back!

The semantics I want permit serial execution in any order. And permit
operation even with a very large number of parallel statements in
effect. Imagine a recursive call with each level having many parallel
statements. Creating a task for each directly would probably break.
Something like an FFT, for example. FFT the upper and lower halves
of Thing in parallel. Combine serially.

Yes, and the run-time could assign the worker tasks from some pool of,
fully transparently to the program. That would be very cool.

And easy to do, and done many times before.

How do you do this in Ada? Or VHDL? It's been done many times
before, yes, but not delivered in any currently usable form for
the general programmer

It's not in any mainstream language I know.

The compiler should be able to generate code which generates a
reasonable number of threads, depending on the hardware being used.

Yes

For heaven's sake... You have a statically-determinable number of
processors. It's your (or your compiler's) choice whether each of
those processors runs a single thread, or somehow runs multiple
threads. If each processor is entitled to run multiple threads, then
there's no reason why the number and structure of cooperating
threads should not be dynamically variable. If you choose to run
one thread on each processor, your thread structure is similarly

Of course. But how do I make this choice with the OSs and languages
of today? "nice" doesn't seem to be able to be able to control
this when code is written in Ada or VHDL. Nor is it defined
anywhere in the source code.

static. Hardware people have been obliged to think about this
kind of thing for decades. Software people seem to have a
pretty good grip on it too, if the textbooks and papers I've read
are anything to go by. Why is it suddenly such a big deal?

It's been a big deal for a long time as far as I'm concerned.
It's not a matter of "invention" mostly, but one of availability
and standards. There is no means in Ada to say "run this in
a separate task, if appropriate". Only a few, academic
and experimental tools offer the flexibility. Papers /= practise.

Maybe you're right. But I can't see how to glue this in with
Ada (or VHDL) semantics.

In VHDL, a process represents a single statically-constructed
thread. It talks to its peers in an inherently safe way through
signals. With this mechanism, together with dynamic memory
allocation, you can easily fake-up whatever threading regime
takes your fancy. You probably wouldn't bother because
there are more convenient tools to do such things in software
land, but it can be done.

I'm not sure what you're talking about here. Do you mean like any/all of
Split-C, Cilk, C*, ZPL, HPF, F, data-parallel C, MPI-1, MPI-2, OpenMP,
ViVA, MOSIX, PVM, SVM, Paderborn BSP, Oxford BSP toolset and IBM's TSpaces?

Specifying and using fine-grain parallelism requires language,
compiler and hardware support, I think.

Consider:
begin par
x := sin(theta);
y := cos(theta);
end par;

you probably *do* want to create a new thread, if thread creation
and destruction is much faster than the function calls. You don't
know this at compile-time, because this depends on the library in use,
and the actual parameters. Maybe X, Y are of dynamically allocated
length (multi-precision).

You can't justify designing hardware with very short thread
creation/destruction times, unless the software can be written
to take advantage. But none of the mainstream languages
allow fine grain reordering and concurrency to be specified.
That's the Catch-22 that Inmos/Occam solved. Technically.

The need is emerging again, now more threads on a chip
is easier than higher sequential instruction rate.

In hardware you can do exactly the
same thing, but one (or more) of your processes must then
take responsibility for emulating the dynamic memory allocation,
carving up some real static physical memory according to
whatever strategy you choose to implement.

That is the most difficult part! -))

Maybe. But then again, maybe organising the structure of
the actual application is the most difficult part, and this

This is sometimes true.

vapid rambling about things that are already well-understood
is actually rather straightforward.

Somewhere our models don't mesh. What is "straightforward" to
you is "impossible" for me. What syntax do I use, and which
compiler, OS and processor do I need to specify and exploit
fine-grain concurrency?

In 1987, the answers were "par", Occam, Transputer. Twenty
years later, Ada (or VHDL, C++, C#), Linux (or Windows), Niagara
(or Tukwila, XinC, ClearSpeed, Cell) do not offer us anything
remotely similar. In fact, in twenty years, things have
got worse

--
Adrian

Dr. Adrian Wrigley · Mar 3, 2007

On Sat, 03 Mar 2007 12:12:10 +0000, Simon Farnsworth wrote:

Jonathan Bromley wrote:

On Sat, 3 Mar 2007 12:00:08 +0100, "Dmitry A. Kazakov"
mailbox@dmitry-kazakov.de> wrote:

The compiler should be able to generate code which generates a
reasonable number of threads, depending on the hardware being used.

Yes

For heaven's sake... You have a statically-determinable number of
processors. It's your (or your compiler's) choice whether each of
those processors runs a single thread, or somehow runs multiple
threads. If each processor is entitled to run multiple threads, then
there's no reason why the number and structure of cooperating
threads should not be dynamically variable. If you choose to run
one thread on each processor, your thread structure is similarly
static. Hardware people have been obliged to think about this
kind of thing for decades. Software people seem to have a
pretty good grip on it too, if the textbooks and papers I've read
are anything to go by. Why is it suddenly such a big deal?

Not disagreeing with most of what you're saying, but I do feel the need to
point out the existence of systems with hotpluggable CPUs. Sun and IBM have
both sold systems for some years where CPUs can be added and removed at
runtime; software is expected to just cope with this.

Also in the software domain; there is a cost to switching between different
threads. Thus, in software, the aim is to limit the number of runnable
threads to the number of active CPUs. If there are more threads runnable
than CPUs available, some CPU time is wasted switching between threads,
which is normally undesirable behaviour.

This is part of the problem. Parallelism has to be be *inhibited* by
explicit serialisation, to limit the number of threads created.

So a construct like (in Ada):

for I in Truth'Range loop
Z := Z xor Truth (I);
end loop;

deliberately forces an execution serial order, even although we
know that order does not matter at all, in this case.

There is no effective consruct to permit but not require concurrency.

The software compiler can't sensibly parallelise this because:

The semantics of xor may be unknown (overloaded), and unsuitable
The execution time of each iteration is much smaller than thread
start/stop time
Too many parallel threads would be created

So we're left with source code which implies a non-existent serialisation
constraint.

If the "for I in all..." construct were in the language, we'd be
able to say "I don't care about the order", and permitting concurrency,
even if the result weren't identical (eg when using floating point)

Numerous algorithms in simulation are "embarrassingly parallel",
but this fact is completely and deliberately obscured from compilers.
Compilers can't normally generated fine-scale threaded code because
the applications don't specify it, the languages don't support it,
and the processors don't need it. But the technical opportunity
is real. It won't happen until the deadlock between compilers, software
and processors is broken.
--
Adrian

Jonathan Bromley · Mar 3, 2007

On Sat, 03 Mar 2007 13:40:16 GMT, "Dr. Adrian Wrigley"
<amtw@linuxchip.demon.co.uk.uk.uk> wrote:

What syntax do I use, and which
compiler, OS and processor do I need to specify and exploit
fine-grain concurrency?

In 1987, the answers were "par", Occam, Transputer. Twenty
years later, Ada (or VHDL, C++, C#), Linux (or Windows), Niagara
(or Tukwila, XinC, ClearSpeed, Cell) do not offer us anything
remotely similar. In fact, in twenty years, things have
got worse

Absolutely right. And whose fault is that? Not the academics,
who have understood this for decades. Not the hardware people
like me, who of necessity must understand and exploit massive
fine-grained parallelism (albeit with a static structure). No,
it's the programmer weenies with their silly nonsense about
threads being inefficient.

Glad to have got that off my chest

But it's pretty frustrating
to be told that parallel programming's time has come, when
I spent a decade and a half trying to persuade people that it
was worth even thinking about and being told that it was
irrelevant.

For the numerical-algorithms people, I suspect the problem of
inferring opportunities for parallelism is nearer to being solved
than some might imagine. There are tools around that
can convert DSP-type algorithms (such as the FFT that's
already been mentioned) into hardware that's inherently
parallel; there are behavioural synthesis tools that allow
you to explore the various possible parallel vs. serial
possibilities for scheduling a computation on heterogeneous
hardware. It's surely a small step from that to distributing
such a computation across multiple threads or CPUs. All
that's needed is the will.
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.bromley@MYCOMPANY.com
http://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

Dr. Adrian Wrigley · Mar 3, 2007

On Sat, 03 Mar 2007 15:26:35 +0000, Jonathan Bromley wrote:

On Sat, 03 Mar 2007 13:40:16 GMT, "Dr. Adrian Wrigley"
amtw@linuxchip.demon.co.uk.uk.uk> wrote:

What syntax do I use, and which
compiler, OS and processor do I need to specify and exploit
fine-grain concurrency?

In 1987, the answers were "par", Occam, Transputer. Twenty
years later, Ada (or VHDL, C++, C#), Linux (or Windows), Niagara
(or Tukwila, XinC, ClearSpeed, Cell) do not offer us anything
remotely similar. In fact, in twenty years, things have
got worse

Absolutely right. And whose fault is that? Not the academics,
who have understood this for decades. Not the hardware people
like me, who of necessity must understand and exploit massive
fine-grained parallelism (albeit with a static structure). No,
it's the programmer weenies with their silly nonsense about
threads being inefficient.

By the way... I am a satisfied customer of yours (from 1994).

If there is any blame to share, I place it upon the language
designers who don't include the basics of concurrency (and
I include Ada, which has no parallel loops, statements or function
calls. Nor decent pure functions).

I do hardware, processor and software design. But I'm not
keen on trying to fix-up programming languages, compilers
and processors so they mesh better. (Unless someone pays me!)

Glad to have got that off my chest But it's pretty frustrating
to be told that parallel programming's time has come, when

(I'm not saying this - so don't be frustrated! What I'm saying
is that multithreading has become "buzzword compliant" again,
so may there's an opportunity to exploit to address longstanding
technical deficiencies and rebrand Ada and/or VHDL)

I spent a decade and a half trying to persuade people that it
was worth even thinking about and being told that it was
irrelevant.

Parallel programming's time hasn't quite arrived

But it's only 3-5 years away! Still. (like flying cars,
fusion power and flat screens, which never seem to get
nearer. {Oh. tick off flat screens!})

For the numerical-algorithms people, I suspect the problem of
inferring opportunities for parallelism is nearer to being solved
than some might imagine. There are tools around that
can convert DSP-type algorithms (such as the FFT that's
already been mentioned) into hardware that's inherently

Again, this is ages old now. But it can't convert
C-type programs reliably and efficiently.

parallel; there are behavioural synthesis tools that allow
you to explore the various possible parallel vs. serial
possibilities for scheduling a computation on heterogeneous
hardware. It's surely a small step from that to distributing
such a computation across multiple threads or CPUs. All
that's needed is the will.

A small step. Like from Apollo 11.

Once the language/software/compiler/processor deadlock is broken,
things will move rapidly. Give it another 15 years, and we might
be half way there.

Glad to see that we're not so far apart as I thought!
--
Adrian

Pascal Obry · Mar 3, 2007

Dr. Adrian Wrigley a écrit :

Numerous algorithms in simulation are "embarrassingly parallel",
but this fact is completely and deliberately obscured from compilers.

Not a big problem. If the algorithms are "embarrassingly parallel" then
the jobs are fully independent. In this case that is quite simple,
create as many tasks as you have of processors. No big deal. Each task
will compute a specific job. Ada has no problem with "embarrassingly
parallel" jobs.

What I have not yet understood is that people are trying to solve, in
all cases, the parallelism at the lowest lever. Trying to parallelize an
algorithm in an "embarrassingly parallel" context is loosing precious
time. Many real case simulations have billions of those algorithm to
compute on multiple data, just create a set of task to compute in
parallel multiple of those algorithm. Easier and as effective.

In other words, what I'm saying is that in some cases ("embarrassingly
parallel" computation is one of them) it is easier to do n computations
in n tasks than n x (1 parallel computation in n tasks), and the overall
performance is better.

Pascal.

--

--|------------------------------------------------------
--| Pascal Obry Team-Ada Member
--| 45, rue Gabriel Peri - 78114 Magny Les Hameaux FRANCE
--|------------------------------------------------------
--| http://www.obry.net
--| "The best way to travel is by means of imagination"
--|
--| gpg --keyserver wwwkeys.pgp.net --recv-key C1082595

Dmitry A. Kazakov · Mar 3, 2007

On Sat, 03 Mar 2007 18:28:18 +0100, Pascal Obry wrote:

Dr. Adrian Wrigley a écrit :
Numerous algorithms in simulation are "embarrassingly parallel",
but this fact is completely and deliberately obscured from compilers.

Not a big problem. If the algorithms are "embarrassingly parallel" then
the jobs are fully independent. In this case that is quite simple,
create as many tasks as you have of processors. No big deal. Each task
will compute a specific job. Ada has no problem with "embarrassingly
parallel" jobs.

What I have not yet understood is that people are trying to solve, in
all cases, the parallelism at the lowest lever. Trying to parallelize an
algorithm in an "embarrassingly parallel" context is loosing precious
time. Many real case simulations have billions of those algorithm to
compute on multiple data, just create a set of task to compute in
parallel multiple of those algorithm. Easier and as effective.

In other words, what I'm saying is that in some cases ("embarrassingly
parallel" computation is one of them) it is easier to do n computations
in n tasks than n x (1 parallel computation in n tasks), and the overall
performance is better.

The idea (of PAR etc) is IMO quite opposite. It is about treating
parallelism rather as a compiler optimization problem, than as a part of
the domain. In the simplest possible form it can be illustrated on the
example of Ada's "or" and "or else." While the former is potentially
parallel, it has zero overhead compared to sequential "or else." (I don't
count the time required to evaluate the operands). If we compare it with
the overhead of creating tasks, we will see a huge difference both in terms
of CPU cycles and mental efforts.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

Clock Edge notation

Ray Blaak

Guest

Georg Bauhaus

Guest

Dr. Adrian Wrigley

Guest

Marcus Harnisch

Guest

Martin Thompson

Guest

Georg Bauhaus

Guest

Dmitry A. Kazakov

Guest

Dr. Adrian Wrigley

Guest

Ray Blaak

Guest

Pascal Obry

Guest

Dmitry A. Kazakov

Guest

Dmitry A. Kazakov

Guest

Jonathan Bromley

Guest

Simon Farnsworth

Guest

Dr. Adrian Wrigley

Guest

Dr. Adrian Wrigley

Guest

Jonathan Bromley

Guest

Dr. Adrian Wrigley

Guest

Pascal Obry

Guest

Dmitry A. Kazakov

Guest

Log in

Welcome to EDABoard.com

Sponsor