Clock Edge notation

Dmitry A. Kazakov a écrit :

The idea (of PAR etc) is IMO quite opposite. It is about treating
parallelism rather as a compiler optimization problem, than as a part of
the domain. In the simplest possible form it can be illustrated on the
example of Ada's "or" and "or else." While the former is potentially
parallel, it has zero overhead compared to sequential "or else." (I don't
count the time required to evaluate the operands). If we compare it with
the overhead of creating tasks, we will see a huge difference both in terms
of CPU cycles and mental efforts.
I don't buy this :) You don't have to create tasks for every
computations. You put in place a writer/consumer model. A task prepare
the data and put them into a list (protected object) and you have a set
of tasks to consume those jobs. This works in many cases, requires only
creation of tasks once (not as bad as OpenMP which creates threads for
parallel computations).

Pascal.

--

--|------------------------------------------------------
--| Pascal Obry Team-Ada Member
--| 45, rue Gabriel Peri - 78114 Magny Les Hameaux FRANCE
--|------------------------------------------------------
--| http://www.obry.net
--| "The best way to travel is by means of imagination"
--|
--| gpg --keyserver wwwkeys.pgp.net --recv-key C1082595
 
On Sat, 03 Mar 2007 19:31:24 +0100, Pascal Obry wrote:

Dmitry A. Kazakov a écrit :

The idea (of PAR etc) is IMO quite opposite. It is about treating
parallelism rather as a compiler optimization problem, than as a part of
the domain. In the simplest possible form it can be illustrated on the
example of Ada's "or" and "or else." While the former is potentially
parallel, it has zero overhead compared to sequential "or else." (I don't
count the time required to evaluate the operands). If we compare it with
the overhead of creating tasks, we will see a huge difference both in terms
of CPU cycles and mental efforts.

I don't buy this :)
Well, maybe I don't buy it too... :)-)) Nevertheless, it is a very
challenging and intriguing idea.

You don't have to create tasks for every computations.
(On some futuristic hardware tasks could become cheaper than memory and
arithmetic computations.)

You put in place a writer/consumer model. A task prepare
the data and put them into a list (protected object) and you have a set
of tasks to consume those jobs. This works in many cases, requires only
creation of tasks once (not as bad as OpenMP which creates threads for
parallel computations).
Ah, but publisher/subscriber framework is itself a solution of some
problem, which is not a domain problem. If you had a distributed middleware
you would not care about publishers and subscribers. You would simply
assign/read a variable controlled by the middleware. Interlocking,
marshaling whatsoever would happen transparently.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de
 
"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:
On Sat, 03 Mar 2007 01:58:35 GMT, Ray Blaak wrote:
I am somewhat rusty on my Ada tasking knowledge, but why can't Thing be a
protected object?

I tried to explain it in my previous post.

When Thing is a protected object, then the procedures and entries of,
called from the concurrent alternatives are all mutually exclusive. This is
not the semantics expected from PAR. Probably it would be better to rewrite
as:
PAR only says that all of its statements run in parallel, nothing more nothing
less (e.g. equivalent to the task bodys you had around each statement before).

Those statements can themselves access synchronization and blockin controls
that affect their execution patterns.

No. The implied semantics of PAR is such that Thing should be accessed from
alternatives without interlocking because one *suggests* that the updates
are mutually independent.
The updates are independent only if their behaviour truly is independent. If
they access a shared synchronization control then by definition they are
mutually dependent.

It is not PAR that dictates this, but rather the statements themselves.

PAR would only be convenience shorthand for writing task bodies around each
statement.

When Thing is visible from outside it should be
blocked by PAR for everyone else. This is not the behaviour of a protected
object. It is rather a "hierarchical" mutex.
The behaviour of a protected object is defined by its entries and how it is
used.

--
Cheers, The Rhythm is around me,
The Rhythm has control.
Ray Blaak The Rhythm is inside me,
rAYblaaK@STRIPCAPStelus.net The Rhythm has my soul.
 
On Sat, 03 Mar 2007 18:28:18 +0100, Pascal Obry wrote:

Dr. Adrian Wrigley a écrit :
Numerous algorithms in simulation are "embarrassingly parallel",
but this fact is completely and deliberately obscured from compilers.

Not a big problem. If the algorithms are "embarrassingly parallel" then
the jobs are fully independent. In this case that is quite simple,
They aren't independent in terms of cache use! They may also have
common subexpressions, which independent treatments re-evalutates.

create as many tasks as you have of processors. No big deal. Each task
will compute a specific job. Ada has no problem with "embarrassingly
parallel" jobs.
A problem is it that it breaks the memory bandwidth budget. This
approach is tricky with large numbers of processors. And even more
challenging with hardware synthesis.

What I have not yet understood is that people are trying to solve, in
all cases, the parallelism at the lowest lever. Trying to parallelize an
algorithm in an "embarrassingly parallel" context is loosing precious
time.
You need to parallelise at the lowest level to take advantage of
hardware synthesis. For normal threads a somewhat higher level
is desirable. For multiple systems on a network, a high level
is needed.

What I want in a language is the ability to specify when things
must be evaluated sequentially, and when it doesn't matter
(even if the result of changing the order may differ).

Many real case simulations have billions of those algorithm to
compute on multiple data, just create a set of task to compute in
parallel multiple of those algorithm. Easier and as effective.
Reasonable for compilers and processors as they are designed now.
Even so it can be challenging to take advantage of shared
calculations and memory capacity and bandwidth limitations.

But useless for hardware synthesis. Or automated partitioning
software. Or generating system diagrams from code.

Manual partitioning into tasks and sequential code segments is
something which is not part of the problem domain, but part
of the solution domain. It implies a multiplicity of sequentially
executing process threads.

Using concurrent statements in the source code is not the same thing
as "trying to parallelise an algorithm". It doesn't lose any
prescious execution time. It simply informs the reader and the
compiler that the order of certain actions isn't considered relevant.
The compiler can takes some parts of the source and convert to
a netlist for an ASIC or FPGA. Other parts could be broken
down into threads. Or maybe parts could be passed to separate
computer systems on a network. Much of it could be ignored.
It is the compiler which tries to parallelise the execution.
Unlike tasks, where the programmer does try to parallelise.

Whose job is it to parallise operations? Traditionally,
programmers try to specify exactly what sequence of operations is
to take place. And then the compiler does its best to shuffle
things around (limited). And the CPU tries to overlap data
fetch, calculation, address calculation by watching the
instruction sequence for concurrency opportunities.
Why do the work to force sequential operation if the
compiler and hardware are desperately trying to infer
concurrency?

In other words, what I'm saying is that in some cases ("embarrassingly
parallel" computation is one of them) it is easier to do n computations
in n tasks than n x (1 parallel computation in n tasks), and the overall
performance is better.
This is definitely the case. And it helps explain why parallelisation
is not a job for the programmer or the hardware designer, but for
the synthesis tool, OS, processor, compiler or run-time. Forcing
the programmer or hardware designer to hard-code a specific parallism type
(threads), and a particular partitioning, while denying the expressiveness
of a concurrent language will result in inferior flexibility and
inability to map the problem onto certain types of solution.

If all the parallelism your hardware has is a few threads then all you
need to code for is tasks. If you want to be able to target FPGAs,
million-thread CPUs, ASICs and loosely coupled processor networks,
the Ada task model alone serves very poorly.

Perhaps mapping execution of a program onto threads or other
concurent structure is like mapping execution onto memory.
It *is* possible to manage a processor with a small, fast memory,
mapped at a fixed address range. You use special calls to move
data to and from your main store, based on your own analysis of
how the memory access patterns will operate. But this approach
has given way to automated caches with dynamic mapping of
memory cells to addresses. And virtual memory. Trying to
manage tasks "manually", based on your hunches about task
coherence and work load will surely give way to automatic
thread inference creation and management based on the interaction
of thread management hardware and OS support. Building in
hunches about tasking to achieve parallelism can only be
a short-term solution.
--
Adrian
 
Dr. Adrian Wrigley a écrit :

If all the parallelism your hardware has is a few threads then all you
need to code for is tasks. If you want to be able to target FPGAs,
million-thread CPUs, ASICs and loosely coupled processor networks,
the Ada task model alone serves very poorly.
Granted. I was talking about traditional hardware where OpenMP is used
and I do not find this solution convincing in this context. It is true
that for massively parallel hardwares things are different. But AFAIK
massively parallel hardwares (like IBM Blue Gene) all come with a
different flavor of parallelism, I don't know if it is possible to have
a model to fit them all... I'm no expert on those anyway.

Pascal.

--

--|------------------------------------------------------
--| Pascal Obry Team-Ada Member
--| 45, rue Gabriel Peri - 78114 Magny Les Hameaux FRANCE
--|------------------------------------------------------
--| http://www.obry.net
--| "The best way to travel is by means of imagination"
--|
--| gpg --keyserver wwwkeys.pgp.net --recv-key C1082595
 
Marcus Harnisch <marcus.harnisch@gmx.net> posted on Fri, 02 Mar 2007
14:22:00 +0100:

"To be fair, of the examples posted, only in C the behavior is
actually undefined."


In Ada it could be useful to read the value of a variable which has
not been assigned a value in the source code, e.g. if the VOLATILE
PRAGMA is used and the variable's memory location is directly
connected to output from a temperature sensor, for example. This is
similar to claiming that assigning a floating point number to a
variable of type INTEGER can be desired. It can be desired, but I
should be required to explicitly express my intent such as in
some_integer : integer := integer(floating_point_number);
instead of
some_integer : integer := floating_point_number;
which quite rightly is illegal.


"In VHDL [..] the variable *does* have an
initial value. It's not entirely obvious but at least you will
*always* get consistent results."


I was unaware of this. Could you please tell me more about this? I
have so far received no such response to
"Reading an undefined value",
HTTPS://Bugzilla.Mentor.com/show_bug.cgi?id=120
timestamped 2007-02-15 04:19.

Thanks in advance,
Colin Paul Gloster
 
On 5 Mar 2007 12:20:55 GMT, Colin Paul Gloster
<Colin_Paul_Gloster@ACM.org> wrote:

the variable *does* have an
initial value. It's not entirely obvious but at least you will
*always* get consistent results."


I was unaware of this. Could you please tell me more about this? I
have so far received no such response to
"Reading an undefined value",
HTTPS://Bugzilla.Mentor.com/show_bug.cgi?id=120
With respect, this is something that is trivially discovered
from reading the VHDL LRM or any half-decent text book;
there is no mystery about it. Any scalar in VHDL is
initialised to the left-hand value of its subtype's range;
aggregates have each of their components so initialised.
You can, of course, add an initialiser to a declaration to
override this behaviour. Every simulator I've ever used
correctly implements this language feature.

The problem arises not in the language, nor in simulation
(i.e. execution of a program written in VHDL on a
suitable platform), but in synthesis. The majority of
hardware platforms do not offer reliable power-up
initialisation of internal state. Consequently it is appropriate
to code explicitly some reset behaviour. For exactly this
reason, the hardware-oriented data types in VHDL (std_logic,
etc) have a specific undefined value as the leftmost value
of their value-set, so that initialisation oversights are
more likely to be detected.

Unfortunately for a purist such as you, there are many
occasions in hardware design where it is entirely
appropriate to read an uninitialised object. For
example, a pipeline or shift register probably does
not need the hardware overhead of reset; it will
automatically flush itself out over the first few clock
cycles - but only if you allow it to run, which of course
entails reading uninitialised (or default-initialised) values.
Consequently it is appropriate for synthesis tools to do
pretty much what they generally do: don't worry about
initialisations. For preference, they should issue warnings
about attempts to do explicit initialisation, since these cannot
be synthesised to hardware on most platforms. However,
even then it may be appropriate to let this past, since the
explicit initialisation may be useful in order to limit the
excessive pessimism that usually occurs when a simulator
is confronted with a lot of 'U' or 'X' values. This issue is
one of those things that hardware designers are required
to be aware of, and failure to attend to it is usually a good
identifying mark of a beginner, or a dyed-in-the-wool
programmer assuming that hardware design is easy.

Please don't assume that hardware design is naive, ignorant
or incompetent simply because it doesn't look exactly
like good software design.
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.bromley@MYCOMPANY.com
http://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.
 
Hello again,

It seems prudent to highlight that my advice regarding discrepancies
between VHDL simulations and synthesized VHDL was intended for someone
(named Mike Silva) who I suspected would not have been aware of how
common this is, who said in
news:1172344264.555988.157790@z35g2000cwz.googlegroups.com
(which is a different subthread on comp.lang.ada and so does not
appear in References fields on comp.lang.vhdl):

"[..]

Well, I did pick up a VHDL book a while back. Maybe it's a sign. :)
But first I want to get Ada running on a SBC."

Sorry if this caused confusion. Having said that, I would prefer
simulations to reflect reality (but I appreciate that less accuracy
for higher speed can be acceptable if you know what you are
sacrificing and what you are doing).


In news:u3b4nwsfx.fsf@trw.com timestamped Fri, 02 Mar 2007 15:55:46
+0000, Martin Thompson <martin.j.thompson@trw.com> posted:
"Colin Paul Gloster <Colin_Paul_Gloster@ACM.org> writes:
[..]
What do you mean by this? The VHDL I simulate behaves the same as the
FPGA, unless I do something bad like doing asynchronous design, or
miss a timing constraint."

Or if you use the enumeration encoding attribute and it is not supported
by both the simulation tool and the synthesis tool;
Well, attributes are all a bit tool specific, I'm not sure this is
important. The sim and the synth will *behave* the same, but the
numbers used to represent states might not be what you expect. Or am
I misunderstanding what you are getting at?"



You seem to have understood it but to not be aware of what I have read
on the matter, which may be just scaremongering, I have never actually
checked whether a simulation tool and a synthesis tool differ on this.

Many attributes are tool-specific, but not all. The attribute
ENUM_ENCODING is located somewhere in between: it was introduced in
IEEE Std 1076.6-1999, "IEEE Standard for VHDL Register
Transfer Level (RTL) Synthesis", and is still present in IEEE Std
TM
1076.6 -2004,
HTTP://IEEEXplore.IEEE.org/search/srchabstract.jsp?arnumber=1342563&isnumber=29580&punumber=9308&k2dockey=1342563@ieeestds&query=%28%281076.6-1999%29%3Cin%3Emetadata%29&pos=0
which contains in 7.1.8 Enumeration encoding attribute:
"[..]

NOTE-Use of this attribute may lead to simulation mismatches, e.g.,
with use of relational operators.

[..]"


E.g. from page 13 of Synopsys's pvhdl_2.pdf :
"[..]

You can override the automatic enumeration encodings and specify
your own enumeration encodings with the ENUM_ENCODING
attribute. This interpretation is specific to Presto VHDL, and
overriding might result in a simulation/synthesis mismatch. [..]

[..]"

You can see how "a simulation/synthesis mismatch" would result from
page 15 of pvhdl_7.pdf :
"[..]

Example 7-3 Using the ENUM_ENCODING Attribute
attribute ENUM_ENCODING: STRING;
-- Attribute definition
type COLOR is (RED, GREEN, YELLOW, BLUE, VIOLET);
attribute ENUM_ENCODING of
COLOR: type is "010 000 011 100 001";
-- Attribute declaration
The enumeration values are encoded as follows:
RED = "010"
GREEN = "000"
YELLOW = "011"
BLUE = "100"
VIOLET = "001"
The result is GREEN < VIOLET < RED < YELLOW < BLUE.
Note:
The interpretation of the ENUM_ENCODING attribute is specific to
Presto VHDL. Other VHDL tools, such as simulators, use the
standard encoding (ordering).

[..]"



In news:u3b4nwsfx.fsf@trw.com timestamped Fri, 02 Mar 2007 15:55:46
+0000, Martin Thompson <martin.j.thompson@trw.com> posted:
"[..]

Never used buffers, so I dunno about that!"


Apparently almost nobody used them. I never used them, I do not use
them.



"[..]

'Z' not being treated as high impedance by a synthesis tool;
It will be if the thing you are targetting has tristates. Either as
IOBs or internally."


Maybe Synplify does. Synopsys's pvhdl_5.pdf warns instead:
"5
Inferring Three-State Logic 5
Presto VHDL infers a three-state buffer when you assign the value of
Z to a signal or variable. The Z value represents the high-impedance
state. Presto VHDL infers one three-state buffer per process. You
can assign high-impedance values to single-bit or bused signals (or
variables). [..]

[..]

You cannot use the z value in an expression, except for
concatenation and comparison with z, such as in
if (IN_VAL = .Z.) then y<=0 endif;
This is an example of permissible use of the z value in an
expression, but it always evaluates to false. So it is also a
simulation/
synthesis mismatch.

[..]

Be careful when using expressions that compare with the z value.
Design Compiler always evaluates these expressions to false, and
the pre-synthesis and post-synthesis simulation results might differ.
For this reason, Presto VHDL issues a warning when it synthesizes
such comparisons."



In news:u3b4nwsfx.fsf@trw.com timestamped Fri, 02 Mar 2007 15:55:46
+0000, Martin Thompson <martin.j.thompson@trw.com> posted:
"> default
values being ignored for synthesis;
Works in my tools."


In case someone tries to coerce you into using Synopsys: from
pvhdl_c.pdf :

"[..]

subprogram
Default values for parameters are unsupported. [..]

[..]"



In news:u3b4nwsfx.fsf@trw.com timestamped Fri, 02 Mar 2007 15:55:46
+0000, Martin Thompson <martin.j.thompson@trw.com> posted:

"[..]

sensitivity lists being ignored for synthesis;
That depends on the tool.

or other
discrepancies.
Well, that covers a lot ;-)"


Needless to say ... :)



"[..]

This may be too much to expect for timing constraints, but I -- perhaps
naively -- do not see why an asynchronous design should be so dismissable.
How hard could it be to replace tools' warnings that a referenced signal
needs to be added to a sensitivity list with a rule in the language standard
which makes the omission from the sensitivity list illegal?
Because it might be handy for simulating something? I dunno to be
honest.

[..]"


I doubt it.




"[I am not an async expert but...] You can do async design in VHDL and
with synthesis, but proving correctness by simulation does not work
out as I understand it."


I do not have a clue.



"[..]
You may rightly deem that claim of mine to be unwarranted, but outside
of testbenches, I do not see what use the language is if it is not
transferrable to actual hardware.
What?! "Outside of testbenches, I do not see what use..." *Inside* of
testbenches is where I spend most of my coding time! The whole point
of having a rich language is to make running simulations easy.

The fact that we has a synthesisable subset is not a bad thing, just
how real life gets in the way. [..]"


It is possible to write testbenches for VHDL in a language other than
VHDL. I do not argue whether testbenches in VHDL or another language
are better. It is possible to synthesize code in a language other than
a dedicated hardware description language, and we interpret which of
synthesizable and unsynthesizable code are a side effect of
practicalities of reality. I am not trying to convince you on this
point, we simply think about it differently.



"I wish VHDL had *more* non synthesisable features
(like dynamic array sizing for example)."


I am aware of an initiative to add a feature, which may or may not be
synthesizable, to VHDL to aid verification, but I do not believe I had
heard a desire for VHDL to have "*more* non synthesisable features"
before news:u3b4nwsfx.fsf@trw.com .




" I'd like to write my
testbenches in Python :)"

So why don't you?
HTTP://MyHDL.JanDecaluwe.com/doku.php



"[..]

Martin J. Thompson wrote:

"Multi dimensional arrays have worked (even in synthesis) for years in
my experience.

[..]"

Not always, and not with all tools. E.g. last month, someone
mentioned in
news:548d3iF1vbcf6U1@mid.individual.net
: "Using 2D-Arrays as I/O signals _may_ be a problem for some synthesis
tools. [..]"
Well, that's a bit weak ("*may* be a problem") - what tools do they
currently not work in?"


Ask the author of news:548d3iF1vbcf6U1@mid.individual.net (Ralf
Hildebrandt). I am not personally aware of any.



Martin J. Thompson wrote:

"> I admit my next example is historical, but Table 7.1-1 Supported and
Unsupported Synthesis Constructs of Ben Cohen's second (1998) edition of
"VHDL Answers to Frequently Asked Questions" contains:
"[..]
[..] multidimensional arrays are not allowed
[..]"

Cheers,
C. P. G.
Yes, in the past it has been a problem. [..]"


By coincidence I was checking something else in the book Luca Fanucci,
"Digital Sistems Design Using VHDL", SEU, 2002 last week and it was
mentioned therein that multidimensional arrays are not
synthesizable. I do not know whether or not they actually were
supported for synthesis at that time.

Regards,
C.P.G.
 
In news:bf5ou2pq0uef82n56us167p2u8v8lb0g6n@4ax.com timestamped Mon, 05
Mar 2007 13:18:09 +0000, Jonathan Bromley
<jonathan.bromley@MYCOMPANY.com> posted:
"On 5 Mar 2007 12:20:55 GMT, Colin Paul Gloster
<Colin_Paul_Gloster@ACM.org> wrote:

the variable *does* have an
initial value. It's not entirely obvious but at least you will
*always* get consistent results."


I was unaware of this. Could you please tell me more about this? I
have so far received no such response to
"Reading an undefined value",
HTTPS://Bugzilla.Mentor.com/show_bug.cgi?id=120
With respect, this is something that is trivially discovered
from reading the VHDL LRM or any half-decent text book;
there is no mystery about it. Any scalar in VHDL is
initialised to the left-hand value of its subtype's range;
[..]

[..]"


Thank you. I clearly missed this. As a result of your post I have found:

"[..]

4.3.1.2 Signal declarations

[..]

In the absence of an explicit default expression, an implicit default
value is assumed for a signal of a scalar subtype or
for each scalar subelement of a composite signal, each of which is
itself a signal of a scalar subtype. The implicit
default value for a signal of a scalar subtype T is defined to be that
given by T'LEFT.

[..]

4.3.1.3 Variable declarations

[..]

If an initial value expression appears in the declaration of a
variable, then the initial value of the variable is determined
by that expression each time the variable declaration is
elaborated. In the absence of an initial value expression, a
default initial value applies. The default initial value for a
variable of a scalar subtype T is defined to be the value given
by T'LEFT. [..]

[..]"



Jonathan Bromley wrote:
"Unfortunately for a purist such as you, there are many
occasions in hardware design where it is entirely
appropriate to read an uninitialised object. For
example, a pipeline or shift register probably does
not need the hardware overhead of reset; it will
automatically flush itself out over the first few clock
cycles - but only if you allow it to run, which of course
entails reading uninitialised (or default-initialised) values.
Consequently it is appropriate for synthesis tools to do
pretty much what they generally do: don't worry about
initialisations. [..]"

I had not thought of those. I did mention a situation in
news:esh1v7$j93$1@newsserver.cilea.it
in which reading an uninitialized item is acceptable.


" For preference, they should issue warnings
about attempts to do explicit initialisation, since these cannot
be synthesised to hardware on most platforms. However,
even then it may be appropriate to let this past, since the
explicit initialisation may be useful in order to limit the
excessive pessimism that usually occurs when a simulator
is confronted with a lot of 'U' or 'X' values. This issue is
one of those things that hardware designers are required
to be aware of, and failure to attend to it is usually a good
identifying mark of a beginner, or a dyed-in-the-wool
programmer assuming that hardware design is easy."


I am certainly unaware of many important things related to
electronics.


"Please don't assume that hardware design is naive, ignorant
or incompetent simply because it doesn't look exactly
like good software design."


I do not. I am unhappy that electronic engineers are very eager to try
to transfer things which are unsuitable for software to hardware for
which they are also unsuitable, e.g. C++ and UML.
 
"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> posted on Fri, 2 Mar
2007 17:32:26 +0100 :
"[..]

I'm looking for something like Cilk, but even the concurrent loop
(JPR's for I in all 1 .. n loop?) would be a help.
Maybe, just a guess, the functional decomposition rather than statements
could be more appropriate here. The alternatives would access their
arguments by copy-in and resynchronize by copy-out."


From William J. Dally in 1999 on
HTTP://CVA.Stanford.edu/people/dally/ARVLSI99.ppt#299,37,Parallel%20Software:%20Design%20Strategy
:"[..]
- many for loops (over data,not time) can be forall
[..]"
Without reading that presentation thoroughly now, I remark that Dally
seemed to be supportive of Wrigley's finely grained parallelism.
 
In news:pan.2007.03.03.17.00.07.159450@linuxchip.demon.co.uk.uk.uk
timestamped Sat, 03 Mar 2007 16:59:52 GMT, "Dr. Adrian Wrigley"
<amtw@linuxchip.demon.co.uk.uk.uk> posted:
"[..]
On Sat, 03 Mar 2007 15:26:35 +0000, Jonathan Bromley wrote:

[..]

For the numerical-algorithms people, I suspect the problem of
inferring opportunities for parallelism is nearer to being solved
than some might imagine. There are tools around that
can convert DSP-type algorithms (such as the FFT that's
already been mentioned) into hardware that's inherently
Again, this is ages old now. But it can't convert
C-type programs reliably and efficiently.

parallel; there are behavioural synthesis tools that allow
you to explore the various possible parallel vs. serial
possibilities for scheduling a computation on heterogeneous
hardware. It's surely a small step from that to distributing
such a computation across multiple threads or CPUs. All
that's needed is the will.
[..]"


I am not aware of tools which automatically generate such parallel
implementations, though they may exist. For many algorithms a precise
implementation would be required, but for many numerical applications
in which absolute adherence is not required, are such tools so
impressive that they will replace Jacobi's method with the
Gauss-Seidel method (or something even better) without guidance?
 
On Mon, 05 Mar 2007 15:23:54 +0000, Colin Paul Gloster wrote:

"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> posted on Fri, 2 Mar
2007 17:32:26 +0100 :
"[..]

I'm looking for something like Cilk, but even the concurrent loop
(JPR's for I in all 1 .. n loop?) would be a help.

Maybe, just a guess, the functional decomposition rather than statements
could be more appropriate here. The alternatives would access their
arguments by copy-in and resynchronize by copy-out."

From William J. Dally in 1999 on
HTTP://CVA.Stanford.edu/people/dally/ARVLSI99.ppt#299,37,Parallel%20Software:%20Design%20Strategy
:"[..]
- many for loops (over data,not time) can be forall
[..]"
Without reading that presentation thoroughly now, I remark that Dally
seemed to be supportive of Wrigley's finely grained parallelism.
I hadn't seen that presentation, but a number of other key points
are made by Dally:
-------------------------------------
# Writing parallel software is easy
* with good mechanisms

# Almost all demanding problems have ample parallelism

# Need to focus on fundamental problems
* extracting parallelism
* load balance
* locality
o load balance and locality can be covered by excess parallelism

Conclusion: We are on the threshold of the explicitly parallel era
* Diminishing returns from sequential processors (ILP)
o no alternative to explicit parallelism
* Enabling technologies have been proven
o interconnection networks, mechanisms, cache coherence
* Fine-grain machines are more efficient than sequential machines

# Fine-grain machines will be constructed from multi-processor/DRAM chips
# Incremental migration to parallel software
-----------------------------------

Good to find *somebody* agrees with me!

Shame Ada isn't leading the pack :(
--
Adrian
 
Colin Paul Gloster <Colin_Paul_Gloster@ACM.org> writes:
Hello again,
Hi!

It seems prudent to highlight that my advice regarding discrepancies
between VHDL simulations and synthesized VHDL was intended for someone
(named Mike Silva) who I suspected would not have been aware of how
common this is, who said in
news:1172344264.555988.157790@z35g2000cwz.googlegroups.com
(which is a different subthread on comp.lang.ada and so does not
appear in References fields on comp.lang.vhdl):

"[..]

Well, I did pick up a VHDL book a while back. Maybe it's a sign. :)
But first I want to get Ada running on a SBC."

Sorry if this caused confusion. Having said that, I would prefer
simulations to reflect reality (but I appreciate that less accuracy
for higher speed can be acceptable if you know what you are
sacrificing and what you are doing).
Yes, you always have to simulate at an appropriate level of
abstraction (otherwise we'd be doing the whole hting in Spice :)

In news:u3b4nwsfx.fsf@trw.com timestamped Fri, 02 Mar 2007 15:55:46
+0000, Martin Thompson <martin.j.thompson@trw.com> posted:
"Colin Paul Gloster <Colin_Paul_Gloster@ACM.org> writes:
[..]


What do you mean by this? The VHDL I simulate behaves the same as the
FPGA, unless I do something bad like doing asynchronous design, or
miss a timing constraint."

Or if you use the enumeration encoding attribute and it is not supported
by both the simulation tool and the synthesis tool;

Well, attributes are all a bit tool specific, I'm not sure this is
important. The sim and the synth will *behave* the same, but the
numbers used to represent states might not be what you expect. Or am
I misunderstanding what you are getting at?"



You seem to have understood it but to not be aware of what I have read
on the matter, which may be just scaremongering, I have never actually
checked whether a simulation tool and a synthesis tool differ on this.
<snip>
OK, so that tool allows you to use attributes to do things that
violate the rules of VHDL.

The interpretation of the ENUM_ENCODING attribute is specific to
Presto VHDL. Other VHDL tools, such as simulators, use the
standard encoding (ordering).
"We changed the way VHDL works". No wonder you get mismatches.

<snip>

'Z' not being treated as high impedance by a synthesis tool;

It will be if the thing you are targetting has tristates. Either as
IOBs or internally."


Maybe Synplify does. Synopsys's pvhdl_5.pdf warns instead:
"5
Inferring Three-State Logic 5
Presto VHDL infers a three-state buffer when you assign the value of
Z to a signal or variable. The Z value represents the high-impedance
state. Presto VHDL infers one three-state buffer per process. You
can assign high-impedance values to single-bit or bused signals (or
variables). [..]

[..]

You cannot use the z value in an expression, except for
concatenation and comparison with z, such as in
if (IN_VAL = .Z.) then y<=0 endif;
This is an example of permissible use of the z value in an
expression, but it always evaluates to false. So it is also a
simulation/
synthesis mismatch.
OK, no synthesis tool will allow a comparison with "Z" as it's not a
real physical value that you can "measure" with a gate.

Be careful when using expressions that compare with the z value.
Design Compiler always evaluates these expressions to false, and
the pre-synthesis and post-synthesis simulation results might differ.
For this reason, Presto VHDL issues a warning when it synthesizes
such comparisons."
That's fair enough. Real life getting in the way again! At least you
get warned.

In news:u3b4nwsfx.fsf@trw.com timestamped Fri, 02 Mar 2007 15:55:46
+0000, Martin Thompson <martin.j.thompson@trw.com> posted:
"> default
values being ignored for synthesis;

Works in my tools."


In case someone tries to coerce you into using Synopsys: from
pvhdl_c.pdf :

"[..]

subprogram
Default values for parameters are unsupported. [..]

[..]"
I shan't be using that tool then (not for architectures which do
support inititalisation anyway). For ASICs it makes sense, as they
*don't* know how to initialise.

Again, I expect my synth tool to allow me to do things that my chip
can do and warn me (or error!) if I try and do impossible stuff.

<snip>
You can do async design in VHDL and
with synthesis, but proving correctness by simulation does not work
out as I understand it."


I do not have a clue.
Fair enough :)

"[..]
You may rightly deem that claim of mine to be unwarranted, but outside
of testbenches, I do not see what use the language is if it is not
transferrable to actual hardware.
Now I've reread that, I don't know what you are saying. The language
has two uses:
1) Testbenches
2) Defining real devices

are you saying the langauge is not useful if it can't *all* be used
for 2)?

<snip>
It is possible to write testbenches for VHDL in a language other than
VHDL. I do not argue whether testbenches in VHDL or another language
are better.
Nor do I.

It is possible to synthesize code in a language other than
a dedicated hardware description language, and we interpret which of
synthesizable and unsynthesizable code are a side effect of
practicalities of reality. I am not trying to convince you on this
point, we simply think about it differently.
OK.

"I wish VHDL had *more* non synthesisable features
(like dynamic array sizing for example)."


I am aware of an initiative to add a feature, which may or may not be
synthesizable, to VHDL to aid verification, but I do not believe I had
heard a desire for VHDL to have "*more* non synthesisable features"
before news:u3b4nwsfx.fsf@trw.com .
I think lots of people would like more support for funkier testbenches!

" I'd like to write my
testbenches in Python :)"

So why don't you?
HTTP://MyHDL.JanDecaluwe.com/doku.php
Then I'd have to write my FPGA code in Python as well. Which I would
also like to do, but integrating it with other IP is tricky at the
moment. It also only produces Verilog output (although VHDl is on the
way) which would mean a mixed-language license for me :-(

I have done some stuff in MyHDL, and I intend to do more (when I get a
PC upgrade at home... 300MHz Celeron's are not ideal for FPGA
development - even small ones!)

"[..]

Martin J. Thompson wrote:

"Multi dimensional arrays have worked (even in synthesis) for years in
my experience.

[..]"

Not always, and not with all tools. E.g. last month, someone
mentioned in
news:548d3iF1vbcf6U1@mid.individual.net
: "Using 2D-Arrays as I/O signals _may_ be a problem for some synthesis
tools. [..]"


Well, that's a bit weak ("*may* be a problem") - what tools do they
currently not work in?"


Ask the author of news:548d3iF1vbcf6U1@mid.individual.net (Ralf
Hildebrandt). I am not personally aware of any.
OK, so it's not strong support for the fact that multidimensional
arrays don't work anywhere...

Martin J. Thompson wrote:

"> I admit my next example is historical, but Table 7.1-1 Supported and
Unsupported Synthesis Constructs of Ben Cohen's second (1998) edition of
"VHDL Answers to Frequently Asked Questions" contains:
"[..]
[..] multidimensional arrays are not allowed
[..]"

Cheers,
C. P. G.

Yes, in the past it has been a problem. [..]"


By coincidence I was checking something else in the book Luca Fanucci,
"Digital Sistems Design Using VHDL", SEU, 2002 last week and it was
mentioned therein that multidimensional arrays are not
synthesizable. I do not know whether or not they actually were
supported for synthesis at that time.
Still 5 years old! A long time in EDA terms...

Cheers,
Martin

--
martin.j.thompson@trw.com
TRW Conekt - Consultancy in Engineering, Knowledge and Technology
http://www.conekt.net/electronics.html
 
Colin Paul Gloster <Colin_Paul_Gloster@ACM.org> writes:

I am unhappy that electronic engineers are very eager to try
to transfer things which are unsuitable for software to hardware for
which they are also unsuitable, e.g. C++ and UML.
That sounds like one for the .sig file!

Cheers,
Martin

--
martin.j.thompson@trw.com
TRW Conekt - Consultancy in Engineering, Knowledge and Technology
http://www.conekt.net/electronics.html
 
"Martin Thompson" <martin.j.thompson@trw.com> wrote in message
news:u3b4ibb0f.fsf@trw.com...
Colin Paul Gloster <Colin_Paul_Gloster@ACM.org> writes:
: "Using 2D-Arrays as I/O signals _may_ be a problem for some synthesis
tools. [..]"


Well, that's a bit weak ("*may* be a problem") - what tools do they
currently not work in?"


Ask the author of news:548d3iF1vbcf6U1@mid.individual.net (Ralf
Hildebrandt). I am not personally aware of any.


OK, so it's not strong support for the fact that multidimensional
arrays don't work anywhere...
2-D arrays work just fine in Quartus, haven't had the need to try higher
dimensions yet...for what it's worth.

Kevin Jennings
 
In news:u3b4ibb0f.fsf@trw.com timestamped Tue, 06 Mar 2007 16:19:12
+0000, Martin Thompson <martin.j.thompson@trw.com> posted:
"Colin Paul Gloster <Colin_Paul_Gloster@ACM.org> writes:
[..]
You may rightly deem that claim of mine to be unwarranted, but outside
of testbenches, I do not see what use the language is if it is not
transferrable to actual hardware.
Now I've reread that, I don't know what you are saying. The language
has two uses:
1) Testbenches
2) Defining real devices

are you saying the langauge is not useful if it can't *all* be used
for 2)?"


I really did say 2), but that would be as a hardware language. Writing
testbenches is not useless, and writing testbenches in a (part of a) language
which can not define real devices does not make that (part of a)
language useless for testbenches, but it does make that (part of a)
language useless for defining real devices which is what I expect to
be able to do with a hardware description language.


"[..]

"I wish VHDL had *more* non synthesisable features
(like dynamic array sizing for example)."


I am aware of an initiative to add a feature, which may or may not be
synthesizable, to VHDL to aid verification, but I do not believe I had
heard a desire for VHDL to have "*more* non synthesisable features"
before news:u3b4nwsfx.fsf@trw.com .
I think lots of people would like more support for funkier testbenches!

[..]"


In which case the goal to have "funkier testbenches", but you do not
need to go out of your way to make these unsynthesisable. They may be
unsynthesisable, but that would just be a side effect and not the
goal. Of course, as you do not want to synthesize such features, you
will not waste effort trying to make them synthesisable, but that is
not the same thing as deliberately obstructing any chance to
synthesize them.



"OK, so it's not strong support for the fact that multidimensional
arrays don't work anywhere...

[..]"

Apparently they work with many tools. Most tools are not necessarily
all tools, though it is true that we do not have a definite claim that
even one tool still in use does not support them, merely a hint.

Cheers,
Colin Paul
 
On Mar 6, 8:58 am, Phil Hays <inva...@dont.spam> wrote:
Andy Peters wrote:
On Mar 5, 1:11 pm, "Jean Nicolle" <jean.nico...@sbcglobal.net> wrote:
Is it possible to use an ISE project to compile for multiple devices?

I happen to have a project that can target two different boards with
different FPGAs. Most of the files are the same, besides the UCF. Do I
have to create separate ISE projects? I'd rather have one project with
different variations. But that doesn't seem supported. Anybody can set
me wrong?

Use a Makefile.

A makefile for this build can be a fairly good answer with dual core
computers becoming more common, as the two map and par jobs can be run in
parallel. Write a makefile and use:

make --jobs=2

Make is a cool utility. Can be loaded with the Cygwin package on Windows,
and is native on Linux. For more information on make:

http://www.gnu.org/software/make/

Using make doesn't prevent one from using a Tcl script(s) for the actual
builds as Jim suggested. If this was done the makefile might have just two
items (it might have more as well):

../bld1/board1.bit : *.vhd *.v board1.ucf build.tcl
tab> xtclsh build.tcl board1

../bld2/board2.bit : *.vhd *.v board2.ucf build.tcl
tab> xtclsh build.tcl board2

Some explanation of this makefile:

1) The first line of each item is "the target" : "the sources". Make
checks to see that the target is newer than the sources. If not newer or
if the target does not exist, then make executes the commands on following
lines starting with tab characters.

2) " <tab> " is the tab character. Required by make before every command.

3) xtclsh is the Xilinx Tcl shell, used to execute the script. ISE8.2 or
later.

4) build.tcl is the script that builds the designs. This script is
expecting a parameter to define which board to target.

Tcl is not as trivial to multithread as make is. On the other hand, Tcl is
a general purpose language, so it can be used for lots of other tasks that
make can't do, such as creating revision or timestamp values to be loaded
into registers, parsing report files, multiple .ucf files in a design,
etc. For more information on Tcl see:

http://www.tcl.tk/about/features.html

Or the Tcl section in the Xilinx manual.

--
Phil Hays (Xilinx, but writing my own words)
Hi,
I had already met the situation you are in now 3 years ago.

The main culprit is that VHDL language lacks the capability of
handling conditional statements.

Its drawback of VHDL put small company engineers in a very
disadvantageous place.

The reason is in small company, one doesn't have manpower resources to
develop a conditional statement program to permit VHDL to insert
conditional statements.

It is very hard for anyone to imagine without such powerful
conditional statement handling software, Intel would develop a
multiple core system.

I mentioned the problem in VHDL group, but met huge opposing and even
someone suggested to use C++, C language preprocessor program to
handle VHDL problem, or use makefile. It is a shame for VHDL language.

Finally I wrote a software to do the job. Since then, I can easily
develop several versions for one project using one source file:
product version, ChipScope debugging version, simulation version and
so on.

And more than that, I have put 4 project files into one big file, in
other words, 4 project files are sharing one big VHDL file.

Without the similar method, Intel cannot manufacture so many product
lines.

Generate/loop statements have very limited capability in reality. For
example, it can only change signal's width in a module interface, but
it cannot insert or delete any signals in the module interface.

Weng
 
Colin Paul Gloster <Colin_Paul_Gloster@ACM.org> writes:

In news:u3b4ibb0f.fsf@trw.com timestamped Tue, 06 Mar 2007 16:19:12
+0000, Martin Thompson <martin.j.thompson@trw.com> posted:
"Colin Paul Gloster <Colin_Paul_Gloster@ACM.org> writes:
[..]
"[..]
You may rightly deem that claim of mine to be unwarranted, but outside
of testbenches, I do not see what use the language is if it is not
transferrable to actual hardware.


Now I've reread that, I don't know what you are saying. The language
has two uses:
1) Testbenches
2) Defining real devices

are you saying the langauge is not useful if it can't *all* be used
for 2)?"


I really did say 2), but that would be as a hardware language. Writing
testbenches is not useless, and writing testbenches in a (part of a) language
which can not define real devices does not make that (part of a)
language useless for testbenches, but it does make that (part of a)
language useless for defining real devices which is what I expect to
be able to do with a hardware description language.
So you'd rather use a different language for testbenches? Or throw
away the non-synth bits.

<aside>Historically, VHDL was for *describing* hardware, and only
intended for synthesis. People then made synthesisers later. So it's
not the fault of the language :)</aside>

"[..]



"I wish VHDL had *more* non synthesisable features
(like dynamic array sizing for example)."


I am aware of an initiative to add a feature, which may or may not be
synthesizable, to VHDL to aid verification, but I do not believe I had
heard a desire for VHDL to have "*more* non synthesisable features"
before news:u3b4nwsfx.fsf@trw.com .


I think lots of people would like more support for funkier testbenches!

[..]"


In which case the goal to have "funkier testbenches", but you do not
need to go out of your way to make these unsynthesisable. They may be
unsynthesisable, but that would just be a side effect and not the
goal.
Is that not the case with current features? No-one is going out of
their way to make them unsynthesisable. I think we just differ on our
point of view here - you would like all of an HDL to be synthesisable,
I'd *like* that, but I'll happily use a subset for synth and more of
it for sims.

Of course, as you do not want to synthesize such features, you
will not waste effort trying to make them synthesisable, but that is
not the same thing as deliberately obstructing any chance to
synthesize them.
Again, no-one is going out of their way.

"OK, so it's not strong support for the fact that multidimensional
arrays don't work anywhere...

[..]"

Apparently they work with many tools. Most tools are not necessarily
all tools, though it is true that we do not have a definite claim that
even one tool still in use does not support them, merely a hint.
Fine, we'll leave it at that then :)

Cheers,
Martin

--
martin.j.thompson@trw.com
TRW Conekt - Consultancy in Engineering, Knowledge and Technology
http://www.conekt.net/electronics.html
 
I have had similar requirements (updating state variables, or some such)
where I used dual-port RAM; I use one port for the read, and the other
(delayed a clock) for the modify-write.

The pipeline needs to be managed properly, but it can save tremendously on
registers (assuming that only one index needs to be updated at a time. If
all entries need concurrent access--well, a memory won't cut it. For my
application(s), typically TDM processing of multiple channels, it works
well.)

JTW

"news reader" <newsreader@google.com> wrote in message
news:esrs16$anh$1@reader01.singnet.com.sg...
"Utku Özcan" <utku.ozcan@gmail.com> wrote in message
news:1173384869.194349.20140@q40g2000cwq.googlegroups.com...

Hi "news reader", my humble perls in between..

news reader schrieb:

In the design I have 256 3-bit registers, every time I need to read or
write 16 of them (data_o0, 1, ...15).
The read/write address is not totally random.

It seems that you have an algorithm that handles a deterministic
distribution of the values to be accessed. Therefore you think you can
implement it with logic only.

I assume you are modeling an algorithm for a special matrix operation.


It's not matrix, but the memory access is intensive, must accomplish r/w
in
single clock cycle, so register is used instead of memory.


For example, assuming that I arrange the register into a 16X16 matrix,
data_o0 accesses among the zeros row or column. data_o1 may access from
20 of the
registers, but not 256, data_o2 may access from 30 of the variables,
etc.

The values do not give us much info. data_ox (x = 1, 2, ...) is
accessing which elements and in which distribution?


In each clock cycle, 16 addresses are generated, and 16 data are
read/written. However,
each of the 16 data is read/written only to n/256 addresses (0<n<255).


If I code such that every output reads from the 256 registers, the final
logic will be overkill and highly redundant.

You think that the distribution of elements can be accessed with pure
logic.
Therefore you tried to model your logic to cover every case, or you
want to do it so.

If I use case statements to list each of the senarios, the RTL code may
end
up 500 kilobyte.

This is reasonable then.



By means of case statement, I use 32 case statements, in each case
statement there
are less than 256 choices. Some have only 20, 30 choices, etc.


Will design compiler synthesize a 500KB design efficiently?

What means "efficience" for you? Speed or minimum logic?
If minimum logic, then please share with us the algorithm you are
trying to implement.

Will NCVerilog compile and simulate it efficiently?

NCVerilog does not care about logic implementation. It defines the
behaviour of the system, no matter how the objects are linked.



For example in read operation,
--------------------- implementation A------------------
input [7:0] addr_i0, addr_r1, ...addr_r15;
output [2:0] dat_o0, dat_o1, ...dat_o15;

reg [2:0] mymemory[0:255]; // Main memory

dat_o0 <= mymemory[addr_i0];
dat_o1 <= mymemory[addr_i1];
....
dat_o15 <= mymemory[addr_i15];
--------------------- End A------------------

--------------------- implementation B------------------

case (addr_i0) // I can calculate these options through simulations.
8'd0 : dat_o0 <= mymemory[0 ];
8'd5 : dat_o0 <= mymemory[5 ];
8'd54 : dat_o0 <= mymemory[54 ];
8'd122: dat_o0 <= mymemory[122];
8'd125: dat_o0 <= mymemory[125];
...
8'd166: dat_o0 <= mymemory[166];
8'd233: dat_o0 <= mymemory[233];
default: dat_o0 <= mymemory[0 ];
endcase



case (addr_i1)
8'd0 : dat_o1 <= mymemory[0 ];
8'd7 : dat_o1 <= mymemory[7 ];
8'd9 : dat_o1 <= mymemory[9 ];
8'd13 : dat_o1 <= mymemory[13 ];
8'd25 : dat_o1 <= mymemory[25 ];
8'd57 : dat_o1 <= mymemory[57 ];
8'd124: dat_o1 <= mymemory[124];
...
8'd133: dat_o1 <= mymemory[133];
8'd155: dat_o1 <= mymemory[155];
8'd277: dat_o1 <= mymemory[277];
default: dat_o1 <= mymemory[0 ];
endcase

...
case (addr_i15)
...
--------------------- End B------------------

In terms of hardware implementation, is it certain that implementation B
saves hardware
compared to A? Will the large chunks of RTL codes causes a DC or NCVerilog
to
choke up?



Are there any neater techniques to attack this problem?

Since you have not given much data, I think you can implement this
stuff with a RAM.
Why don't you use a RAM? Then you can define the RAM addresses to
model your matrix. You will generate addresses to define the positions
for your matrix which mimics your algorithm.


I used registers instead of RAM due to the memory throughput.



Utku.
 
Sounds interesting to me. Would you like to contribute this or make a
piece of commercial software out of it? I guess it would by no problem
for the -> small companies to spende some Cents on a tool which saves
hours of programming time over the months.

Currently I am handling conditional synthesis (parameters mostly) with
Excel / Access :)

Thanks
J., currently working for a "small company"
 

Welcome to EDABoard.com

Sponsor

Back
Top