Beginner question: What trigs processes

J

Jerker Hammarberg

Guest
(I'm sorry if this message appears several times - problems with newsgroups client)

Hello all! I'm learning VHDL but despite my thorough search through several
books I can't find the answer to the following pretty basic question: What
exactly causes a process (with a sensitivity list) to run, and can it be run
several times in the same point in time?

Consider the following example:
(Multiplier has two 16-bit inputs a and b and one 32 bit output p)

01: architecture RTL of Multiplier is
02: signal multin1: integer range -32768 to 32767;
03: signal multin2: integer range -32768 to 32767;
04: signal multout: integer;
05: begin
06: multout <= multin1 * multin2;
07: process (a, b, multout)
08: variable c: integer := 0;
09: variable d: integer := 0;
10: begin
11: multin1 <= a;
12: multin2 <= b;
13: p <= multout;
14: c := c + 1;
15: d := d + p;
16: end process;
17: end;

(I know it's stupid but let's use it for the purpose of explanation.) Let's
say that new data arrive to a and b at a certain point in time. Will the
process run once or twice? What will c and d end up being? My guess is c = 2
and d = undefined, but then I wonder if this still holds in the synthesized
system?

/Jerker
 
Hello all! I'm learning VHDL but despite my thorough search through
several books I can't find the answer to the following pretty basic
question: What exactly causes a process (with a sensitivity list) to run,
and can it be run several times in the same point in time?
Ooooh, lemme try to answer this one. :)
There's two senses of time in VHDL: normal time (in seconds, nanoseconds,
etc.) and delta time. Delta time are basically sequential execution moments
at the same point in normal time. The following happens:

LOOP
LOOP while there is activity on any of the signals
all processes are called to see if there has been activity on any of the
signals in their sensitivity lists, and are executed accordingly
increase time t to t + 1 delta
END LOOP
time t increases from t to t + 1 (second, pico second, whatever)
END LOOP

In your example, the process will run once or twice, depending on when a and
b change. Example:

a <= '0', '1' AFTER 2 ns;
b <= '1', '0' AFTER 2 ns;

In this case, the process will run once, since both a and b change at time =
2ns;

a <= '0', '1' AFTER 2 ns;
b <= NOT( a );

In this case, the process will run twice, since a changes at time = 2 ns,
and b changes at time = 2 ns + 1 delta.

Hope this helps. :)

Regards,

Pieter Hulshoff
 
Oh wait... let me answer my own question. Since multin1 and multin2 don't
change at delta 2, there is no activity that can trig the multiplicaton in
the next delta and therefore it will stop.

So I guess I'm back at my original question: Is it true that the process
will run twice (although a and b arrives at the same simulation time)? And
if so, how will the synthesis tool implement this - will it place two copies
of the logic of the process on the chip?

/Jerker
 
Oh wait... let me answer my own question. Since multin1 and multin2 don't
change at delta 2, there is no activity that can trig the multiplicaton in
the next delta and therefore it will stop.
That is correct.

So I guess I'm back at my original question: Is it true that the process
will run twice (although a and b arrives at the same simulation time)?
Yes, it does. I had not taken the multout process into account in the
example I gave you.

And if so, how will the synthesis tool implement this - will it place two
copies of the logic of the process on the chip?
Now we're at a different ballpark: how does synthesis handle this?

First of all, as c and d are never read, they will be ignored, and their
logic removed. Your process is little other than signal name changes, and
will most likely be ignored as well, though they might show up as wires in
the netlist. Your multout process is also combinatorial, so it will be
implemented as a straight multiplier. Basically what you'll end up with is
a combinatorial (no clock) multiplier, probably just like you intended. :)

Regards,

Pieter Hulshoff
 
What if c and d WERE read? Let's place a "c_out <= c;" statement in the
end of the process, where c_out is a 32 bit output port. Now, will the
implementated system have two copies of the logic for c inside? Or maybe
it's not even synthesizable at all? If so, is there a "synthesizability
rule" so that I can predict this?
Well, I can't really think of a proper application for this, but I doubt it
would synthesize. If you can give me a proper application for something
like this though (describe the behaviour of what you'd want to build) I
should be able to give you proper code for it. :)

I don't know of a general rule of thumb for what is synthesizable, although
there are many rules of things that don't synthesize. Pure combinatorial
logic that needs to hold a value usually leads to interesting results
though. :)

Regards,

Pieter Hulshoff
 
Jerker Hammarberg wrote:

If so, is there a "synthesizability rule" so that I can predict this?

A minimal synthesis is a entity port assigned to a constant:
my_port_pin <= '1';

Input assignments and any subsequent processing that affects
no output pins on the device, synthesizes to nothing.


I agree with Pieter that "What will these equations make?"
is the wrong question.


-- Mike Treseler
 
Well, I can't really think of a proper application for this, but I doubt
it
would synthesize. If you can give me a proper application for something
like this though (describe the behaviour of what you'd want to build) I
should be able to give you proper code for it. :)
You're damn right, the example doesn't really make sense! Well basically
what I'm actually trying to do is to implement a complicated mathematical
function containing several multiplications, additions etc. To save chip
space, I use only one multiplier and a state machine to control the access
to it. So in the first state, the FPGA will sample the input variable, and
some clock cycles later it will have reached the last state and will output
the result. Then it starts anew. There's also feedback involved, so some
partial results during the calculation will be used for the next round.

I won't state the whole function here, but I can give a minimal example that
should make sense: Let's say three 16-bit integers a, b and c are repeatedly
sampled, multiplied and accumulated to a 32 bit accumulator o as follows:

o = (last value of o) + a * b * c

Since I only want to use one multiplier, the calculation will be done over
two clock cycles. Then I would like to write as follows:

architecture RTL of Function is
signal multin1: integer range -32768 to 32767;
signal multin2: integer range -32768 to 32767;
signal multout: integer;
signal state: bit := '0';
signal next_state: bit;
begin
multout <= multin1 * multin2;
process (a, b, multout, state)
variable c_saved: integer range -32768 to 32767;
variable o_accum: integer := 0;
variable axb: integer;
begin
case state is
when '0' =>
c_saved := c;
multin1 <= a;
multin2 <= b;
axb := multout;
next_state <= '1';
when '1' =>
multin1 <= axb;
multin2 <= c_saved;
o_accum := o_accum + multout;
o <= o_accum;
next_state <= '0';
end case;
end process;
process (clk)
begin
if clk'event and clk = '1' then
state <= next_state;
end if;
end process;
end;

But I understand now that I can't do it like this, because when state '1' is
clocked in, the process will be run twice, thus adding first the old, then
the new value of multout to o_accum. Just out of curiosity I would like to
know if this actually happens in the implemented system too?

And maybe I'm asking too much now but... I would be really grateful to see
the best way to rewrite this so that it works correctly!

/Jerker
 
I can't even begin to think of what your code would synthesize into, if it
would synthesize at all, which I highly doubt. :)

Ok, let's see here:

First step: any value you need to store for a 2nd run needs to be in a
clocked process.

Second step: you need an indication of when your two step process starts.
Initial values don't synthesize, so you need some kind of enable to start
your process and/or resynchronize it. I'll assume a, b, and c are available
for both clock cycles, and that their FlipFlops don't have any logic
between them and this design unit.

Third step: avoid combinatorial loops like the plague! Don't have a signal
in a combinatorial process loop back to itself. It's deadly.

As I like to integrate combinatorial and clocked logic, I'd build your
function like this (using the integers, though I personally prefer using
signed and unsigned). I also like using port type BUFFER, as this is custom
practice in our company. I'll even add a synchronous reset for you:

ENTITY function IS
PORT
(
clk : IN std_logic;
enable : IN std_logic;
reset : IN std_logic;
a : IN integer RANGE -32768 TO 32767;
b : IN integer RANGE -32768 TO 32767;
c : IN integer RANGE -32768 TO 32767;
o : BUFFER integer
)
END ENTITY function;

ARCHITECTURE rtl OF function IS
SIGNAL state : std_logic;
SIGNAL axb : integer;
BEGIN
PROCESS
BEGIN
WAIT UNTIL clk = '1';
IF enable = '1' OR state = '0' THEN
state <= '1';
axb <= a*b;
ELSE
state <= '0';
o <= o + axb * c;
END IF;
IF reset = '1' THEN
state <= '0';
o <= 0;
END IF;
END PROCESS;
END ARCHITECTURE rtl;

Hope this helps.

Regards,

Pieter Hulshoff
 
You should define c_saved, axb and o_accum as DFF (under a clocked process)
and not as LATCH (under a combinatorial process).

FE


"Jerker Hammarberg" <jerha202@student.liu.se> wrote in message
news:7bYRa.16773$mU6.15620@newsb.telia.net...
Well, I can't really think of a proper application for this, but I doubt
it
would synthesize. If you can give me a proper application for something
like this though (describe the behaviour of what you'd want to build) I
should be able to give you proper code for it. :)

You're damn right, the example doesn't really make sense! Well basically
what I'm actually trying to do is to implement a complicated mathematical
function containing several multiplications, additions etc. To save chip
space, I use only one multiplier and a state machine to control the access
to it. So in the first state, the FPGA will sample the input variable, and
some clock cycles later it will have reached the last state and will
output
the result. Then it starts anew. There's also feedback involved, so some
partial results during the calculation will be used for the next round.

I won't state the whole function here, but I can give a minimal example
that
should make sense: Let's say three 16-bit integers a, b and c are
repeatedly
sampled, multiplied and accumulated to a 32 bit accumulator o as follows:

o = (last value of o) + a * b * c

Since I only want to use one multiplier, the calculation will be done over
two clock cycles. Then I would like to write as follows:

architecture RTL of Function is
signal multin1: integer range -32768 to 32767;
signal multin2: integer range -32768 to 32767;
signal multout: integer;
signal state: bit := '0';
signal next_state: bit;
begin
multout <= multin1 * multin2;
process (a, b, multout, state)
variable c_saved: integer range -32768 to 32767;
variable o_accum: integer := 0;
variable axb: integer;
begin
case state is
when '0' =
c_saved := c;
multin1 <= a;
multin2 <= b;
axb := multout;
next_state <= '1';
when '1' =
multin1 <= axb;
multin2 <= c_saved;
o_accum := o_accum + multout;
o <= o_accum;
next_state <= '0';
end case;
end process;
process (clk)
begin
if clk'event and clk = '1' then
state <= next_state;
end if;
end process;
end;

But I understand now that I can't do it like this, because when state '1'
is
clocked in, the process will be run twice, thus adding first the old, then
the new value of multout to o_accum. Just out of curiosity I would like to
know if this actually happens in the implemented system too?

And maybe I'm asking too much now but... I would be really grateful to see
the best way to rewrite this so that it works correctly!

/Jerker
 
Hi Jerker!


o = (last value of o) + a * b * c

Since I only want to use one multiplier, the calculation will be done over
two clock cycles. Then I would like to write as follows:

architecture RTL of Function is
signal multin1: integer range -32768 to 32767;
signal multin2: integer range -32768 to 32767;
signal multout: integer;
signal state: bit := '0';
signal next_state: bit;
begin
multout <= multin1 * multin2;
process (a, b, multout, state)
variable c_saved: integer range -32768 to 32767;
variable o_accum: integer := 0;
variable axb: integer;
begin
case state is
when '0' =
c_saved := c;
multin1 <= a;
multin2 <= b;
axb := multout;
next_state <= '1';
when '1' =
multin1 <= axb;
multin2 <= c_saved;
o_accum := o_accum + multout;
o <= o_accum;
next_state <= '0';
end case;
end process;
process (clk)
begin
if clk'event and clk = '1' then
state <= next_state;
end if;
end process;
end;

But I understand now that I can't do it like this, because when state '1' is
clocked in, the process will be run twice, thus adding first the old, then
the new value of multout to o_accum. Just out of curiosity I would like to
know if this actually happens in the implemented system too?
Yes, it will. I wrote such code several times because of "idiotic typing
errors". The simulator runs into an "infinite loop". Synthesis should
not comply, because it's not task of synthesis to detect such loops.
(Some synthesis tools will warn you: "timing loop detected".)

I would make a copy of signal o_accum, before the accumulation is done.

Your (latch based) code may lead to hazards, because if you make a copy
of o_accum in state 0, and the state machine switches to state 1, the
enable-signal for the latch, that contains the copy of o_accum may not
be disabled.
Solution: Merge the "state-change"-process with the
"while-state-is"-process. Then all registers become flipflops:

if clk'event and clk = '1' then
case state is
when '0' =>
c_saved := c;
multin1 <= a;
multin2 <= b;
axb := multout;
state <= '1';
when '1' =>
multin1 <= axb;
multin2 <= c_saved;
o_accum := o_accum + multout;
o <= o_accum;
state <= '0';
end case;
end if;


... well I hope this is correct. I have no VHDL-compiler at this PC to
check it.

I would recommend the flipflop-based solution, but it should be possible
to optimize it, to use a mixed latch- and ff-based solution or even a
pure latch-based solution. Because optimizing it is not your question,
the ff-based solution should be o.k.. ;-)

Ralf
 
Thank you all for your suggestions! You have taught me to put everything in
the clocked process to avoid latches. However, none of the designs that you
suggested seem to produce what I wanted. First Pieter's design:

WAIT UNTIL clk = '1';
IF enable = '1' OR state = '0' THEN
state <= '1';
axb <= a * b;
ELSE
state <= '0';
o <= o + axb * c;
END IF;
IF reset = '1' THEN
state <= '0';
o <= 0;
END IF;

It's very elegant, and I had no idea that one could accumulate to signals
like that. But the whole point with going through the multin-multout thing
was to share the multiplier to save FPGA area. Maybe a good optimizer would
find that the two multipliers aren't used at the same time and apply
resource sharing automatically, but mine (Xilinx) doesn't, so I get two
multipliers here.

Here is Ralf's suggestion:

if clk'event and clk = '1' then
case state is
when '0' =>
c_saved := c;
multin1 <= a;
multin2 <= b;
axb := multout;
state <= '1';
when '1' =>
multin1 <= axb;
multin2 <= c_saved;
o_accum := o_accum + multout;
o <= o_accum;
state <= '0';
end case;
end if;

If this works, then I'm confused again about the "What trigs processes"
question, because as far as I understand, it takes two deltas before a * b
actually reaches multout (in state '0'). Since the process will only execute
once (at delta 0), axb will never be assigned this value. The same goes for
o_accum.

So it seems I'm still stuck... If I want to share a multiplier, I guess I
HAVE to send the factors out of the process, and then the process HAS to
execute twice in order to take care of the result in the same clock cycle.
Or am I wrong here?

I'm sorry to keep bugging you!

/Jerker
 
WAIT UNTIL clk = '1';
IF enable = '1' OR state = '0' THEN
state <= '1';
axb <= a * b;
ELSE
state <= '0';
o <= o + axb * c;
END IF;
IF reset = '1' THEN
state <= '0';
o <= 0;
END IF;

Maybe a good optimizer would find that the two multipliers aren't used at
the same time and apply resource sharing automatically, but mine (Xilinx)
doesn't, so I get two multipliers here.
Aah, the beauty of compiler limitations. :) Ok, let's try it again then
shall we?

ARCHITECTURE rtl OF function IS
SIGNAL multin1 : integer range -32768 to 32767;
SIGNAL multin2 : integer range -32768 to 32767;
SIGNAL multout : integer;
SIGNAL axb : integer;
SIGNAL state : std_logic;
BEGIN

multout <= multin1 * multin2;

multin_cmb: PROCESS( a, b, c, axb, state, enable )
BEGIN
IF enable = '1' OR state = '0' THEN
multin1 <= a;
multin2 <= b;
ELSE
multin1 <= c;
multin2 <= axb;
END IF;
END PROCESS multin_cmb;

mult_reg: PROCESS
BEGIN
WAIT UNTIL clk = '1';
IF enable = '1' OR state = '0' THEN
state <= '1';
axb <= multout;
ELSE
state <= '0';
o <= o + multout;
END IF;
IF reset = '1' THEN
state <= '0';
axb <= 0;
o <= 0;
END IF;
END PROCESS mult_reg;

END ARCHITECTURE rtl;

Hope this works better.

Regards,

Pieter Hulshoff
 
If this works, then I'm confused again about the "What trigs processes"
question, because as far as I understand, it takes two deltas before a *
b actually reaches multout (in state '0'). Since the process will only
execute once (at delta 0), axb will never be assigned this value. The
same goes for o_accum.

The process is triggered everytime a signal in the sensitivity list
changes, but the if-clause is executed only at rising_edge(clk).
Therefore it is not nessecary to have all input-signals in the
sensitivity list. Only clk is needed.
I'm sorry Ralf, but Jerker is correct. In your process:

if clk'event and clk = '1' then
case state is
when '0' =>
c_saved := c;
multin1 <= a;
multin2 <= b;
axb := multout;

multin1 and multin2 get their value 1 delta after the rising clock edge.
multout gets it value 1 delta after that. This means that axb will not get
the correct value.

Regards,

Pieter Hulshoff
 
Thank you Pieter! This one really seems to do it, and everything is clear to
me now about processes, deltas, DFFs vs flip-flops and so on - at least for
now...

But while we're at it, I think I managed to put together an even simpler
version. Would you review it for me? It should work too, right? (For the
sake of simplicity, I removed the reset and enable signals, although I
understand that at least one of them should be there.)

architecture RTL of Experiment is
signal state: std_logic := '0';
begin
process
variable multin1: integer;
variable multin2: integer;
variable multout: integer;
begin
wait until clk = '1';
case state is
when '0' =>
multin1 := a;
multin2 := b;
when '1' =>
multin1 := multout;
multin2 := c;
end case;
multout := multin1 * multin2;
case state is
when '0' =>
state <= '1';
when '1' =>
o <= o + multout;
state <= '0';
end case;
end process;
end;

/Jerker
 
But while we're at it, I think I managed to put together an even simpler
version. Would you review it for me? It should work too, right? (For the
sake of simplicity, I removed the reset and enable signals, although I
understand that at least one of them should be there.)
This one would work fine in my opinion. For personal reasons I just try to
avoid using variables. They tend to lead to more timing issues (unless you
know what you're doing), and it's a pain to find them in a netlist. As said
though: this is just my personal preference. I know plenty of collegues
that use them all over the place.

Regards,

Pieter Hulshoff
 
Jerker Hammarberg wrote:

But while we're at it, I think I managed to put together an even simpler
version. Would you review it for me? It should work too, right?
Consider writing a testbench, to prove it.
I've enjoyed this thread, so I'll get you started.
Let's add an entity to your process, to make it testable.
--------------------------------------------
library ieee;
use ieee.std_logic_1164.all;

entity mult is
port (a, b, c : in integer;
o : out integer;
clk, rst : in std_ulogic );
end mult;

architecture synth of mult is

begin
this : process( clk, rst) is
variable step_1 : boolean;
variable multin1 : integer;
variable multin2 : integer;
variable multout : integer;
begin
clked : if rst = '1' then
o <= 0;
step_1 := true;
elsif rising_edge(clk) then
op : case step_1 is
when true =>
multin1 := a;
multin2 := b;
when false =>
multin1 := multout;
multin2 := c;
end case op;
multout := multin1 * multin2;
step_1 := not step_1;
end if clked;
end process this;
end synth;
------------------------------------------------

Note that this step clarifies what is local
to the process and what is i/o.

State variables are normally type enumerations,
but a boolean will do fine here.
Using std_logic for state variables would force
us to consider possible states such as 'H' and 'Z'.

I know I am in the minority in this group,
but I would encourage the appropriate use
of variables. The upside for me is that
the code is easier to sim in my head, and
therefore, much more likely to work the first
time.

-- Mike Treseler
 

Welcome to EDABoard.com

Sponsor

Back
Top