How to describe a pipeline structure in VHDL

I

Ingmar Seifert

Guest
Hello,

I want to implement an algorithm, that is based on multiplication and
afterwards accumulation. The last step is to store the result from the
addition to registers.

I don't produce one "product" on this pipeline. I produce different
"products", so I have a multifunctional pipeline.
The different calculations are for example:
seq. MUL ADD STORE result in
1 a*b product+x m00
2 product*b product+y m01

So my question is, how to desribe the structure of the pipeline and the
control FSM.
I have thought about a FSM, that has a state for each "product" and
gives out control signals, that are delayed by D-FF to reach each
pipeline stage at the right time and to say, which operation has to be done.

Each pipeline stage lasts one clock cycle.

I would be very happy about a short piece of code or some hints.
Thanks in advance for your help.

Ingmar Seifert
 
I want to implement an algorithm, that is based on multiplication and
afterwards accumulation. The last step is to store the result from the
addition to registers.

I don't produce one "product" on this pipeline. I produce different
"products", so I have a multifunctional pipeline.
The different calculations are for example:
seq. MUL ADD STORE result in
1 a*b product+x m00
2 product*b product+y m01

So my question is, how to desribe the structure of the pipeline and the
control FSM.
I have thought about a FSM, that has a state for each "product" and
gives out control signals, that are delayed by D-FF to reach each
pipeline stage at the right time and to say, which operation has to be done.

Each pipeline stage lasts one clock cycle.

I would be very happy about a short piece of code or some hints.
How about the following (unchecked for syntax)
architecture y of x is
type opr_enum is (MULT, ADD, STORE);
type Reg_array_typ is array(0 to 4) of std_logic_vector(31 downto 0);
signal reg1_array : Reg_array_typ;
signal reg2_array : Reg_array_typ;
signal reg3array : Reg_array_typ;

begin -- architecture y
Comp_proc : process (clk, rst_n) is
begin -- process Comp_proc
if rst_n = '0' then -- asynchronous reset (active low)

elsif clk'event and clk = '1' then -- rising clock edge
for i in reg1_array'range loop
case i is -- i is the stage number
when 0 => reg1_array(0) <= reg2_array(0)* reg3array(0); --Mult
when 1 => reg1_array(1) <= reg2_array(1)+ reg3array(1); --add
when 2 => mem(addr) <= reg1_array(1); -- store
when others => null;
end case;
end loop; -- i
end if;
end process Comp_proc;
end architecture y;
----------------------------------------------------------------------------
Ben Cohen Publisher, Trainer, Consultant (310) 721-4830
http://www.vhdlcohen.com/ vhdlcohen@aol.com
Author of following textbooks:
* Using PSL/SUGAR with Verilog and VHDL
Guide to Property Specification Language for ABV, 2003 isbn 0-9705394-4-4
* Real Chip Design and Verification Using Verilog and VHDL, 2002 isbn
0-9705394-2-8
* Component Design by Example ", 2001 isbn 0-9705394-0-1
* VHDL Coding Styles and Methodologies, 2nd Edition, 1999 isbn 0-7923-8474-1
* VHDL Answers to Frequently Asked Questions, 2nd Edition, isbn 0-7923-8115
------------------------------------------------------------------------------
 
VhdlCohen wrote:
I want to implement an algorithm, that is based on multiplication and
afterwards accumulation. The last step is to store the result from the
addition to registers.

I don't produce one "product" on this pipeline. I produce different
"products", so I have a multifunctional pipeline.
The different calculations are for example:
seq. MUL ADD STORE result in
1 a*b product+x m00
2 product*b product+y m01

So my question is, how to desribe the structure of the pipeline and the
control FSM.
I have thought about a FSM, that has a state for each "product" and
gives out control signals, that are delayed by D-FF to reach each
pipeline stage at the right time and to say, which operation has to be done.

Each pipeline stage lasts one clock cycle.

I would be very happy about a short piece of code or some hints.


How about the following (unchecked for syntax)
architecture y of x is
type opr_enum is (MULT, ADD, STORE);
type Reg_array_typ is array(0 to 4) of std_logic_vector(31 downto 0);
signal reg1_array : Reg_array_typ;
signal reg2_array : Reg_array_typ;
signal reg3array : Reg_array_typ;

begin -- architecture y
Comp_proc : process (clk, rst_n) is
begin -- process Comp_proc
if rst_n = '0' then -- asynchronous reset (active low)

elsif clk'event and clk = '1' then -- rising clock edge
for i in reg1_array'range loop
case i is -- i is the stage number
when 0 => reg1_array(0) <= reg2_array(0)* reg3array(0); --Mult
when 1 => reg1_array(1) <= reg2_array(1)+ reg3array(1); --add
when 2 => mem(addr) <= reg1_array(1); -- store
when others => null;
end case;
end loop; -- i
end if;
end process Comp_proc;
end architecture y;
Why isn't reg1_array(0) (which is obviously the product) used as an
operand to the adder in the next stage? Is it only a mistake?
I think I understood, what will be synthsized.

But my problem is that I have to calculate different things on this
pipeline and store then the result in different registers.
Later I want to use the multiplier and adder as seperate units (not
correlated through the pipeline).
To illustrate the problem:

run1 a*b --> product+c --> store in m00
run2 a*product --> product+d --> store in m01
runx ... --> ... --> ...

a*product is done to exponentiate, that is the reason why I can't use a
MAC from a core generator.

How can I desribe/control the behaviour of such a pipeline?


Regards
Ingmar Seifert
 
Why isn't reg1_array(0) (which is obviously the product) used as an
operand to the adder in the next stage? Is it only a mistake?
I think I understood, what will be synthsized.

But my problem is that I have to calculate different things on this
pipeline and store then the result in different registers.
Later I want to use the multiplier and adder as seperate units (not
correlated through the pipeline).
To illustrate the problem:

run1 a*b --> product+c --> store in m00
run2 a*product --> product+d --> store in m01
runx ... --> ... --> ...

a*product is done to exponentiate, that is the reason why I can't use a
MAC from a core generator.

How can I desribe/control the behaviour of such a pipeline?

-- The regX_array array represents the registers with a pipe depth
-- equal to regX_array'range. Thus, regX_array(0) is the first pipe,
-- regX_array(1), the 2nd pipe. in the loop previously described,
-- the i represents the pipe depth. As you know, all registers, regardless of
--where in the pipe can be read. All registers can be written, but you need
--to ensure that that are written by a single source in each cycle
-- (i.e., regx <= xxx; -- at clock x
-- regx <= yyy; -- at same clock x is a NO NO, unless
-- you want the last assignment in the same process to win).
--
-- Thus, if you have different cases or runs:
-- Below is junk code that demonstrates the concept.
elsif clk'event and clk = '1' then -- rising clock edge
case run_mode is
when run1 =>
for i in reg1_array'range loop
case i is -- i is the stage number
when 0 =>
reg1_array(0) <= reg2_array(0)* reg3array(0); --Mult
reg3array(0) <= whatever;
when 1 => reg1_array(1) <= reg2_array(0)+ reg3array(0); --add
when 2 => mem(addr) <= reg1_array(1); -- store
when others => null;
end case;
end loop; -- i;
when run2 =>
for i in reg1_array'range loop
case i is -- i is the stage number
when 0 =>
reg1_array(0) <= reg2_array(2)+ reg3array(2); --Mult
reg3array(0) <= whateverelse;
when 1 => reg1_array(0) <= reg2_array(0)+ reg3array(0); --add
when 2 => mem(addr) <= reg1_array(1); -- store
when others => null;
end case;
end if;
--



----------------------------------------------------------------------------
Ben Cohen Publisher, Trainer, Consultant (310) 721-4830
http://www.vhdlcohen.com/ vhdlcohen@aol.com
Author of following textbooks:
* Using PSL/SUGAR with Verilog and VHDL
Guide to Property Specification Language for ABV, 2003 isbn 0-9705394-4-4
* Real Chip Design and Verification Using Verilog and VHDL, 2002 isbn
0-9705394-2-8
* Component Design by Example ", 2001 isbn 0-9705394-0-1
* VHDL Coding Styles and Methodologies, 2nd Edition, 1999 isbn 0-7923-8474-1
* VHDL Answers to Frequently Asked Questions, 2nd Edition, isbn 0-7923-8115
------------------------------------------------------------------------------
 
Ingmar Seifert <inse@hrz.tu-chemnitz.de> wrote:
But my problem is that I have to calculate different things on this
pipeline and store then the result in different registers.
Later I want to use the multiplier and adder as seperate units (not
correlated through the pipeline).
To illustrate the problem:

run1 a*b --> product+c --> store in m00
run2 a*product --> product+d --> store in m01
runx ... --> ... --> ...
It seems to me you need a structure with a multiplier followed by an
adder with control logic to allow you to use different inputregister
and store in different ozutput register.
I wonder if you need
y(t)<=a*b;
y(t+1)<=a*y(t);z<=y(t)+c;
Or
y(t)<=a*b+c
y(t+1)<=a*y(t)+c;

I don't want to produce the whole code for this but it seems to me,
you need an adder, a multiplier and a lot of registers plus
multiplexer but not really a pipeline (except for timing purpose).

bye Thomas
 
Thomas Stanka wrote:

It seems to me you need a structure with a multiplier followed by an
adder with control logic to allow you to use different inputregister
and store in different ozutput register.
I wonder if you need
y(t)<=a*b;
y(t+1)<=a*y(t);z<=y(t)+c;
Or
y(t)<=a*b+c
y(t+1)<=a*y(t)+c;
I need the first one.

I don't want to produce the whole code for this but it seems to me,
you need an adder, a multiplier and a lot of registers plus
multiplexer but not really a pipeline (except for timing purpose).
Yes indeed. It isn't a typical pipeline as we have learnt at university
but the second stage gets the result of the first one.

I post a short part of the code of my control FSM, I have written until
now. It works correctly (in HW too), but I'm not that happy with it. It
isn't easy to extend and doesn't look that good.

WHEN R1 =>
state <= R2;
factor1 <= EXT(m00_row,11);
factor2 <= EXT(y,3);
summand1a <= EXT(product,19);
summand2a <= EXT(m20_img,19);
--
m10_row <= EXT(sum1,m10_row'LENGTH);

WHEN R2 =>
state <= R3;
factor1 <= EXT(product,11);
factor2 <= EXT(y,3);
summand1a <= EXT(product,19);
summand2a <= EXT(m01_img,19);
--
m20_img <= EXT(sum1,m20_img'LENGTH);

I have described the input "registers" of the multplier and the adder by
signals that get their value at rising clock_edge. I had to do this,
because the synthesize tool synthesized more than one mult. and adder.
This operand registers are used by 2 processes
(product<=factor1*factor2; and sum1<=summand1a+summand2a;)


The problem is that in clock 1 I'm in state R1 and have to set the
operands for the multiplier. They get their values in the next clock
cycle. In this cycle I'm in state R2 and have to control what will be
done with the product that is now ready.
So the two operations belong together.
In the solution I posted above I set the addition registers in R2, even
if they belong to R1.

Now my question: is it a good idea to set the signals, that belong
togehter: the mul operand choice, the add operand choice and the store
register choice in state R1 and delay these control signals for the
adder by 1 cycle and the signal for a (to be done) registerbank by 2 cycles?
Is there a common way to solve such a problem? Is it good idea to do so
or are there other solutions?


Thanks in advance for some hints on this topic
Ingmar Seifert
 
Ingmar Seifert <inse@hrz.tu-chemnitz.de> wrote:

This operand registers are used by 2 processes
(product<=factor1*factor2; and sum1<=summand1a+summand2a;)


The problem is that in clock 1 I'm in state R1 and have to set the
operands for the multiplier. They get their values in the next clock
cycle. In this cycle I'm in state R2 and have to control what will be
done with the product that is now ready.
So the two operations belong together.
In the solution I posted above I set the addition registers in R2, even
if they belong to R1.

Now my question: is it a good idea to set the signals, that belong
togehter: the mul operand choice, the add operand choice and the store
register choice in state R1 and delay these control signals for the
adder by 1 cycle and the signal for a (to be done) registerbank by 2 cycles?
Is there a common way to solve such a problem? Is it good idea to do so
or are there other solutions?
Well I think thats a matter of style. I would prefer to set all
signals just-in-time, but maybe you think it's better to verify your
code with delayed signals.

BTW you don't need the pipeline anyway :).
architecture.....
begin
P<=m1*m1;
S<=s1+s2;

s1<=P when normalmode else
Muxin;
process(Clk)
if Rising_Edge(Clk) then
case state
when s0 =>
normalmode<=false;
m1<=a;
m2<=b;
when s1=>
normalmode<=true;
m1<=a;
m2<=P; -- value of last multiplication
s2<=c;
.....

bye Thomas
 

Welcome to EDABoard.com

Sponsor

Back
Top