Using DSP Units...

  • Thread starter gnuarm.del...@gmail.com
  • Start date
G

gnuarm.del...@gmail.com

Guest
I working with the Gowin GW1N devices and need to do some serious math. By serious, I mean a number of calculations, not that they have to be fast. In fact, I pretty much have all the time in the world relatively speaking. The cycle time for performing all the calculations is 5 ms with a 33 MHz clock, so 167,000 odd cycles.

What I\'m not up to speed about is just how to use or even infer such logic. Certainly they can be instantiated which I might do. But the docs are pretty poor. For each configuration, the user guide shows a set of equations it can implement, a block diagram with various control signals and data paths and then an interface prototype of I suppose the inferred object. The equations are very easy to understand...
DOUT = A * B ± C
DOUT = ∑(A * B)
DOUT = A * B + CASI

The full capability is more complex, but the copy and paste has too many things to fix up to bother with. The point is they don\'t make it clear how the controls work or even what can be controlled in real time vs. needing to be configured. I guess I\'ll have to write some code and experiment with the synthesis. I can try writing support for some answers. This is a person rather than a black hole at a web site, so I usually get an adequate answer.

I just wondered how this is done with other brands of devices.

--

Rick C.

- Get 1,000 miles of free Supercharging
- Tesla referral code - https://ts.la/richard11209
 
On 22/11/2020 04:49, gnuarm.del...@gmail.com wrote:
I working with the Gowin GW1N devices and need to do some serious math. By serious, I mean a number of calculations, not that they have to be fast. In fact, I pretty much have all the time in the world relatively speaking. The cycle time for performing all the calculations is 5 ms with a 33 MHz clock, so 167,000 odd cycles.

What I\'m not up to speed about is just how to use or even infer such logic. Certainly they can be instantiated which I might do. But the docs are pretty poor. For each configuration, the user guide shows a set of equations it can implement, a block diagram with various control signals and data paths and then an interface prototype of I suppose the inferred object. The equations are very easy to understand...
DOUT = A * B ± C
DOUT = ∑(A * B)
DOUT = A * B + CASI

The full capability is more complex, but the copy and paste has too many things to fix up to bother with. The point is they don\'t make it clear how the controls work or even what can be controlled in real time vs. needing to be configured. I guess I\'ll have to write some code and experiment with the synthesis. I can try writing support for some answers. This is a person rather than a black hole at a web site, so I usually get an adequate answer.

I just wondered how this is done with other brands of devices.
By instantiation (in my case with Lattice or Altera), mainly because the
DSP is avery limited resource and I needed full control over how it was
shared. My one big Vivado project with a chip with much more resources I
think I let the tools to a bit of inferring but mostly used blocks out
of the Xilinx IP collection.

MK
 
On Sunday, 11/22/2020 2:12 AM, Michael Kellett wrote:
On 22/11/2020 04:49, gnuarm.del...@gmail.com wrote:
I working with the Gowin GW1N devices and need to do some serious
math.  By serious, I mean a number of calculations, not that they have
to be fast.  In fact, I pretty much have all the time in the world
relatively speaking.  The cycle time for performing all the
calculations is 5 ms with a 33 MHz clock, so 167,000 odd cycles.

What I\'m not up to speed about is just how to use or even infer such
logic.  Certainly they can be instantiated which I might do.  But the
docs are pretty poor.  For each configuration, the user guide shows a
set of equations it can implement, a block diagram with various
control signals and data paths and then an interface prototype of I
suppose the inferred object.  The equations are very easy to
understand...
DOUT = A * B ± C
DOUT = ∑(A * B)
DOUT = A * B + CASI

The full capability is more complex, but the copy and paste has too
many things to fix up to bother with.  The point is they don\'t make it
clear how the controls work or even what can be controlled in real
time vs. needing to be configured.  I guess I\'ll have to write some
code and experiment with the synthesis.  I can try writing support for
some answers.  This is a person rather than a black hole at a web
site, so I usually get an adequate answer.

I just wondered how this is done with other brands of devices.

By instantiation (in my case with Lattice or Altera), mainly because the
DSP is avery limited resource and I needed full control over how it was
shared. My one big Vivado project with a chip with much more resources I
think I let the tools to a bit of inferring but mostly used blocks out
of the Xilinx IP collection.

MK

In the Xilinx tools, it\'s generally easy to infer your logic, and let
the tools figure out how to place it in DSPs. I used to try to place
pipeline registers where they were needed based on the DSP architecture,
but soon found out that if you just place a lot of pipeline stages at
the end, the tools will push the registers into the required places. If
you were using 3rd party tools like Symplify Pro I would expect the same
behavior regardless of the target FPGA. I\'m not familiar with Gowin, so
I couldn\'t tell you what to expect from their tools. In any case it\'s
easy enough to write code for inference and see what the tools do with it.

--
Gabor

--
Gabor
 
On Sunday, November 22, 2020 at 2:13:03 AM UTC-5, Michael Kellett wrote:
On 22/11/2020 04:49, gnuarm.del...@gmail.com wrote:
I working with the Gowin GW1N devices and need to do some serious math. By serious, I mean a number of calculations, not that they have to be fast.. In fact, I pretty much have all the time in the world relatively speaking.. The cycle time for performing all the calculations is 5 ms with a 33 MHz clock, so 167,000 odd cycles.

What I\'m not up to speed about is just how to use or even infer such logic. Certainly they can be instantiated which I might do. But the docs are pretty poor. For each configuration, the user guide shows a set of equations it can implement, a block diagram with various control signals and data paths and then an interface prototype of I suppose the inferred object. The equations are very easy to understand...
DOUT = A * B ± C
DOUT = ∑(A * B)
DOUT = A * B + CASI

The full capability is more complex, but the copy and paste has too many things to fix up to bother with. The point is they don\'t make it clear how the controls work or even what can be controlled in real time vs. needing to be configured. I guess I\'ll have to write some code and experiment with the synthesis. I can try writing support for some answers. This is a person rather than a black hole at a web site, so I usually get an adequate answer.

I just wondered how this is done with other brands of devices.

By instantiation (in my case with Lattice or Altera), mainly because the
DSP is avery limited resource and I needed full control over how it was
shared. My one big Vivado project with a chip with much more resources I
think I let the tools to a bit of inferring but mostly used blocks out
of the Xilinx IP collection.

I\'d be willing to use inference if I actually could understand just what is in the DSP blocks. The adder can do addition or subtraction, but I can\'t tell if that is configurable at run time. There are various multiplexers and a variety of inputs to a rather large mux for the large accumulator, unclear how to control that one. I guess I\'ll just have to ask if there is more documentation.

Thanks for the reply

--

Rick C.

+ Get 1,000 miles of free Supercharging
+ Tesla referral code - https://ts.la/richard11209
 
On Sunday, November 22, 2020 at 11:45:26 AM UTC-5, Gabor wrote:
On Sunday, 11/22/2020 2:12 AM, Michael Kellett wrote:
On 22/11/2020 04:49, gnuarm.del...@gmail.com wrote:
I working with the Gowin GW1N devices and need to do some serious
math. By serious, I mean a number of calculations, not that they have
to be fast. In fact, I pretty much have all the time in the world
relatively speaking. The cycle time for performing all the
calculations is 5 ms with a 33 MHz clock, so 167,000 odd cycles.

What I\'m not up to speed about is just how to use or even infer such
logic. Certainly they can be instantiated which I might do. But the
docs are pretty poor. For each configuration, the user guide shows a
set of equations it can implement, a block diagram with various
control signals and data paths and then an interface prototype of I
suppose the inferred object. The equations are very easy to
understand...
DOUT = A * B ± C
DOUT = ∑(A * B)
DOUT = A * B + CASI

The full capability is more complex, but the copy and paste has too
many things to fix up to bother with. The point is they don\'t make it
clear how the controls work or even what can be controlled in real
time vs. needing to be configured. I guess I\'ll have to write some
code and experiment with the synthesis. I can try writing support for
some answers. This is a person rather than a black hole at a web
site, so I usually get an adequate answer.

I just wondered how this is done with other brands of devices.

By instantiation (in my case with Lattice or Altera), mainly because the
DSP is avery limited resource and I needed full control over how it was
shared. My one big Vivado project with a chip with much more resources I
think I let the tools to a bit of inferring but mostly used blocks out
of the Xilinx IP collection.

MK
In the Xilinx tools, it\'s generally easy to infer your logic, and let
the tools figure out how to place it in DSPs. I used to try to place
pipeline registers where they were needed based on the DSP architecture,
but soon found out that if you just place a lot of pipeline stages at
the end, the tools will push the registers into the required places. If
you were using 3rd party tools like Symplify Pro I would expect the same
behavior regardless of the target FPGA. I\'m not familiar with Gowin, so
I couldn\'t tell you what to expect from their tools. In any case it\'s
easy enough to write code for inference and see what the tools do with it..

Thanks for the reply. I tried inference and it seems to be using a separate DSP unit for the multiply and for the add. Maybe I need to combine the two into a single assignment.

--

Rick C.

-- Get 1,000 miles of free Supercharging
-- Tesla referral code - https://ts.la/richard11209
 
On 11/22/20 5:04 PM, gnuarm.del...@gmail.com wrote:
On Sunday, November 22, 2020 at 2:13:03 AM UTC-5, Michael Kellett wrote:
On 22/11/2020 04:49, gnuarm.del...@gmail.com wrote:
I working with the Gowin GW1N devices and need to do some serious math. By serious, I mean a number of calculations, not that they have to be fast. In fact, I pretty much have all the time in the world relatively speaking. The cycle time for performing all the calculations is 5 ms with a 33 MHz clock, so 167,000 odd cycles.

What I\'m not up to speed about is just how to use or even infer such logic. Certainly they can be instantiated which I might do. But the docs are pretty poor. For each configuration, the user guide shows a set of equations it can implement, a block diagram with various control signals and data paths and then an interface prototype of I suppose the inferred object. The equations are very easy to understand...
DOUT = A * B ± C
DOUT = ∑(A * B)
DOUT = A * B + CASI

The full capability is more complex, but the copy and paste has too many things to fix up to bother with. The point is they don\'t make it clear how the controls work or even what can be controlled in real time vs. needing to be configured. I guess I\'ll have to write some code and experiment with the synthesis. I can try writing support for some answers. This is a person rather than a black hole at a web site, so I usually get an adequate answer.

I just wondered how this is done with other brands of devices.

By instantiation (in my case with Lattice or Altera), mainly because the
DSP is avery limited resource and I needed full control over how it was
shared. My one big Vivado project with a chip with much more resources I
think I let the tools to a bit of inferring but mostly used blocks out
of the Xilinx IP collection.

I\'d be willing to use inference if I actually could understand just what is in the DSP blocks. The adder can do addition or subtraction, but I can\'t tell if that is configurable at run time. There are various multiplexers and a variety of inputs to a rather large mux for the large accumulator, unclear how to control that one. I guess I\'ll just have to ask if there is more documentation.

Thanks for the reply

The IP tool lets you select a number of different operations selected by
an input value. As I remember both the input adder and the output adder
are dynamically configurable for adding or subtracting, but I would have
to double check that (At least for the part I was using)
 
On Sunday, November 22, 2020 at 5:47:50 PM UTC-5, Richard Damon wrote:
On 11/22/20 5:04 PM, gnuarm.del...@gmail.com wrote:
On Sunday, November 22, 2020 at 2:13:03 AM UTC-5, Michael Kellett wrote:
On 22/11/2020 04:49, gnuarm.del...@gmail.com wrote:
I working with the Gowin GW1N devices and need to do some serious math. By serious, I mean a number of calculations, not that they have to be fast. In fact, I pretty much have all the time in the world relatively speaking. The cycle time for performing all the calculations is 5 ms with a 33 MHz clock, so 167,000 odd cycles.

What I\'m not up to speed about is just how to use or even infer such logic. Certainly they can be instantiated which I might do. But the docs are pretty poor. For each configuration, the user guide shows a set of equations it can implement, a block diagram with various control signals and data paths and then an interface prototype of I suppose the inferred object. The equations are very easy to understand...
DOUT = A * B ± C
DOUT = ∑(A * B)
DOUT = A * B + CASI

The full capability is more complex, but the copy and paste has too many things to fix up to bother with. The point is they don\'t make it clear how the controls work or even what can be controlled in real time vs. needing to be configured. I guess I\'ll have to write some code and experiment with the synthesis. I can try writing support for some answers. This is a person rather than a black hole at a web site, so I usually get an adequate answer.

I just wondered how this is done with other brands of devices.

By instantiation (in my case with Lattice or Altera), mainly because the
DSP is avery limited resource and I needed full control over how it was
shared. My one big Vivado project with a chip with much more resources I
think I let the tools to a bit of inferring but mostly used blocks out
of the Xilinx IP collection.

I\'d be willing to use inference if I actually could understand just what is in the DSP blocks. The adder can do addition or subtraction, but I can\'t tell if that is configurable at run time. There are various multiplexers and a variety of inputs to a rather large mux for the large accumulator, unclear how to control that one. I guess I\'ll just have to ask if there is more documentation.

Thanks for the reply

The IP tool lets you select a number of different operations selected by
an input value. As I remember both the input adder and the output adder
are dynamically configurable for adding or subtracting, but I would have
to double check that (At least for the part I was using)

Yeah, I can\'t find where much is configurable in the application, rather there are parameters (generics) that establish connectivity and function. But the docs don\'t really explain just what they do, only the number of bits occupied.

GENERIC (
AREG:bit:=\'0\';
BREG:bit:=\'0\';
ASIGN_REG:bit:=\'0\';
BSIGN_REG:bit:=\'0\';
ACCLOAD_REG:bit:=\'0\';
OUT_REG:bit:=\'0\';
B_ADD_SUB:bit:=\'0\';
C_ADD_SUB:bit:=\'0\';
ALUD_MODE:integer:=0;
ALU_RESET_MODE:string:=\"SYNC\"
);

As far as I can tell the operations are fixed and there\'s no way to selectively add/subtract in real time. The ALU info seems to show a real time A and B sign input, but no real indication of what they do. The combined multiply-alu functions don\'t show the sign inputs, but do show A*B±C in one of the equations. Are the multiply and ALU functions separate DSP blocks or are both contained in every DSP block?

--

Rick C.

-+ Get 1,000 miles of free Supercharging
-+ Tesla referral code - https://ts.la/richard11209
 
On 11/22/20 6:43 PM, gnuarm.del...@gmail.com wrote:
On Sunday, November 22, 2020 at 5:47:50 PM UTC-5, Richard Damon wrote:
On 11/22/20 5:04 PM, gnuarm.del...@gmail.com wrote:
On Sunday, November 22, 2020 at 2:13:03 AM UTC-5, Michael Kellett wrote:
On 22/11/2020 04:49, gnuarm.del...@gmail.com wrote:
I working with the Gowin GW1N devices and need to do some serious math. By serious, I mean a number of calculations, not that they have to be fast. In fact, I pretty much have all the time in the world relatively speaking. The cycle time for performing all the calculations is 5 ms with a 33 MHz clock, so 167,000 odd cycles.

What I\'m not up to speed about is just how to use or even infer such logic. Certainly they can be instantiated which I might do. But the docs are pretty poor. For each configuration, the user guide shows a set of equations it can implement, a block diagram with various control signals and data paths and then an interface prototype of I suppose the inferred object. The equations are very easy to understand...
DOUT = A * B ± C
DOUT = ∑(A * B)
DOUT = A * B + CASI

The full capability is more complex, but the copy and paste has too many things to fix up to bother with. The point is they don\'t make it clear how the controls work or even what can be controlled in real time vs. needing to be configured. I guess I\'ll have to write some code and experiment with the synthesis. I can try writing support for some answers. This is a person rather than a black hole at a web site, so I usually get an adequate answer.

I just wondered how this is done with other brands of devices.

By instantiation (in my case with Lattice or Altera), mainly because the
DSP is avery limited resource and I needed full control over how it was
shared. My one big Vivado project with a chip with much more resources I
think I let the tools to a bit of inferring but mostly used blocks out
of the Xilinx IP collection.

I\'d be willing to use inference if I actually could understand just what is in the DSP blocks. The adder can do addition or subtraction, but I can\'t tell if that is configurable at run time. There are various multiplexers and a variety of inputs to a rather large mux for the large accumulator, unclear how to control that one. I guess I\'ll just have to ask if there is more documentation.

Thanks for the reply

The IP tool lets you select a number of different operations selected by
an input value. As I remember both the input adder and the output adder
are dynamically configurable for adding or subtracting, but I would have
to double check that (At least for the part I was using)

Yeah, I can\'t find where much is configurable in the application, rather there are parameters (generics) that establish connectivity and function. But the docs don\'t really explain just what they do, only the number of bits occupied.

GENERIC (
AREG:bit:=\'0\';
BREG:bit:=\'0\';
ASIGN_REG:bit:=\'0\';
BSIGN_REG:bit:=\'0\';
ACCLOAD_REG:bit:=\'0\';
OUT_REG:bit:=\'0\';
B_ADD_SUB:bit:=\'0\';
C_ADD_SUB:bit:=\'0\';
ALUD_MODE:integer:=0;
ALU_RESET_MODE:string:=\"SYNC\"
);

As far as I can tell the operations are fixed and there\'s no way to selectively add/subtract in real time. The ALU info seems to show a real time A and B sign input, but no real indication of what they do. The combined multiply-alu functions don\'t show the sign inputs, but do show A*B±C in one of the equations. Are the multiply and ALU functions separate DSP blocks or are both contained in every DSP block?

I tend to use the IP integrator to make a configured version of the DSP
block, and that integrator lets you make a list of operations. I would
need to try it to see if it allows both an add and a subtract selected
by the \'operaiton\' input to the module.

I haven\'t figured out the incantations to directly generate some of
these blocks without using the IP integrator. I haven\'t found the
documentations for that.

My understanding is that the whole block is one module, and at least on
the part I am using they come in pairs that can be coupled to make a
bigger block with a faster path for a partial sum from one block to another.
 

Welcome to EDABoard.com

Sponsor

Back
Top