What is your VHDL design flow for a complex project?

F

fl

Guest
Hi,

I have quite several years of digital logic design, days in TTL and CPLD. I
even designed several small FPGA projects with VHDL. For complex FPGA project,
I once used Xilinx System Generator on that project.

I know the basics on FPGA design, such as timing constraints, some attributes
about place and route. But I still feel very incompetence at VHDL on a large
project. Of course, if I had the opportunity on a large VHDL project, I can
get there sooner or later. Here I just want to get your advice on a large
VHDL project procedures.

Let me make my question a little clear. I guess it may work using top-down
or down-top for a large project. My concern is mainly about clock timing at
different modules (entities?). At System Generator, I can try to add z^-1 to
some modules to get the desired result output. For a large VHDL project, it
looks like much more troublesome on a delay unit trials. For example, on an
FFT design, I think I should make the basic butterfly unit work. Then, I still
feel uncomfortable on the following procedures to add the required
index/address calculation using VHDL code.

Could you give me some help? What procedures do you take on a large VHDL
project?


Thanks,
 
On 5/23/2015 10:08 PM, fl wrote:
Hi,

I have quite several years of digital logic design, days in TTL and CPLD. I
even designed several small FPGA projects with VHDL. For complex FPGA project,
I once used Xilinx System Generator on that project.

I know the basics on FPGA design, such as timing constraints, some attributes
about place and route. But I still feel very incompetence at VHDL on a large
project. Of course, if I had the opportunity on a large VHDL project, I can
get there sooner or later. Here I just want to get your advice on a large
VHDL project procedures.

Let me make my question a little clear. I guess it may work using top-down
or down-top for a large project. My concern is mainly about clock timing at
different modules (entities?). At System Generator, I can try to add z^-1 to
some modules to get the desired result output. For a large VHDL project, it
looks like much more troublesome on a delay unit trials. For example, on an
FFT design, I think I should make the basic butterfly unit work. Then, I still
feel uncomfortable on the following procedures to add the required
index/address calculation using VHDL code.

Could you give me some help? What procedures do you take on a large VHDL
project?

I'm not sure what to tell you. I do most projects in a similar manner.
I do a top down design with some idea of the complexity of each
module. If modules are so complex that you have no idea of the pipeline
delays you need to do more work on those modules to determine how fast
they can run and how many pipeline delays there will be (register
delays). Once you have that info, you can redesign the interconnect to
keep everything synchronized. Some would call that bottom up
implementation.

I have always found block diagrams to be my friend and to help me
understand all the relationships between modules. An FFT is actually
easy to implement once you understand how they work. They often need
pipelining to make them run fast. I have never found pipelining of a
linear flow to be difficult. Do you have feedback paths that make your
design more complex? What else are you using other than FFTs?

--

Rick
 
On Saturday, May 23, 2015 at 8:20:47 PM UTC-7, rickman wrote:
On 5/23/2015 10:08 PM, fl wrote:
Hi,

I have quite several years of digital logic design, days in TTL and CPLD. I
even designed several small FPGA projects with VHDL. For complex FPGA project,
I once used Xilinx System Generator on that project.

I know the basics on FPGA design, such as timing constraints, some attributes
about place and route. But I still feel very incompetence at VHDL on a large
project. Of course, if I had the opportunity on a large VHDL project, I can
get there sooner or later. Here I just want to get your advice on a large
VHDL project procedures.

Let me make my question a little clear. I guess it may work using top-down
or down-top for a large project. My concern is mainly about clock timing at
different modules (entities?). At System Generator, I can try to add z^-1 to
some modules to get the desired result output. For a large VHDL project, it
looks like much more troublesome on a delay unit trials. For example, on an
FFT design, I think I should make the basic butterfly unit work. Then, I still
feel uncomfortable on the following procedures to add the required
index/address calculation using VHDL code.

Could you give me some help? What procedures do you take on a large VHDL
project?

I'm not sure what to tell you. I do most projects in a similar manner.
I do a top down design with some idea of the complexity of each
module. If modules are so complex that you have no idea of the pipeline
delays you need to do more work on those modules to determine how fast
they can run and how many pipeline delays there will be (register
delays). Once you have that info, you can redesign the interconnect to
keep everything synchronized. Some would call that bottom up
implementation.

I have always found block diagrams to be my friend and to help me
understand all the relationships between modules. An FFT is actually
easy to implement once you understand how they work. They often need
pipelining to make them run fast. I have never found pipelining of a
linear flow to be difficult. Do you have feedback paths that make your
design more complex? What else are you using other than FFTs?

--

Rick

Thanks, Rick. I can imagine it could be more difficult when there is
feedback for a high speed module. FFT has a simple, regular structure.
For me, I am still in the phase of FFT. I know FFT and its coding in C,
even in assembly code. I do not have time to finish a VHDL FFT yet. The main
difficulties are about the memory addressing, twiddle coef selection etc.
Yes, I need to be patient to work on these interconnect between memory,
twiddle and multipliers.
 
On 5/24/2015 9:43 AM, fl wrote:
On Saturday, May 23, 2015 at 8:20:47 PM UTC-7, rickman wrote:
On 5/23/2015 10:08 PM, fl wrote:
Hi,

I have quite several years of digital logic design, days in TTL and CPLD. I
even designed several small FPGA projects with VHDL. For complex FPGA project,
I once used Xilinx System Generator on that project.

I know the basics on FPGA design, such as timing constraints, some attributes
about place and route. But I still feel very incompetence at VHDL on a large
project. Of course, if I had the opportunity on a large VHDL project, I can
get there sooner or later. Here I just want to get your advice on a large
VHDL project procedures.

Let me make my question a little clear. I guess it may work using top-down
or down-top for a large project. My concern is mainly about clock timing at
different modules (entities?). At System Generator, I can try to add z^-1 to
some modules to get the desired result output. For a large VHDL project, it
looks like much more troublesome on a delay unit trials. For example, on an
FFT design, I think I should make the basic butterfly unit work. Then, I still
feel uncomfortable on the following procedures to add the required
index/address calculation using VHDL code.

Could you give me some help? What procedures do you take on a large VHDL
project?

I'm not sure what to tell you. I do most projects in a similar manner.
I do a top down design with some idea of the complexity of each
module. If modules are so complex that you have no idea of the pipeline
delays you need to do more work on those modules to determine how fast
they can run and how many pipeline delays there will be (register
delays). Once you have that info, you can redesign the interconnect to
keep everything synchronized. Some would call that bottom up
implementation.

I have always found block diagrams to be my friend and to help me
understand all the relationships between modules. An FFT is actually
easy to implement once you understand how they work. They often need
pipelining to make them run fast. I have never found pipelining of a
linear flow to be difficult. Do you have feedback paths that make your
design more complex? What else are you using other than FFTs?

--

Rick

Thanks, Rick. I can imagine it could be more difficult when there is
feedback for a high speed module. FFT has a simple, regular structure.
For me, I am still in the phase of FFT. I know FFT and its coding in C,
even in assembly code. I do not have time to finish a VHDL FFT yet. The main
difficulties are about the memory addressing, twiddle coef selection etc.
Yes, I need to be patient to work on these interconnect between memory,
twiddle and multipliers.

Maybe this stuff comes easier to me than most. I cut my teeth on signal
processing back in the 80's working on array processors. They were rack
cabinets of boards which did the same thing DSP chips do now. I was
testing boards in the machine and so got to see and debug every part of
the device at a micro level.

I think the key to designing an FFT in hardware is much like these
machines. First understand the timing of the multiplier. Then
everything else will be to feed data to and from the multiplier so it
never rests.

--

Rick
 
fl <rxjwg98@gmail.com> wrote:
> On Saturday, May 23, 2015 at 8:20:47 PM UTC-7, rickman wrote:

(snip)
I have always found block diagrams to be my friend and to help me
understand all the relationships between modules. An FFT is actually
easy to implement once you understand how they work. They often need
pipelining to make them run fast. I have never found pipelining of a
linear flow to be difficult. Do you have feedback paths that make your
design more complex? What else are you using other than FFTs?

Thanks, Rick. I can imagine it could be more difficult when there is
feedback for a high speed module. FFT has a simple, regular structure.
For me, I am still in the phase of FFT. I know FFT and its coding in C,
even in assembly code. I do not have time to finish a VHDL FFT yet.
The main difficulties are about the memory addressing, twiddle coef
selection etc.
Yes, I need to be patient to work on these interconnect between memory,
twiddle and multipliers.

My favorite use for FPGAs is systolic array processors.

I think you can make a systolic array for FFT, but haven't
actually tried to do it.

Once you figure it out, they are easy to write and debug.

-- glen
 
rickman <gnuarm@gmail.com> wrote:

(snip, I wrote)
My favorite use for FPGAs is systolic array processors.

I think you can make a systolic array for FFT, but haven't
actually tried to do it.

Once you figure it out, they are easy to write and debug.

(snip)
A systolic array is useful when you have a *lot* of work to be done and
it can be broken into units that allow the data to flow through the
processors. I'm probably not doing justice to the "proper" definition.

I think this type of design is not very common and is rather
specialized. What applications have you found that utilized systolic
processing?

I have used it for dynamic programming for DNA and protein
sequence comparison. There are nice algorithms for computing
alignment scores including insertions and deletions using
dynamic programming. (The algorithm used by unix diff was first
used for protein sequence comparison, and later for diff.)

I believe that convolution and FIR filters also make nice systolic
arrays, I am not so sure about IIR, though.

-- glen
 
On 7/9/2015 1:58 PM, glen herrmannsfeldt wrote:
fl <rxjwg98@gmail.com> wrote:
On Saturday, May 23, 2015 at 8:20:47 PM UTC-7, rickman wrote:

(snip)
I have always found block diagrams to be my friend and to help me
understand all the relationships between modules. An FFT is actually
easy to implement once you understand how they work. They often need
pipelining to make them run fast. I have never found pipelining of a
linear flow to be difficult. Do you have feedback paths that make your
design more complex? What else are you using other than FFTs?

Thanks, Rick. I can imagine it could be more difficult when there is
feedback for a high speed module. FFT has a simple, regular structure.
For me, I am still in the phase of FFT. I know FFT and its coding in C,
even in assembly code. I do not have time to finish a VHDL FFT yet.
The main difficulties are about the memory addressing, twiddle coef
selection etc.
Yes, I need to be patient to work on these interconnect between memory,
twiddle and multipliers.

My favorite use for FPGAs is systolic array processors.

I think you can make a systolic array for FFT, but haven't
actually tried to do it.

Once you figure it out, they are easy to write and debug.

-- glen

A systolic array is useful when you have a *lot* of work to be done and
it can be broken into units that allow the data to flow through the
processors. I'm probably not doing justice to the "proper" definition.

I think this type of design is not very common and is rather
specialized. What applications have you found that utilized systolic
processing?

--

Rick
 
rickman <gnuarm@gmail.com> wrote:

(snip, I wrote)
I have used it for dynamic programming for DNA and protein
sequence comparison. There are nice algorithms for computing
alignment scores including insertions and deletions using
dynamic programming. (The algorithm used by unix diff was first
used for protein sequence comparison, and later for diff.)

I believe that convolution and FIR filters also make nice systolic
arrays, I am not so sure about IIR, though.

Yeah, I work on protein sequence comparison nearly every day.. lol
Yes, I'd say that was a rather specialized application.

Do they still use FPGAs for that or have they moved on to GPUs?

For the usual problems, it is fixed point add/subtract with a
small number of bits. It is very efficient in an FPGA, and
not so good in GPU.

Sequencers some years ago could generate 1e9 base/day,
and they might be much faster now. Dynamic programming
is O(N**2) (which is pretty good when you include insertion
and deletion), but that means 2e19 add/subtract per day to
compare against one human genome. Eight bits, or maybe
only five or six, is enough.

But for HMM calculations, floating point is sometimes better.

I remember *many* years ago when the buzzword was "workstations". Not
sure who it was, but I think someone at NIH was working on providing
molecular interaction simulations using 3D joysticks with force
feedback. I never heard if they continued the research. So I don't
know if they had any problems with computing power or not. I am pretty
sure it was not pursued enough to have a general solution and instead
only simulated specific molecules that had been entered into the system.
This could have used systolic processing at that time, but by now
would be pretty easy on a GPU I am sure.

-- glen
 
On 7/9/2015 4:49 PM, glen herrmannsfeldt wrote:
rickman <gnuarm@gmail.com> wrote:

(snip, I wrote)
My favorite use for FPGAs is systolic array processors.

I think you can make a systolic array for FFT, but haven't
actually tried to do it.

Once you figure it out, they are easy to write and debug.

(snip)
A systolic array is useful when you have a *lot* of work to be done and
it can be broken into units that allow the data to flow through the
processors. I'm probably not doing justice to the "proper" definition.

I think this type of design is not very common and is rather
specialized. What applications have you found that utilized systolic
processing?

I have used it for dynamic programming for DNA and protein
sequence comparison. There are nice algorithms for computing
alignment scores including insertions and deletions using
dynamic programming. (The algorithm used by unix diff was first
used for protein sequence comparison, and later for diff.)

I believe that convolution and FIR filters also make nice systolic
arrays, I am not so sure about IIR, though.

Yeah, I work on protein sequence comparison nearly every day.. lol
Yes, I'd say that was a rather specialized application.

Do they still use FPGAs for that or have they moved on to GPUs?

I remember *many* years ago when the buzzword was "workstations". Not
sure who it was, but I think someone at NIH was working on providing
molecular interaction simulations using 3D joysticks with force
feedback. I never heard if they continued the research. So I don't
know if they had any problems with computing power or not. I am pretty
sure it was not pursued enough to have a general solution and instead
only simulated specific molecules that had been entered into the system.
This could have used systolic processing at that time, but by now
would be pretty easy on a GPU I am sure.

--

Rick
 

Welcome to EDABoard.com

Sponsor

Back
Top