EDAboard.com | EDAboard.eu | EDAboard.de | EDAboard.co.uk | RTV forum PL | NewsGroups PL

Phrasing!

Ask a question - edaboard.com

elektroda.net NewsGroups Forum Index - FPGA - Phrasing!

Goto page 1, 2, 3  Next

Kevin Neilson
Guest

Sun Nov 20, 2016 12:15 am   



Here's an interesting synthesis result. I synthesized this with Vivado for Virtex-7:

reg [68:0] x;
reg x_neq_0;
always@(posedge clk) x_neq_0 <= x!=0; // version 1

Then I rephrased the logic:

reg [68:0] x;
reg x_neq_0;
always@(posedge clk) x_neq_0 <= |x; // version 2

These should be the same, right?

Version 1 uses 23 3-input LUTs on the first level followed by a 23-long carry chain (6 CARRY4 blocks). This is twice as big as it should be.

Version 2 is 3 levels of LUTs, 12 6-input LUTs on the first level, 15 total.

Neither is optimal. What I really want is a combination, 12 6-input LUTs followed by 3 CARRY4s.

This is supposed to be the era of high-level synthesis...

Tim Wescott
Guest

Mon Nov 21, 2016 5:43 am   



On Sat, 19 Nov 2016 14:15:18 -0800, Kevin Neilson wrote:

Quote:
Here's an interesting synthesis result. I synthesized this with Vivado
for Virtex-7:

reg [68:0] x;
reg x_neq_0;
always@(posedge clk) x_neq_0 <= x!=0; // version 1

Then I rephrased the logic:

reg [68:0] x;
reg x_neq_0;
always@(posedge clk) x_neq_0 <= |x; // version 2

These should be the same, right?

Version 1 uses 23 3-input LUTs on the first level followed by a 23-long
carry chain (6 CARRY4 blocks). This is twice as big as it should be.

Version 2 is 3 levels of LUTs, 12 6-input LUTs on the first level, 15
total.

Neither is optimal. What I really want is a combination, 12 6-input
LUTs followed by 3 CARRY4s.

This is supposed to be the era of high-level synthesis...


I'm not enough of an FPGA guy to make really deep comments, but this
looks like the state of C compilers about 20 or so years ago. When I
started coding in C one had to write the code with an eye to the assembly
that the thing was spitting out. Now, if you've got a good optimizer
(and the gnu C optimizer is better than I am on all but a very few of the
processors I've worked with recently), you just express your intent and
the compiler makes it happen most efficiently.

Clearly, that's not yet the case, at least for that particular synthesis
tool. It's a pity.

--

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

I'm looking for work -- see my website!

rickman
Guest

Mon Nov 21, 2016 8:30 am   



On 11/20/2016 5:43 PM, Tim Wescott wrote:
Quote:
On Sat, 19 Nov 2016 14:15:18 -0800, Kevin Neilson wrote:

Here's an interesting synthesis result. I synthesized this with Vivado
for Virtex-7:

reg [68:0] x;
reg x_neq_0;
always@(posedge clk) x_neq_0 <= x!=0; // version 1

Then I rephrased the logic:

reg [68:0] x;
reg x_neq_0;
always@(posedge clk) x_neq_0 <= |x; // version 2

These should be the same, right?

Version 1 uses 23 3-input LUTs on the first level followed by a 23-long
carry chain (6 CARRY4 blocks). This is twice as big as it should be.

Version 2 is 3 levels of LUTs, 12 6-input LUTs on the first level, 15
total.

Neither is optimal. What I really want is a combination, 12 6-input
LUTs followed by 3 CARRY4s.

This is supposed to be the era of high-level synthesis...

I'm not enough of an FPGA guy to make really deep comments, but this
looks like the state of C compilers about 20 or so years ago. When I
started coding in C one had to write the code with an eye to the assembly
that the thing was spitting out. Now, if you've got a good optimizer
(and the gnu C optimizer is better than I am on all but a very few of the
processors I've worked with recently), you just express your intent and
the compiler makes it happen most efficiently.

Clearly, that's not yet the case, at least for that particular synthesis
tool. It's a pity.


'tis true, ’tis pity, And pity ’tis ’tis true

--

Rick C

Tom Gardner
Guest

Mon Nov 21, 2016 5:07 pm   



On 20/11/16 22:43, Tim Wescott wrote:
Quote:
On Sat, 19 Nov 2016 14:15:18 -0800, Kevin Neilson wrote:

Here's an interesting synthesis result. I synthesized this with Vivado
for Virtex-7:

reg [68:0] x;
reg x_neq_0;
always@(posedge clk) x_neq_0 <= x!=0; // version 1

Then I rephrased the logic:

reg [68:0] x;
reg x_neq_0;
always@(posedge clk) x_neq_0 <= |x; // version 2

These should be the same, right?

Version 1 uses 23 3-input LUTs on the first level followed by a 23-long
carry chain (6 CARRY4 blocks). This is twice as big as it should be.

Version 2 is 3 levels of LUTs, 12 6-input LUTs on the first level, 15
total.

Neither is optimal. What I really want is a combination, 12 6-input
LUTs followed by 3 CARRY4s.

This is supposed to be the era of high-level synthesis...

I'm not enough of an FPGA guy to make really deep comments, but this
looks like the state of C compilers about 20 or so years ago. When I
started coding in C one had to write the code with an eye to the assembly
that the thing was spitting out. Now, if you've got a good optimizer
(and the gnu C optimizer is better than I am on all but a very few of the
processors I've worked with recently), you just express your intent and
the compiler makes it happen most efficiently.

Clearly, that's not yet the case, at least for that particular synthesis
tool. It's a pity.


Of course sometimes you don't want optimisation.
Consider, for example, bridging terms in an asynchronous
circuit.

Kevin Neilson
Guest

Mon Nov 21, 2016 9:32 pm   



Quote:
I'm not enough of an FPGA guy to make really deep comments, but this
looks like the state of C compilers about 20 or so years ago. When I
started coding in C one had to write the code with an eye to the assembly
that the thing was spitting out. Now, if you've got a good optimizer
(and the gnu C optimizer is better than I am on all but a very few of the
processors I've worked with recently), you just express your intent and
the compiler makes it happen most efficiently.

I know! I often feel like I'm a software guy, but stuck in the 80s, poring over every line generated by the assembler to make sure it's optimized.


Mark Curry
Guest

Mon Nov 21, 2016 11:19 pm   



In article <9ae86fdc-dc6a-4d3f-b201-594fe2f6a3cd_at_googlegroups.com>,
Kevin Neilson <kevin.neilson_at_xilinx.com> wrote:
Quote:
I'm not enough of an FPGA guy to make really deep comments, but this
looks like the state of C compilers about 20 or so years ago. When I
started coding in C one had to write the code with an eye to the assembly
that the thing was spitting out. Now, if you've got a good optimizer
(and the gnu C optimizer is better than I am on all but a very few of the
processors I've worked with recently), you just express your intent and
the compiler makes it happen most efficiently.

I know! I often feel like I'm a software guy, but stuck in the 80s, poring over every line generated by the assembler to make sure it's optimized.


But, but "HLS", and "IP Integrator"... ;)

I actually came back a bit let down from a recent Xilinx user's meeting at just how
much focus Xilinx is putting on their 'high level' tools. I'm of the opinion that
Xilinx is sinking a ton of resources into something that a small minority will
ever use. (And will probably not last long either). To Xilinx, RTL design is
dead...

--Mark

Kevin Neilson
Guest

Tue Nov 22, 2016 12:51 am   



Quote:
I actually came back a bit let down from a recent Xilinx user's meeting at just how
much focus Xilinx is putting on their 'high level' tools. I'm of the opinion that
Xilinx is sinking a ton of resources into something that a small minority will
ever use. (And will probably not last long either). To Xilinx, RTL design is
dead...

--Mark


I wish they would just focus all their effort on the synthesizer and placer.. The chips get better and better, but the software seems stuck. I think the high-level tools are not for serious users. You can only use them if you don't care about clock speed, and if you don't care about clock speed, you should be using a processor or something.

Mark Curry
Guest

Tue Nov 22, 2016 2:52 am   



In article <c5206719-b91e-43e5-94ef-dfc84a49d62a_at_googlegroups.com>,
Kevin Neilson <kevin.neilson_at_xilinx.com> wrote:
Quote:
I actually came back a bit let down from a recent Xilinx user's meeting at just how
much focus Xilinx is putting on their 'high level' tools. I'm of the opinion that
Xilinx is sinking a ton of resources into something that a small minority will
ever use. (And will probably not last long either). To Xilinx, RTL design is
dead...

--Mark

I wish they would just focus all their effort on the synthesizer and placer. The chips
get better and better, but the software seems stuck. I think the high-level tools are
not for serious users. You can only use them if you don't care about clock speed, and
if you don't care about clock speed, you should be using a processor or something.


Agreement. Add value where you add value - in your core competencies. Xilinx
adds value here - they design some kick ass technologies, in some very tough
geometries. They add value here. They have some excellant experts in a wide
breadth of technologies, than can help you design and debug some of the most
advanced designs. They add value in their software back end tools
which must map to this technology. They have great reference designs, and documentation.

They don't add value in the front end. They're trying to solve a difficult
problem that's been around for 20 years, that's vexxed an entire EDA
software industry. Learn from the ASIC guys here. ASIC companies
punted on their "special sauce" in-house SW 20 years ago, before they got wise and
let the EDA industry do its job. FPGA needs to do the same now.

I'm actually of the opinion that they should punt on synthesis too. Focus on the back
end. I doubt it'll happen - folks are too used to the idea of "free" EDA tools from
the FPGA vendors.

Regards,

Mark

Tim Wescott
Guest

Tue Nov 22, 2016 3:19 am   



On Mon, 21 Nov 2016 10:07:41 +0000, Tom Gardner wrote:

Quote:
On 20/11/16 22:43, Tim Wescott wrote:
On Sat, 19 Nov 2016 14:15:18 -0800, Kevin Neilson wrote:

Here's an interesting synthesis result. I synthesized this with
Vivado for Virtex-7:

reg [68:0] x;
reg x_neq_0;
always@(posedge clk) x_neq_0 <= x!=0; // version 1

Then I rephrased the logic:

reg [68:0] x;
reg x_neq_0;
always@(posedge clk) x_neq_0 <= |x; // version 2

These should be the same, right?

Version 1 uses 23 3-input LUTs on the first level followed by a
23-long carry chain (6 CARRY4 blocks). This is twice as big as it
should be.

Version 2 is 3 levels of LUTs, 12 6-input LUTs on the first level, 15
total.

Neither is optimal. What I really want is a combination, 12 6-input
LUTs followed by 3 CARRY4s.

This is supposed to be the era of high-level synthesis...

I'm not enough of an FPGA guy to make really deep comments, but this
looks like the state of C compilers about 20 or so years ago. When I
started coding in C one had to write the code with an eye to the
assembly that the thing was spitting out. Now, if you've got a good
optimizer (and the gnu C optimizer is better than I am on all but a
very few of the processors I've worked with recently), you just express
your intent and the compiler makes it happen most efficiently.

Clearly, that's not yet the case, at least for that particular
synthesis tool. It's a pity.

Of course sometimes you don't want optimisation. Consider, for example,
bridging terms in an asynchronous circuit.


OK. I give up -- what do you mean by "bridging terms"?

In general, I would say that if this is an issue, then (as with the
'volatile' and 'mutable' keywords in C++), there should be a way in the
language to express your intent to the synthesizer -- either a way to say
"don't optimize this section", or a way to say "keep this signal no
matter what", or a syntax that lets you lay down literal hardware, etc.

--

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

I'm looking for work -- see my website!

GaborSzakacs
Guest

Tue Nov 22, 2016 3:47 am   



Tim Wescott wrote:
Quote:
On Mon, 21 Nov 2016 10:07:41 +0000, Tom Gardner wrote:

On 20/11/16 22:43, Tim Wescott wrote:
On Sat, 19 Nov 2016 14:15:18 -0800, Kevin Neilson wrote:

Here's an interesting synthesis result. I synthesized this with
Vivado for Virtex-7:

reg [68:0] x;
reg x_neq_0;
always@(posedge clk) x_neq_0 <= x!=0; // version 1

Then I rephrased the logic:

reg [68:0] x;
reg x_neq_0;
always@(posedge clk) x_neq_0 <= |x; // version 2

These should be the same, right?

Version 1 uses 23 3-input LUTs on the first level followed by a
23-long carry chain (6 CARRY4 blocks). This is twice as big as it
should be.

Version 2 is 3 levels of LUTs, 12 6-input LUTs on the first level, 15
total.

Neither is optimal. What I really want is a combination, 12 6-input
LUTs followed by 3 CARRY4s.

This is supposed to be the era of high-level synthesis...
I'm not enough of an FPGA guy to make really deep comments, but this
looks like the state of C compilers about 20 or so years ago. When I
started coding in C one had to write the code with an eye to the
assembly that the thing was spitting out. Now, if you've got a good
optimizer (and the gnu C optimizer is better than I am on all but a
very few of the processors I've worked with recently), you just express
your intent and the compiler makes it happen most efficiently.

Clearly, that's not yet the case, at least for that particular
synthesis tool. It's a pity.
Of course sometimes you don't want optimisation. Consider, for example,
bridging terms in an asynchronous circuit.

OK. I give up -- what do you mean by "bridging terms"?

In general, I would say that if this is an issue, then (as with the
'volatile' and 'mutable' keywords in C++), there should be a way in the
language to express your intent to the synthesizer -- either a way to say
"don't optimize this section", or a way to say "keep this signal no
matter what", or a syntax that lets you lay down literal hardware, etc.


Bridging terms refers to terms that cover transitions in an asynchronous
sequential circuit. Xilinx tools specifically do not honor this sort of
logic and it really has no business in their FPGA's. However, if you
insist on generating asynchronous sequential logic in a Xilinx FPGA, you
will need to instantiate LUTs to get the coverage you're looking for.

--
Gabor

Tim Wescott
Guest

Tue Nov 22, 2016 4:58 am   



On Mon, 21 Nov 2016 21:19:50 +0000, Mark Curry wrote:

Quote:
In article <9ae86fdc-dc6a-4d3f-b201-594fe2f6a3cd_at_googlegroups.com>,
Kevin Neilson <kevin.neilson_at_xilinx.com> wrote:
I'm not enough of an FPGA guy to make really deep comments, but this
looks like the state of C compilers about 20 or so years ago. When I
started coding in C one had to write the code with an eye to the
assembly that the thing was spitting out. Now, if you've got a good
optimizer (and the gnu C optimizer is better than I am on all but a
very few of the processors I've worked with recently), you just
express your intent and the compiler makes it happen most efficiently.

I know! I often feel like I'm a software guy, but stuck in the 80s,
poring over every line generated by the assembler to make sure it's
optimized.

But, but "HLS", and "IP Integrator"... ;)

I actually came back a bit let down from a recent Xilinx user's meeting
at just how much focus Xilinx is putting on their 'high level' tools.
I'm of the opinion that Xilinx is sinking a ton of resources into
something that a small minority will ever use. (And will probably not
last long either). To Xilinx, RTL design is dead...

--Mark


If that small minority is the one with the most dollars behind it, then
they win. Dunno if that's the case or not, but it seems like there's a
lot of design of high-volume, cost-sensitive stuff that's done mostly by
applications engineers these days.

Or, Xilinx is wrong, and they'll spend a lot of money on uselessness.
That's never happened before in the history of semiconductors, now has
it? ;)

--

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

I'm looking for work -- see my website!

Tim Wescott
Guest

Tue Nov 22, 2016 7:33 am   



On Mon, 21 Nov 2016 14:51:13 -0800, Kevin Neilson wrote:

Quote:
I actually came back a bit let down from a recent Xilinx user's meeting
at just how much focus Xilinx is putting on their 'high level' tools.
I'm of the opinion that Xilinx is sinking a ton of resources into
something that a small minority will ever use. (And will probably not
last long either). To Xilinx, RTL design is dead...

--Mark

I wish they would just focus all their effort on the synthesizer and
placer. The chips get better and better, but the software seems stuck.
I think the high-level tools are not for serious users. You can only
use them if you don't care about clock speed, and if you don't care
about clock speed, you should be using a processor or something.


Maybe if the synthesizer got better the demand for hugely fast chips
would go down, and thus they'd shoot themselves in the foot -- at least
from their perspective.

--

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

I'm looking for work -- see my website!

Tom Gardner
Guest

Tue Nov 22, 2016 8:25 am   



On 21/11/16 20:19, Tim Wescott wrote:
Quote:
On Mon, 21 Nov 2016 10:07:41 +0000, Tom Gardner wrote:

On 20/11/16 22:43, Tim Wescott wrote:
On Sat, 19 Nov 2016 14:15:18 -0800, Kevin Neilson wrote:

Here's an interesting synthesis result. I synthesized this with
Vivado for Virtex-7:

reg [68:0] x;
reg x_neq_0;
always@(posedge clk) x_neq_0 <= x!=0; // version 1

Then I rephrased the logic:

reg [68:0] x;
reg x_neq_0;
always@(posedge clk) x_neq_0 <= |x; // version 2

These should be the same, right?

Version 1 uses 23 3-input LUTs on the first level followed by a
23-long carry chain (6 CARRY4 blocks). This is twice as big as it
should be.

Version 2 is 3 levels of LUTs, 12 6-input LUTs on the first level, 15
total.

Neither is optimal. What I really want is a combination, 12 6-input
LUTs followed by 3 CARRY4s.

This is supposed to be the era of high-level synthesis...

I'm not enough of an FPGA guy to make really deep comments, but this
looks like the state of C compilers about 20 or so years ago. When I
started coding in C one had to write the code with an eye to the
assembly that the thing was spitting out. Now, if you've got a good
optimizer (and the gnu C optimizer is better than I am on all but a
very few of the processors I've worked with recently), you just express
your intent and the compiler makes it happen most efficiently.

Clearly, that's not yet the case, at least for that particular
synthesis tool. It's a pity.

Of course sometimes you don't want optimisation. Consider, for example,
bridging terms in an asynchronous circuit.

OK. I give up -- what do you mean by "bridging terms"?


https://en.wikipedia.org/wiki/Karnaugh_map#Race_hazards

It is called a bridging term since it is a logically
redundant term that straddles two required minterms.
Its purpose is to remove static hazards (glitches) that
can occur when inputs change, typically when there
are unequal propagation delays inside the implementation.


Quote:
In general, I would say that if this is an issue, then (as with the
'volatile' and 'mutable' keywords in C++), there should be a way in the
language to express your intent to the synthesizer -- either a way to say
"don't optimize this section", or a way to say "keep this signal no
matter what", or a syntax that lets you lay down literal hardware, etc.


It only occurs in asynchronous circuits; the <ahem>
workaround is to only have synchronous designs and
implementations.

Tom Gardner
Guest

Tue Nov 22, 2016 8:29 am   



On 21/11/16 20:47, GaborSzakacs wrote:
Quote:
Tim Wescott wrote:
On Mon, 21 Nov 2016 10:07:41 +0000, Tom Gardner wrote:

On 20/11/16 22:43, Tim Wescott wrote:
On Sat, 19 Nov 2016 14:15:18 -0800, Kevin Neilson wrote:

Here's an interesting synthesis result. I synthesized this with
Vivado for Virtex-7:

reg [68:0] x;
reg x_neq_0;
always@(posedge clk) x_neq_0 <= x!=0; // version 1

Then I rephrased the logic:

reg [68:0] x;
reg x_neq_0;
always@(posedge clk) x_neq_0 <= |x; // version 2

These should be the same, right?

Version 1 uses 23 3-input LUTs on the first level followed by a
23-long carry chain (6 CARRY4 blocks). This is twice as big as it
should be.

Version 2 is 3 levels of LUTs, 12 6-input LUTs on the first level, 15
total.

Neither is optimal. What I really want is a combination, 12 6-input
LUTs followed by 3 CARRY4s.

This is supposed to be the era of high-level synthesis...
I'm not enough of an FPGA guy to make really deep comments, but this
looks like the state of C compilers about 20 or so years ago. When I
started coding in C one had to write the code with an eye to the
assembly that the thing was spitting out. Now, if you've got a good
optimizer (and the gnu C optimizer is better than I am on all but a
very few of the processors I've worked with recently), you just express
your intent and the compiler makes it happen most efficiently.

Clearly, that's not yet the case, at least for that particular
synthesis tool. It's a pity.
Of course sometimes you don't want optimisation. Consider, for example,
bridging terms in an asynchronous circuit.

OK. I give up -- what do you mean by "bridging terms"?

In general, I would say that if this is an issue, then (as with the 'volatile'
and 'mutable' keywords in C++), there should be a way in the language to
express your intent to the synthesizer -- either a way to say "don't optimize
this section", or a way to say "keep this signal no matter what", or a syntax
that lets you lay down literal hardware, etc.


Bridging terms refers to terms that cover transitions in an asynchronous
sequential circuit. Xilinx tools specifically do not honor this sort of
logic and it really has no business in their FPGA's. However, if you
insist on generating asynchronous sequential logic in a Xilinx FPGA, you
will need to instantiate LUTs to get the coverage you're looking for.


Agreed. You will probably also have to nail
down the LUTs and the signal routing.

I suspect that, since Xilinx has a very good range
of I/O primitives, there really isn't any benefit
to full async design in their FPGAs.

Tom Gardner
Guest

Tue Nov 22, 2016 8:30 am   



On 22/11/16 00:33, Tim Wescott wrote:
Quote:
On Mon, 21 Nov 2016 14:51:13 -0800, Kevin Neilson wrote:

I actually came back a bit let down from a recent Xilinx user's meeting
at just how much focus Xilinx is putting on their 'high level' tools.
I'm of the opinion that Xilinx is sinking a ton of resources into
something that a small minority will ever use. (And will probably not
last long either). To Xilinx, RTL design is dead...

--Mark

I wish they would just focus all their effort on the synthesizer and
placer. The chips get better and better, but the software seems stuck.
I think the high-level tools are not for serious users. You can only
use them if you don't care about clock speed, and if you don't care
about clock speed, you should be using a processor or something.

Maybe if the synthesizer got better the demand for hugely fast chips
would go down, and thus they'd shoot themselves in the foot -- at least
from their perspective.


Synthesis is easy. Place and route is hard.
A big question is how to either decouple or
integrate the them.

Particularly when you see the size of the
big Xilinx chips and consider the relative
time taken to get across the chip and through
a single LUT (and then through the integrated
ARM cores Smile )

But I suspect I'm close to teaching you how
to suck eggs Smile

Goto page 1, 2, 3  Next

elektroda.net NewsGroups Forum Index - FPGA - Phrasing!

Ask a question - edaboard.com

Arabic versionBulgarian versionCatalan versionCzech versionDanish versionGerman versionGreek versionEnglish versionSpanish versionFinnish versionFrench versionHindi versionCroatian versionIndonesian versionItalian versionHebrew versionJapanese versionKorean versionLithuanian versionLatvian versionDutch versionNorwegian versionPolish versionPortuguese versionRomanian versionRussian versionSlovak versionSlovenian versionSerbian versionSwedish versionTagalog versionUkrainian versionVietnamese versionChinese version
RTV map EDAboard.com map News map EDAboard.eu map EDAboard.de map EDAboard.co.uk map