EDK : FSL macros defined by Xilinx are wrong

Allan Herriman wrote on 12/14/2017 6:39 AM:
On Thu, 14 Dec 2017 10:49:41 +0000, Allan Herriman wrote:


The placement and routing is quite easy to control from your favourite
HDL, once you know how. This is important to get right as otherwise the
results will not be repeatable.

This Xilinx forum thread gives some examples of placement and routing in VHDL:

https://forums.xilinx.com/t5/UltraScale-Architecture/Gated-ring-oscillator/m-p/808774/highlight/true#M5557

When you say "routing", it doesn't appear to deal with the actual routing.
He does mention that the attributes assign specific I/Os on the LUTs and so
which pin is connected to which is determined. But the routing
interconnects still need to be wired up in the chip editor I believe.

--

Rick C

Viewed the eclipse at Wintercrest Farms,
on the centerline of totality since 1998
 
On Thu, 14 Dec 2017 10:02:25 -0500, rickman <gnuarm@gmail.com> wrote:

krw@notreal.com wrote on 12/14/2017 9:12 AM:
On Thu, 14 Dec 2017 02:10:06 -0500, rickman <gnuarm@gmail.com> wrote:

John Larkin wrote on 12/13/2017 10:43 PM:

I have an async signal, call it TRIG, inside a Zynq 7020.

At the rising edge of TRIG, I want to make an async one-shot. It will
leave the chip as RX and reset some outboard ecl logic. Anything from,
say, 2 ns to 10 ns width would work.

The board is built, and we can't easily add more connections to the
FPGA or hack in glue logic. Well, it would be ugly.

Here are some ideas:

https://www.dropbox.com/s/4azi5hzkqzsyeiy/FPGA_OneShots.JPG?raw=1

We could play with i/o cell slew rates and drive strengths to tune the
timing. And use as many delay stages (circuit B) as we like... there
are tons of unused balls.

Or maybe use some internal delay path, if we can find one that is
reasonably repeatable.

The compiler will probably let us do circuit A or B without whining
much, but might object to the third one.

I grew up on async hairball logic, so this seems reasonable to me, but
my FPGA guys are horrified. We don't want to spin up a 250 MHz PLL
here and do it synchronously, for various reasons.

An internal passive pullup resistor charging an i/o pin capacitance
would be fun, but I don't think we could make a short enough blip.

Any other ideas or comments?

The variation in delay in an FPGA for any given route aren't all that bad,
about the same as with regular logic.

It's higher in FPGAs since the wires are longer (higher capacitance),
though distance between gates may (or may not) be similar. The wires
in an FPGA are "fixed" length, where they are only as long as needed
in an ASIC. There is also a lot of capacitance from all of the muxes
(pass gates) hanging off the wire. The higher magnitude will mean a
higher variation, too.

I assume Xilinx still has a manual
chip editing tool. It will give you delays of routes. So you can do a run
of the chip and manually reroute the one delay path to get your time delay.
There are ways to force placement of the FF with attributes in your source
code. So as long as the routes you need are not used it would be a simple
matter to hand route the same path each time. Getting a 2 ns minimum pulse
width shouldn't be hard at all.

A very poor way of doing things but it may be the only way to make
such a kludge.

You seem to be thinking you can tune the loop with I/O pad delays, but that
will still require manual work in the chip editor to make adjustments each
time you get a different route on the delay path and so need different I/O
pad slew rates.

One other thought is to use some number of LUTs as the main delay element,
there are ways to force the use of such a component in HDL source. By
constraining the placement to cells that are in the same logic block you
will get consistent route delays and routing variation should go away. I
believe it is the inter-logic cell routes that have lots of variation.

The main reason why your FPGA guys are reacting in horror is because they
know what a royal PITA it will be to learn the tools well enough to make all
this happen.

Not to mention the maintenance of this kludge for the life of the
product.

Something like this was done in test equipment from a major manufacturer.
They needed to mux a clock and the delay through the chip needed to be
minimal. I don't recall if the LUT was hand placed or not, but the routing
was done by hand in the chip editor. I found out about it because we had to
touch the chip. My boss was the guy who had originally done this and not
documented a single thing on it. He gave a demonstration on how to do the
hand mod to few of us and that was how he passed the torch, by oral
tradition. lol

What a wonderful way to run a company. I assume they're no longer in
business?
 
On 14 Dec 2017 11:44:34 +0000 (GMT), Theo Markettos
<theom+news@chiark.greenend.org.uk> wrote:

In comp.arch.fpga rickman <gnuarm@gmail.com> wrote:
The main reason why your FPGA guys are reacting in horror is because they
know what a royal PITA it will be to learn the tools well enough to make
all this happen.

Is this a mature product, or one which is likely to see frequent updates?

That may direct your strategy. If it's mature, it might be feasible to use
the ECO tools to manually add cells to an existing design.

If it's still in flux, you probably need to understand how to direct the
tool that this piece of logic needs special treatment and should be
constructed like so. This means it will persist over respins of the rest of
the logic. You will likely still need to verify over a number of respins
that it does in fact persist, given that it's hard to get this right.

Theo

The virtue of putting most of the delay into i/o cells is that they
will behave the same independent of compiles. And the slew/drive
strength params can be set without any hand routing or fighting the
tools to do something they don't want to do. We can hang timing
constraints on the presumably short runs to the dflop to keep that
uncertainty low.

Rob here suggested that an adder/carry chain might have a more
consistent internal delay (it's a fixed structure) than routing
delays, which might change every compile. Maybe a MAC?

I'll ask my folks to add a couple of experiments to the next compile.
We are iterating the design to add and test features once or twice a
week. This thing is maybe 20x as complex as our average design, which
is our excuse for not thinking out everything in advance.




--

John Larkin Highland Technology, Inc

lunatic fringe electronics
 
John Larkin wrote on 12/14/2017 11:54 AM:
On 14 Dec 2017 11:44:34 +0000 (GMT), Theo Markettos
theom+news@chiark.greenend.org.uk> wrote:

In comp.arch.fpga rickman <gnuarm@gmail.com> wrote:
The main reason why your FPGA guys are reacting in horror is because they
know what a royal PITA it will be to learn the tools well enough to make
all this happen.

Is this a mature product, or one which is likely to see frequent updates?

That may direct your strategy. If it's mature, it might be feasible to use
the ECO tools to manually add cells to an existing design.

If it's still in flux, you probably need to understand how to direct the
tool that this piece of logic needs special treatment and should be
constructed like so. This means it will persist over respins of the rest of
the logic. You will likely still need to verify over a number of respins
that it does in fact persist, given that it's hard to get this right.

Theo

The virtue of putting most of the delay into i/o cells is that they
will behave the same independent of compiles. And the slew/drive
strength params can be set without any hand routing or fighting the
tools to do something they don't want to do. We can hang timing
constraints on the presumably short runs to the dflop to keep that
uncertainty low.

Rob here suggested that an adder/carry chain might have a more
consistent internal delay (it's a fixed structure) than routing
delays, which might change every compile. Maybe a MAC?

I'll ask my folks to add a couple of experiments to the next compile.
We are iterating the design to add and test features once or twice a
week. This thing is maybe 20x as complex as our average design, which
is our excuse for not thinking out everything in advance.

There is nothing inherent in the IOBs that makes their delay more consistent
than any other internal logic component. The issue is the routing
*changing* when you recompile the design. *That* will give widely varying
timing unless you lock the placement of each component so there is a direct
route available between them. I have not studied the routing flow of the
newer families of Xilinx devices, but I believe if you stay within a local
block of logic the routing resources have very direct paths, so the timing
with not change appreciably on different passes. This location constraint
can be relative so it simply puts the logic in the same block but allows the
block to "float" anywhere in the device. If you also want to minimize the
delay from the leading edge of the trigger pulse you need to further
constrain the logic to be adjacent to the IOB which should again be able to
use dedicated routing for adjacent blocks.

The adder carry chain has a defined delay just like any other block of
logic, but the delays are very short per bit, I recall around 200 ps, but
may be less in the newer devices. Again, no advantage other than being able
to customize the pulse width with very high resolution which you appear to
have indicated is not important.

The way to reduce variation in the pulse width is to use logic block local
routing which will have much less potential variation. In all cases you
will have PVT variation in timing which you can't do anything about. It is
not clear if the delay time (trigger to leading edge of pulse) is important.
If it is you will likely still need to deal with widely varying routing
delays between blocks (IOB and logic). The delay from trigger input to the
FF can be constrained I believe although I've never used this. So the
routing from the FF to the output should be optimized through placement.

--

Rick C

Viewed the eclipse at Wintercrest Farms,
on the centerline of totality since 1998
 
krw@notreal.com wrote on 12/14/2017 10:35 AM:
On Thu, 14 Dec 2017 10:02:25 -0500, rickman <gnuarm@gmail.com> wrote:

krw@notreal.com wrote on 12/14/2017 9:12 AM:
On Thu, 14 Dec 2017 02:10:06 -0500, rickman <gnuarm@gmail.com> wrote:

John Larkin wrote on 12/13/2017 10:43 PM:

I have an async signal, call it TRIG, inside a Zynq 7020.

At the rising edge of TRIG, I want to make an async one-shot. It will
leave the chip as RX and reset some outboard ecl logic. Anything from,
say, 2 ns to 10 ns width would work.

The board is built, and we can't easily add more connections to the
FPGA or hack in glue logic. Well, it would be ugly.

Here are some ideas:

https://www.dropbox.com/s/4azi5hzkqzsyeiy/FPGA_OneShots.JPG?raw=1

We could play with i/o cell slew rates and drive strengths to tune the
timing. And use as many delay stages (circuit B) as we like... there
are tons of unused balls.

Or maybe use some internal delay path, if we can find one that is
reasonably repeatable.

The compiler will probably let us do circuit A or B without whining
much, but might object to the third one.

I grew up on async hairball logic, so this seems reasonable to me, but
my FPGA guys are horrified. We don't want to spin up a 250 MHz PLL
here and do it synchronously, for various reasons.

An internal passive pullup resistor charging an i/o pin capacitance
would be fun, but I don't think we could make a short enough blip.

Any other ideas or comments?

The variation in delay in an FPGA for any given route aren't all that bad,
about the same as with regular logic.

It's higher in FPGAs since the wires are longer (higher capacitance),
though distance between gates may (or may not) be similar. The wires
in an FPGA are "fixed" length, where they are only as long as needed
in an ASIC. There is also a lot of capacitance from all of the muxes
(pass gates) hanging off the wire. The higher magnitude will mean a
higher variation, too.

I assume Xilinx still has a manual
chip editing tool. It will give you delays of routes. So you can do a run
of the chip and manually reroute the one delay path to get your time delay.
There are ways to force placement of the FF with attributes in your source
code. So as long as the routes you need are not used it would be a simple
matter to hand route the same path each time. Getting a 2 ns minimum pulse
width shouldn't be hard at all.

A very poor way of doing things but it may be the only way to make
such a kludge.

You seem to be thinking you can tune the loop with I/O pad delays, but that
will still require manual work in the chip editor to make adjustments each
time you get a different route on the delay path and so need different I/O
pad slew rates.

One other thought is to use some number of LUTs as the main delay element,
there are ways to force the use of such a component in HDL source. By
constraining the placement to cells that are in the same logic block you
will get consistent route delays and routing variation should go away. I
believe it is the inter-logic cell routes that have lots of variation.

The main reason why your FPGA guys are reacting in horror is because they
know what a royal PITA it will be to learn the tools well enough to make all
this happen.

Not to mention the maintenance of this kludge for the life of the
product.

Something like this was done in test equipment from a major manufacturer.
They needed to mux a clock and the delay through the chip needed to be
minimal. I don't recall if the LUT was hand placed or not, but the routing
was done by hand in the chip editor. I found out about it because we had to
touch the chip. My boss was the guy who had originally done this and not
documented a single thing on it. He gave a demonstration on how to do the
hand mod to few of us and that was how he passed the torch, by oral
tradition. lol

What a wonderful way to run a company. I assume they're no longer in
business?

Ever hear of TTC? I was interviewed with TTC but came on board Acterna.
TTC merged with Dynatech and WWG along with nearly a billion in debt which
ultimately sunk the company, not anything about their designs. I believe
TTC was very well respected in the test equipment world. I don't know who
bought the pieces.

--

Rick C

Viewed the eclipse at Wintercrest Farms,
on the centerline of totality since 1998
 
On 14 Dec 2017 10:49:41 GMT, Allan Herriman
<allanherriman@hotmail.com> wrote:

On Wed, 13 Dec 2017 19:43:20 -0800, John Larkin wrote:

I have an async signal, call it TRIG, inside a Zynq 7020.

At the rising edge of TRIG, I want to make an async one-shot. It will
leave the chip as RX and reset some outboard ecl logic. Anything from,
say, 2 ns to 10 ns width would work.

The board is built, and we can't easily add more connections to the FPGA
or hack in glue logic. Well, it would be ugly.

Here are some ideas:

https://www.dropbox.com/s/4azi5hzkqzsyeiy/FPGA_OneShots.JPG?raw=1

We could play with i/o cell slew rates and drive strengths to tune the
timing. And use as many delay stages (circuit B) as we like... there are
tons of unused balls.

Or maybe use some internal delay path, if we can find one that is
reasonably repeatable.

The compiler will probably let us do circuit A or B without whining
much, but might object to the third one.

I grew up on async hairball logic, so this seems reasonable to me, but
my FPGA guys are horrified. We don't want to spin up a 250 MHz PLL here
and do it synchronously, for various reasons.

An internal passive pullup resistor charging an i/o pin capacitance
would be fun, but I don't think we could make a short enough blip.

Any other ideas or comments?


"Some internal delay mechanism" on your block diagram could be an IDELAY
(or IODELAY), which gives you a calibrated delay that will be independent
of PVT. Of course, it's independent of PVT because you give it (actually
an IDELAY_CTRL) a reference clock at 200MHz (or some other, higher
frequencies). Max delay is a few ns, delay resolution is some tens of ps,
and jitter is some tens of ps as well.

IDELAY sounds ideal for setting my pulse width, because it's
calibrated and tweakable. We'll look into that.

I recently ran some ring oscillator experiments in the same FPGA family.
I used LUTs as delay elements and when coded (in VHDL) such that all
elements were in the same slice and the routing was all in the local
switchbox, I measured a frequency of 945MHz for a 3 element ring. That
should give you some idea of the achievable delays.

I did a ring oscillator to measure chip temperature, on a part that
didn't have an internal sensor.

https://www.dropbox.com/s/hq595kyhlhx5f1y/ESM_Ring_Oscillator.jpg?raw=1



--

John Larkin Highland Technology, Inc
picosecond timing precision measurement

jlarkin att highlandtechnology dott com
http://www.highlandtechnology.com
 
John Larkin wrote on 12/14/2017 1:31 PM:
On 14 Dec 2017 10:49:41 GMT, Allan Herriman
allanherriman@hotmail.com> wrote:

On Wed, 13 Dec 2017 19:43:20 -0800, John Larkin wrote:

I have an async signal, call it TRIG, inside a Zynq 7020.

At the rising edge of TRIG, I want to make an async one-shot. It will
leave the chip as RX and reset some outboard ecl logic. Anything from,
say, 2 ns to 10 ns width would work.

The board is built, and we can't easily add more connections to the FPGA
or hack in glue logic. Well, it would be ugly.

Here are some ideas:

https://www.dropbox.com/s/4azi5hzkqzsyeiy/FPGA_OneShots.JPG?raw=1

We could play with i/o cell slew rates and drive strengths to tune the
timing. And use as many delay stages (circuit B) as we like... there are
tons of unused balls.

Or maybe use some internal delay path, if we can find one that is
reasonably repeatable.

The compiler will probably let us do circuit A or B without whining
much, but might object to the third one.

I grew up on async hairball logic, so this seems reasonable to me, but
my FPGA guys are horrified. We don't want to spin up a 250 MHz PLL here
and do it synchronously, for various reasons.

An internal passive pullup resistor charging an i/o pin capacitance
would be fun, but I don't think we could make a short enough blip.

Any other ideas or comments?


"Some internal delay mechanism" on your block diagram could be an IDELAY
(or IODELAY), which gives you a calibrated delay that will be independent
of PVT. Of course, it's independent of PVT because you give it (actually
an IDELAY_CTRL) a reference clock at 200MHz (or some other, higher
frequencies). Max delay is a few ns, delay resolution is some tens of ps,
and jitter is some tens of ps as well.

IDELAY sounds ideal for setting my pulse width, because it's
calibrated and tweakable. We'll look into that.

It still has to be routed to the logic. So you haven't worked around the
problem of wildly variable routing delays which add to your pulse width.

As usual John is trying to work with things he doesn't understand by
applying methods he has used on totally unrelated designs. Sit down and
draw a block diagram showing not just the delay you wish to control, but the
delays on every bit of wire in the design. Then maybe the picture will emerge.

--

Rick C

Viewed the eclipse at Wintercrest Farms,
on the centerline of totality since 1998
 
John Larkin wrote:

I have an async signal, call it TRIG, inside a Zynq 7020.

At the rising edge of TRIG, I want to make an async one-shot. It will
leave the chip as RX and reset some outboard ecl logic. Anything from,
say, 2 ns to 10 ns width would work.
This signal is generated within the FPGA and sent out? And, you just want
to stretch it? Make a counter with a few bits, set it to zero when TRIG
occurs, and count up at the available clock rate, and generate RX. When the
counter reaches the max, turn off RX and don't increment the counter again.

This is so simple, I must be misunderstanding what you want to do.

Jon
 
On Thu, 14 Dec 2017 12:50:30 -0500, rickman <gnuarm@gmail.com> wrote:

krw@notreal.com wrote on 12/14/2017 10:35 AM:
On Thu, 14 Dec 2017 10:02:25 -0500, rickman <gnuarm@gmail.com> wrote:

krw@notreal.com wrote on 12/14/2017 9:12 AM:
On Thu, 14 Dec 2017 02:10:06 -0500, rickman <gnuarm@gmail.com> wrote:

John Larkin wrote on 12/13/2017 10:43 PM:

I have an async signal, call it TRIG, inside a Zynq 7020.

At the rising edge of TRIG, I want to make an async one-shot. It will
leave the chip as RX and reset some outboard ecl logic. Anything from,
say, 2 ns to 10 ns width would work.

The board is built, and we can't easily add more connections to the
FPGA or hack in glue logic. Well, it would be ugly.

Here are some ideas:

https://www.dropbox.com/s/4azi5hzkqzsyeiy/FPGA_OneShots.JPG?raw=1

We could play with i/o cell slew rates and drive strengths to tune the
timing. And use as many delay stages (circuit B) as we like... there
are tons of unused balls.

Or maybe use some internal delay path, if we can find one that is
reasonably repeatable.

The compiler will probably let us do circuit A or B without whining
much, but might object to the third one.

I grew up on async hairball logic, so this seems reasonable to me, but
my FPGA guys are horrified. We don't want to spin up a 250 MHz PLL
here and do it synchronously, for various reasons.

An internal passive pullup resistor charging an i/o pin capacitance
would be fun, but I don't think we could make a short enough blip.

Any other ideas or comments?

The variation in delay in an FPGA for any given route aren't all that bad,
about the same as with regular logic.

It's higher in FPGAs since the wires are longer (higher capacitance),
though distance between gates may (or may not) be similar. The wires
in an FPGA are "fixed" length, where they are only as long as needed
in an ASIC. There is also a lot of capacitance from all of the muxes
(pass gates) hanging off the wire. The higher magnitude will mean a
higher variation, too.

I assume Xilinx still has a manual
chip editing tool. It will give you delays of routes. So you can do a run
of the chip and manually reroute the one delay path to get your time delay.
There are ways to force placement of the FF with attributes in your source
code. So as long as the routes you need are not used it would be a simple
matter to hand route the same path each time. Getting a 2 ns minimum pulse
width shouldn't be hard at all.

A very poor way of doing things but it may be the only way to make
such a kludge.

You seem to be thinking you can tune the loop with I/O pad delays, but that
will still require manual work in the chip editor to make adjustments each
time you get a different route on the delay path and so need different I/O
pad slew rates.

One other thought is to use some number of LUTs as the main delay element,
there are ways to force the use of such a component in HDL source. By
constraining the placement to cells that are in the same logic block you
will get consistent route delays and routing variation should go away. I
believe it is the inter-logic cell routes that have lots of variation.

The main reason why your FPGA guys are reacting in horror is because they
know what a royal PITA it will be to learn the tools well enough to make all
this happen.

Not to mention the maintenance of this kludge for the life of the
product.

Something like this was done in test equipment from a major manufacturer.
They needed to mux a clock and the delay through the chip needed to be
minimal. I don't recall if the LUT was hand placed or not, but the routing
was done by hand in the chip editor. I found out about it because we had to
touch the chip. My boss was the guy who had originally done this and not
documented a single thing on it. He gave a demonstration on how to do the
hand mod to few of us and that was how he passed the torch, by oral
tradition. lol

What a wonderful way to run a company. I assume they're no longer in
business?

Ever hear of TTC?

No. I think I know why.

I was interviewed with TTC but came on board Acterna.
TTC merged with Dynatech and WWG along with nearly a billion in debt which
ultimately sunk the company, not anything about their designs. I believe
TTC was very well respected in the test equipment world. I don't know who
bought the pieces.

Makes complete sense. Knowledge locked up in someone's head =>
company in pieces.
 
krw@notreal.com wrote on 12/14/2017 9:59 PM:
On Thu, 14 Dec 2017 12:50:30 -0500, rickman <gnuarm@gmail.com> wrote:

krw@notreal.com wrote on 12/14/2017 10:35 AM:
On Thu, 14 Dec 2017 10:02:25 -0500, rickman <gnuarm@gmail.com> wrote:

krw@notreal.com wrote on 12/14/2017 9:12 AM:
On Thu, 14 Dec 2017 02:10:06 -0500, rickman <gnuarm@gmail.com> wrote:

John Larkin wrote on 12/13/2017 10:43 PM:

I have an async signal, call it TRIG, inside a Zynq 7020.

At the rising edge of TRIG, I want to make an async one-shot. It will
leave the chip as RX and reset some outboard ecl logic. Anything from,
say, 2 ns to 10 ns width would work.

The board is built, and we can't easily add more connections to the
FPGA or hack in glue logic. Well, it would be ugly.

Here are some ideas:

https://www.dropbox.com/s/4azi5hzkqzsyeiy/FPGA_OneShots.JPG?raw=1

We could play with i/o cell slew rates and drive strengths to tune the
timing. And use as many delay stages (circuit B) as we like... there
are tons of unused balls.

Or maybe use some internal delay path, if we can find one that is
reasonably repeatable.

The compiler will probably let us do circuit A or B without whining
much, but might object to the third one.

I grew up on async hairball logic, so this seems reasonable to me, but
my FPGA guys are horrified. We don't want to spin up a 250 MHz PLL
here and do it synchronously, for various reasons.

An internal passive pullup resistor charging an i/o pin capacitance
would be fun, but I don't think we could make a short enough blip.

Any other ideas or comments?

The variation in delay in an FPGA for any given route aren't all that bad,
about the same as with regular logic.

It's higher in FPGAs since the wires are longer (higher capacitance),
though distance between gates may (or may not) be similar. The wires
in an FPGA are "fixed" length, where they are only as long as needed
in an ASIC. There is also a lot of capacitance from all of the muxes
(pass gates) hanging off the wire. The higher magnitude will mean a
higher variation, too.

I assume Xilinx still has a manual
chip editing tool. It will give you delays of routes. So you can do a run
of the chip and manually reroute the one delay path to get your time delay.
There are ways to force placement of the FF with attributes in your source
code. So as long as the routes you need are not used it would be a simple
matter to hand route the same path each time. Getting a 2 ns minimum pulse
width shouldn't be hard at all.

A very poor way of doing things but it may be the only way to make
such a kludge.

You seem to be thinking you can tune the loop with I/O pad delays, but that
will still require manual work in the chip editor to make adjustments each
time you get a different route on the delay path and so need different I/O
pad slew rates.

One other thought is to use some number of LUTs as the main delay element,
there are ways to force the use of such a component in HDL source. By
constraining the placement to cells that are in the same logic block you
will get consistent route delays and routing variation should go away. I
believe it is the inter-logic cell routes that have lots of variation.

The main reason why your FPGA guys are reacting in horror is because they
know what a royal PITA it will be to learn the tools well enough to make all
this happen.

Not to mention the maintenance of this kludge for the life of the
product.

Something like this was done in test equipment from a major manufacturer.
They needed to mux a clock and the delay through the chip needed to be
minimal. I don't recall if the LUT was hand placed or not, but the routing
was done by hand in the chip editor. I found out about it because we had to
touch the chip. My boss was the guy who had originally done this and not
documented a single thing on it. He gave a demonstration on how to do the
hand mod to few of us and that was how he passed the torch, by oral
tradition. lol

What a wonderful way to run a company. I assume they're no longer in
business?

Ever hear of TTC?

No. I think I know why.

I was interviewed with TTC but came on board Acterna.
TTC merged with Dynatech and WWG along with nearly a billion in debt which
ultimately sunk the company, not anything about their designs. I believe
TTC was very well respected in the test equipment world. I don't know who
bought the pieces.

Makes complete sense. Knowledge locked up in someone's head =
company in pieces.

You didn't read a word I wrote. The company failed because like so many in
the dot com bubble they didn't see it was a bubble and borrowed a shit-ton
of money to expand, then when the bubble burst they couldn't pay the debt.
it had nothing to do with the dick-head I had for a boss.

The dick-head came from the company's history developing as a startup and
having loose engineering management principles. So he didn't know any
better because they never taught him any better. But the company was a
technical success. Ask anyone who works in telecom if they've ever used a
T-BERD. It is highly regarded test equipment from what I've heard and very
widely used.

--

Rick C

Viewed the eclipse at Wintercrest Farms,
on the centerline of totality since 1998
 
On Thu, 14 Dec 2017 10:18:25 -0500, rickman wrote:

Allan Herriman wrote on 12/14/2017 6:39 AM:
On Thu, 14 Dec 2017 10:49:41 +0000, Allan Herriman wrote:


The placement and routing is quite easy to control from your favourite
HDL, once you know how. This is important to get right as otherwise
the
results will not be repeatable.

This Xilinx forum thread gives some examples of placement and routing
in VHDL:

https://forums.xilinx.com/t5/UltraScale-Architecture/Gated-ring-
oscillator/m-p/808774/highlight/true#M5557

When you say "routing", it doesn't appear to deal with the actual
routing.
He does mention that the attributes assign specific I/Os on the LUTs
and so
which pin is connected to which is determined. But the routing
interconnects still need to be wired up in the chip editor I believe.

There is no manual step needed. Once you lock the pins, the routing will
be fixed (to an extent).

I haven't done manual routing on an FPGA since the '90s.

I haven't done manual placement for a few months. Even then, it was all
in the form of relative placements in HDL, so the tools still have the
ability to move the entire macro around on the die.

Allan
 
On Fri, 15 Dec 2017 00:56:37 -0500, rickman <gnuarm@gmail.com> wrote:

krw@notreal.com wrote on 12/14/2017 9:59 PM:
On Thu, 14 Dec 2017 12:50:30 -0500, rickman <gnuarm@gmail.com> wrote:

krw@notreal.com wrote on 12/14/2017 10:35 AM:
On Thu, 14 Dec 2017 10:02:25 -0500, rickman <gnuarm@gmail.com> wrote:

krw@notreal.com wrote on 12/14/2017 9:12 AM:
On Thu, 14 Dec 2017 02:10:06 -0500, rickman <gnuarm@gmail.com> wrote:

John Larkin wrote on 12/13/2017 10:43 PM:

I have an async signal, call it TRIG, inside a Zynq 7020.

At the rising edge of TRIG, I want to make an async one-shot. It will
leave the chip as RX and reset some outboard ecl logic. Anything from,
say, 2 ns to 10 ns width would work.

The board is built, and we can't easily add more connections to the
FPGA or hack in glue logic. Well, it would be ugly.

Here are some ideas:

https://www.dropbox.com/s/4azi5hzkqzsyeiy/FPGA_OneShots.JPG?raw=1

We could play with i/o cell slew rates and drive strengths to tune the
timing. And use as many delay stages (circuit B) as we like... there
are tons of unused balls.

Or maybe use some internal delay path, if we can find one that is
reasonably repeatable.

The compiler will probably let us do circuit A or B without whining
much, but might object to the third one.

I grew up on async hairball logic, so this seems reasonable to me, but
my FPGA guys are horrified. We don't want to spin up a 250 MHz PLL
here and do it synchronously, for various reasons.

An internal passive pullup resistor charging an i/o pin capacitance
would be fun, but I don't think we could make a short enough blip.

Any other ideas or comments?

The variation in delay in an FPGA for any given route aren't all that bad,
about the same as with regular logic.

It's higher in FPGAs since the wires are longer (higher capacitance),
though distance between gates may (or may not) be similar. The wires
in an FPGA are "fixed" length, where they are only as long as needed
in an ASIC. There is also a lot of capacitance from all of the muxes
(pass gates) hanging off the wire. The higher magnitude will mean a
higher variation, too.

I assume Xilinx still has a manual
chip editing tool. It will give you delays of routes. So you can do a run
of the chip and manually reroute the one delay path to get your time delay.
There are ways to force placement of the FF with attributes in your source
code. So as long as the routes you need are not used it would be a simple
matter to hand route the same path each time. Getting a 2 ns minimum pulse
width shouldn't be hard at all.

A very poor way of doing things but it may be the only way to make
such a kludge.

You seem to be thinking you can tune the loop with I/O pad delays, but that
will still require manual work in the chip editor to make adjustments each
time you get a different route on the delay path and so need different I/O
pad slew rates.

One other thought is to use some number of LUTs as the main delay element,
there are ways to force the use of such a component in HDL source. By
constraining the placement to cells that are in the same logic block you
will get consistent route delays and routing variation should go away. I
believe it is the inter-logic cell routes that have lots of variation.

The main reason why your FPGA guys are reacting in horror is because they
know what a royal PITA it will be to learn the tools well enough to make all
this happen.

Not to mention the maintenance of this kludge for the life of the
product.

Something like this was done in test equipment from a major manufacturer.
They needed to mux a clock and the delay through the chip needed to be
minimal. I don't recall if the LUT was hand placed or not, but the routing
was done by hand in the chip editor. I found out about it because we had to
touch the chip. My boss was the guy who had originally done this and not
documented a single thing on it. He gave a demonstration on how to do the
hand mod to few of us and that was how he passed the torch, by oral
tradition. lol

What a wonderful way to run a company. I assume they're no longer in
business?

Ever hear of TTC?

No. I think I know why.

I was interviewed with TTC but came on board Acterna.
TTC merged with Dynatech and WWG along with nearly a billion in debt which
ultimately sunk the company, not anything about their designs. I believe
TTC was very well respected in the test equipment world. I don't know who
bought the pieces.

Makes complete sense. Knowledge locked up in someone's head =
company in pieces.

You didn't read a word I wrote.

You're lying, as always.

The company failed because like so many in
the dot com bubble they didn't see it was a bubble and borrowed a shit-ton
of money to expand, then when the bubble burst they couldn't pay the debt.
it had nothing to do with the dick-head I had for a boss.

No, they failed because their business practices sucked and their
people sucked more.

The dick-head came from the company's history developing as a startup and
having loose engineering management principles. So he didn't know any
better because they never taught him any better. But the company was a
technical success. Ask anyone who works in telecom if they've ever used a
T-BERD. It is highly regarded test equipment from what I've heard and very
widely used.

So you agree with me. Funny that.
 
On 15 Dec 2017 10:21:42 GMT, Allan Herriman
<allanherriman@hotmail.com> wrote:

On Thu, 14 Dec 2017 10:18:25 -0500, rickman wrote:

Allan Herriman wrote on 12/14/2017 6:39 AM:
On Thu, 14 Dec 2017 10:49:41 +0000, Allan Herriman wrote:


The placement and routing is quite easy to control from your favourite
HDL, once you know how. This is important to get right as otherwise
the
results will not be repeatable.

This Xilinx forum thread gives some examples of placement and routing
in VHDL:

https://forums.xilinx.com/t5/UltraScale-Architecture/Gated-ring-
oscillator/m-p/808774/highlight/true#M5557

When you say "routing", it doesn't appear to deal with the actual
routing.
He does mention that the attributes assign specific I/Os on the LUTs
and so
which pin is connected to which is determined. But the routing
interconnects still need to be wired up in the chip editor I believe.

There is no manual step needed. Once you lock the pins, the routing will
be fixed (to an extent).

Nonsense. The next release of the routing software, or even the next
modification to the HDL may completely change the routing.
I haven't done manual routing on an FPGA since the '90s.

Have you tuned for exact timing of asynchronous timing?

I haven't done manual placement for a few months. Even then, it was all
in the form of relative placements in HDL, so the tools still have the
ability to move the entire macro around on the die.

....and you're not controlling the largest variable, the routing.
 
Allan Herriman wrote on 12/15/2017 5:21 AM:
On Thu, 14 Dec 2017 10:18:25 -0500, rickman wrote:

Allan Herriman wrote on 12/14/2017 6:39 AM:
On Thu, 14 Dec 2017 10:49:41 +0000, Allan Herriman wrote:


The placement and routing is quite easy to control from your favourite
HDL, once you know how. This is important to get right as otherwise
the
results will not be repeatable.

This Xilinx forum thread gives some examples of placement and routing
in VHDL:

https://forums.xilinx.com/t5/UltraScale-Architecture/Gated-ring-
oscillator/m-p/808774/highlight/true#M5557

When you say "routing", it doesn't appear to deal with the actual
routing.
He does mention that the attributes assign specific I/Os on the LUTs
and so
which pin is connected to which is determined. But the routing
interconnects still need to be wired up in the chip editor I believe.

There is no manual step needed. Once you lock the pins, the routing will
be fixed (to an extent).

"To an extent" is one of those things where the devil is in the details.
Routing within a block is dedicated and either a path exists or it doesn't.
But routing between blocks is much more flexible and so subject to
variations depending on what the software chooses.

It has been a long time since I opened up a Xilinx chip in the editor, but
the interconnects have always been in levels. There are direct links
between adjacent blocks, north, east, south and west. These are the fastest
and not subject to much variation from routing choices since there aren't
any choices other than use them or not.

See the diagonal traces between blocks in the chip editor image in the link
you provided? Those will not be direct traces. The software will have to
pick a route through the routing matrix which can be very different from run
to run. When the circuit has an async feedback path, the software can't
time it, so there is no way to place a timing constraint to make it use a
faster path. So the tools won't help you here. This has to be addressed
manually on each run of the tools.

You might be able to put together a script to semi-automate it, but you will
need to handle this outside of the normal automated tool usage. It should
be completely documented so it can be repeated anytime the device is touched.


I haven't done manual routing on an FPGA since the '90s.

I haven't done manual placement for a few months. Even then, it was all
in the form of relative placements in HDL, so the tools still have the
ability to move the entire macro around on the die.

Placement is easy, it can be done in the HDL. Routing is another matter
altogether.

There are (or were) things called "hard" macros. This is a piece of logic
with routing that is already done and can be dropped into a design as an
entity. I don't know if they are supported in HDL. I don't know they are
still supported at all. Their down side is that nothing could be routed
through the hard macro since the tools have no idea what can be used and
what can't. I suspect it was just too hard a problem to deal with for the
routing tools to work optimally. This is similar to the issues of partial
reconfiguration. The partial reconfiguration was the easy part. Developing
tools to support designing modules to use in partial reconfiguration was the
hard part. Same problem, getting the tools to effectively route in just a
portion of the chip.

--

Rick C

Viewed the eclipse at Wintercrest Farms,
on the centerline of totality since 1998
 
On Fri, 15 Dec 2017 10:57:50 -0500, rickman wrote:

Allan Herriman wrote on 12/15/2017 5:21 AM:
On Thu, 14 Dec 2017 10:18:25 -0500, rickman wrote:

Allan Herriman wrote on 12/14/2017 6:39 AM:
On Thu, 14 Dec 2017 10:49:41 +0000, Allan Herriman wrote:


The placement and routing is quite easy to control from your
favourite HDL, once you know how. This is important to get right as
otherwise
the
results will not be repeatable.

This Xilinx forum thread gives some examples of placement and routing
in VHDL:

https://forums.xilinx.com/t5/UltraScale-Architecture/Gated-ring-
oscillator/m-p/808774/highlight/true#M5557

When you say "routing", it doesn't appear to deal with the actual
routing.
He does mention that the attributes assign specific I/Os on the LUTs
and so
which pin is connected to which is determined. But the routing
interconnects still need to be wired up in the chip editor I believe.

There is no manual step needed. Once you lock the pins, the routing
will be fixed (to an extent).

"To an extent" is one of those things where the devil is in the details.
Routing within a block is dedicated and either a path exists or it
doesn't. But routing between blocks is much more flexible and so subject
to variations depending on what the software chooses.

In general, I agree with that. Look sideways at the PAR software or make
a trivial change to the source code and it will do something completely
different, and often something unexpected. Sometimes it will do
something wildly suboptimal.

But in this case, I believe Larkin has just one connection that will be
contained within a single switch box next to the IOB. It will not use
any inter-switchbox routing (with all the uncertainty that entails).

My experience has been that in this particular case (and this particular
case only), locking the pins correctly may remove any choice the router
has, resulting in repeatable routing. (I won't say repeatable timing,
because we still have PVT to worry about.)

Depending on the exact switchbox resources used, this may also require
that other routing be prohibited in that area to work.


Perhaps I should point out that whilst I have done some of this sort of
manual placement and routing recently, I have not done the exact route of
IBUF output to OUTFF clear input. Sometimes there are quirks that do not
become apparent until the design hits the tools.


> It has been a long time since I opened up a Xilinx chip in the editor,

'nuf said.


Allan
 
Allan Herriman wrote on 12/16/2017 4:50 AM:
On Fri, 15 Dec 2017 10:57:50 -0500, rickman wrote:

Allan Herriman wrote on 12/15/2017 5:21 AM:
On Thu, 14 Dec 2017 10:18:25 -0500, rickman wrote:

Allan Herriman wrote on 12/14/2017 6:39 AM:
On Thu, 14 Dec 2017 10:49:41 +0000, Allan Herriman wrote:


The placement and routing is quite easy to control from your
favourite HDL, once you know how. This is important to get right as
otherwise
the
results will not be repeatable.

This Xilinx forum thread gives some examples of placement and routing
in VHDL:

https://forums.xilinx.com/t5/UltraScale-Architecture/Gated-ring-
oscillator/m-p/808774/highlight/true#M5557

When you say "routing", it doesn't appear to deal with the actual
routing.
He does mention that the attributes assign specific I/Os on the LUTs
and so
which pin is connected to which is determined. But the routing
interconnects still need to be wired up in the chip editor I believe.

There is no manual step needed. Once you lock the pins, the routing
will be fixed (to an extent).

"To an extent" is one of those things where the devil is in the details.
Routing within a block is dedicated and either a path exists or it
doesn't. But routing between blocks is much more flexible and so subject
to variations depending on what the software chooses.

In general, I agree with that. Look sideways at the PAR software or make
a trivial change to the source code and it will do something completely
different, and often something unexpected. Sometimes it will do
something wildly suboptimal.

But in this case, I believe Larkin has just one connection that will be
contained within a single switch box next to the IOB. It will not use
any inter-switchbox routing (with all the uncertainty that entails).

My experience has been that in this particular case (and this particular
case only), locking the pins correctly may remove any choice the router
has, resulting in repeatable routing. (I won't say repeatable timing,
because we still have PVT to worry about.)

Depending on the exact switchbox resources used, this may also require
that other routing be prohibited in that area to work.


Perhaps I should point out that whilst I have done some of this sort of
manual placement and routing recently, I have not done the exact route of
IBUF output to OUTFF clear input. Sometimes there are quirks that do not
become apparent until the design hits the tools.

I guess my work with Xilinx parts is getting old. I didn't remember the IOB
FFs *having* accessible async Clear/Preset inputs which would have required
a FF from the fabric. But I looked at the Xynq data sheet and the IOB FF
have accessible Clear/Preset inputs. So there will be routing on the
general fabric as I expect there is no direct connection between the input
and the Clear pin within the IOB.

As to your presumption of this removing "any choice the router has", I think
that is fallacious. The switch matrix is a general purpose routing medium
and I have seen the tool do, as you say, "wildly suboptimal" routes. The
only way to tell is to give it a try.

--

Rick C

Viewed the eclipse at Wintercrest Farms,
on the centerline of totality since 1998
 
On Sat, 16 Dec 2017 09:27:37 -0500, rickman wrote:

Allan Herriman wrote on 12/16/2017 4:50 AM:
On Fri, 15 Dec 2017 10:57:50 -0500, rickman wrote:

Allan Herriman wrote on 12/15/2017 5:21 AM:
On Thu, 14 Dec 2017 10:18:25 -0500, rickman wrote:

Allan Herriman wrote on 12/14/2017 6:39 AM:
On Thu, 14 Dec 2017 10:49:41 +0000, Allan Herriman wrote:


The placement and routing is quite easy to control from your
favourite HDL, once you know how. This is important to get right
as otherwise
the
results will not be repeatable.

This Xilinx forum thread gives some examples of placement and
routing
in VHDL:

https://forums.xilinx.com/t5/UltraScale-Architecture/Gated-ring-
oscillator/m-p/808774/highlight/true#M5557

When you say "routing", it doesn't appear to deal with the actual
routing.
He does mention that the attributes assign specific I/Os on the LUTs
and so
which pin is connected to which is determined. But the routing
interconnects still need to be wired up in the chip editor I
believe.

There is no manual step needed. Once you lock the pins, the routing
will be fixed (to an extent).

"To an extent" is one of those things where the devil is in the
details.
Routing within a block is dedicated and either a path exists or it
doesn't. But routing between blocks is much more flexible and so
subject to variations depending on what the software chooses.

In general, I agree with that. Look sideways at the PAR software or
make a trivial change to the source code and it will do something
completely different, and often something unexpected. Sometimes it
will do something wildly suboptimal.

But in this case, I believe Larkin has just one connection that will be
contained within a single switch box next to the IOB. It will not use
any inter-switchbox routing (with all the uncertainty that entails).

My experience has been that in this particular case (and this
particular case only), locking the pins correctly may remove any choice
the router has, resulting in repeatable routing. (I won't say
repeatable timing, because we still have PVT to worry about.)

Depending on the exact switchbox resources used, this may also require
that other routing be prohibited in that area to work.


Perhaps I should point out that whilst I have done some of this sort of
manual placement and routing recently, I have not done the exact route
of IBUF output to OUTFF clear input. Sometimes there are quirks that
do not become apparent until the design hits the tools.

I guess my work with Xilinx parts is getting old. I didn't remember the
IOB FFs *having* accessible async Clear/Preset inputs which would have
required a FF from the fabric. But I looked at the Xynq data sheet and
the IOB FF have accessible Clear/Preset inputs. So there will be
routing on the general fabric as I expect there is no direct connection
between the input and the Clear pin within the IOB.

I can't try it here (I'm not at work and I deliberately don't have the
tools installed at home) but I believe both signals appear on the same
switchbox, which is about as close to a direct connection as one can get
outside a slice.


As to your presumption of this removing "any choice the router has", I
think that is fallacious. The switch matrix is a general purpose
routing medium and I have seen the tool do, as you say, "wildly
suboptimal" routes.

I'm fairly sure that rather than being general purpose, the switch matrix
is sparse and doesn't support all input to output connections. (The
exact details are not documented publicly.) With pin locking, one can
force particular paths through the switchbox. This is based on my
observations of tool behaviour rather than an inspection of the die, thus
there is a chance that it is wrong as you suggest.

Please note that I'm only referring to intra-switchbox routing. All bets
are off once the routing goes outside the local switchbox.


> The only way to tell is to give it a try.

Oh yes.


Allan
 
Allan Herriman wrote on 12/16/2017 11:32 AM:
On Sat, 16 Dec 2017 09:27:37 -0500, rickman wrote:

Allan Herriman wrote on 12/16/2017 4:50 AM:
On Fri, 15 Dec 2017 10:57:50 -0500, rickman wrote:

Allan Herriman wrote on 12/15/2017 5:21 AM:
On Thu, 14 Dec 2017 10:18:25 -0500, rickman wrote:

Allan Herriman wrote on 12/14/2017 6:39 AM:
On Thu, 14 Dec 2017 10:49:41 +0000, Allan Herriman wrote:


The placement and routing is quite easy to control from your
favourite HDL, once you know how. This is important to get right
as otherwise
the
results will not be repeatable.

This Xilinx forum thread gives some examples of placement and
routing
in VHDL:

https://forums.xilinx.com/t5/UltraScale-Architecture/Gated-ring-
oscillator/m-p/808774/highlight/true#M5557

When you say "routing", it doesn't appear to deal with the actual
routing.
He does mention that the attributes assign specific I/Os on the LUTs
and so
which pin is connected to which is determined. But the routing
interconnects still need to be wired up in the chip editor I
believe.

There is no manual step needed. Once you lock the pins, the routing
will be fixed (to an extent).

"To an extent" is one of those things where the devil is in the
details.
Routing within a block is dedicated and either a path exists or it
doesn't. But routing between blocks is much more flexible and so
subject to variations depending on what the software chooses.

In general, I agree with that. Look sideways at the PAR software or
make a trivial change to the source code and it will do something
completely different, and often something unexpected. Sometimes it
will do something wildly suboptimal.

But in this case, I believe Larkin has just one connection that will be
contained within a single switch box next to the IOB. It will not use
any inter-switchbox routing (with all the uncertainty that entails).

My experience has been that in this particular case (and this
particular case only), locking the pins correctly may remove any choice
the router has, resulting in repeatable routing. (I won't say
repeatable timing, because we still have PVT to worry about.)

Depending on the exact switchbox resources used, this may also require
that other routing be prohibited in that area to work.


Perhaps I should point out that whilst I have done some of this sort of
manual placement and routing recently, I have not done the exact route
of IBUF output to OUTFF clear input. Sometimes there are quirks that
do not become apparent until the design hits the tools.

I guess my work with Xilinx parts is getting old. I didn't remember the
IOB FFs *having* accessible async Clear/Preset inputs which would have
required a FF from the fabric. But I looked at the Xynq data sheet and
the IOB FF have accessible Clear/Preset inputs. So there will be
routing on the general fabric as I expect there is no direct connection
between the input and the Clear pin within the IOB.


I can't try it here (I'm not at work and I deliberately don't have the
tools installed at home) but I believe both signals appear on the same
switchbox, which is about as close to a direct connection as one can get
outside a slice.


As to your presumption of this removing "any choice the router has", I
think that is fallacious. The switch matrix is a general purpose
routing medium and I have seen the tool do, as you say, "wildly
suboptimal" routes.


I'm fairly sure that rather than being general purpose, the switch matrix
is sparse and doesn't support all input to output connections. (The
exact details are not documented publicly.) With pin locking, one can
force particular paths through the switchbox. This is based on my
observations of tool behaviour rather than an inspection of the die, thus
there is a chance that it is wrong as you suggest.

Please note that I'm only referring to intra-switchbox routing. All bets
are off once the routing goes outside the local switchbox.


The only way to tell is to give it a try.

Oh yes.

You are more familiar with the newer devices than I am. The issue isn't
even if you can get the route through the local switchbox. It is whether it
will always pick the same route. As you say, the switch matrix is somewhat
sparse, but the issue is whether it goes through a single switchbox or not.
I guess we'll find out when John tries it.

I thought the problem was going to be the lack of a reset pin on the IOB FF
which would have forced the use of a fabric FF with routing to/from the IOB.
Then I think the locking of LUTs (for delay) and the FF to a single CLB
would have been the approach with the best shot at producing a controlled
pulse width even if the routing delay to the IOB would have been variable.
But that can be constrained since it is a path from the "clock" (trigger
input) to the output pin.

--

Rick C

Viewed the eclipse at Wintercrest Farms,
on the centerline of totality since 1998
 
John Larkin wrote:
I have an async signal, call it TRIG, inside a Zynq 7020.

snip

Any other ideas or comments?

State machines?

--
Les Cargill
 
On Sat, 16 Dec 2017 12:44:35 -0500, rickman wrote:

Allan Herriman wrote on 12/16/2017 11:32 AM:
On Sat, 16 Dec 2017 09:27:37 -0500, rickman wrote:

Allan Herriman wrote on 12/16/2017 4:50 AM:
On Fri, 15 Dec 2017 10:57:50 -0500, rickman wrote:

Allan Herriman wrote on 12/15/2017 5:21 AM:
On Thu, 14 Dec 2017 10:18:25 -0500, rickman wrote:

Allan Herriman wrote on 12/14/2017 6:39 AM:
On Thu, 14 Dec 2017 10:49:41 +0000, Allan Herriman wrote:


The placement and routing is quite easy to control from your
favourite HDL, once you know how. This is important to get
right as otherwise
the
results will not be repeatable.

This Xilinx forum thread gives some examples of placement and
routing
in VHDL:

https://forums.xilinx.com/t5/UltraScale-Architecture/Gated-ring-
oscillator/m-p/808774/highlight/true#M5557

When you say "routing", it doesn't appear to deal with the actual
routing.
He does mention that the attributes assign specific I/Os on the
LUTs
and so
which pin is connected to which is determined. But the routing
interconnects still need to be wired up in the chip editor I
believe.

There is no manual step needed. Once you lock the pins, the
routing will be fixed (to an extent).

"To an extent" is one of those things where the devil is in the
details.
Routing within a block is dedicated and either a path exists or it
doesn't. But routing between blocks is much more flexible and so
subject to variations depending on what the software chooses.

In general, I agree with that. Look sideways at the PAR software or
make a trivial change to the source code and it will do something
completely different, and often something unexpected. Sometimes it
will do something wildly suboptimal.

But in this case, I believe Larkin has just one connection that will
be contained within a single switch box next to the IOB. It will not
use any inter-switchbox routing (with all the uncertainty that
entails).

My experience has been that in this particular case (and this
particular case only), locking the pins correctly may remove any
choice the router has, resulting in repeatable routing. (I won't say
repeatable timing, because we still have PVT to worry about.)

Depending on the exact switchbox resources used, this may also
require that other routing be prohibited in that area to work.


Perhaps I should point out that whilst I have done some of this sort
of manual placement and routing recently, I have not done the exact
route of IBUF output to OUTFF clear input. Sometimes there are
quirks that do not become apparent until the design hits the tools.

I guess my work with Xilinx parts is getting old. I didn't remember
the IOB FFs *having* accessible async Clear/Preset inputs which would
have required a FF from the fabric. But I looked at the Xynq data
sheet and the IOB FF have accessible Clear/Preset inputs. So there
will be routing on the general fabric as I expect there is no direct
connection between the input and the Clear pin within the IOB.


I can't try it here (I'm not at work and I deliberately don't have the
tools installed at home) but I believe both signals appear on the same
switchbox, which is about as close to a direct connection as one can
get outside a slice.


As to your presumption of this removing "any choice the router has", I
think that is fallacious. The switch matrix is a general purpose
routing medium and I have seen the tool do, as you say, "wildly
suboptimal" routes.


I'm fairly sure that rather than being general purpose, the switch
matrix is sparse and doesn't support all input to output connections.
(The exact details are not documented publicly.) With pin locking, one
can force particular paths through the switchbox. This is based on my
observations of tool behaviour rather than an inspection of the die,
thus there is a chance that it is wrong as you suggest.

Please note that I'm only referring to intra-switchbox routing. All
bets are off once the routing goes outside the local switchbox.


The only way to tell is to give it a try.

Oh yes.

You are more familiar with the newer devices than I am. The issue isn't
even if you can get the route through the local switchbox. It is
whether it will always pick the same route. As you say, the switch
matrix is somewhat sparse, but the issue is whether it goes through a
single switchbox or not. I guess we'll find out when John tries it.

I thought the problem was going to be the lack of a reset pin on the IOB
FF which would have forced the use of a fabric FF with routing to/from
the IOB.
Then I think the locking of LUTs (for delay) and the FF to a single
CLB
would have been the approach with the best shot at producing a
controlled pulse width even if the routing delay to the IOB would have
been variable. But that can be constrained since it is a path from the
"clock" (trigger input) to the output pin.

I ran an experiment today at work. I used the following VHDL source in
the smallest Kintex 7 part (which has the same fabric as the OP's Zynq).
The net obuf_input (FF Q to pin driver) used dedicated routing and didn't
go through any switchboxes at all.

The net ibuf_output (which connects back to the FF reset input) was
restricted to the local switchbox as expected. It needed multiple passes
though the switchbox though, as clearly this isn't a connection that
Xilinx expects customers to use often.
I didn't check, but I imagine that the path through the switchbox would
change if other routing was also passing through the switchbox (which
does happen in a dense design).

I have not simulated this code or tested it in any way. Use at own risk.


library ieee;
use ieee.std_logic_1164.all;
library unisim;
use unisim.vcomponents.all;


entity larkin_oneshot is
generic (
IOSTANDARD : string := "LVCMOS18"
);
port (
trig : in std_logic;
oneshot_pin : inout std_logic
);
end entity larkin_oneshot;


architecture rtl of larkin_oneshot is

signal obuf_input : std_logic;
signal ibuf_output : std_logic;

begin -- architecture rtl of entity larkin_oneshot

iobuf_inst : component IOBUF
generic map (
IBUF_LOW_PWR => FALSE,
SLEW => "FAST",
IOSTANDARD => IOSTANDARD
)
port map (
O => ibuf_output,
IO => oneshot_pin,
I => obuf_input,
T => '0'
);

oddr_inst : component ODDR
generic map (
DDR_CLK_EDGE => "SAME_EDGE",
SRTYPE => "ASYNC"
)
port map (
Q => obuf_input,
C => trig,
CE => '1',
D1 => '1',
D2 => '1',
R => ibuf_output
);

end architecture rtl; -- of entity larkin_oneshot
 

Welcome to EDABoard.com

Sponsor

Back
Top