Clock Edge notation

Mike Treseler · Jul 21, 2005

ALuPin@web.de wrote:

I have tried to illustrate the timing in the following diagram:
http://mitglied.lycos.de/vazquez78/

Looks like a standard synchronous handshake to me.
fpga sees NXT then registers DATA and drives STP
on the next clock (or later if collecting a burst)
Good luck.

-- Mike Treseler

Vladislav Muravin · Jul 21, 2005

Andre,

From what I see on the web (timing diagram), i think it's just a pileline
which needs to be enabled/disabled.
Maybe I am wrong, but where you need to do everything on one cycle?
You always "prepare" the next data before opening the bus towards the
external PHY

Vladislav

<ALuPin@web.de> wrote in message
news:1121932820.228642.199200@z14g2000cwz.googlegroups.com...
Hi,
thank you for your answers.
With "NO time" I mean that I have no time to sychronize the input when
using the external data clock in my FSM.

Some more information on the interface:

DATA[7..0]
8-bit bidirectional data bus.
The FPGA has to drive the bus LOW by default.
By sending a non-zero data pattern called TXCMD (transmit command) the
FPGA initiates transfers.
The direction of DATA[7..0] is controlled by DIR. Contents of the bus
lines must be ignored
for one clock cycle whenever DIR changes value (turnaround)

DIR
Controls direction of data bus. The external PHY drives DIR LOW by
default so that it can listen
to TXCMDs from FPGA. The PHY drives DIR HIGH when it has data for the
FPGA.

STP
The FPGA drives STP HIGH for one clock cycle after the last byte of
data was sent to the PHY

NXT
The PHY drives NXT HIGH to throttle data. If DIR is LOW, the PHY
asserts NXT to notify the FPGA to
place the next data byte on DATA[7..0] in the following clock cycle. If
DIR is HIGH the PHY asserts NXT HIGH to notify the FPGA a valid byte is
on DATA[7..0].

I have tried to illustrate the timing in the following diagram:
http://mitglied.lycos.de/vazquez78/

Some dynamic characteristics of the PHY which provides the 60MHz clock:

timings with respect to positive edge of PHY clock
tSETUP (input-only pins) max. 6.0 ns
tHOLD (input-only pins) max. 0.0 ns
tOUT (output-only pins) 2pF - ns
12pF - ns
30pF max. 9.0 ns

As the plot shows I have to provide data on the next PHY clock cycle
as soon the PHY accepts my TXCMD. So not really much time to do a good
synchronous job ... ?

Rgds
André

Bryan · Apr 29, 2005

Then Peter isn't an average engineer. Whats the prize? Wizard hat? I
am always highly skeptical of anyone that claims to be the inventor of
ideas(that sounds pompous). I certainly didn't invent a new FIFO.
Just integrated it into my data path for the highest performance. If
you want to make the challenge tricky then also design the FIFO to
handle variable burst reads from 2 to 10 elements for all combinations
of write and read clock speeds up to the maximum, otherwise it is just
a simple fifo.

Berty · Apr 29, 2005

Peter,
I have no doubt you wrote many FIFO that work ok, and believe it or not
many other Eng did it as well, and even simulate it.

We are all here for the fun and joy of Eng, so Lets not make it a
Contest of who have the bigger ....

A Better approach which I believe will be more suitable and more
education will be since you feel so strongly about the FIFO you design
why don't you write App note or white paper about how it is done so
other Eng that are not aware of how to make Async FIFO will see and
learn and who knows maybe some of us that know how will learn something
new as maybe you have new way, After all there are many way to design
Async FIFO's depend on the requirement and amount of resource
available. (e.g. Phase handler, PPM handler, in high out low, in low
out high, any to any and they can be with and without gray, using
pessimistic approach, and so on and so on).

Back to simulation yes you can simulate Async FIFO even if
theoretically you can have infinite number of condition, since many of
those infinite are the same, just like when you test SONET Frame you
can argue it is impossible since there is infinite number of
combination as each data can be differ gap between frame can be differ,
number of frame can be differ etc, and there are many more examples of
infinite condition which using finite number of test you can verify
very well your design assuming the test bench is done properly.

To give you an idea of one approach is have a script that generate two
value in define file which you later include in your simulation.
So for example the file output can be
`define clk1 19.9
`define clk2 24.9
in one time and in another time can be for example
`define clk1 36.1
`define clk2 10.8
and so on and so on, where the number and resolution depend on what you
want to test (Myself I run all in unix so this file is generated using
unix script, but I'm sure there is a way to do it also in window/dos
or what ever is your platform).

Another parameter which should be randomize is burst of data you write
and how many of them per simulation.
Than you compile all and at the end verify automatically that all work
ok and if so your script start all over.
After one night or what ever depend on how strong is your machine etc
you can cover all the ranges you wanted, as well as maybe some pre
define freq and definition for dedicated tests. Using 1ns/1ps or
1ps/10fs etc can help you get the resolution you need.

The important thing from my experience is once you did all your
dedicated test and verify all to let the $random(seed) work in the
ranges of value you want to cover as well as make sure the test run
automatically just as the verifier so when you run an overnight test
you get large range of coverage.
Of course you should keep all the seed that generate failer in the test
so in the morning you can re-generate the same condition that cause the
failer.

But as always the most important this is Have fun

Peter Alfke · Apr 29, 2005

Here is the URL for a FIFO description that I published a few months
ago.

http://www.xilinx.com/publications/xcellonline/xcell_52/xc_v4fifo52.htm

But back to simulation:
I have tested metastability in our flip-flops, and I found that the
metastability-catching timing window has a width of
0.07 ns for a metastable-caused delay of 1 ns
0.07 femtoseconds for a metastable-caused delay of 1.5 ns.
For every extra half ns of delay, the window becomes a million times
smaller.
For a 2-ns delay you have to hit a timing bulls-eye of 10e-22 seconds.
Please tell me how you can simulate that...
Peter Alfke

Mohammed A khader · Apr 30, 2005

HI Bert,

Intersting topic to talk about ! .

I often see newbies writing things like :
[Snipped]..............
????? Since I've seen this a number of times, I'm curious
to know who is telling them this ? and why ? a book ?

I am a newbie(still at academic level) but I'll not accept from a
book without saying why.

to know who is telling them this ?
Here is the list which I have read .....

1) RTL Implementaion Guide by Jack Marshall (from tera systems Inc).
( I got this from synopsys SNUG group,check it if u have account )
2) Coding style from synopsys
3)
http://www.utdallas.edu/~shankars/teaching/ee5325/foils/lectures/lecture17.pdf
(Slid number 17)

4) http://min.ecn.purdue.edu/~ee495d/Lectures/synthesis.pdf

All say that case infers a parallel logic and if-elsif-else infers
a proirity encoder structure ... If situation does'nt demand me for a
prority logic Why shall I choose it ???

The case statement in VHDL also has it's unfriendly sides
(you often have to qualify the case expression like in
"case A&B is" + there is also the locally static issue
+ no way to use std_match in a case + case must be complete, etc...).

problem of qulifying expression has been addressed by vhdl committee
and it is going be fixed soon.

completness of case is an advantage to find the bugs.Lets take an
example....
Take the exmaple you mentioned above

case my_std_logic is
when '0' =
when '1' =
when others => -- they HAVE to write this indeed !!
end case;

They seem to believe it would be a better (more efficient !)
style than :
if my_std_logic='1' then
else
end if;

Suppose my_std_logic is a conrol signal which must be initialized
properly after reset. I cannot see that with your if - else code . but
by having others clause I could watch all uninitialized 'U' signals by
case statement.

Even using the case statement (and don't cares), obtaining
the minimal logic sometimes requires efforts (as for the one
hot minimal decoder).
I dont know partiuclarly about one hot minimal decoder but

Synthesis tools say case statments are good for area opptimizations.

My recommendation is to usually favor the style which provides
the best expressiveness even at the (hypothetical) cost of a
few gates.
I dont think if-else provide more " expressiveness " then case. For

parallel code case is more "expressiveness".
At the end merits of hardware counts wheather it is in the form of
few gates. Dont under estimate the number of gates as it is know
causing more static power.

Has anyone different views or experience to share ?

Thank you very much for starting such an interesting topic.

-- Mohammed A Khader.

Neo · May 2, 2005

Info,
The code above given by you for onehot did sysnthesize differently. the
"case" version systhesized to 3 LUT's involving only OR gates and didnt
infer priority structure infact it optimized as you have mentioned to a
series of OR gates. But the "if" version systhesiszed to 6 LUT's and
inferred a priority structure. Leonardo was used for the systhesis.

Berty · May 2, 2005

First personal opinion I don't generally like the word expert or
average or what ever. The fact someone did something in the past not
make him expert just as the fact if someone design dozen of chip's
don't make him an Asic Guru.
The only thing it does make one is have experience which a new designer
should listen to but as always with a bit of skepticism as even the
most experienced Eng can be wrong and even if not the newer Eng might
have better idea.

As for Async topic, the first step to simulate any clock domain
crossing is to have FF that are not just reg which you put in always @
(posedge ... but one with timing referance.

What I would suggest as first step is write something very simple with
clock crossing even a simple req/ack which is done in full handshake
something which is very simple, and than synthesis and place and route
it.

Once you have the post place and route netlist if you look on assume
you use xilinx the abc_timesim.v or if you use altera it will be abc.vo
you will see that the FF's are now coming from xilinx or altera
library file/directories, More over there is time overwrite in the sdf
file they both generate.

If now you will simulate your design you will have FF that have setup
and hold timing requirement to met and failing to do so will generate
"x" which is what you actually want to test and simulate.

Of course you can argue that this don't cover all the timing issue
but only catch some of them meaning let say the setup/hold is total of
2n compare to 20n period than you might argue that not all the 2n are
checked but for that I will answer that this is not important as the
important thing is to hit this time frame and get the X and see that
your system handle it properly.

Just like when you test a counter that can count up to let say 100 you
verify the 0 the 100 the freq on high and low but even if the counter
is for let say 100M you are not testing it for ALL freq from 0Hz to
100MHz so 0.00000...01 Hz and than 0.000000...02 Hz and so on.

Later on you can write your own FF with whatever timing you want and
optimize it to what ever you want to test.

The big drawback in using post place and route as you probably aware is
time, those simulation run much slower compare to the code you wrote
however even in very large design the relative size of the cross domain
part is very small and so a good approach is have quick test of this
part separately and only when happy move to simulate the complete
design.

This don't come to tell Spice simulation are not needed of course
they do but they are needed to verify the operation and characteristic
of the FF itself inside the FPGA (or the Asic for that matter) the
digital designer can rely on the above and not go into the Spice
modeling if he assume which is a reasonable assumption that those FF
was tested by the vendor and that the timing characteristic etc are
giving properly in the library files.

As for the URL for the FIFO in Xilinx site, while it is very nice
article with lots of color, unless I miss something it is basically
useless and the reason I say it is that as I see it good article
explain the theory and when it refer to "basic staff" which FIFO is
one of them it should be such that a new Eng that know how to write
code will be able to make the design base on this article and I believe
this is not the case with this article, and regretly this is true to
too many article out there.

Some how I got the feeling long ago that many time Eng believe they
have great design and want to tell everyone about it but on the same
time are concern someone will know how they did it so the write it in a
way you get some idea but not the whole solution, and for me unless we
are talking on break through technology where one is going to write
patent it is simple mean a very poor article which don't benefit the
Eng groups, and as a whole don't even benefit the writer as if the
complete detail was to be given, MAYBE someone would come with
enhancement that would make the design even better.

Many years back I had to design my first design that require a CRC and
at that time I tried to find answer to how it work how it was done and
so on.
All I found was few article that talk about the idea and general
abstract as how to do it but with no real implementation until after
some more research I finally found one article which show exactly how
it should be done, sure it was for only one bit of data but after this
article I had the path and could expand it to more bit as I needed.

The first articles I would probably give mark from poor to excellent
but they was all what I refer as "university articles", the last
one probably would get a C grade in the university but it got A from me
as Eng. Finally for the first time I not only understood what and why
but also the how, sure math equation are great but it might not be so
straightforward to figure from them that all you really need at the end
are some Xor's and FF's at least it was not clear to me at that
time.

Back to the Article and this tread I saw mention to the some difficult
empty have but for example I didn't saw any mention as for how it is
done in the URL (if I missed it do let me know where it mention as I
went through it quickly) and assume there was indeed no explanation to
how it was done than what does this article add to the Eng knowledge
base expect few more papers.
On the other hand if there is clear explanation and if this article is
such that new Eng can take it and write its own Async FIFO than there
is a reason and justification for this article.

Berty · May 2, 2005

No need to apologize the whole issue as I see it, is that if someone
have question we should try to give as much answer so he can use it and
not just vague detail.
We are here for the fun so lets not make it a competition of who did
more, as it serve nothing.
If someone is very knowledgeable and give answers to all difficult
question Eng in this group or any other group community etc will know
he is knowledgeable and no need to go and say "I'm" or "can you
do this and that" as it only bring "the bad out of us" instead of
the good.

And again FIFO while have its own complication is not something that
should be put aside or consider as "black magic" like EMI.
IF you are new designer the leader of your group will not give you huge
design and if there is Async FIFO he will no doubt check your design so
go ahead and try and learn so when you are the leader you know how it
is done and can teach the next gen of Eng.

Take a moment and think how long it will take you as someone who knows
how to design Async FIFO to teach someone knew the idea and concept as
well as the drawback. No need to do it for all flavor of FIFO's
enough to go with one as a start. Let say it will take you a whole day
(again I refer to the digital part without going into the physics of
the Metastable etc) than next FIFO will be 4 hour and nextw ill be 2
hour and before you know this Eng know how to design Async FIFO and
give better productivity as he is no more one more in the herd of IP
copy/paste Eng's.

Sure if the Async FIFO is deep enough and the freq is high enough you
might have eventually to do also some hand placement and you might
figure let use the vendor IP or any other reason but this should not be
in my opinion the first solution when it come to BASIC block of the
digital design.
On the other hand if you look for let say PCI and don't want to get
into understand how the PCI work than go ahead and use the IP core for
PCI but PCI is not BASIC block of digital design and this is the main
difference.

May 4, 2005

If latency is not an issue, I would register the control signal at
the
IOB to make sure you don't have setup timing issues. Then add
another
delay stage (register) for the DATA[7..0] bus so you can use the
control signal a cycle later.

If you need to reduce latency into your FIFO, you need to create a
timing spec in the ucf file for the control signal like:
OFFSET = IN 7.2 ns BEFORE "RXCLK";
to make sure your state machine does not exceed the input setup
time available.

If you haven't assigned pins yet, I would suggest grouping the
control
pin near the data pins so your state machine can be placed easily
near the control input.

How can I make this timing constraint "OFFSET X BEFORE Clk" in
QuartusII ?

Thank you for your help.

Rgds
Andre

guru10 · Aug 9, 2005

I have the same problem. XGpio_Initialize does not return and it hangs the
whole program. I do not know why it is not working. Theoratically it
should be. Atleast it should give some error value. It does not return at
all.

rossb · Aug 24, 2005

Actually, it has nothing to do with your synthesis tool.

I know this is years too late, but hopefully it will help anyone else
having this problem like I did. The solution can be found in the manual
for ModelSim.

the following commands should fix it

vcom -work <troubling library> -refresh
vlog -work <troubling library> -refresh

where <troubling library> is the library that is stuffing up.

Jim Granville · Oct 16, 2005

Peter Alfke wrote:

All members of the Virtex-4 family from Xilinx have a
(hard-coded=full-custom) FIFO controller in each of their BlockRAMs. It
accepts different clocks for read and write (called "asynchronous
operation") at any frequency up to 500 MHz. Capacity is 18 Kbits, the
width is 4 to 36 bits, and the depth is accordingly from 4K to 512
addresses (depth and width can easily be expanded with additional
BlockRAMs)
There is an EMPTY and a FULL flag, and also an ALMOST EMPTY and an
ALMOST FULL flag, both fully programmable (with 1-address granularity).

I designed the crucial asynchronous empty arbitration logic, and it
works perfectly: We tested it by writing data at ~200 MHz into the
FIFO, and reading it out at ~500 MHz, and the asynchrous empty-detect
logic had worked flawlessly for all those >10e14 operations when we
stopped the test after a week.

Why stop after 1 week ?. Sounds like the sort of app nice to have
spinning in the corner of the lab forever....
Did you also test the full detect, or is that expected to be the same
by symmetry ?

No real FIFO application will probably ever go empty 200 million times
a second...
The high performance is due to very fast and compact full-custom logic,
and our long experience in analyzing and dealing with the effects of
metastability.

So does that mean devices without this full-custom logic, can expect
lower performance, and if so, how much lower ?
[eg Spartan 3 / 3E ?]

-jg

Peter Alfke · Oct 16, 2005

Hi, Jim..
We stopped after a week because we were satisfied. In one week, we
proved 10e14, it would take 10 weeks to prove 10e15, and 2 years to
prove 10e16. Diminishing returns...But we definitely did NOT stop
because we found an error. No cheating on my watch!

For some strange reason (fixed in "Virtex-5") there is a
one-clock-pulse latency for FULL. I suggest using ALMOST FULL instead.
FULL is not as important as EMPTY, since a properly designed system
should never overflow the FIFO, whereas it might be nice to empty it
completely. (I often use the savings-account analogy).

Yes, using the fabric to implement the FIFO controller might limit the
speed to 250 MHz.
The reasons for the "hard" FIFO controller were:
Higher performance, guaranteed reliable operation without user
involvement, and saving fabric resources as well as power consumption.
The same reasoning will be used for future "hard" subfunctions. It's
the best way to increase speed, functionality, and user-friendliness.
How else can we improve by a factor 2 or even more?
Peter Alfke

mindenpilot · Oct 17, 2005

In most datacomm applications, filling a buffer can be caused by network
congestion, so to prevent dropped packets, you'd want to correctly detect
FIFO full, and backpressure accordingly.

"Peter Alfke" <alfke@sbcglobal.net> wrote in message
news:1129501822.643986.219070@f14g2000cwb.googlegroups.com...

Hi, Jim..
We stopped after a week because we were satisfied. In one week, we
proved 10e14, it would take 10 weeks to prove 10e15, and 2 years to
prove 10e16. Diminishing returns...But we definitely did NOT stop
because we found an error. No cheating on my watch!

For some strange reason (fixed in "Virtex-5") there is a
one-clock-pulse latency for FULL. I suggest using ALMOST FULL instead.
FULL is not as important as EMPTY, since a properly designed system
should never overflow the FIFO, whereas it might be nice to empty it
completely. (I often use the savings-account analogy).

Yes, using the fabric to implement the FIFO controller might limit the
speed to 250 MHz.
The reasons for the "hard" FIFO controller were:
Higher performance, guaranteed reliable operation without user
involvement, and saving fabric resources as well as power consumption.
The same reasoning will be used for future "hard" subfunctions. It's
the best way to increase speed, functionality, and user-friendliness.
How else can we improve by a factor 2 or even more?
Peter Alfke

mindenpilot · Oct 17, 2005

In most datacomm applications, filling a buffer can be caused by network
congestion, so to prevent dropped packets, you'd want to correctly detect
FIFO full, and backpressure accordingly.

"Peter Alfke" <alfke@sbcglobal.net> wrote in message
news:1129501822.643986.219070@f14g2000cwb.googlegroups.com...

Hi, Jim..
We stopped after a week because we were satisfied. In one week, we
proved 10e14, it would take 10 weeks to prove 10e15, and 2 years to
prove 10e16. Diminishing returns...But we definitely did NOT stop
because we found an error. No cheating on my watch!

For some strange reason (fixed in "Virtex-5") there is a
one-clock-pulse latency for FULL. I suggest using ALMOST FULL instead.
FULL is not as important as EMPTY, since a properly designed system
should never overflow the FIFO, whereas it might be nice to empty it
completely. (I often use the savings-account analogy).

Yes, using the fabric to implement the FIFO controller might limit the
speed to 250 MHz.
The reasons for the "hard" FIFO controller were:
Higher performance, guaranteed reliable operation without user
involvement, and saving fabric resources as well as power consumption.
The same reasoning will be used for future "hard" subfunctions. It's
the best way to increase speed, functionality, and user-friendliness.
How else can we improve by a factor 2 or even more?
Peter Alfke

Peter Alfke · Oct 17, 2005

The Virtex-4 has a FULL flag that is synchronous with the write clock
(obviously, the read clock does not care) but the FULL flag is
activated one clock period late. (The EMPTY flag, synchronous with the
read clock does not have this latency, it gets activated by the same
clock edge that read the last valid data. Doing that right and fast is
the art of asynchronous FIFO design...))
I claim that it is easy to use the ALMOST FULL flag, since the exaxt
max capacity of a FIFO is not critical. Set it for 1020 for a 1024-deep
FIFO, and you will never be bothered by the latency, you actually get
an early warning...
Peter Alfke

Alex Shot · Oct 17, 2005

Xilinx's asynchronous fifos have a depth of (power of 2) -1 bytes.
According to my analysis, using Xilinx's application notes, the reason
of it is that full flag can be really generated 1 writing clock period
after it is really expected. To overcome overflowing, the fifo depth is
decreased by 1.
Alex

Peter Alfke · Oct 17, 2005

Let me correct this:
The addressing depth of Virtex-4 FIFOs is 512, 1024, 2048, or 4096
locations. The word "byte" is meaningful only for the 2048 x 9
configuration.
The FULL flag goes active one write clock cycle after the FIFO has been
filled. That means, in a continuous write situation, the last written
entry will be lost. That's why I recommend using the ALMOST FULL flag
instead of the FULL flag.
EMPTY does not have this problem. It goes active on the same read clock
edge that is reading the last piece of data out of the FIFO. EMPTY
then goes inactive again after a data entry has been written into the
FIFO and the internal signal hes been re-synchronized to the read
clock, which takes a few read clock cycles.
This asymmetric behavior assures that the EMPTY flag is appropriately
extremely fast in stopping any further erroneous reads, but is more
"relaxed" in allowing the reading to restart again. Note that this read
latency only occurs after the FIFO had gone empty.
If anybody has questions about the Virtex-4 FIFO, I am the right person
to ask. I have designed FIFOs, on and off, for over 35 years...
Peter Alfke, Xilinx Applications

Dave Pollum · Oct 17, 2005

Peter Alfke wrote:

Let me correct this:
The addressing depth of Virtex-4 FIFOs is 512, 1024, 2048, or 4096
locations. The word "byte" is meaningful only for the 2048 x 9
configuration.
The FULL flag goes active one write clock cycle after the FIFO has been
filled. That means, in a continuous write situation, the last written
entry will be lost. That's why I recommend using the ALMOST FULL flag
instead of the FULL flag.
EMPTY does not have this problem. It goes active on the same read clock
edge that is reading the last piece of data out of the FIFO. EMPTY
then goes inactive again after a data entry has been written into the
FIFO and the internal signal hes been re-synchronized to the read
clock, which takes a few read clock cycles.
This asymmetric behavior assures that the EMPTY flag is appropriately
extremely fast in stopping any further erroneous reads, but is more
"relaxed" in allowing the reading to restart again. Note that this read
latency only occurs after the FIFO had gone empty.
If anybody has questions about the Virtex-4 FIFO, I am the right person
to ask. I have designed FIFOs, on and off, for over 35 years...
Peter Alfke, Xilinx Applications

So Peter, what do those of us with lowly Spartan-II FPGA's do if we
want say, a 16x9 FIFO?
-Dave

Clock Edge notation

Mike Treseler

Guest

Vladislav Muravin

Guest

Bryan

Guest

Berty

Guest

Peter Alfke

Guest

Mohammed A khader

Guest

Neo

Guest

Berty

Guest

Berty

Guest

Guest

guru10

Guest

rossb

Guest

Jim Granville

Guest

Peter Alfke

Guest

mindenpilot

Guest

mindenpilot

Guest

Peter Alfke

Guest

Alex Shot

Guest

Peter Alfke

Guest

Dave Pollum

Guest

Log in

Welcome to EDABoard.com

Sponsor