Mitigating metastability.

Symon · Aug 30, 2003

Hi,
Before I start, metastability is like death and taxes,
unavoidable! That said, I've read the latest metastability thread. I
thought these points were interesting.

Firstly, A quote from Peter, who has carried out a most thorough
experimental investigation :-
"I have never seen strange levels or oscillations ( well, 25 years ago
we had TTL oscillations). Metastability just affects the delay on the
Q output."

Secondly, from Philip's excellent FAQ :-
"Metastable outputs can be

1) Oscillations from Voh to Vol, that eventually stop.
2) Oscillations that occur (and may not even cross) Voh and Vol
3) A stable signal between Voh and Vol, that eventually resolves.
4) A signal that transitions to the opposite state of the pre
clock
state, and then some time later (without a clock edge)
transitions
back to the original state.
5) A signal that transitions to the oposite state later than the
specified clock-to-output delay.
6) Probably some more that I haven't remembered. "

So, this got me thinking on the best way to mitigate the effects of
metastability. If Peter is correct in his analysis of his experimental
data, and I've no reason to doubt this, then Philip's option 5) is the
form of metastability appearing in Peter's Xilinx FPGA experiments.

So, bearing this in mind, a thought experiment. We have an async
input, moving to a synchronising clock domain at (say) 1000MHz. Say we
have a budget of 5ns of latency to mitigate metastability. The sample
is captured after the metastability mitigation circuit (MMC) with a FF
called the output FF.
My first question is, which of these choices of MMC is least likely
to produce metastability at the output FF?
1) The MMC is a 4 FF long shift register clocked at 1000MHz.
MMC1 : process(clock)
begin
if rising_edge(clock) then
FF1 <= input;
FF2 <= FF1;
FF3 <= FF2;
FF4 <= FF3;
output <= FF4;
end if;
end process;

2) The MMC is 4 FFs, each clock enabled every second clock.
MMC2 : process(clock)
begin
if rising_edge(clock) then
toggle <= not toggle;
if toggle = '1' then
FF1 <= input;
FF3 <= FF1;
output <= FF3;
else
FF2 <= input;
FF4 <= FF2;
output <= FF4;
end if;
end if;
end process;

Option 1) offers extra stages of synchronization between the input
and output, but the 1ns gap between FFs means that metastability is
more likely to propagate. Option 2) waits 2ns for the sample FFs to
make up their mind, vastly decreasing the metastability probability.
My second question is, does the type of metastability, i.e. the
things in Philip's list, affect which is the better choice? For
instance, if the first FF in the MMC exhibits oscillations in
metastability, then the second FF in the MMC would have several
chances, as its input oscillates, to sample at the 'wrong' time. This
might favour MMC option 2). If, however, the first FF in the MMC goes
into option 5) metastability, then there's only one chance for the
second FF to sample at the 'wrong' time. This might confer an
advantage on MMC option 1).

Anyway, I'm still thinking about this. I think the clock frequency
may decide which is better for a given FF type. Any comments?
Cheers, Syms.

Hal Murray · Aug 31, 2003

My first question is, which of these choices of MMC is least likely
to produce metastability at the output FF?

1) The MMC is a 4 FF long shift register clocked at 1000MHz.

2) The MMC is 4 FFs, each clock enabled every second clock.

You should also consider 1 FF clocked as late as you can wait.

Each FF has a setup time and a clock-output delay. I'm talking
about the actual measured time, not the data book worst case times.

If you chain FFs together, that time gets subtracted from the
settling time. The settling time is in an exponent. Waiting a
little bit longer helps a lot.

--
The suespammers.org mail server is located in California. So are all my
other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited
commercial e-mail to my suespammers.org address or any of my other addresses.
These are my opinions, not necessarily my employer's. I hate spam.

Symon · Sep 2, 2003

Hi Philip,
Thanks for your post, those pictures were certainly very
interesting! What parts did you use? I notice they seem to disagree
with Peter's quote that "Metastability just affects the delay on the Q
output.". I wonder, Peter, if Xilinx FFs behave differently from the
ones in Philip's photos? (Other than speed, of course.) I must admit,
one reason I posted was that I found hard to believe that any FF
wouldn't show runt pulses, or funny output levels, albeit for brief
periods of time, during metastable events.
It's also interesting that the straight shift register isn't
necessarily the best way to reduce metastability effects. That was
what I suspected and was another reason behind my post. I agree that
the 'four paths out of phase' solution is better, and preserves the
sampling resolution. Often the sampling resolution needs to be
preserved, which was why I didn't present an 'enabled every third or
fourth go' type circuit.
Anyway, thanks to all for their thoughts, it's an interesting
topic!
cheers, Syms.

Austin Lesea · Sep 2, 2003

Symon,

I think the sampling o-scope shots agree perfectly with what Peter said.
Runt pulse, and funny levels are the easiest metastable results to catch,
as they are just before the long unknown settling time behavior that is so
vexing to designers.

It also makes a difference where you look: a master-slave FF reduces the
duration of the unknown transistion over a simple FF without a slave to
help "sharpen up" the transistions.

Austin

Symon wrote:

Hi Philip,
Thanks for your post, those pictures were certainly very
interesting! What parts did you use? I notice they seem to disagree
with Peter's quote that "Metastability just affects the delay on the Q
output.". I wonder, Peter, if Xilinx FFs behave differently from the
ones in Philip's photos? (Other than speed, of course.) I must admit,
one reason I posted was that I found hard to believe that any FF
wouldn't show runt pulses, or funny output levels, albeit for brief
periods of time, during metastable events.
It's also interesting that the straight shift register isn't
necessarily the best way to reduce metastability effects. That was
what I suspected and was another reason behind my post. I agree that
the 'four paths out of phase' solution is better, and preserves the
sampling resolution. Often the sampling resolution needs to be
preserved, which was why I didn't present an 'enabled every third or
fourth go' type circuit.
Anyway, thanks to all for their thoughts, it's an interesting
topic!
cheers, Syms.

Symon · Sep 3, 2003

Hi Austin,
Maybe I got the wrong end of the stick, but when Peter said:-
"I have never seen strange levels or oscillations ( well, 25 years ago
we had TTL oscillations). Metastability just affects the delay on the
Q output." I thought he meant that he'd only seen metastability where
the output from the FF was always either on or off, just that
sometimes the transition was delayed. Philip's pictures clearly show
'strange levels'. This is important, I believe, when deciding what the
effects of metastable FFs are on following circuitry. I guess we'll
have to wait until he returns from his Portugese jaunt before we find
out what he meant!!
Of course, I agree the Master/Slave thing helps. A master FF
on its own is what I'd call a latch, the clock controlling whether
it's transparent or not. The slave is the sameish circuit again, fed
from the output of this, but its clock is inverted. So, I guess that
you're saying because the master and slave are fabricated right next
to each other, the input to the slave can be expected to transition
faster than the input to the master which travels from further away?
Less capacitive interconnect to drive. (BTW, I assumed throughout the
metastability stuff we were talking about D-type FF, rather than
latches.)
thanks, Syms.

Austin Lesea <Austin.Lesea@xilinx.com> wrote in message news:<3F550102.DC2B872F@xilinx.com>...

Symon,

I think the sampling o-scope shots agree perfectly with what Peter said.
Runt pulse, and funny levels are the easiest metastable results to catch,
as they are just before the long unknown settling time behavior that is so
vexing to designers.

It also makes a difference where you look: a master-slave FF reduces the
duration of the unknown transistion over a simple FF without a slave to
help "sharpen up" the transistions.

Austin

Symon wrote:

Hi Philip,
Thanks for your post, those pictures were certainly very
interesting! What parts did you use? I notice they seem to disagree
with Peter's quote that "Metastability just affects the delay on the Q
output.". I wonder, Peter, if Xilinx FFs behave differently from the
ones in Philip's photos? (Other than speed, of course.) I must admit,
one reason I posted was that I found hard to believe that any FF
wouldn't show runt pulses, or funny output levels, albeit for brief
periods of time, during metastable events.
It's also interesting that the straight shift register isn't
necessarily the best way to reduce metastability effects. That was
what I suspected and was another reason behind my post. I agree that
the 'four paths out of phase' solution is better, and preserves the
sampling resolution. Often the sampling resolution needs to be
preserved, which was why I didn't present an 'enabled every third or
fourth go' type circuit.
Anyway, thanks to all for their thoughts, it's an interesting
topic!
cheers, Syms.

rickman · Sep 3, 2003

Austin Lesea wrote:

Rick,

I agree that the effect is the same: you just don't know when it will
resolve, and thus the value that you "see" at the next level is basically
unknown.

It could be that Peter's point is that the next circuit in line does make a
decision, and it most certainly makes the "unknown" into a '1', or into a
'0'. It is unlikely that the next circuit in line will propagate the same
intermediate behavior, and if it can (gain is too low), then things just get
more fuzzy until someone upstream ends up resolving the level back to a '1'
or a '0'.

If that were the case, then metastability would not be an issue at all.
Having an indeterminate voltage is what creates metastability. The
signal does not need to remain in the indeterminate value for any length
of time, so a transistion at the wrong time is as bad as any other
indeterminte value. When FFs see an indeterminate input at the sampling
time (very small time and voltage window), they will create an
indeterminate value for an arbitrary period (or at an abritray delay) on
the output.

--

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design URL http://www.arius.com
4 King Ave 301-682-7772 Voice
Frederick, MD 21701-3110 301-682-7666 FAX

Jim Granville · Sep 4, 2003

Austin Lesea wrote:

Symon,

A long time ago, I designed a timing system for telecom that used a rubidium clock. The
reason why this is interesting will become apparent in a moment.

One function of the system was to measure up to five external sync references (presumably
from traffic bearing lines from other offices).

The circuit to do this was basically a counter, that was sampled by a rubidium derived
clock. All clocks were syntonous (same frequency, arbitray phase -- aka the SONET/SDH
telepone network). As the phase wandered back and forth, the metastable regions would get
exercised, and the measurement board would report that an input signal had arbitrarily
"slipped" by some random number of bits (due to a metastable transistion of a bit in the
counter/latch).

This was so frustrating, because in real life, the inputs could slip due to failures,
glitches in the network, etc. So how do you know a real bad slip, from a metastable one?

In the hope (vain and useless) of reducing the occurence of the metastable sample, we had
three levels of FFs to try to re-synchronize the sample count, along with an elaborate set
of clock enables. We got it to the point where the false slip occurred about evey two
months, in a typical network. Since real outages were far more common, it was not a big
deal.

In spite of this, we wrote software to identify a real slip, from a false slip. Basically,
if five successive samples were not equal (as the phase can't change that fast) we threw out
that set of measurements, and took another five. This dropped the occurence of a false slip
to below the threshold that we could measure (but it could still happen and probably did -
still does out there somewhere).

So, metastability can sometimes be beaten into submission, but it never goes away....

The ultimate frustration is that one of these references gets used to track the rubidium
(steer it in a phase lockeed loop), so a fake glitch can cause quite a hit, which then
causes slips throughout the network as the rubidium runs off the the "wrong"
frequency/phase. A real glitch can also cause the same behavior, so the locked loop is
quite loosely coupled, and before each update, one checks absolutely everything to be sure
that what you are trying to track is real....

I had heard there were some cases with equipment designed by others where the maintenance
folks would just disconnect the inputs, as the bare rubidium ran so clean, that it was
better to not track at all (fewer slips) than to bother with trying to track the references
and falsely running off into the weeds due to metastability and poor reference checking.

All of this became obsolete when GPS became available, as now precise time (frequency) was
broadcast for free.

Very interesting account - worth adding to the FAQ under metastable ?
-jg

Mike Treseler · Sep 4, 2003

Jim Granville wrote:

Very interesting account

Yes. Great story.

My old story involves *missing* a synchronizer
rather than having failures *in* a synchronizer.

At least I didn't have to wait two months
for symptoms to occur in that case

-- Mike Treseler

rickman · Aug 30, 2003

All you need to answer these questions is the equation that describes
your metastability. That is contained in most of the references that
have been given. The settling time is found in an exponent, so using
two FFs with half the time of a single FF will make the problem worse,
not better.

The best (and only) way to resolve metastability is to provide more
time. The probability never goes to zero, but you can get arbitrarily
close.

Symon wrote:

Hi,
Before I start, metastability is like death and taxes,
unavoidable! That said, I've read the latest metastability thread. I
thought these points were interesting.

Firstly, A quote from Peter, who has carried out a most thorough
experimental investigation :-
"I have never seen strange levels or oscillations ( well, 25 years ago
we had TTL oscillations). Metastability just affects the delay on the
Q output."

Secondly, from Philip's excellent FAQ :-
"Metastable outputs can be

1) Oscillations from Voh to Vol, that eventually stop.
2) Oscillations that occur (and may not even cross) Voh and Vol
3) A stable signal between Voh and Vol, that eventually resolves.
4) A signal that transitions to the opposite state of the pre
clock
state, and then some time later (without a clock edge)
transitions
back to the original state.
5) A signal that transitions to the oposite state later than the
specified clock-to-output delay.
6) Probably some more that I haven't remembered. "

So, this got me thinking on the best way to mitigate the effects of
metastability. If Peter is correct in his analysis of his experimental
data, and I've no reason to doubt this, then Philip's option 5) is the
form of metastability appearing in Peter's Xilinx FPGA experiments.

So, bearing this in mind, a thought experiment. We have an async
input, moving to a synchronising clock domain at (say) 1000MHz. Say we
have a budget of 5ns of latency to mitigate metastability. The sample
is captured after the metastability mitigation circuit (MMC) with a FF
called the output FF.
My first question is, which of these choices of MMC is least likely
to produce metastability at the output FF?
1) The MMC is a 4 FF long shift register clocked at 1000MHz.
MMC1 : process(clock)
begin
if rising_edge(clock) then
FF1 <= input;
FF2 <= FF1;
FF3 <= FF2;
FF4 <= FF3;
output <= FF4;
end if;
end process;

2) The MMC is 4 FFs, each clock enabled every second clock.
MMC2 : process(clock)
begin
if rising_edge(clock) then
toggle <= not toggle;
if toggle = '1' then
FF1 <= input;
FF3 <= FF1;
output <= FF3;
else
FF2 <= input;
FF4 <= FF2;
output <= FF4;
end if;
end if;
end process;

Option 1) offers extra stages of synchronization between the input
and output, but the 1ns gap between FFs means that metastability is
more likely to propagate. Option 2) waits 2ns for the sample FFs to
make up their mind, vastly decreasing the metastability probability.
My second question is, does the type of metastability, i.e. the
things in Philip's list, affect which is the better choice? For
instance, if the first FF in the MMC exhibits oscillations in
metastability, then the second FF in the MMC would have several
chances, as its input oscillates, to sample at the 'wrong' time. This
might favour MMC option 2). If, however, the first FF in the MMC goes
into option 5) metastability, then there's only one chance for the
second FF to sample at the 'wrong' time. This might confer an
advantage on MMC option 1).

Anyway, I'm still thinking about this. I think the clock frequency
may decide which is better for a given FF type. Any comments?
Cheers, Syms.

--

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design URL http://www.arius.com
4 King Ave 301-682-7772 Voice
Frederick, MD 21701-3110 301-682-7666 FAX

Philip Freidin · Sep 1, 2003

On 29 Aug 2003 16:23:32 -0700, symon_brewer@hotmail.com (Symon) wrote:

Hi,
Before I start, metastability is like death and taxes,
unavoidable! That said, I've read the latest metastability thread. I
thought these points were interesting.

Firstly, A quote from Peter, who has carried out a most thorough
experimental investigation :-
"I have never seen strange levels or oscillations ( well, 25 years ago
we had TTL oscillations). Metastability just affects the delay on the
Q output."

Secondly, from Philip's excellent FAQ :-

Thanks.

"Metastable outputs can be

1) Oscillations from Voh to Vol, that eventually stop.
2) Oscillations that occur (and may not even cross) Voh and Vol
3) A stable signal between Voh and Vol, that eventually resolves.
4) A signal that transitions to the opposite state of the pre
clock state, and then some time later (without a clock edge)
transitions back to the original state.
5) A signal that transitions to the oposite state later than the
specified clock-to-output delay.
6) Probably some more that I haven't remembered. "

So, this got me thinking on the best way to mitigate the effects of
metastability. If Peter is correct in his analysis of his experimental
data, and I've no reason to doubt this, then Philip's option 5) is the
form of metastability appearing in Peter's Xilinx FPGA experiments.

Peter's experimental data revolves around detecting metastables, and
counting them to create the data we use for our calculations.
Very good stuff!

I too have created metastability test systems, which not only count
the metastables, but also display them on an osciloscope.

I would like to make a very strong distinction about the typically
presented scope pictures of metastability, and the data that I have
taken.

What you normally see published (in terms of scope photos, as opposed
to drawn diagrams) is a screen of dots representing samples of the Q
output. These scopes are high bandwidth sampling scopes that typically
take 1 sample per sweep, and rely on the signal being repetitive to
build up a picture of what is going on. Examples are the Tek 11801 and
11803, as well as the newer TDS7000 and TDS8000 . The CSA11803 and
CSA8000 are basically the same scopes with some extra software.

The picture at the top of page 3 of this document is typical:

http://www.onsemi.com/pub/Collateral/AN1504-D.PDF

The scope is triggered by the same clock as the clock to the device
under test (DUT), and the scope takes a random sample (or maybe a
few samples) over the duration of the sweep. Most sweeps are of
the flip flop not going metastable, and so the dots accumulate and
show the trajectory of the flip flop. Occasionally the flip flop
goes metastable, and sometimes the random sample occurs during the
metastable time. These show up as the dots that are to the right
of the solid rising edge on the left. Every dot that is not on
that left edge represents times when the flip flop had a longer
than normal transition time, after you take into consideration
clock jitter, data output jitter, scope trigger jitter, and scope
sweep jitter. All of these can be characterized by first doing a
test run that does not violate the setup and hold times of the DUT.

The problem with these test systems is that when you do record
a metastable event, you only get 1 sample point on the trajectory
and you can say very little about the trajectory, other than it
passed through that point. Even when these scopes take multiple
samples per sweep, they are often microseconds apart, and of
little interest in the domain we are talking about here.

Although the collected data is predominantly of non metastable
transitions, these all pile up on top of each other as the left
edge of the trace, and do not significantly detract from seeing
the more interesting dots to the right.

The test systems that I have designed are quite different. These
test systems only collect trajectory data when the flip flop
goes metastable, and they sample the DUT output at 1GSamples per
second, thus taking a sample every nanosecond. The result is
that the scope pictures I have show the actual trajectory of
the metastable.

For your viewing pleasure, I have put them up on the web:

www.fpga-faq.com/Images/meta_pic_1.jpg
www.fpga-faq.com/Images/meta_pic_2.jpg
www.fpga-faq.com/Images/meta_pic_3.jpg

These are far from just delayed outputs! The end result though
is still the same, systems that fail. But seeing these scope
pictures of the actual Q output might make you think about how
you measure metastability.

For example, on meta_pic_1.jpg lower trace, the vertical scale
is 1V per division, and the 0V level is 1/2 a division above
the bottom of the screen. The horizontal scale is 4ns per
division. Now what if your test system took a sample at 10ns,
and used a threshold of 1.5 volts (2 div up from the bottom of
the picture). You would say that the signal is always high
at this point. If you sampled again at 20 ns (middle of the
screen, you would say that it has resolved for all the traces
shown, and you would count all the transitions that returned
to ground (because they were high at 10ns). All those traces
that ended up high would not be counted. This would be bad
if in the real system the device listening to the DUT
happened to have a threshold of 2.1 volts (right in the
middle of that cute little hump).

This also shows why using a signal like this as a clock
could be a real disaster.

Knowing what the trajectory of the DUT output looks like
can make you think a lot harder about how you test it.

So, bearing this in mind, a thought experiment. We have an async
input, moving to a synchronising clock domain at (say) 1000MHz. Say we
have a budget of 5ns of latency to mitigate metastability. The sample
is captured after the metastability mitigation circuit (MMC) with a FF
called the output FF.
My first question is, which of these choices of MMC is least likely
to produce metastability at the output FF?
1) The MMC is a 4 FF long shift register clocked at 1000MHz.
MMC1 : process(clock)
begin
if rising_edge(clock) then
FF1 <= input;
FF2 <= FF1;
FF3 <= FF2;
FF4 <= FF3;
output <= FF4;
end if;
end process;

Basically the improvement in MTBF is a function of the slack time
you give it to resolve. This is the sum of the slack time between
FF1 and FF2, FF2 and FF3, FF3 and FF4, and FF4 and output.
Lets throw some numbers at it. Setup time is 75ps, clock to Q is
200ps, routing delay between any pair of Q to D paths is 100ps.
Clock distribution skew is 25ps (in the unfortunate direction).
So we have 4 paths of 1000ps - (75+200+100+25) = 4 * 600ps = 2.4ns

2) The MMC is 4 FFs, each clock enabled every second clock.
MMC2 : process(clock)
begin
if rising_edge(clock) then
toggle <= not toggle;
if toggle = '1' then
FF1 <= input;
FF3 <= FF1;
output <= FF3;
else
FF2 <= input;
FF4 <= FF2;
output <= FF4;
end if;
end if;
end process;

Ok, so this is weird, and it adds a mux

Transit time through muxes is 200ps (assume that
getting toggle to it is a non issue), and its
output connects to the D of the output FF.
No extra routing delay.

Path 1:
slack from FF1 to FF3 plus slack from FF3 to output

2000-(75+200+100+25)+2000-(75+200+100+25+200)= 3.0ns

Path 2:
same slacks, just different FFs 3.0ns

So: weird but better

I could of course screw up the results by changing
the delay numbers, but they are pretty realistic for
current technology.

Option 1) offers extra stages of synchronization between the input
and output, but the 1ns gap between FFs means that metastability is
more likely to propagate. Option 2) waits 2ns for the sample FFs to
make up their mind, vastly decreasing the metastability probability.

Yep.

My second question is, does the type of metastability, i.e. the
things in Philip's list, affect which is the better choice? For
instance, if the first FF in the MMC exhibits oscillations in
metastability, then the second FF in the MMC would have several
chances, as its input oscillates, to sample at the 'wrong' time. This
might favour MMC option 2).

Actually MMC 2 is favored regardless of oscillations or not
because of the 600ps of additional slack time.

If, however, the first FF in the MMC goes
into option 5) metastability, then there's only one chance for the
second FF to sample at the 'wrong' time. This might confer an
advantage on MMC option 1).

My thinking on this has always been that the only thing
that matters is the resolving time (slack) and the thought
experiments about later stages sampling at just the right time
to catch the previous FF resolving only cloud the issue. I am
not as confident on this issue as I am on others though.

What I am confident on though, is that there is a better MMC
than your two, and it follows on from MMC #2.

Just use 2 FFs, and clock them every 4 ns:

(that is, enable them every 4th clock cycle. This
would mean though that unlike MMC #2, which runs 2
parallel paths and avoids some latency, this could
have upto 4 ns of extra latency, if you just miss
the input change)

slack from FF1 to output:

4000-(75+200+100+25) = 3.6ns

If the latency is really a problem, you could build
on the MMC #2 design and have 4 paths each out of phase
by 1 clock cycle. Since the path is now only 2 FFs long
you would have to have 4 output FFs, and the selector
mux would be after these 4 FFs. On the bright side, the
mux delay does not eat into the resolving slack time,
but it would eat some of the available cycle time in
the logic that follows the output FF.

Anyway, I'm still thinking about this. I think the clock frequency
may decide which is better for a given FF type. Any comments?
Cheers, Syms.

Thanks for an interesting question. Comments above

Philip Freidin

Philip Freidin
Fliptronics

rickman · Sep 3, 2003

Symon wrote:

Hi Austin,
Maybe I got the wrong end of the stick, but when Peter said:-
"I have never seen strange levels or oscillations ( well, 25 years ago
we had TTL oscillations). Metastability just affects the delay on the
Q output." I thought he meant that he'd only seen metastability where
the output from the FF was always either on or off, just that
sometimes the transition was delayed. Philip's pictures clearly show
'strange levels'. This is important, I believe, when deciding what the
effects of metastable FFs are on following circuitry. I guess we'll
have to wait until he returns from his Portugese jaunt before we find
out what he meant!!

I think the exact behavior is largely irrelevant since a simple delay is
just as disasterous as anything else you would encounter. Since you
don't know *when* the transition would happen, it could happen at the
moment the next FF is latching the intermediate value. That is enough
for the next FF and all following logic to behave badly as well.

--

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design URL http://www.arius.com
4 King Ave 301-682-7772 Voice
Frederick, MD 21701-3110 301-682-7666 FAX

Austin Lesea · Sep 3, 2003

Symon,

Agreed. D FF is what I was assuming here, but folks do have different ways of building them, and
not all master-slave implementations are the same (in fact I have seen perhaps a dozen different
versions).

Peter will have some explaining to do when he gets back .... as the quote does sound odd. There
is most definitely voltage levels that remain in the undecided region for various lengths of time
until the circuit resolves its state.

Austin

Symon wrote:

Hi Austin,
Maybe I got the wrong end of the stick, but when Peter said:-
"I have never seen strange levels or oscillations ( well, 25 years ago
we had TTL oscillations). Metastability just affects the delay on the
Q output." I thought he meant that he'd only seen metastability where
the output from the FF was always either on or off, just that
sometimes the transition was delayed. Philip's pictures clearly show
'strange levels'. This is important, I believe, when deciding what the
effects of metastable FFs are on following circuitry. I guess we'll
have to wait until he returns from his Portugese jaunt before we find
out what he meant!!
Of course, I agree the Master/Slave thing helps. A master FF
on its own is what I'd call a latch, the clock controlling whether
it's transparent or not. The slave is the sameish circuit again, fed
from the output of this, but its clock is inverted. So, I guess that
you're saying because the master and slave are fabricated right next
to each other, the input to the slave can be expected to transition
faster than the input to the master which travels from further away?
Less capacitive interconnect to drive. (BTW, I assumed throughout the
metastability stuff we were talking about D-type FF, rather than
latches.)
thanks, Syms.

Austin Lesea <Austin.Lesea@xilinx.com> wrote in message news:<3F550102.DC2B872F@xilinx.com>...
Symon,

I think the sampling o-scope shots agree perfectly with what Peter said.
Runt pulse, and funny levels are the easiest metastable results to catch,
as they are just before the long unknown settling time behavior that is so
vexing to designers.

It also makes a difference where you look: a master-slave FF reduces the
duration of the unknown transistion over a simple FF without a slave to
help "sharpen up" the transistions.

Austin

Symon wrote:

Hi Philip,
Thanks for your post, those pictures were certainly very
interesting! What parts did you use? I notice they seem to disagree
with Peter's quote that "Metastability just affects the delay on the Q
output.". I wonder, Peter, if Xilinx FFs behave differently from the
ones in Philip's photos? (Other than speed, of course.) I must admit,
one reason I posted was that I found hard to believe that any FF
wouldn't show runt pulses, or funny output levels, albeit for brief
periods of time, during metastable events.
It's also interesting that the straight shift register isn't
necessarily the best way to reduce metastability effects. That was
what I suspected and was another reason behind my post. I agree that
the 'four paths out of phase' solution is better, and preserves the
sampling resolution. Often the sampling resolution needs to be
preserved, which was why I didn't present an 'enabled every third or
fourth go' type circuit.
Anyway, thanks to all for their thoughts, it's an interesting
topic!
cheers, Syms.

Austin Lesea · Sep 3, 2003

Rick,

I agree that the effect is the same: you just don't know when it will
resolve, and thus the value that you "see" at the next level is basically
unknown.

It could be that Peter's point is that the next circuit in line does make a
decision, and it most certainly makes the "unknown" into a '1', or into a
'0'. It is unlikely that the next circuit in line will propagate the same
intermediate behavior, and if it can (gain is too low), then things just get
more fuzzy until someone upstream ends up resolving the level back to a '1'
or a '0'.

Austin

rickman wrote:

Symon wrote:

Hi Austin,
Maybe I got the wrong end of the stick, but when Peter said:-
"I have never seen strange levels or oscillations ( well, 25 years ago
we had TTL oscillations). Metastability just affects the delay on the
Q output." I thought he meant that he'd only seen metastability where
the output from the FF was always either on or off, just that
sometimes the transition was delayed. Philip's pictures clearly show
'strange levels'. This is important, I believe, when deciding what the
effects of metastable FFs are on following circuitry. I guess we'll
have to wait until he returns from his Portugese jaunt before we find
out what he meant!!

I think the exact behavior is largely irrelevant since a simple delay is
just as disasterous as anything else you would encounter. Since you
don't know *when* the transition would happen, it could happen at the
moment the next FF is latching the intermediate value. That is enough
for the next FF and all following logic to behave badly as well.

--

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design URL http://www.arius.com
4 King Ave 301-682-7772 Voice
Frederick, MD 21701-3110 301-682-7666 FAX

Symon · Sep 3, 2003

Hi Rick,
Good point, the unknown delay is bad enough. Thinking about it,
I can't think of a 'sensible' digital metastability reduction circuit
and scenario which would differentiate between a simple delay, and
(say) a runt pulse. (Perhaps if the second FF was clocking faster than
the 'sample' FF? Doesn't make much sense!)
OTOH, my concern is this. If the following FF also goes, or is
very likely to go, metastable if its D input is at a 'funny' level,
then the 'funny' level metastability is more likely to propogate than
a simple delay. This is because the simple delay has to hit a tiny
time window to propogate the metastability, whereas Philip's photo's
show the funny levels lasting a while.
thanks, Syms.

rickman <spamgoeshere4@yahoo.com> wrote in message news:<3F55987E.ACCD59BE@yahoo.com>...

Symon wrote:

Hi Austin,
Maybe I got the wrong end of the stick, but when Peter said:-
"I have never seen strange levels or oscillations ( well, 25 years ago
we had TTL oscillations). Metastability just affects the delay on the
Q output." I thought he meant that he'd only seen metastability where
the output from the FF was always either on or off, just that
sometimes the transition was delayed. Philip's pictures clearly show
'strange levels'. This is important, I believe, when deciding what the
effects of metastable FFs are on following circuitry. I guess we'll
have to wait until he returns from his Portugese jaunt before we find
out what he meant!!

I think the exact behavior is largely irrelevant since a simple delay is
just as disasterous as anything else you would encounter. Since you
don't know *when* the transition would happen, it could happen at the
moment the next FF is latching the intermediate value. That is enough
for the next FF and all following logic to behave badly as well.

--

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design URL http://www.arius.com
4 King Ave 301-682-7772 Voice
Frederick, MD 21701-3110 301-682-7666 FAX

Austin Lesea · Sep 3, 2003

Symon,

A long time ago, I designed a timing system for telecom that used a rubidium clock. The
reason why this is interesting will become apparent in a moment.

One function of the system was to measure up to five external sync references (presumably
from traffic bearing lines from other offices).

The circuit to do this was basically a counter, that was sampled by a rubidium derived
clock. All clocks were syntonous (same frequency, arbitray phase -- aka the SONET/SDH
telepone network). As the phase wandered back and forth, the metastable regions would get
exercised, and the measurement board would report that an input signal had arbitrarily
"slipped" by some random number of bits (due to a metastable transistion of a bit in the
counter/latch).

This was so frustrating, because in real life, the inputs could slip due to failures,
glitches in the network, etc. So how do you know a real bad slip, from a metastable one?

In the hope (vain and useless) of reducing the occurence of the metastable sample, we had
three levels of FFs to try to re-synchronize the sample count, along with an elaborate set
of clock enables. We got it to the point where the false slip occurred about evey two
months, in a typical network. Since real outages were far more common, it was not a big
deal.

In spite of this, we wrote software to identify a real slip, from a false slip. Basically,
if five successive samples were not equal (as the phase can't change that fast) we threw out
that set of measurements, and took another five. This dropped the occurence of a false slip
to below the threshold that we could measure (but it could still happen and probably did -
still does out there somewhere).

So, metastability can sometimes be beaten into submission, but it never goes away....

The ultimate frustration is that one of these references gets used to track the rubidium
(steer it in a phase lockeed loop), so a fake glitch can cause quite a hit, which then
causes slips throughout the network as the rubidium runs off the the "wrong"
frequency/phase. A real glitch can also cause the same behavior, so the locked loop is
quite loosely coupled, and before each update, one checks absolutely everything to be sure
that what you are trying to track is real....

I had heard there were some cases with equipment designed by others where the maintenance
folks would just disconnect the inputs, as the bare rubidium ran so clean, that it was
better to not track at all (fewer slips) than to bother with trying to track the references
and falsely running off into the weeds due to metastability and poor reference checking.

All of this became obsolete when GPS became available, as now precise time (frequency) was
broadcast for free.

Austin

Symon wrote:

Hi Rick,
Good point, the unknown delay is bad enough. Thinking about it,
I can't think of a 'sensible' digital metastability reduction circuit
and scenario which would differentiate between a simple delay, and
(say) a runt pulse. (Perhaps if the second FF was clocking faster than
the 'sample' FF? Doesn't make much sense!)
OTOH, my concern is this. If the following FF also goes, or is
very likely to go, metastable if its D input is at a 'funny' level,
then the 'funny' level metastability is more likely to propogate than
a simple delay. This is because the simple delay has to hit a tiny
time window to propogate the metastability, whereas Philip's photo's
show the funny levels lasting a while.
thanks, Syms.

rickman <spamgoeshere4@yahoo.com> wrote in message news:<3F55987E.ACCD59BE@yahoo.com>...
Symon wrote:

Hi Austin,
Maybe I got the wrong end of the stick, but when Peter said:-
"I have never seen strange levels or oscillations ( well, 25 years ago
we had TTL oscillations). Metastability just affects the delay on the
Q output." I thought he meant that he'd only seen metastability where
the output from the FF was always either on or off, just that
sometimes the transition was delayed. Philip's pictures clearly show
'strange levels'. This is important, I believe, when deciding what the
effects of metastable FFs are on following circuitry. I guess we'll
have to wait until he returns from his Portugese jaunt before we find
out what he meant!!

I think the exact behavior is largely irrelevant since a simple delay is
just as disasterous as anything else you would encounter. Since you
don't know *when* the transition would happen, it could happen at the
moment the next FF is latching the intermediate value. That is enough
for the next FF and all following logic to behave badly as well.

--

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design URL http://www.arius.com
4 King Ave 301-682-7772 Voice
Frederick, MD 21701-3110 301-682-7666 FAX

Mitigating metastability.

Symon

Guest

Hal Murray

Guest

Symon

Guest

Austin Lesea

Guest

Symon

Guest

rickman

Guest

Jim Granville

Guest

Mike Treseler

Guest

rickman

Guest

Philip Freidin

Guest

rickman

Guest

Austin Lesea

Guest

Austin Lesea

Guest

Symon

Guest

Austin Lesea

Guest

Welcome to EDABoard.com

Sponsor

Online statistics

Forum statistics

Mitigating metastability.

Symon

Guest

Hal Murray

Guest

Symon

Guest

Austin Lesea

Guest

Symon

Guest

rickman

Guest

Jim Granville

Guest

Mike Treseler

Guest

rickman

Guest

Philip Freidin

Guest

rickman

Guest

Austin Lesea

Guest

Austin Lesea

Guest

Symon

Guest

Austin Lesea

Guest

Log in

Welcome to EDABoard.com

Sponsor