Clock Edge notation

"jacko" <jackokring@gmail.com> wrote in message
news:43a279d5-60c0-4a7d-9fab-d90132d279c7@m73g2000hsh.googlegroups.com...
On 10 Aug, 16:37, Mike Treseler <mtrese...@gmail.com> wrote:

I feel this feature should be added by altera as it could boost fmax
of many designs not featuring carry sum primitives, just convoluted
critical paths.
Then perhaps suggest this to Altera with examples that actually demonstrate
your claimed performance difference on final routed designs via their 'My
Support' channel.

KJ
 
"whygee" <whygee@yg.yg> wrote in message
news:48a0ae9d$0$294$7a628cd7@news.club-internet.fr...
Hello,

Can somebody spend a few minutes downloading and trying the code
on Altera or Xilinx tools and chips ?

Thanks in advance,
YG
Yes, somebody can. You! Xilinx offer their software for free in a thing
called 'Webpack', and Altera has a similar thing called 'Web Edition', that
you can use to try out your code. You'll find the debug process goes a lot
quicker without the Usenet posting-response loops! ;-)
Good luck, Syms.
 
"KJ" <kkjennings@sbcglobal.net> wrote in message
news:bwmnk.32987$co7.7807@nlpi066.nbdc.sbc.com...
I'll strongly second Mike's advice and I'd go further and state that
latches are never safe unless
- The device actually has a hardware latch as a resource (unlikely
now-a-daze)
- The synthesized code ends up mapping the source code to the above
mentioned latch
- The latch enable signal is sourced from a flip flop.


KJ
Hi Kevin,

All the Xilinx FPGAs I use have latches built in. The storage elements in
the CLBs and the IOBs can be designated either a FF or a latch. Also, I
think your third requirement is awry if by 'latch enable' you mean what I
would call the 'gate'. Normally my latched designs have the gate fed from a
clock.

See Fig.10 of this app note.
http://www.xilinx.com/support/documentation/application_notes/xapp233.pdf

Sometimes latches are used because of their superior speed. That said, in
FPGAs, clearly FFs are preferable in almost all circumstances. Including the
case where the designer can't be bothered to code for FFs. ;-)

HTH., Syms.
 
All,
FWIW this link gives a lot of useful information about level sensitive
latches and their inference, both intentional and accidental, in FPGAs. It
covers both VHDL and Verilog.
HTH, Syms.
http://www.synplicity.com/literature/pdf/synplify_ref_1001.pdf
 
"jacko" <jackokring@gmail.com> wrote in message
news:af53aa15-bf45-4a86-be34-419694c6ad88@l42g2000hsc.googlegroups.com...
Take nibz12.vhd from http://nibz.googlecode.com and eliminate the
enumeration state indqq. This is to prevent post increment on register
assign.
It does more than just that from a logic perspective. Just a simple perusal
by searching for 'indqq' shows that deleting the 'indqq' enumeration would
change the following:
- Logic for signals 'p', 'q', 'r' and 's' would be different. As written,
there would be instances (i.e. when 'indirect = indqq') where p,q,r,s would
not be updated; by deleting the 'indqq' enumeration one of these four would
always be doing something. Refer to lines 122-134 of nibz12.vhd, the
snippet of the code is at the end of this post as "Nibz12.vhd example #1".
There are other instances of this as well.
- Logic for signal 'pre' would be different. Refer to lines 223-234 of
nibz12.vhd or "Nibz12.vhd example #2" at then end of this post. Without
'indqq' as an enumeration, the statement "pre <= q;" that occurs when in the
case when 'indirect = indqq' would need some modifications.
- Logic for the signal 'ir' would be different. refer to case statement
starting at line 273 (or "Nibz12.vhd example #3" at the end of this post)
and in particular, the assignment "ind <= indqq;" on line 282 which would
produce a compile error if you deleted the 'indqq' enumeration.

I have no idea why you would toss this out as an example of the particular
sub-topic of demonstrating what you claim to be differences when there are
multiple assignments to a signal within a single process...but at this point
I don't really much care.

Obviously you didn't even take the time to see that simply deleting the
'indqq' enumeration...
- Would produce code that wouldn't compile
- Would change the logical function being implemented.
- Is not an example to support your claim that multiple assignments to a
signal within a process produces different synthesis results in terms of
either resource usage or performance.

I'm guessing that your claims are more based on the arrogrance of ignorance
than anything else.

According to you the fact that post-increment code occurs
before register assignment code, register assignment should overide,
and the state indqq would not be required.
No I didn't say that at all. What I said was that from the perspective of
- Logic function
- Synthesis resources
- Synthesized performance
the following two forms are exactly identical. Perhaps you should take some
more time reading and understanding what is being presented instead of going
off on various tangents stating things that you don't really know about.
Unfortunately, a person who doesn't know what they don't know is in a far
worse situation than someone who at least knows what they don't know.

-- #1
process(clk)
begin
if rising_edge(clk) then
if (reset = '1') then
-- do some sync resets
else
-- do something else
end if;
end if;
end process;

-- #2
process(clk)
begin
if rising_edge(clk) then
-- do something else
if (reset = '1') then
-- do some sync resets
end if;
end if;
end process;

I have tried it, it makes it larger and slower! A real example of
using or avoiding 'double' assignment.
It's a real example of something, but it is not an example of how avoiding a
'double' assignment changes anything. It may be an example of pipelining,
I'm not interested enough to find out. In order to show your point, you
would have to produce two designs that are
- Logically exactly equivalent (every signal has the same value in both
designs at every clock cycle).
- The only source code difference is that at least one signal has a 'double'
assignment in the one design but not the other.
- Demonstrates different resource usage or clock cycle performance

cheers
jacko

p.s. wouldn't consider doing a pointless simple test as reduction of
logic form just too obvoius to any silicon compilier.
But not so obvious to you for some reason.

The "pointless simple test" as you call it has nothing to do with scale,
those two templates would produce the exact same results whether the process
in question was one of a handful of lines (as was presented) or 10,000 lines
with multiple loops, case, if statements and whatever. You seem to feel
otherwise, even in spite of
- The comments of multiple people who know what they are talking about.
- The presentation of a complete design (not just a snippet of the relevant
code) that was provided that in the previous post that you could use to
prove it to yourself by simply copying it and trying it out.

In any case, I took the time to review what you suggested and pointed out
the flaws in your argument. The reason for your resource usage and clock
cycle differences have nothing to do with double assignment in the source
code it has completely to do with changing the logical function itself which
generally does produce changes in both of these metrics. A simple example
of this is pipelining where you break up a computationally 'expensive' logic
function into smaller ones that span several clock cycles. While at some
higher level of abstraction the two designs can be thought of as being
equivalent, the fact remains that the one with pipelining has more latency,
it will produce results at a different (later) time, the logic function
being implemented is different. That is not news, that is well known.

As a final point, since you pointed me to your code, here are some other
suggestions:
* You don't know how to name signals and constants in a meaningful way to
indicate what they logically represent. Some examples are...
signal p, q, r, s, c, x0, a0, x1, a1, car, ctmp
constant z, z4

* You don't understand what signals belong in the sensitivity list of a
synchronous process. Example:
process (CLK_I, RST_I, ACK_I) -- KJ: ACK_I is not needed.

* You don't understand what signals elong in the sensitivity list of a
combinatorial process. Example:
process(ir)
But this process (starting at line 206 of nibz12.vhd) depends on the
following signals as well: 'cycle', 'indirect', 'dir', etc. This will
synthesize to something that is functionally different than simulation.
That is a huge blunder, debugging in the simulator is way more productive
than on the bench...once you have sufficient skill that is.

* You probably don't simulate your source code.

Good luck on your learning experience, I'm done with this one.

Kevin Jennings

---- Nibz12.vhd example #1
case indirect is
--pre decrement??
when indp =>
p <= ADR_O;
when indq =>
q <= ADR_O;
when indr =>
r <= ADR_O;
when inds =>
s <= ADR_O;
when indqq =>

end case;

---- Nibz12.vhd example #2
case indirect is
when indp =>
pre <= p;
when indq =>
pre <= q;
when indr =>
pre <= r;
when inds =>
pre <= s;
when indqq =>
pre <= q;
end case;

---- Nibz12.vhd example #3
case ir(3 DOWNTO 0) is
when "0000" =>
-- BAck
ind <= indr;
wrt <= rd;
dir <= dirp;
post <= din;
when "0001" =>
-- Fetch In
ind <= indqq;
wrt <= rd;
dir <= dirq;
post <= din;
....
 
"jacko" <jackokring@gmail.com> wrote in message
news:a3352fc4-9d3a-4513-aca7-644ad2aacd65@m44g2000hsc.googlegroups.com...
Hi

On the subject of
sensitivity lists, I tend to exclude things which have no relevance
until clock, or selection.
This statement demonstrates a lack of knowledge of what the following type
of synthesis warning means:
"Incomplete sensitivity list: assuming completeness"

Given that premise, I would hazard a guess to say that you also don't
understand the implications of this 'warning' and how it means that your
simulation model will behave differently under certain conditions than the
real hardware....ah well, life's lessons are best remembered when taught by
direct exposure.

This possibly allows such designs to be
sythesized using latches based on sensitivity, preventing many
possible power wasting transitions.
Not to mention creating opportunities for creating a design that behaves
differently as a function of temperature (i.e. warm the part up or cool it
off and watch it stop working) because the targetted part either doesn't
have a hardware latch as a basic resource or the synthesis tool doesn't map
it to such a latch for whatever reason. Most people don't consider correct
operation only under very stringent temperature conditions to be much of a
feature, they typically want the entire commercial operating temperature
range or something close to that.

KJ
 
"jacko" <jackokring@gmail.com> wrote in message
news:c7f50dca-89fe-4c83-a885-22dbc0ac2439@s50g2000hsb.googlegroups.com...
Unfortunately, a person who doesn't know what they don't know is in a
far
worse situation than someone who at least knows what they don't know.

if they knew what they don't know I'd say they were confused!
And if you would understand what was actually written maybe you wouldn't
come off looking rather foolish/arrogant in your postings.

Now as some people have said, and maybe would say about me, it often
not to do witha lack of understanding, I'm just an eratic genious.
Rest assured, I would not be counted among those who might consider you to
be any sort of genious.

KJ
 
"jacko" <jackokring@gmail.com> wrote in message
news:33823413-d5a6-47c5-9a17-f3e25b9bfd71@m36g2000hse.googlegroups.com...
On 16 Aug, 20:22, "KJ" <kkjenni...@sbcglobal.net> wrote:
"jacko" <jackokr...@gmail.com> wrote in message

news:a3352fc4-9d3a-4513-aca7-644ad2aacd65@m44g2000hsc.googlegroups.com...

Hi

On the subject of
sensitivity lists, I tend to exclude things which have no relevance
until clock, or selection.

This statement demonstrates a lack of knowledge of what the following
type
of synthesis warning means:
"Incomplete sensitivity list: assuming completeness"

If assumption is possible, and acurate to control design, then purpose
of inclusion is $/klocs wage scheme or to provide late combination
entry performance queues and uPower latch halting of logic oscillation/
ringing/set-up transitions.
I'll refrain from stating what this statement seems to demonstrate about
your expertise in this area..

Given that premise, I would hazard a guess to say that you also don't
understand the implications of this 'warning' and how it means that your
simulation model will behave differently under certain conditions than the
real hardware....ah well, life's lessons are best remembered when taught
by
direct exposure.

I expect a simulation model to behave differently, after all it is
just a simulation, where accuracy is not perfect. I assume your
talking about VHDL simulation, and not spice modeling of resulting
Yes, VHDL simulation...you still don't have a clue though about what I'm
talking about what the differences are though now do you?

This process...
process(a)
begin
c <= a and b;
end process;

will simulate one way (i.e. changes in 'b' will not cause a change in
'c'...not until a change in 'a' also occurs to 'wake up' the process. Get a
simulator, try it out, hold 'a' constant and toggle 'b' all you want and
watch 'c' not change at all.

Synthesis will give you a warning about 'b' not being in the sensitivity
list and generate code equivalent to this...
c<= a and b;

These are functionally different things, they will behave differently.
Depending on what you intended, you might like the way the synthesis tools
handled it, or you might not, the point is that they are going to do
radically different things so you won't be able to use the simulator to
debug a problem in the real world...you don't seem to grasp that this is a
very bad situation to be in, but that's because you don't simulate...which
is yet another problem.

This is a simple example to demonstrate the point, your code has more
complicated examples that will misbehave on you as well.

This possibly allows such designs to be
sythesized using latches based on sensitivity, preventing many
possible power wasting transitions.

Not to mention creating opportunities for creating a design that behaves
differently as a function of temperature (i.e. warm the part up or cool
it
off and watch it stop working) because the targetted part either doesn't
have a hardware latch as a basic resource or the synthesis tool doesn't
map
it to such a latch for whatever reason. Most people don't consider
correct
operation only under very stringent temperature conditions to be much of
a
feature, they typically want the entire commercial operating temperature
range or something close to that.

If the synthesizer does not know the holding properties of the part,
I'd say your well screwed.
Not true at all. If you ignore the warnings that the synthesizer gives you
then you'll be screwed. Every synthesis tool will properly generate a
bitstream that can be used to program a part to implement the following
transparent latch...

process(c, d)
begin
if (c = '1') then
q <= d;
end if;
end process;

If the targetted device does not have a hard latch primitive then it will be
cobbled together from the basic logic elements that are available...and that
device will likely fail either immediately or when the device is heated up
or cooled down a bit. In any case, it's not the tools fault, it implemented
the logic that you specified with the most appropriate elements available to
it. The fault lies with the designer for using that code in a device that
does not have hardware transparent latches.

KJ
 
"Mike Treseler" <mtreseler@gmail.com> wrote in message
news:6gp9grFh39i9U1@mid.individual.net...
Jonathan Bromley wrote:
...
Which leads me to a scheme
that I've proposed here before, but not so elegantly
as Kenn's exposition: MAKE Q BE A VARIABLE, not a signal,

Now you've let the cat out of the bag.
If designers learn that C-like variables can make
registers, this newsgroup will become a ghost town ;)
I doubt it will become a ghost town, there will just be something else to be
used and misused.

It will just add a new topic to go along with the favorites...
- Two (or three) process vs one process state machine
- Mealy and Moore vs Larry, Curly, Moe
- Async vs. sync resets
- std_logic_arith

KJ
 
"jacko" <jackokring@gmail.com> wrote in message
news:bef61521-51a7-4ad3-8f39-3456d61886fc@x41g2000hsb.googlegroups.com...
Also get the FREE DAC2 source. A neat little digital to analog
converter using phase ultrasonics, for output on a single digital pin
for feeding into a LPF. This is the b***ocks!

cheers
jacko
'C' compiler?

Thanks, Syms.

p.s. A fair few folks on this newsgroups are 'merican. They might think
b***ocks are buttocks, rather than bollocks. In any case, the bollocks need
to belong to a dog to be any good. Unless I'm talking bollocks.
 
"Andy" <jonesandy@comcast.net> wrote in message
news:d4ece4bd-1bcd-45b1-bf08-782fd22b5df7@x41g2000hsb.googlegroups.com...
rickman wrote:
One suggestion. When implementing counters, it is slightly more
efficient to implement them as loadable down counters.
This is because in most technologies there is a carry chain built in
that can detect when counter is 0. If you are counting up to (M-1)
the synthesizer has to use LUTs to detect the final state if M is not
a power of 2.
Rick
Many synthesis tools will not infer the carry bit from the decrement
for a comparison = 0. They will implement an AND function to test each
bit.

However, if you use integers for counters, it is easy to detect
rollovers with the carry bit.
signal counter : integer range 0 to 2**n-1;
if counter - 1 < 0 then -- check the carry bit
counter <= start_val;
do_something;
else
counter <= counter - 1; -- reuse same decrementer
end if;
Note that integer operations are always signed, 32 bit. so the result
of the decrement in the conditional expression can in fact be less
than zero. Not to worry, synthesis will figure out which of the 32
signed bits really gets used, and throw away the rest.
Andy
I tried an experiment today with a unit with a 22-bit unsigned counter
(count down). I had initially implemented it as std_logic_vector; and it
turned out initially that I had to stop at one instead of zero. I changed
the logic to a 22-bit unsigned, and changed the load value by one so that
the "stop" point came at cnt_out = 0. Then, I tried it with an integer range
0 to 2*22-1 and the cnt_out -1 test mentioned by Andy.

Using Synplify Pro 8.8 into ISE 9.1, the *unsigned* came out noticeably
smaller (both worked). The *integer* was largest (even bigger than the
initial SLV code). Looking at the RTL view, it seemed like a big and was
still implied in the unsigned case, but I imagine that could be misleading.

I'm going to try a few more and see if the trend continues (at least see if
unsigned works better than SLV, though that difference may have just come
from changing the test from cnt_out = "0000000000000000000001" to cnt_out =
0).

Just one data point...
Marty (a physicist/systems engineer who's rapidly learning VHDL on the fly)
 
On Aug 21, 9:35 am, Andy <jonesa...@comcast.net> wrote:



On Aug 21, 7:27 am, rickman <gnu...@gmail.com> wrote:

On Aug 21, 2:52 am, Mike Treseler <miket_trese...@comcast.net> wrote:

Kim Enkovaara wrote:
Yes, I agree that variables are sometimes painful and make debugging
much harder, but on the other hand they help to make cleaner code
usually that is easier to read.

...and less likely to have a logical error in the first place.

-- Mike Treseler

People keep saying this, but I have not seen one example. Can anyone
come up with a compelling example of why we should suffer the use of
variables in our code?


"We" have already given several examples that are compelling to many
of us:

Ease of discerning cycle based behavior from the code


The 'ease of discerning' depends much on the skill of the person
writing the code, not whether or not variables were used.


Decoupling, etc. without resorting to separate files/entities/
architectures

Decoupling and hierarchy has nothing to do with the use of variables.
If you don't like the typing overhead of separate entities to express
hierarchy (a valid complaint for some) you're free to express
hierarchy within a block or generate statement and keep it all in one
file...the amount of typing would be the same (slightly less I guess
since 'block' is a shorter word than 'process').

If your point here though was that keeping things (in this case
variables) invisible outside of the scope of the process then this is
exactly analogous to keeping other things (in this case signals) local
to a block or generate...and inside that block you can still plop down
a process with its variables if you so choose. Processes are
handicapped in that they can not define a local signal if needed so
you're forced to use variables to keep it local.



Proximity of the variable definition to where it is used

The proximity of a variable definition to it's use is identical to
that of a signal definition within a block to it's point of use.


Simulation efficiency

I measured ~10-15% a while back...so that's one advantage.


Apparently these are not compelling reasons to you, and you are free
to limit your use of the VHDL language and tool capabilities
accordingly.

You listed four reasons, only one of which is a valid advantage (which
in turn must be balanced against the disadvantages previously
mentioned).

@Rick
Where I tend to use variables in synthesis is where the logic to
express the function is best handled with sequential statements (i.e.
'if', 'case', etc.) and I don't want to bother writing it as a
function for whatever reason so that I could use the function output
in a concurrent signal assignment.


The other case I would use variables in synthesis is to compute some
intermediate thing which, if I didn't do it that way, would result in
basically copy/pasting code, or otherwise cluttering up the source
code...in other words, use of the variable becomes much like a
shorthand notation.


The biggest place I find for variables though has to be in the
testbench code where I model the various widgets on the board and
where synthesizability (if that's a word) is not a concern.


The 'compelling example' is only something that you find compelling.
Variables are just a tool in the toolkit for getting the job done.
Like any tool they can be used well or misused. The quality/
readability/*-ity of the resulting code that pops out depends solely
on the skill and knowledge of the designer. Being limited in either
area reduces the *-ity measure.


KJ
 
"rickman" <gnuarm@gmail.com> wrote in message news:99d6d66f-322e-4bce-98a3-
In fact, I was under the impression that
there is no "standard" for what parts of the language were supported
for synthesis. Am I wrong about this?
IEEE 1076.6

http://www.google.com/search?hl=en&q=vhdl+1076.6

KJ
 
"whygee" <whygee@yg.yg> wrote in message
news:48b0287a$0$292$7a628cd7@news.club-internet.fr...
whygee wrote:
rickman wrote:
On Aug 21, 4:39 pm, whygee <why...@yg.yg> wrote:
Hi,
...snip...

Oh, now I remember exactly why i use the current, seemingly crazy method.

This is because the SPI clock is different from the CPU clock.
The shift register (in shared/single configuration) must have 2 clock
sources,
one for loading the register, another for shifting.
When only the CPU clock controls the shift, it is very easy, but
Since you said you're implementing the SPI master side, that implies that
you're generating the SPI clock itself which *should* be derived from the
CPU clock...there should be no need then for more than a single clock domain
(more later).

If someone has a better idea, please tell me ...
But when 2 different unrelated clocks are used, there is no solution
using less than 1 FF per side. Then again, I'm pretty sure that
I am reinventing the wheel for the 100001th time :)
The CPU clock period and the desired SPI clock period are known constants.
Therefore one can create a counter that counts from 0 to Spi_Clock_Period /
Cpu_Clock_Period - 1. When the counter is 0, set your Spi_Sclk output
signal to 1; when that counter reaches one half the max value (i.e.
"(Spi_Clock_Period / Cpu_Clock_Period/2") then set Spi_Sclk back to 0.

The point where the counter = 0 can also then be used to define the 'rising
edge of Spi_Sclk' state. So any place where you'd like to use
"rising_edge(Spi_Sclk)" you would instead use "Counter = 0". The same can
be done for the falling edge of Spi_Sclk; that point would occur when
Counter = Spi_Clock_Period / Cpu_Clock_Period/2.

Every flop in the design then is synchronously clocked by the Cpu_Clock,
there are no other clock domains therefore no clock domain crossings. The
counter is used as a divider to signal internally for when things have
reached a particular state.

KJ
 
"whygee" <whygee@yg.yg> wrote in message
news:48b00011$0$290$7a628cd7@news.club-internet.fr...
rickman wrote:
On Aug 21, 4:39 pm, whygee <why...@yg.yg> wrote:

However,

my master SPI controller emits the clock itself (and resynchronises it)
No need for the master to resynchronize something that it generates itself
(see my other post).

so for MOSI, the system can be considered as "source clocked", even
if the slave provides some clock (it is looped back in my circuit).
I don't think you understand SPI. The master always generates the clock, it
is up to the slave to synchronize to that clock. The master never has to
synchronize to the SPI clock since it generates it.

So i can also sample the incoming MISO bit on the same clock edge as MOSI
:
the time it takes for my clock signal to be output, transmitted,
received by the slave, trigger the shift, and come back, this is
well enough time for sample & hold.
See my other post for the details, but basically you're making this harder
than it need be. Since the master is generating the SPI clock it knows when
it is about to switch the SPI clock from low to high or from high to low,
there is no need for it to detect the actual SPI clock edge, it simply needs
to generate output data and sample input data at the point that corresponds
to where it is going to be switching the SPI clock.

KJ
 
"whygee" <whygee@yg.yg> wrote in message
news:48b0e3a4$0$294$7a628cd7@news.club-internet.fr...
Hi !

KJ wrote:
Since you said you're implementing the SPI master side, that implies that
you're generating the SPI clock itself which *should* be derived from the
CPU clock...there should be no need then for more than a single clock
domain (more later).

As pointed in my previous post, there is at least one peripheral
(ENC28J60 revB4) that has clocking restrictions
(also know as "errata") and I happen to have some ready-to-use
modules equipped with this otherwise nice chip...
It's always fun when someone refers to mystery stuff like "clocking
restrictions (also know as "errata")" instead of simply stating what they
are talking about. There is setup time (Tsu), hold time (Th), clock to
ouput (Tco), max frequency (Fmax). That suffices for nearly all timing
analysis although sometimes there are others as well such as minimum
frequency (Fmin), refresh cycle time, latency time, yadda, yadda, yadda. I
did a quick search for the erratta sheet and came up with...
http://ww1.microchip.com/downloads/en/DeviceDoc/80257d.pdf

In there is the following blurb which simply puts a minimum frequency
requirement of 8 MHz on your SPI controller design, nothing else. I'd go
with the work around #1 approach myself since it keeps the ultimate source
of the SPI clock at the master where it *should* be for a normal SPI system.

-- Start of relevant errata
1. Module: MAC Interface

When the SPI clock from the host microcontroller
is run at frequencies of less than 8 MHz, reading or
writing to the MAC registers may be unreliable.

Work around 1
Run the SPI at frequencies of at least 8 MHz.

Work around 2
Generate an SPI clock of 25/2 (12.5 MHz), 25/3
(8.333 MHz), 25/4 (6.25 MHz), 25/5 (5 MHz), etc.
and synchronize with the 25 MHz clock entering
OSC1 on the ENC28J60. This could potentially be
accomplished by feeding the same 25 MHz clock
into the ENC28J60 and host controller. Alternatively,
the host controller could potentially be
clocked off of the CLKOUT output of the
ENC28J60.
-- End of relevant errata


I don't know if my chip revision is B4 and the errata
suggest using a clock between 8 and 10MHz.
However, it also suggest using the ENC28J60-provided 12.5MHz
output :
Read it again. That suggestion was one possible work around, there is
nothing there to indicate that this is a preferred solution, just that it is
a solution.

I'm ready to add an external clock input in the master
if i'm allowed to "legally" go beyond the 10MHz rating
(a 25% bandwidth increase is always a good thing, particularly
with real-time communications).
You can run SPI at whatever clock frequeny you choose. What matters is
whether you meet the timing requirements of each of the devices on your SPI
bus. In this case, you have a minimum frequency clock requirement of 8 MHZ
when communicating with the ENC28J60. If you have other SPI devices on this
same bus, this clock frequency does not need to be used when communicating
with those devices...unless of course the ENC28J60 is expecting a free
running SPI clock, they don't mention it that way, but I'd be suspicious of
it. Many times SPI clock is stopped completely when no comms are ongoing
and Figures 4-4 and 4-4 of the datasheet seem to imply that the clock is
expected to stop for this device as well.

As another "unintended case", an external clock input opens
the possibility to bit-bang data with some PC or uC.
I know it sounds stupid :)
Many times that's the most cost effective approach since the 'cost' is 4
general purpose I/O pins that are usually available. In this case though,
maintaining an 8 MHz

The CPU clock period and the desired SPI clock period are known
constants.
They are indicated in the datasheet of each individual product.
And there is no "SPI standard" contrary to I2C or others.
( http://en.wikipedia.org/wiki/Serial_Peripheral_Interface_Bus#Standards )
Yes, all the more freedom you have.

Some chips accept a falling CLK edge after CS goes low,
and some other chips don't (even chips by the same manufacturer vary).

So i have read the datasheets of the chips i want to interface,
and adapted the master interface to their various needs (and errata).
Sounds good.

Therefore one can create a counter that counts from 0 to Spi_Clock_Period
/ Cpu_Clock_Period - 1. When the counter is 0, set your Spi_Sclk output
signal to 1; when that counter reaches one half the max value (i.e.
"(Spi_Clock_Period / Cpu_Clock_Period/2") then set Spi_Sclk back to 0.
I have (more or less) that already, which is active when the interal
CPU clock is selected. This is used when booting the CPU soft core
from an external SPI EEPROM.

Note however that your version does not allow to use the CPU clock at full
speed,
what happens if you set your "max value" to "00000" ?
That's correct but I wouldn't set the max value to anything, it would be a
computed constant like this

constant Spi_Clks_Per_Cpu_Clk: positive range 2 to positive'high :=
Spi_Clk_Period / Cpu_Clk_Period.

Synthesis (and sim) would fail immediately if the two clock periods were the
same since that would result in 'Spi_Clks_Per_Cpu_Clk' coming out to be 1
which is outside of the defined range. Running SPI at the CPU speed is
rarely needed since the CPU typically runs much faster than the external SPI
bus. If that's not your case, then you've got a wimpy CPU, but in that
situation you wouldn't have a clock divider, and the data handling would be
done differently. This type of information though is generally known at
design time and is not some selectable option so if your CPU did run that
slow you wouldn't even bother to write code that would put in a divider so
your whole point of "what happens if you set your "max value" to "00000"" is
moot.

And it does not garantee
that the high and low levels have equal durations.
That's not usually a requirement either. If it is a requirement for some
particular application, then one can simply write a function to compute the
constant so that it comes out to be an even number. In the case of the
ENC28J60 the only specification (Table 16-6) on the SPI clock itself is that
it be in the range of DC to 20 MHz, with the errata then ammending that to
be 8 MHz min *while* writing to that device. It can still be at DC when not
accessing the device. In any case, there is no specific 'SPI clock high
time' or 'SPI clock low time' requirement for the device, so unless there is
some other errata there is no requirement for this device to have a 50% duty
cycle clock.

The point where the counter = 0 can also then be used to define the
'rising edge of Spi_Sclk' state. So any place where you'd like to use
"rising_edge(Spi_Sclk)" you would instead use "Counter = 0". The same
can be done for the falling edge of Spi_Sclk; that point would occur when
Counter = Spi_Clock_Period / Cpu_Clock_Period/2.

Every flop in the design then is synchronously clocked by the Cpu_Clock,
there are no other clock domains therefore no clock domain crossings.
The counter is used as a divider to signal internally for when things
have reached a particular state.

I understand that well, as this is how i started my first design iteration
I soon reached some inherent limitations, however.
I doubt those limitations were because of device requirements though...they
seem to be your own limitations. If not, then specify what those
limitations are. Just like with your previously mentioned "clocking
restrictions (also know as "errata")" comment I doubt that these
limitations are due to anything in the device requirements.

As the RTL code grows, the synthesizer infers more and more stuffs,
often not foreseen, which leads to bloat. Muxes everywhere,
and duplicated logic cells that are necessary to drive higher fanouts.
I guess that this is because I focused more on the "expression"
of my need than on the actual result (but I was careful anyway).
Don't write bloated code. Use the feedback you're seeing from running your
code through synthesis to sharpen your skills on how to write good
synthesizable code...there is no substitute for actual experience in gaining
knowledge.

my master SPI controller emits the clock itself (and resynchronises it)

No need for the master to resynchronize something that it generates
itself
(see my other post).

In fact, there IS a need to resynchronise the clock, even when
it is generated by the CPU, because of the divider.
No there isn't. Everything is clocked by the high speed clock (the CPU
clock I presume). The counter being at a specific count value is all one
needs to know in order to sample the data at the proper time. Since the
master generates the SPI clock from the counter there is no need for it to
then *use* the SPI clock in any fashion. You could 'choose' to do so, but
it is not a requirement, it would mainly depend on how you transfer the
receive data back to the CPU, but I suspect either method would work just
fine...but again, that doesn't make it a requirement.

Imagine (I'm picky here) that the CPU runs at 100MHz (my target)
and the slave at 100KHz (an imaginary old chip).
The data transfer is setup in the control register, then
the write to the data register triggers the transfer.
But this can happen at any time, whatever the value of the predivider's
counter.
So the clock output may be toggled the first time well below
the required setup time of the slave. That's a glitch.
So don't write such bad code for a design. There is no need for the clock
divider to be running when you're not transmitting. It should sit at 0
until the CPU write comes along, then it would step through a 1000+ CPU
clock cycle state machine, with the first few clocks used for setting up
timing of data relative to chip select and the start of SPI clock. Then
there are a few CPU clocks on the back end for shutting off the chip select
and then of course the 1000 CPU clocks needed in order to generate the 100
kHz SPI clock itself. Any time the counter is greater than 0, the SPI
controller must be telling the CPU interface to 'wait' while it completes

You should make sure your design works in the above scenario as a test case.

In this case, the solution is easy : reset the counter
whenever a transfer is requested. That's what i did too,
the first time.

but there is an even simpler solution : add a "clear" input condition
to the FF that are used to resynchronise the clocks as in
http://i.cmpnet.com/eedesign/2003/jun/mahmud3.jpg
so the next clock cycle will be well-formed, whether the
source is internal or external. The created delay is not an issue.
You haven't guarded against the CPU coming in and attempting to start a
second write while the first one is ongoing. You need a handshake on the
CPU side to insert wait states while the controller is spitting out the
bits. When you look at it in that perspective and design it correctly,
there will be no chance of any glitchy clocks or anything else. If you
don't have a 'wait' signal back to the CPU, then certainly you have an
interrupt that you can use to send back to the CPU to indicate that it
fouled up by writing too quickly...many possible solutions.

So i can also sample the incoming MISO bit on the same clock edge as
MOSI :
the time it takes for my clock signal to be output, transmitted,
received by the slave, trigger the shift, and come back, this is
well enough time for sample & hold.
See my other post for the details, but basically you're making this
harder
than it need be.
Though sometimes there needs to be something
a bit more than the "theoretically practically enough".
I've done this, it's not a theoretical exercise on my part either. It's not
that hard.

Since the master is generating the SPI clock it knows when
it is about to switch the SPI clock from low to high or from high to
low,
there is no need for it to detect the actual SPI clock edge, it simply
needs
to generate output data and sample input data at the point that
corresponds
to where it is going to be switching the SPI clock.
This is what I did in the first design iteration.

However, now, i avoid large single-clock processes
because there is less control over what the synthesiser does.
That makes no sense.

Finally, I have the impression that you misunderstood the initial post
about "SPI clocking".
The idea was that the SPI master "could" sample MISO with the same
(internal) clock signal
and edge that samples MOSI. The issue this would "solve" is when
capacitance
and propagation delays on the PCB, along with relatively high clock speed
(the 25AA1024 by Microchip goes up to 20MHz) delay the MISO signal
enough to miss the normal clock edge.
Your proposed solution wouldn't solve anything. If you have a highly loaded
MISO this means you have a lot of loads (or the master and slave are
faaaaaaaar apart on separate boards). It also likely means you have a
highly loaded SPI clock since each slave device needs a clock. You likely
won't be able to find a driver capable of switching the SPI clock so that it
is monotonic at each of the loads (which is a requirement) which will force
you to split SPI clock into multiple drivers just to handle the electrical
load at switching...but now you've changed the topology so that trying to
feed back SPI clock to somehow compensate for delays will not be correct.
Far easier to simply sample MISO a tick or two later. For example, using
the 100 MHz/100kHz example you mentioned, the half way point would be 500,
but there is nothing to say that you can't sample it at 501, 502 or
whatever, it doesn't matter.

Kevin Jennings
 
"Tricky" <Trickyhead@gmail.com> wrote in message
news:d6513c7f-63ae-4ef3-ab71-b657efcb3ab5@k7g2000hsd.googlegroups.com...
Modelsim appears to be working fine. The sensitivity list only applies
in simulation. The "problem" appears to be the else clause on the same
level as the clock, and the lack of the other signals in the
sensitivity list.
read_ver_E_reg : process(pci_lclk_i)
begin
if rising_edge(pci_lclk_i) then
if (main_rst_h = '0') and (LRD1WR0 = DIR_READ) and (reg_access_req
= '1') then
case laddr_latched(7 downto 0) is
when DAC_CTRL_OFFSET =
pci_data_out_p(7 downto 0) <= dac_ctrl_regval;
pci_data_out_p(63 downto 8) <= (others => '0');
when SYNCCTRL_OFFSET =
pci_data_out_p(10 downto 0) <= sync_regval_rb;
pci_data_out_p(63 downto 11)<= (others => '0');
when COREID_OFFSET =
pci_data_out_p(7 downto 0) <= X"02"; -- minor revision - 02
pci_data_out_p(23 downto 8) <= X"0100"; -- major revision -
100; T10_C1_image00_xx
pci_data_out_p(63 downto 24)<=(others => '0');
when others => pci_data_out_p(63 downto 0) <= (others => 'Z');
end case;
else
pci_data_out_p <= (others => 'Z'); -- do I need this?
end if;
end if;
end process;
An update: yes, finally removing that last else clause and getting rid of
some other junk left over from when I collapsed three of the vendor
structures into that case statement seems to work. Now I have to debate the
merits of fixing the vendor's code in some of the sub-modules to have the
same structure. Right now, the register calls from my code work in both
Modelsim and Synplify; the calls to the vendor's modules break in simulation
but work fine in the chip. For the life of me I can't figure out what
possible switch/library setting changed in my Modelsim setup to cause the
behavior to change between runs. For even when I used the configuration
repository to roll back to the precise same code base as I had used earlier,
it no longer worked and I hadn't (to my knowledge) messed with my Modelsim
setup. Sigh. Maybe systems engineering is easier after all.

-Marty
 
"Brian Drummond" <brian_drummond@btconnect.com> wrote in message
news:2j82c4d1t0nuqqmito1lp35qlgkgoqr347@4ax.com...
I just want to ask what drives "hold".
If (a) "hold" is used as an enable by more than one process, and (b)
"hold" is NOT driven by a FF or register clocked by "clk", you have a
problem, whichever coding style you use. (Different processes sample
"hold" at different times according to routing delays in the FPGA; if it
is asynchronous, they may see it at different levels in the same clock
cycle).

- Brian

Hi Brian,
I don't think it's a problem if 'hold' is driven by combinationial logic,
iff the combinatorial logic is driven by FFs in the domain of 'clk'. In this
case, the FPGA static timing analysis tools will alert the designer to any
timing issues.
Cheers, Syms.
 
thutt wrote:

All the I/O is actually constrained, but I have not done anything with
timing yet. I guess I'll try to check out information about timing.
Thanks for the info. Hopefully this will pan out.
I totally agree with what Sean said.

I recently ran into *exactly* the same type of problem, on a project based
on the "small" 100K gates 3E fpga.

Just assigning a signal or not to an output spare pin for debug purposes had
the power to make the entire design totally inusable. The SPI port
communicating to an external device used to crash within a few seconds after
power-on.

I've been instructed to add a "timing constraint" and now is seems (it
seems!) that the design is more stable and changing that assignment no
longer affects reliability. This is what I've added:

NET clk_pin TNM_NET = clk_ref_grp;
TIMESPEC TS01 = PERIOD : clk_ref_grp : 20.00 : PRIORITY 1; # 50.00 MHz

This was for the main 50MHz clock. "clk_pin" is the name of the net where
the 50MHz osc is attached. This did not bring any improvement. But when I
added another constraint, to the signal output from DCM (75MHz) then the
problem disappeared:

NET clk_pll TNM_NET = clk_ref_grp_pll;
TIMESPEC TS01 = PERIOD : clk_ref_grp_pll : 12.00 : PRIORITY 1; # 75.00 MHz

I also added this:
TIMESPEC TS11=FROM:pADS:TO:FFS : 30 ns;
TIMESPEC TS12=FROM:FFS:TO:pADS : 30 ns;

because it was included on a xilinx 3E-board example. I don't know if is
useful or not.

Ciao!
Alessandro


 

Welcome to EDABoard.com

Sponsor

Back
Top