Clock Edge notation

Ralf Hildebrandt · Mar 30, 2005

Mohammed A khader wrote:

Extend this vector by some zeros to the left to have a fractional
part.

Suppose y is the integer variable which can be represented in 8 bit
as <8.0> format, and 1/1.36 = 0.73529 is represented in 16 bits as
1.15> format. Then under such conditions multiplication can be done
between unsymmetrical formats i.e. format of <8.0> * <1.15> gives
9.15>. I think there is no need to pad the integer to represent
fractional part. Correct me if I am going wrong.

Do you mean

result <= integer_signal * unsigned_constant;

while unsigned_constant is in <9.15> format? Well - seems to be a good
option (I have no simulator at hand to check ist). I was thinking of
something like

result <= unsigned_signal * unsigned_constant;

and there adding a (zero) fractional part is nessecary. But your
solution seems to be more elegant.

Ralf

Bert Cuzeau · Mar 30, 2005

If you want to divide a 24 bits by a real (floating point) just with a
/, do you want the synthesizer to work this out, or do you need it only
in a test bench ?

In a test bench, no problem. Just write legal VHDL.
Like :
to_integer(unsigned(Sample)) / 2 (returns an integer) or :
real(to_integer(unsigned(sample))) / 1.122 which returns a real
etc...
unsigned() is a type conversion, to_integer() is a conversion.

For synthesis, the efficient method is different.
For example you could :
(to_integer(unsigned(sample)) * 7301 ) / 8192
Multiplying by a constant is easy to most synthesizers and efficiently
implemented (3 add/sub ?). Dividing by a power of 2 is trivial (no
hardware necessary).
For RTL, the solution is even smaller if you can spread the calculation
over 13 clock cycles (in the case above), so you will only need an
accumulator and a shiftregister.
This problem has a lot of solutions, well documented in many books.
Choose the one best suited to your needs and constraints.

Bert

info_ · Mar 30, 2005

Hi,

Not exactly a hardware-friendly code !

You write a huge barrel shifter without necessity -I think-.
Running that at clock speed (no multicycle, in case the synthesis
tool doesn't choke on this is probably not guarantied either.
You'd probably better arrange your line buffer as a shift register,
shifting 5 bits at a time.

LineBuffer <= UV (7 downto 3) & LineBuffer(LineBuffer'high downto 5);
This _will_ run at full speed.

It's not very good either to write a non static length for your
variable posittion assignment. At least, you should use only
one variable (MSB position e.g.) and use a constant for the
slice's nb of bits. I don't think the synthesis tool can "see"
that the slice length is in fact constant. You did hide that nicely.
But really I don't see (in this snippet) why using this huge barrel shifter.

I don't know your application, but maybe it's not most efficient
to store your line information under this form.
Memories have many advantages for storing arrays of data sequentially...

Bert Cuzeau

Andromodon wrote:

Hello. I am writing VHDL code to be synthesized on a Xilinx Spartan II
FPGA. For the past month, I've been beating my head against one
problem with synthesizing the latest version of my code. I am using
the Xilinx Webpack IDE V7.1.01i available for free from
http://www.xilinx.com/xlnx/xil_prodcat_landingpage.jsp?title=ISE+WebPack.
Synthesis usually takes about a minute with my previous code versions.
When I added a few lines about a buffer and references to that buffer,
and try to synthesize, the synthesis goes to 66% of completion
normally, but then stays there for about 15 minutes before spitting out
an out of memory error:

ERRORortability:3 - This Xilinx application has run out of memory or
has encountered a memory conflict. Current memory usage is 1954536 kb.
Memory problems may require a simple increase in available system
memory, or possibly a fix to the software or a special workaround. To
troubleshoot or remedy the problem, first: Try increasing your
system's RAM. Alternatively, you may try increasing your system's
virtual memory or swap space. If this does not fix the problem, please
try the following: Search the Answers Database at support.xilinx.com
to locate information on this error message. If neither of the above
resources produces an available solution, please use Web Support to
open a case with Xilinx Technical Support off of support.xilinx.com.
As it is likely that this may be an unforeseen problem, please be
prepared to submit relevant design files if necessary.
ERROR: XST failed
Process "Synthesize" did not complete.

I have stripped my program down to the simplest version that still
exhibits the problem:

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;

entity frameCapture is
port (
clk :in std_logic;
lineBuffer : out std_logic_vector(1780 downto 0); --will buffer most
significant 5 bits of each pixel on even lines.
UV: in std_logic_vector(7 downto 0)); --Digital output from
camera. UV bus.
end frameCapture;

architecture reg_transfer of frameCapture is

signal bufPosOne, bufPosTwo, bufPosThree : integer range 0 to 1785;
--are five more than they needs to be to prevent negative
numbers.
signal colCount : integer range 5 to 25 :=5; -- I picked these values
in this example so that bufPosTwo and bufPosOne are obviously
-- between 0 and 1785.
begin

bufPosTwo<= colCount*5+4;
bufPosOne<= colCount*5+0;

process (clk)
begin
if clk'event and clk = '1' then
if colCount<= 25 then
colCount <= colCount + 1;
else
colCount <= 5;
end if;
LineBuffer(bufPosTwo downto bufPosOne) <=UV (7 downto 3);
end if;
end process;

end reg_transfer;

-------
I have been trying to troubleshoot this problem for the past month, and
have a deadline soon aproaching, so I am in desperate need of help.
I'm even willing to compensate you for your time spent if you resolve
the problem. Is my bit of code doing anything illegal? Can you see
any reason why my code should make the ISE work so hard it runs out of
memory? From my experimentation, the problem does not have to do with
colCount, but the indexing of LineBuffer with bufPosTwo and bufPosOne.

Thanks for any help you can provide. It will be greatly appreciated!

Sincerely,
Andrew Doucette

Doug · Mar 31, 2005

Hal is spot on here.

Just multiply by K=0.7352941176. And how do you do that?

One way is to represent K by a 24 bit natural, let's call it Ki where
Ki = (2**24) * k = (2**24) / 1.36 = 12336188 (rounded off).

I will assume your 24 bit number is unsigned, let's call it G. You will
then end up with a 48 bit result for G.
R = G * Ki; -- R is 48 bits, G is 24 bits, and Ki is 24 bits.
G is a 24 bit integer with no fractional part.
Ki is a 24 bit fraction where 0xFFFFFF represents a value very
close to 1.0. Actually it ((2**24)-1)/(2**24) = 0.999999940395

The result, R, is a 48 bit number with 24 bit integer part and 24 bit
fractional part. You can drop the fractional part and retain the
24 MSBs and there you have it.

Doug

"Hal Murray" <hmurray@suespammers.org> wrote in message
news:LqWdnQIR1vJFgdffRVn-pw@megapath.net...

I am basically trying to divide a 24 bit vector by 1.36.(output result
eventually being a bit vector)

One trick is to multiply by the inverse.

--
The suespammers.org mail server is located in California. So are all my
other mailboxes. Please do not send unsolicited bulk e-mail or
unsolicited
commercial e-mail to my suespammers.org address or any of my other
addresses.
These are my opinions, not necessarily my employer's. I hate spam.

Jerry · Mar 31, 2005

report "Channel" & integer'image(J) & failed";

"dwerdna" <dwerdna@yahoo.com> wrote in message
news:1112155307.872777.8440@l41g2000cwc.googlegroups.com...

Hello all

I have this loop, but I cant work out to display the appropriate
channel to the screen when it fails..

The below example prints out the sentence as you see if (of course),
but I want the value of J. I've tried a few things, similar to when
you write values to a file, but havent been able to work it out

Thanks

Andrew

for J in 0 to 3 loop
if (exp_analog_channel(J) /= tb_channel_out(J)) then
assert false
report "Channel J failed " severity note; -- how do I get 'J' to
show value??
end if;
end loop;

Jonathan Bromley · Mar 31, 2005

On 30 Mar 2005 21:44:06 -0800, vizziee@gmail.com wrote:

I am looking for a concurrent implementation of merge-sort wherein two
sorted arrays (with same number of elements) are to be merged in a
single sorted array.

If by "concurrent" you mean single-cycle with no iteration, then
I think you need rather a lot of comparators and other logic.

I tried using generate statement to form an insertion vector for final
implementation. But it is not working.

.... which doesn't help us much ... what does "not working" mean?

Is it possible to use generate statement to form two arrays which use
(i-1)th element values of each other (ie array 1 uses array2 values and
vice versa) to compute ith element value?

Yes, definitely. Especially given that the two arrays are the same
size. Why not just iterate one generate loop over a common
array subscript? Maybe you would need to use a nested
generate-if to deal with any special cases at the beginning
and end of the arrays.

If anyone has inputs on merge-sort (parallel implementation) in VHDL,
plz. help. I have written a skeletal code (but is defective with the
problems I just discussed) and will be interested in discussing that.

How about this.... It's parallel, sure enough. The "for" loop
with its embedded incrementers and comparators will synthesise
to a frighteningly large amount of logic - I just did a very
rough trial synthesis run, targeting Virtex-2, and got 605 LUTs
and a delay of 56ns for the default-sized design (signed 8-bit
inputs, 5 elements per array). However, I guess you don't want
an O(2) implementation of an O(1) algorithm, do you?

Tell us what the architecture of your merge-sorter will be,
and perhaps we can guide you to a suitable VHDL implementation.
Meanwhile, you could at least use my code as a reference
model, to help you build a test fixture.

I am aware of an architecture that, for input arrays of N elements
each, needs O(1) magnitude comparators and O(2) multiplexers; it
exhibits O(1) propagation delay and would be fairly easy to
code in VHDL. I wonder if you have a better idea? I am
disappointed that I can't think of a strictly O(1) implementation.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

package sorter_types is
subtype word is integer range -128 to 127; -- signed byte
type word_array is array (natural range <&gt

of word;
end;

use work.sorter_types.all;
entity merge_sort is
generic (input_size: positive := 5);
port (
inA, inB: in word_array(0 to input_size-1);
sorted : out word_array(0 to 2*input_size-1)
);
end;

architecture Naive of merge_sort is
begin
process (inA, inB)
variable indexA, indexB: natural range 0 to input_size;
variable pickB: boolean;
begin

indexA := 0;
indexB := 0;

for i in sorted'range loop

-- Decide which list to pick from...
pickB := FALSE;
if (indexA = input_size) then
pickB := TRUE;
elsif (indexB /= input_size) then
pickB := inA(indexA) > inB(indexB);
end if;

-- Pick
if pickB then
sorted(i) <= inB(indexB);
indexB := indexB + 1;
else
sorted(i) <= inA(indexA);
indexA := indexA + 1;
end if;

end loop;

end process;

end;

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL, Verilog, SystemC, Perl, Tcl/Tk, Verification, Project Services

Doulos Ltd. Church Hatch, 22 Market Place, Ringwood, BH24 1AW, UK
Tel: +44 (0)1425 471223 mail:jonathan.bromley@doulos.com
Fax: +44 (0)1425 471573 Web: http://www.doulos.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

Jonathan Bromley · Mar 31, 2005

On 31 Mar 2005 06:01:41 -0800, nisheethg@gmail.com (Nisheeth) wrote:

my project demands to do
(84 bit no * 42 bit no)
42 BIT NO * 42 BIT no)
comparing 126 bit nos..
i dont understand how to go about it....

Imagine a constant F = 2^42. Then your 84-bit number
can be represented as (F*A)+B, where A is the upper 42
bits and B is the lower 42 bits. If the other 42-bit
number is C, then you want

result = ((F*A)+B) * C
= F*A*C + B*C

In other words:
- get B*C, an 84-bit result
- get A*C, an 84-bit result
- add the upper 42 bits of B*C to A*C to get the top 84
bits of result
- use the lower 42 bits of B*C as the bottom 42 bits of
result

It's just long multiplication, same as we learnt at school.

ip core supports maximum 64 bit no * 64 bit no.

So - don't use the IP core. Hell, it's only a multiplier.
What about your synthesis tool? Try feeding the calculation
directly to it. The tool should infer a suitable collection
of multipliers and adders for you. For extra credit,
think about pipelining.
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL, Verilog, SystemC, Perl, Tcl/Tk, Verification, Project Services

Doulos Ltd. Church Hatch, 22 Market Place, Ringwood, BH24 1AW, UK
Tel: +44 (0)1425 471223 mail:jonathan.bromley@doulos.com
Fax: +44 (0)1425 471573 Web: http://www.doulos.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

Kai Harrekilde-Petersen · Mar 31, 2005

"Divyang M" <divyangm@gmail.com> writes:

Hi,
I am looking for some insight into how I can go about pipelining my
system.

The system is an image interpolator which contains a buffer (on-chip
dual-port RAM) and an interpolator block.
The buffer stores incoming data (real-time say at X MHz). The
interpolator requires 4 data elements from this buffer to produce 1
output.

To keep the systems real-time, I am running my system at X MHz and also
write to the buffer at this rate. But read the data out of the buffer
at 4X MHz so that at each clock cycle I have all the 4 data elements
that the intepolator unit needs. This has limited X to 50 MHz because
the internal block RAM max out at 200 MHz (well, according to Altera it
can go upto 287 MHz but I hit the wall at some point or another).

I was wondering if there is a way to pipeline the design so that I can
run the whole system at a single clock frequency but still not have a
huge backlog of data accumulation (since there is finite amount of
on-chip storage) or if there are examples of such in any books?

If you have the storage available, write to four RAMs in parallel and
read 1 element from each RAM per clock cycle.
Other combinations are also possible.

Regards,

Kai
--
Kai Harrekilde-Petersen <khp(at)harrekilde(dot)dk>

info_ · Mar 31, 2005

Are the four element consecutive ?
Then it's a simple pipelined FIR, and it runs at full speed,
producing data at X Ms/s with a X MHz clock.

This become a little bit more interesting when you come to 2D
correlation

with 8 "adjacent" pixels spread over 3 lines.

Bert Cuzeau

Divyang M wrote:

Hello Kai,

I do not have that much storage space to store the data 4 times since
my data is a 240x320 gray level (8-bit/pixel) image.

The other option I was thinking of is to delay the first of the four
output by 4 cycles (registers), the second by 3 cycles, the third
output by 2 cycles, and the fourth output by 1 cycle. These four
points will then be aligned, but if I use this strategy, then I get a
valid output out of my system once every 4 cycles, so a throughput of
0.25 (if I'm using the definition of throughput correctly). But I would
ideally like the throughput to be 1.

Any others suggestions you have would be welcome.

Thanks,
Divyang M

Ben Twijnstra · Mar 31, 2005

Hi Divyang,

Is there some scrambled addressing way that you can write the data so that
the words you need to read are always consecutive? That way you could set
your write port width to 8, and your read port width to 32.

If this is possible, your write bandwidth would be limited to the Tpd of the
logic that sets up the write address, but at least you will be able to
clock both ends at write speed.

Best regards,

Ben

info_ · Mar 31, 2005

I think you just need to pipeline almost one line (N-2 depth), and pipeline it on two
FF stages at the entrance and the exit. So at a single clock cycle, you would have
your four elements available for combining, every clock cyle.

The big pipeline fits nicely in a dual port memory used as circular buffer,
or simply a ready-made Fifo.

Assuming the pixels per line is n+1 :

FF -> FF -> [[[[[ Fifo(n-2) ]]]]] -> FF -> FF
D2n D2n-1 D2n-2 ... D21 D20 D1n D1n-1

so you have D1n, D1n-1, D2n, D2n-1 available at the same time.
Just be careful at the edges...

What can we not do with pipelining

?

It's late here, I hope I didn't goof.

Bert

Divyang M wrote:

Hi Bert,

The four elements are not consecutive. I am actually working with an
image. So the four elements are essentially 2 elements each in 2 rows,
something like A11, A12, A21, A22.

But these can be any 4 "adjacent" elements in the image, so they do not
go by a particular order either.

Thanks,
Divyang

Ben Twijnstra · Mar 31, 2005

Hi Divyang M,

The four elements are not consecutive. I am actually working with an
image. So the four elements are essentially 2 elements each in 2 rows,
something like A11, A12, A21, A22.

But these can be any 4 "adjacent" elements in the image, so they do not
go by a particular order either.

I once wrote a linear interpolation algorithm for a Bayer-matrix camera
interface for one of my customers. I can't publish the code due to legal
reasons, but the idea is as follows:

There is no frame buffer. You use an FSM to capture video into one of four
line buffers when writing. Thus, three buffers are available for reading,
and the fourth is being updated.

Thus, assuming that your destination pixel (xd,yd) is a function of one or
two (x+d,y-1), one or two (x+d,y) and one or two (x+d, y+1) pixels (where d
is some arbitrary X distance) you can read pixels at write speed from the
three line buffers.

In my case, the worst case computation was
p(x-1,y-1)*C1+p(x+1,y-1)*C2+p(x-1,y+1)*C3+p(x+1,y+1)*C4, which would
normally cause 4 fetches from a frame buffer, 2*2 fetches using the
abovementioned technique, or simply 2*1 fetch if you store the x-1 and x
fetches in 8-bit 'cache' registers.

In this case I could simply work from left to right, so I used 6*8 DFFs to
hold

(x-1, y-1), (x, y-1)
(x-1, y) , (x, y )
(x-1, y+1), (x, y+1)

and compute my formula by setting the address on the line buffers to x+1.
The data outputs would then yield

(x+1, y-1)
(x+1, y )
(x+1, y+1)

giving me all the necessary date to compute RGB values for every pixel in
the Bayer pattern, or basically any 3x3 Fourier kernel on grayscale data.

The whole idea yields a 3-scanline delay between input and output, but this
way you can run at high clock speeds and be creative about the
interpolation method.

Best regards,

Ben

David Bishop · Apr 1, 2005

genlock wrote:

Hi,

Is there a way to divide an integer by a real number(decimal number).

Can we simply use the operator '/' as follows:
eg: a <= 1234/ 1.36;
where we define 'a' as an integer.

Are there any specific libraries to be included for such a VHDL design
file.

If so, should these libraries be included in a particular order?

The fixed point packages can deal with this.

http://www.eda.org/vhdl-200x/vhdl-200x-ft/packages/files.html
get "fixed_pkg.vhd" and "fixed_pkg_body.vhd".

Write it this way:

signal b : ufixed (15 downto 0);
signal c : ufixed (0 downto -10);
signal a : ufixed (26 downto -1);
begin
b <= to_ufixed (1234, b'high, b'low);
c <= to_ufixed (1.36, c'high, c'low);
a <= b / c;

Should synthesize OK of your tool can deal with an unsigned divide.

David Bishop · Apr 1, 2005

Nisheeth wrote:

hello
i have been tryin to use the fp library available here...
http://perso.ens-lyon.fr/jeremie.detrey/FPLibrary/

I am not able to properly use the library...
there r so many files in it...is there any easy way to add them ?
what do we mean by precompiling the library ? what exactly do i have
to compile in the library ?

plz if someone could d/l the library and let me know step by step how
to go about using it in A PROPER WAY...though help is given with
library, i found it insufficient for beginner like me...

Try the vhdl-200x ones.

http://www.eda.org/vhdl-200x/vhdl-200x-ft/packages/files.html

download:
fphdl_base_pkg.vhd
fphdl_base_pkg_body.vhd
There is a dependancy on:
fixed_pkg.vhd
fixed_pkg_body.vhd

Bert Cuzeau · Apr 1, 2005

Who said arithmetic was difficut in an FPGA ;-)

It's just an example. It can be further optimized.
It's a parallel full speed solution...

-- Divide a 24 bits unsigned by 1.122
-- Author : Bert Cuzeau
-- not overly optimized (yet under 150 LCs of plain logic)

LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
USE ieee.numeric_std.ALL;

-- ---------------------------------------
Entity DIVBYR is -- Divide a 24 bits by 1.122
-- ---------------------------------------
Port ( Clk : In std_logic; -- Main System Clock
Rst : In std_logic; -- Asynchronous reset, active high
D : in unsigned (23 downto 0); -- use std_logic_vector !
Q : out unsigned (23 downto 0) -- use std_logic_vector !
); --
end;

-- ---------------------------------------
Architecture RTL of DIVBYR is
-- ---------------------------------------
begin

process (Rst,Clk)
begin
if Rst='1' then
Q <= (others=>'0');
elsif rising_edge (Clk) then
Q <= to_unsigned( (to_integer(D) * 7301 / 8192 ),24);
end if;
end process;

end RTL;

Egbert Molenkamp · Apr 1, 2005

"Taras_96" <tagas96@hotmail.com> wrote in message
news:1112339672.440271.240930@o13g2000cwo.googlegroups.com...

Hi everyone

I've just started to learn VHDL, for the purpose of synthesising the
code onto an FPGA. I have previously worked with synthesisable Verilog
at RTL, and am trying to get my head around a couple of the mechanisms
VHDL offers. Warning: the following post is a bit long, but I tried to
make myself as clear as possible. My uncertainty involves race
conditions, and how VHDL handles signal assignments inside processes.

VHDL is rather easy

Processes can only communicate with each other with signals (I forget about
shared variables here).

One thing to remember "signals are never updated immediatly"! Simulation
performs to successive steps:
1. execution of the processes (with the frozen signals values!)
2. update of signals values (if no explicit delay is given, this delay is
often called delta delay)
In this way you won't have the order dependent behaviour.

your example:

process (clock)
begin
if clock'event and clock = '1' then
b<=c;
end if;
end process;

process (clock)
begin
if clock'event and clock = '1' then
a<=b;
end if;
end process;

This is exactly the same as:
process (clock)
begin
if clock'event and clock = '1' then
a<=b;
b<=c;
end if;
end process;

And this is exactly the same as (notice the reordening).:
process (clock)
begin
if clock'event and clock = '1' then
b<=c;
a<=b;
end if;
end process;

In all these cases your synthesis tool will realize this with two flipflops;
c is the input of the first FF en the output (b) is connected with de second
FF en that output is equal to a.
(Simulation shows you the same result).

And variables? Variables may only be declared in the sequential part of
language (in a process, function, procedure) therefore a variable can never
be used to communicate between concurrent statements. Variables are always
updated immediatly. If the value of a variable is needed in the next clock
cycle a FF is inserted by the synthesis tool.
process (clock)
variable b : std_logic;
begin
if clock'event and clock = '1' then
a<=b;
b:=c;
end if;
end process;
In this example if "a <= b" is executed the variable b is not assigned a new
value. So the tool will insert a FF to remember the old value.
Result is again two FF (as above).

process (clock)
variable b : std_logic;
begin
if clock'event and clock = '1' then
b:=c;
a<=b;
end if;
end process;
And this? Well 'b' is updated immediatly. That new value is assigned to 'a'.
And after a delta delay 'a' is updated. This results in only 1 FF
c is input and a is output.

Hope this helps,
Egbert Molenkamp

Jonathan Bromley · Apr 1, 2005

On 31 Mar 2005 23:14:32 -0800, "Taras_96" <tagas96@hotmail.com> wrote:

I've just started to learn VHDL, for the purpose of synthesising the
code onto an FPGA. I have previously worked with synthesisable Verilog
at RTL, and am trying to get my head around a couple of the mechanisms
VHDL offers.

Luckily you have a pretty good insight into the way Verilog does it,
so you should have absolutely no trouble understanding what VHDL does.

A Verilog Example.

always @ (posedge sysclk)
begin
b<=c
end

//other code....

always @ (posedge sysclk)
begin
a<=b
end

The type of assignment above is called 'non-blocking' assignment.

Indeed. Signal assignment in VHDL behaves somewhat like nonblocking
assignment in Verilog, although there are important differences
(which I'll try to outline later). The equivalent VHDL is
race-free in exactly the same way that NBA assures freedom from
races in Verilog.

[snip commentary on Verilog mechanisms]

always @ (posedge sysclk)
a = b;

always @ (posedge sysclk)
b = c;

How does VHDL protect against such race conditions?

It doesn't have to. VHDL has a perfectly good blocking
assignment mechanism: the := assignment to variables.
However, in VHDL a variable is declared, and visible,
locally to a process and cannot be accessed in any
other process. Consequently, the race you coded
in Verilog cannot occur in VHDL. (There's a caveat:
To allow for certain types of software description,
VHDL has the concept of "shared variables". These
can, indeed, be shared among processes. However,
they are not synthesisable and don't concern us here.)

On a similar topic, how do you model pipelines in VHDL?
Perhaps something like the following code...

process (clock)
begin
if clock'event and clock = '1' then
Please use the much more readable

"if rising_edge(clock) then"

b<=c
end if;
end process

process (clock)
begin
if clock'event and clock = '1' then
a<=b
end if;
end process

I'm not sure what would happen here

Two flip-flops, no races - a pipeline.

As I understand it, if a process gets triggered, and a signal
assignment is made within the statement, a *transaction* is added to
the queue of the signal that was assigned to. The actual change of the
variable (Rushton describes this as "the point where a transaction
becomes due on a signal, that signal becomes active") occurs when the
process execution phase is finished; it occurs at the beginning of the
event processing phase (and this assignment can cause new processes to
be triggered).

How does the queueing mechanism (of the queued transactions) work?
Does it queue an assignment to take place, using the value of the RHS
at the time the signal assignment was come across

Yes. Just like nonblocking assignment in Verilog.

If the
[sensible, VHDL]
option of queueing I mentioned is actually the one used,
the pipeline example would work ('a' would get the 'old' value of 'b')
if *all processes that were triggered in the same delta time period run
to completion*. This would mean that during process execution using
the example above, 'b' would get *scheduled* to be assigned the value
of 'c', and 'a' would be *scheduled* to be assigned the (old) value of
'b'; these assignments will take place in the next event processing
cycle. Is this right? Is this what happens when multiple processes are
triggered at the same time?

Yes.

What would happen in this case?
[two missing semicolons added]
process (clock)
begin
if clock'event and clock = '1' then
b<=c;
a<=b;
end if;
end process

Would it result in similar behaviour to the above case? (Intuitively I
think it should). However, if the first queueing algorithm was used
(where signals get assigned 'old' values - the value of the variable
when it's assignment was encountered in the process execution stage),
wouldn't this result in non-sequential assignment inside the process
block ('b' would be assigned the 'old' value of 'c', 'a' will be
assigned the 'old' value of 'b' - if we switch the order of these
signal assignments around, then the result will be the same, thus the
assignments aren't sequential)?

Yes and no. The *results* would certainly be the same. The key is
that you are making nonblocking assignment ("signal assignment" in
VHDL-speak) to two DIFFERENT signals, so it doesn't matter in which
order they are executed. Consider, however, the following...

process (clock) begin
if rising_edge(clock) then
a <= '0';
if (some_complicated_condition) then
a <= '1';
end if;
end if;
end process;

The statement "a <= '0'" puts a transaction on to a's queue
for one delta cycle into the future; this transaction will
assign '0' to a. However, if the condition is true, the
second assignment statement "a <= '1'" will also execute,
and it will put a different transaction on to a's queue,
also for one delta cycle into the future.

If you did this in Verilog, both scheduled assignments would
be present on the nonblocking assignment (NBA) queue, and
both would be executed at the moment when NBAs take effect;
however, the Verilog NBA queue preserves the order in which
assignments were executed, and so the '1' assignment would
happen *after* the '0' assignment, and would therefore "win".
In principle there is a zero-width glitch on "a", but in
practice this is unimportant.

In VHDL, the mechanism is somewhat different but (for all
synthesisable code) it has the same effect. Signal assignment
statements place their update transactions on to the queue
using "inertial" delay. In effect, the second assignment
of '1' to "a" will delete all other transactions from the
queue, so that ONLY the later assignment will take effect.
NOTE: this description is somewhat over-simplified, but
for synthesisable code where you always make assignments
with no delay, it is entirely adequate.

Now I've confused myself

No need; VHDL's model is far simpler and less confusing
than Verilog's.

Coming from an RTL for synthesis point of view, is there any
application of the process (who's main feature, as I see it, is that
assignments are done sequentially rather than concurrently), other than
for the purpose of modelling a flip-flop? I seem to remember a book
saying that "processes are concurrent statements" - does this just mean
the position of processes in the code doesn't matter?

Finally, is there a use of variables in synthesisable VHDL at RTL?

Many questions, same answer: In processes you can write sequential
code. This brings two major benefits:

(1) you can assign something to a signal, and then change your
mind about the assignment, later in the process. In many
realistic situations this is very convenient.
(2) you can do complicated sequences of calculations, typically
using variables to store the intermediate results so that
they can be used immediately by code later in the process,
and then assign the results of those calculations to a signal.

You are of course right to say that "the position of processes in
the code doesn't matter", in just the same way that the ordering
of "always" blocks in a Verilog module is irrelevant.

Note that VHDL does not possess Verilog's distinction between
nets and registers. You can make continuous assignments in
VHDL just as in Verilog...

Verilog: assign some_wire = some_expression;
VHDL: some_signal <= some_expression;

but in VHDL this is EXACTLY equivalent to the process...

process (all_signals_used_in_expression)
begin
some_signal <= some_expression;
end process;

And there is a big difference between VHDL and Verilog
concerning what happens when two processes write to the
same signal at different times: in Verilog the most
recent update takes effect and all others are ignored;
in VHDL each process independently updates its own
driver on the signal, and the values of all drivers on
the signal are then "resolved" in rather the same way
that the combined results of all continuous assignments
to the same net are resolved in Verilog.

Hope this helps.
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL, Verilog, SystemC, Perl, Tcl/Tk, Verification, Project Services

Doulos Ltd. Church Hatch, 22 Market Place, Ringwood, BH24 1AW, UK
Tel: +44 (0)1425 471223 mail:jonathan.bromley@doulos.com
Fax: +44 (0)1425 471573 Web: http://www.doulos.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

Jonathan Bromley · Apr 1, 2005

On 31 Mar 2005 23:20:37 -0800, "Taras_96" <tagas96@hotmail.com> wrote:

From what I've read about VHDL, concurrent seems to be a bad
description.

No; it's very useful for describing small pieces of combinational
logic.

One might be tempted to interpret the concurrent
assignment:

a<=b
c<=a

As at the same time a gets assigned the value of b, and c gets assigned
the OLD value of a.

No. A concurrent assignment is exactly a process, sensitive to
all signals used in its right-hand-side expression. Consequently,
the sequence of activity is....

* when signal 'b' changes, the assignment "a<=b" is triggered
* this schedules an update on 'a' for one delta cycle later
* when signal 'a' updates, the second assignment is triggered
* this schedules an update on 'c', using the NEW value of 'a'
of course, for one delta cycle later

Would a better way of describing concurrent
assignment be "the order of assignment doesn't matter" - this is
because of the event processing/process execution cycles, right?

That's a useful rule of thumb.

I'm pretty sure that concurrent assignment from a synthesisable VHDL
point of view just represents how signals will be 'wired'.

a<=b
c<=a

just describes the fact that b is wired to a, and a is wired to c:

Correct. Note, however, that each concurrent assignment introduces
a delta-cycle delay (roughly the same as the delay between executing
a nonblocking assignment in Verilog and its signal being updated).
By contrast, the Verilog continuous assignment effectively uses
blocking assignment and introduces no delay whatsoever (unless
you specify one, of course). This is VERY important if you try
to create gated clocks.
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL, Verilog, SystemC, Perl, Tcl/Tk, Verification, Project Services

Doulos Ltd. Church Hatch, 22 Market Place, Ringwood, BH24 1AW, UK
Tel: +44 (0)1425 471223 mail:jonathan.bromley@doulos.com
Fax: +44 (0)1425 471573 Web: http://www.doulos.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

Ralf Hildebrandt · Apr 1, 2005

Egbert Molenkamp wrote:

[some good examples]

Let me add a "rule for the beginner":
Verilog blocking signal assignment ("=") is similar to VHDL variables,
while non-blockling signal assignment ("<=") is similar to VHDL signals.

Ralf

Subroto Datta · Apr 1, 2005

When you diuble click on a symbol in the bdf the behavior is to open the
design file that was used as part of the compilation. In this case it was
the file with the vhd extension (as a vhd extension has a higher priority
than a bdf extension). If you want to open the bdf instead do a File Open,
traverse to the directory which has te bdf file and open it.

If you want the bdf file to be used for compilation, add the bdf file
explicitly to the file list.

Hope this helps,
Subroto Datta
Altera Corp.

"ALuPin" <ALuPin@web.de> wrote in message
news:b8a9a7b0.0504010346.7f9b3bb0@posting.google.com...

Hi,

in QuartusII v4.2 SP1
I have the following problem:

My top level file is a schematic file called
"H_top.bdf"

In this top level file I have several symbols, for example
"cache_search.bsf".

The component cache_search is also a schematic file from which
I created a .vhd file in Quartus
so that I have one "cache_search.bdf" and one "cache_search.vhd"
which are of course identical.

Now in my top level file I include under PROJECT --> ADD/REMOVE FILES
IN PROJECT the file "H_top.bdf"
Under USER LIBRARIES I include the directory in which the
"cache_search.bdf"
and "cache_search.vhd" are included.

When I click on the symbol "cache_search.bsf" in the top level
schematic "H_top.bdf" file
the "cache_search.vhd" is opened.
I would like to choose whether to look at the "cache_search.vhd" or
the
"cache_search.bdf" ...

How can I do that ?

Any suggestions are appreciated.

Rgds
André

Clock Edge notation

Ralf Hildebrandt

Guest

Bert Cuzeau

Guest

info_

Guest

Doug

Guest

Jerry

Guest

Jonathan Bromley

Guest

Jonathan Bromley

Guest

Kai Harrekilde-Petersen

Guest

info_

Guest

Ben Twijnstra

Guest

info_

Guest

Ben Twijnstra

Guest

David Bishop

Guest

David Bishop

Guest

Bert Cuzeau

Guest

Egbert Molenkamp

Guest

Jonathan Bromley

Guest

Jonathan Bromley

Guest

Ralf Hildebrandt

Guest

Subroto Datta

Guest

Log in

Welcome to EDABoard.com

Sponsor