Criticism requested of Verilog multiplier

James Harris · Sep 29, 2010

I've made a couple of attempts to write a multiplier to operate using
a technique presented in a Berkeley course video. The first version
did work (under Icarus) but it was rather ad-hoc and I hadn't really
learned anything from it. After a lot of head scratching trying to
guess how different constructs might (ought to?) be translated to
hardware I felt I could make it fully synchronous and the result is
below.

This stuff is fairly new to me I'd very much like to get some feedback
on the code. I can't change the basic technique which was part of the
problem definition but could there be any issue with getting the
circuit to synthesize and could it be made faster or less power
hungry? Is anything anomalous and could the design be made more
"normal"? Based on info in a separate CPU design course I've tried to
ensure that

* the combinatorial signal paths between registers are short and
balanced
* the fundamental add and shift operations can happen together in a
single cycle
* the counter decrement can happen in parallel with the addition.
These, I think, should be the slowest two operations as they are the
only ones with information to propagate between bit positions.

The basic idea for this multiplier mark 2 is that all the necessary
combinatorial signals are generated by continuous assignments and are
simply latched into registers on each leading clock edge - so all I
have to wait for between clocks is the combinatorial logic to settle.

Well, here's the code. *All* criticisms and suggestions welcome.

//
// Muliplier for N-bit operands, 2N-bit product
//

module mult2(product, finished, clock, input_a, input_b, reset);

parameter BITS = 32;
localparam LOG2BITS = $clog2(BITS + 1); // (May not need the + 1)

output reg [BITS * 2 - 1 : 0] product;
output reg finished;
input [BITS - 1 : 0] input_a, input_b;
input reset;
input clock;

reg [BITS - 1 : 0] multiplicand;
reg [LOG2BITS : 0] counter; // floor(log2(BITS)) + 1 in size

wire [BITS - 1 : 0] proxy = product[0] ? multiplicand : 0;
wire [BITS : 0] sum = product[BITS * 2 - 1 : BITS] + proxy; // NB. N
+ 1 bits

always @(posedge clock) begin
// $write("%b %b %b %d %b\n", clock, reset, finished, counter,
product);
if (reset == 1'b1) begin
multiplicand <= input_a;
product <= { 1'b0, input_b }; // Zero extend input_b
counter <= BITS;
finished <= 0;
end
else if (counter != 0) begin
product <= { sum, product[BITS - 1 : 1] }; // Add and shift
counter <= counter - 1;
end
else /* Counter is zero */ begin
finished <= 1;
end
end
endmodule

That's it. Hit me!

James

gabor · Sep 30, 2010

On Sep 29, 3:19 pm, James Harris <james.harri...@googlemail.com>
wrote:

I've made a couple of attempts to write a multiplier to operate using
a technique presented in a Berkeley course video. The first version
did work (under Icarus) but it was rather ad-hoc and I hadn't really
learned anything from it. After a lot of head scratching trying to
guess how different constructs might (ought to?) be translated to
hardware I felt I could make it fully synchronous and the result is
below.

This stuff is fairly new to me I'd very much like to get some feedback
on the code. I can't change the basic technique which was part of the
problem definition but could there be any issue with getting the
circuit to synthesize and could it be made faster or less power
hungry? Is anything anomalous and could the design be made more
"normal"? Based on info in a separate CPU design course I've tried to
ensure that

* the combinatorial signal paths between registers are short and
balanced
* the fundamental add and shift operations can happen together in a
single cycle
* the counter decrement can happen in parallel with the addition.
These, I think, should be the slowest two operations as they are the
only ones with information to propagate between bit positions.

The basic idea for this multiplier mark 2 is that all the necessary
combinatorial signals are generated by continuous assignments and are
simply latched into registers on each leading clock edge - so all I
have to wait for between clocks is the combinatorial logic to settle.

Well, here's the code. *All* criticisms and suggestions welcome.

//
// Muliplier for N-bit operands, 2N-bit product
//

module mult2(product, finished, clock, input_a, input_b, reset);

parameter BITS = 32;
localparam LOG2BITS = $clog2(BITS + 1); // (May not need the + 1)

output reg [BITS * 2 - 1 : 0] product;
output reg finished;
input [BITS - 1 : 0] input_a, input_b;
input reset;
input clock;

reg [BITS - 1 : 0] multiplicand;
reg [LOG2BITS : 0] counter; // floor(log2(BITS)) + 1 in size

wire [BITS - 1 : 0] proxy = product[0] ? multiplicand : 0;
wire [BITS : 0] sum = product[BITS * 2 - 1 : BITS] + proxy; // NB. N
+ 1 bits

always @(posedge clock) begin
// $write("%b %b %b %d %b\n", clock, reset, finished, counter,
product);
if (reset == 1'b1) begin
multiplicand <= input_a;
product <= { 1'b0, input_b }; // Zero extend input_b
counter <= BITS;
finished <= 0;
end
else if (counter != 0) begin
product <= { sum, product[BITS - 1 : 1] }; // Add and shift
counter <= counter - 1;
end
else /* Counter is zero */ begin
finished <= 1;
end
end
endmodule

That's it. Hit me!

James

I guess that criticism really depends on what you're trying to
do with this multiplier. It is not pipelined, so you can't
start a new multiplication until the previous one is finished.
Nowadays I tend to use FPGA's with parallel multipliers in
them, so even if I wanted to do a multiplication once a
minute I'd just use them (if any are left) and then I only
need to write something like:

assign product = input_a * input_b;

A couple other points:

I usually think of an input called "reset" as something
you apply when the system powers on to get everything
into a known state. Your "reset" is what I would consider
a "start" signal, it gets a new multiplication started.

I don't see how separating the combinatorial logic from
the sequential (clocked) always block changes the logic
that will be synthesized. In your case it probably
helps the readability since you are first doing a
sum and then a bit concatenation, and that might look
clunky in a single statement. However in the end it's
still an adder followed by a D register.

Anyway the code seems very clean and easy to read, and
nothing really esoteric other than the log2 business.

Regards,
Gabor

James Harris · Oct 1, 2010

On 30 Sep, 17:49, gabor <ga...@alacron.com> wrote:

....

I guess that criticism really depends on what you're trying to
do with this multiplier.

It was essentially a response to a Berkeley class assignment so was
expected to be done a certain way. I should say I'm not at Berkeley. I
was working from the 2003 course video on their web site. If anyone
else is interested it is CS152 at

http://webcast.berkeley.edu/courses.php?semesterid=16

It is not pipelined, so you can't
start a new multiplication until the previous one is finished.
Nowadays I tend to use FPGA's with parallel multipliers in
them, so even if I wanted to do a multiplication once a
minute I'd just use them (if any are left) and then I only
need to write something like:

assign product = input_a * input_b;

Neither was an option in this case but I see the point. I took it that
there wasn't even an option to shift multiple bits at a time which
could have made a big difference to the speed.

In general is it fair to say that synthesis tools cab be relied upon
to choose faster or less power hungry operations? For example, if I
wanted a fast adder is it better to code one from gates or just to
code an addition operation and trust the synthesis tools to use or
make a fast adder rather than something with a ripple-carry?

A couple other points:

I usually think of an input called "reset" as something
you apply when the system powers on to get everything
into a known state. Your "reset" is what I would consider
a "start" signal, it gets a new multiplication started.

OK, I'll bear that in mind.

I don't see how separating the combinatorial logic from
the sequential (clocked) always block changes the logic
that will be synthesized. In your case it probably
helps the readability since you are first doing a
sum and then a bit concatenation, and that might look
clunky in a single statement. However in the end it's
still an adder followed by a D register.

Understood. I've been unsure what hardware will be generated. Now I'm
on to using the Xilinx tools I see the they report what devices have
been recognised and I've just found that their XST manual says what
constructs the software looks for to synthesize certain devices. I may
try another version putting everything in the clocked logic.

Anyway the code seems very clean and easy to read, and
nothing really esoteric other than the log2 business.

Good to hear.

Thanks for the feedback.

James

gabor · Oct 1, 2010

....

In general is it fair to say that synthesis tools cab be relied upon
to choose faster or less power hungry operations? For example, if I
wanted a fast adder is it better to code one from gates or just to
code an addition operation and trust the synthesis tools to use or
make a fast adder rather than something with a ripple-carry?

Most modern synthesis tools are very good at doing
standard structures like adders, and in fact unless
you know the best implementation in your target device,
you are much better off letting synthesis generate the
gates (or LUT's), flip-flops, etc. Synthesis tools
are not as good at reading your mind, so if you try
to put an adder together from gates, they don't
necessarily see that you want an adder and therefore
won't optimize your logic as well as if you just
coded the addition operator.

Working with multiplication is different. Given that
your multiplier is pipelined, you need to show the
tools what you want to accomplish in each cycle so
you can't just use the multiplication operator, which
infers a combinatorial multiplier. That being said,
I have heard of synthesis tools that can pipeline
logic for you if you just create a combinatorial
function followed by a number of flip-flop stages.
Haven't seen it work myself, though.

Regards,
Gabor

Criticism requested of Verilog multiplier

James Harris

Guest

gabor

Guest

James Harris

Guest

gabor

Guest

Welcome to EDABoard.com

Sponsor

Online statistics

Forum statistics

Criticism requested of Verilog multiplier

James Harris

Guest

gabor

Guest

James Harris

Guest

gabor

Guest

Log in

Welcome to EDABoard.com

Sponsor