# Squaring of a binary number

## Ask a question - edaboard.com

elektroda.net NewsGroups Forum Index - VHDL Language - Squaring of a binary number

Goto page Previous  1, 2

lokesh kumar
Guest

Sat Jul 20, 2013 10:52 pm

On Sunday, July 21, 2013 12:53:23 AM UTC+5:30, rickman wrote:
Quote:
On 7/20/2013 11:46 AM, lokesh kumar wrote:

On Saturday, July 20, 2013 7:50:20 PM UTC+5:30, rickman wrote:

On 7/20/2013 8:37 AM, lokesh kumar wrote:

On Saturday, July 20, 2013 1:39:08 PM UTC+5:30, rickman wrote:

On 7/19/2013 11:38 PM, Gabor wrote:

On 7/19/2013 6:23 PM, Fredxx wrote:

On 19/07/2013 18:27, lokesh kumar wrote:

Hi,

Can anyone help me to design a code to square binary number?

Suppose "A" is a 5 bit number (a4a3a2a1a0)

If we do A x A then the output result will be "0 a4 0 a3 0 a2 0 a1 0

a0 " ( a 10 bit number)

All these numbers look bigger than 5 or 10 bits!

The syntax is not VHDL. He means for A to be a 5-bit vector

and the result ended up as:

'0'& A(4) \$ '0'& A(3) \$ '0'& A(2) \$ '0'& A(1) \$ '0'& A(0)

For example : Suppose A= 11111

Then A x A = 11111 x 11111 = 0101010101 ( Xor operation is done for

the adding to get the final result).

31 x 31 = 961 (11 1100 0001)

So clearly XORing is incorrect.

For the OP clearly "multiplication" or "squaring" is incorrect. He

apparently wants a different function that is similar to multiplication

but lacks any carries on the intermediate addition.

squaring of a 5-bit number to get an output like this?

A square operation is precisely that, A * A. Most FPGAs have some

pretty good multipliers, best to use them.

If he is doing a calculation on a polynomial, I understand why he wants

a multiply with no carries. Each term of the polynomial has a

coefficient which is all the terms of the same value summed together mod

2 (XOR). But I don't understand his other statements. As you showed

earlier his general form above for the product is not accurate. Or he

is saying something we don't understand.

From what I understand (or think I understand) this should be code he

could use.

subtype binary_num_5 is std_logic_vector (4 downto 0);

signal A : binary_num_5;

signal B : binary_num_5;

function square (arg : std_logic_vector) return std_logic_vector is

constant arg_Hi : integer := arg'HIGH;

constant arg_Lo : integer := arg'LOW;

constant arg_Len : integer := arg'LENGTH;

variable prod : std_logic_vector ((arg_Hi + arg_Len) downto arg_L)

:= (others => '0');

begin

for i in arg'range loop

prod := prod XOR std_logic_vector (

SHIFT_LEFT (RESIZE (unsigned(arg), 2*arg_Len), i));

end loop;

return prod;

end square;

...

B<= square (A);

I think this will do the job but I haven't tested it, so many errors can

be present! If nothing else, it should give a good idea on how to

proceed. I will say the whole thing is a little bit simpler if it is

done with unsigned type signals rather than std_logic_vector. This

would eliminate the type casts in the loop assignment statement.

prod := prod XOR SHIFT_LEFT (RESIZE (arg, 2*arg_Len), i));

--

Rick

library IEEE;

use IEEE.std_logic_1164.all;

use IEEE.std_logic_arith.all;

use IEEE.std_logic_unsigned.all;

--use work.my_package.all;

entity square_163_7_6_3 is

port (

a: in std_logic_vector(162 downto 0);

z: out std_logic_vector(162 downto 0)

);

end square_163_7_6_3;

architecture circuit of square_163_7_6_3 is

signal s, t, u, s_plus_t: std_logic_vector(162 downto 0);

signal xor1, xor2: std_logic;

begin

vector_s: for i in 0 to 80 generate

s(2*i)<= a(i);

s(2*i + 1)<= a(i+82);

end generate;

s(162)<= a(81);

vector_t1: for j in 0 to 6 generate

t(j)<= '0';

end generate;

t(7)<= a(82);

vector_t2: for i in 4 to 80 generate

t(2*i)<= a(i+7;

t(2*i + 1)<= a(i+79);

end generate;

t(162)<= a(159);

xor1<= a(160) xor a(161);

xor2<= a(161) xor a(162);

u(0)<= a(160);

u(1)<= a(160) xor a(162);

u(2)<= a(161);

u(3)<= xor1;

u(4)<= a(82) xor a(160);

u(5)<= xor2;

u(6)<= a(83) xor xor1;

u(7)<= '0';

u(<= a(84) xor xor1;

u(9)<= '0';

u(10)<= a(85) xor xor2;

u(11)<= '0';

u(12)<= a(86) xor a(162);

u(13)<= '0';

vector_u: for i in 7 to 80 generate

u(2*i)<= a(i+80);

u(2*i + 1)<= '0';

end generate;

u(162)<= a(161);

xor_gates1: for j in 0 to 162 generate

s_plus_t(j)<= s(j) xor t(j);

end generate;

xor_gates2: for j in 0 to 162 generate

z(j)<= s_plus_t(j) xor u(j);

end generate;

end circuit;

This the the exact code I found online, I think. But it is for 163-bit. So it is difficult to test and verify. Can you please help me to convert it for a 5-bit to make me understand it?

Many Thanks!

Hmmm... I don't think I can help you understand the code above. The

purpose of the code you posted, or at least how it was derived, is not

clear to me. If it helps you any, I have replaced it with a version

containing more white space for clarity.

Polynomial arithmetic is not my strong suit, but it seems familiar, so I

must have done it somewhere, sometime. Maybe it was that class in

multivalued logic which was actually a thinly disguised course in

abstract algebra taught in the EE department. Or more likely it is just

familiar from working with CRC calculations.

Here is my take on why you came up with the description of the formula

that you did. I am assuming that multiplication is the AND operation

and addition is the XOR operation. So '*' really means AND while '+'

really means XOR.

With that in mind here are some identities...

a(n) * a(n) = a(n)

a(n) * a(m) + a(m) * a(n) = a(n) * a(m) + a(n) * a(m) = 0

a(4 downto 0) is your input and z(8 downto 0) is your output.

a4, a3, a2, a1, a0 * a0

a4, a3, a2, a1, a0 * a1

a4, a3, a2, a1, a0 * a2

a4, a3, a2, a1, a0 * a3

a4, a3, a2, a1, a0 * a4

+ ----------------------------------

z8, z7, z6, z5, z4, z3, z2, z1, z0

z0 = a0 * a0 = a0

z1 = a0 * a1 + a1 * a0 = 0

z2 = a0 * a2 + a1 * a1 + a2 * a0 = a1

z3 = a0 * a3 + a1 * a2 + a2 * a1 + a3 * a0 = 0

z4 = a0 * a4 + a1 * a3 + a2 * a2 + a3 * a1 + a4 * a0 = a2

z5 = a1 * a4 + a2 * a3 + a3 * a2 + a4 * a1 = 0

z6 = a2 * a4 + a3 * a3 + a4 * a2 = a3

z7 = a3 * a4 + a4 * a3 = 0

z8 = a4 * a4 = a4

So this shows (at least for this case) the square of a polynomial *is*

represented by the formula you gave at the beginning (which includes one

more bit than needed).

'If we do AxA then the output result will be "0 a4 0 a3 0 a2 0 a1 0 a0"'

So here is the code for your square...

output_even: for i in 0 to 4 generate

z(2*i-1)<= a(i);

end generate;

output_odd: for i in 0 to 3 generate

z(2*i)<= '0';

end generate;

Replace the constants with the appropriate parameters and I expect you

can make a general function.

function poly_square (arg : std_logic_vector) return std_logic_vector is

constant arg_Hi : integer := arg'HIGH;

constant arg_Lo : integer := arg'LOW;

constant arg_Len : integer := arg'LENGTH;

variable prod : std_logic_vector ((2 * (arg_Len - 1)) downto 0)

:= (others => '0');

begin

for i in arg'range loop

prod(2*(i-arg_Lo)-1) := arg(i);

end loop;

return prod;

end poly_square;

...

signal A : std_logic_vector (4 downto 0);

signal B : std_logic_vector (8 downto 0);

...

B<= poly_square (A);

Again, not tested so there are likely errors.

--

Rick

It is a bit confusing to me.

Can I have your email id please? I can send you a relevant paper for the circuit with the algorithm. May be you will be able to understand it. I am unable to find more information related to it.

vector_s: for i in 0 to 80 generate s(2*i)<= a(i); s(2*i + 1)<= a(i+82); end generate;

s(162)<= a(81);

Do you understand what this code is doing? It is generating the signal

s() from the signal a(). Is that clear?

vector_t1: for j in 0 to 6 generate t(j)<= '0'; end generate;

t(7)<= a(82);

vector_t2: for i in 4 to 80 generate t(2*i)<= a(i+7; t(2*i + 1)<= a(i+79); end generate;

t(162)<= a(159);

This code generates t() from a().

Thanks!

I don't get it either. What are a(), s(), t() and z()? Is this

supposed to be squaring a() to get z()? If so, why is a() the same

length as z()? I can't find any of this code using Google.

I got your other email which was largely the same as your earlier post I

think. I prefer to discuss this here. There may be others who would

like to understand this or who can explain it. In fact, you might try

explaining what you are doing and ask how to do this in other groups

such as comp.dsp. Working from an undocumented code section is not a

great way to understand an algorithm.

I can't tell you anything about the code you posted. I have no idea why

they are doing all the calculations they are doing. I can read the

VHDL, but I can't read the mind of the person who wrote it.

I might be able to help you figure this out if you give more background.

What are you trying to do? What is the bigger picture? There is

often more than one way to skin a cat. If I understand what you are

trying to do with polynomials I think I have already explained what you

need to do to square a() and given you code to do it. If I don't

understand, perhaps you can explain better?

BTW, when you use Google Groups to post in newsgroups you need to fix

the quoted material and after a couple of quotes becomes unreadable.

--

Rick

Lets consider a 5 bit number, A = 10100

If we do the squaring of it then we will get,

A = 10100
x 10100
-------------
00000
00000
10100
00000
10100
-------------------
c= 100010000 (XOR operation is performed to add)

Now Irreducible polynomial is used to reduce it to 5-bit. (The aim is: if the input is out 5-bit then we need to reduce the output to 5-bit)

So for a 5-bit number the irreducible polynomial is , F(x) = x^5 + x^2 + 1 (we can write it in binary form as 100101) (It is a standard value)

Now both c and F(x) are added to reduce it to 5-bit. (from the MSB)

100010000 (value of c that we got after the squaring)
xor 100101 ( value of F(x) that we calculated)
-------------------
000111000 ( it is not 5-bit)

So now again do the xor operation to the result with the irreducible polynomial.

111000 (Do not consider 3 zeros from the left)
xor 100101 (irreducible polynomial)
---------------
011101 ( now it is reduced to a 5-bit number,(do not consider the zero at left side))

If you take a close look on the result then, we have taken A = 10100 and we got c = 100010000 (before reduction)

so simply we insert one zero between all the bits of A, then we will also get the same result.

c = 0 1 0 0 0 1 0 0 0 0 ( zeros are indicated)
| | | | |

If you remove the indicated zeros then tht value is equal to "A"

So now my main aim is to design a generalised code for 5-bit. Suppose I am giving a 5-bit input, A = a4 a3 a2 a1 a0

Then after squaring, I am getting a result C = 0-a4-0-a3-0-a2-0-a1-0-a0 ( zeros are added between all the bits of A)

Now I have to use same irreducible polynomial (100101) to reduce it to 5-bit.

lokesh kumar
Guest

Sat Jul 20, 2013 10:53 pm

On Sunday, July 21, 2013 12:53:23 AM UTC+5:30, rickman wrote:
Quote:
On 7/20/2013 11:46 AM, lokesh kumar wrote:

On Saturday, July 20, 2013 7:50:20 PM UTC+5:30, rickman wrote:

On 7/20/2013 8:37 AM, lokesh kumar wrote:

On Saturday, July 20, 2013 1:39:08 PM UTC+5:30, rickman wrote:

On 7/19/2013 11:38 PM, Gabor wrote:

On 7/19/2013 6:23 PM, Fredxx wrote:

On 19/07/2013 18:27, lokesh kumar wrote:

Hi,

Can anyone help me to design a code to square binary number?

Suppose "A" is a 5 bit number (a4a3a2a1a0)

If we do A x A then the output result will be "0 a4 0 a3 0 a2 0 a1 0

a0 " ( a 10 bit number)

All these numbers look bigger than 5 or 10 bits!

The syntax is not VHDL. He means for A to be a 5-bit vector

and the result ended up as:

'0'& A(4) \$ '0'& A(3) \$ '0'& A(2) \$ '0'& A(1) \$ '0'& A(0)

For example : Suppose A= 11111

Then A x A = 11111 x 11111 = 0101010101 ( Xor operation is done for

the adding to get the final result).

31 x 31 = 961 (11 1100 0001)

So clearly XORing is incorrect.

For the OP clearly "multiplication" or "squaring" is incorrect. He

apparently wants a different function that is similar to multiplication

but lacks any carries on the intermediate addition.

squaring of a 5-bit number to get an output like this?

A square operation is precisely that, A * A. Most FPGAs have some

pretty good multipliers, best to use them.

If he is doing a calculation on a polynomial, I understand why he wants

a multiply with no carries. Each term of the polynomial has a

coefficient which is all the terms of the same value summed together mod

2 (XOR). But I don't understand his other statements. As you showed

earlier his general form above for the product is not accurate. Or he

is saying something we don't understand.

From what I understand (or think I understand) this should be code he

could use.

subtype binary_num_5 is std_logic_vector (4 downto 0);

signal A : binary_num_5;

signal B : binary_num_5;

function square (arg : std_logic_vector) return std_logic_vector is

constant arg_Hi : integer := arg'HIGH;

constant arg_Lo : integer := arg'LOW;

constant arg_Len : integer := arg'LENGTH;

variable prod : std_logic_vector ((arg_Hi + arg_Len) downto arg_L)

:= (others => '0');

begin

for i in arg'range loop

prod := prod XOR std_logic_vector (

SHIFT_LEFT (RESIZE (unsigned(arg), 2*arg_Len), i));

end loop;

return prod;

end square;

...

B<= square (A);

I think this will do the job but I haven't tested it, so many errors can

be present! If nothing else, it should give a good idea on how to

proceed. I will say the whole thing is a little bit simpler if it is

done with unsigned type signals rather than std_logic_vector. This

would eliminate the type casts in the loop assignment statement.

prod := prod XOR SHIFT_LEFT (RESIZE (arg, 2*arg_Len), i));

--

Rick

library IEEE;

use IEEE.std_logic_1164.all;

use IEEE.std_logic_arith.all;

use IEEE.std_logic_unsigned.all;

--use work.my_package.all;

entity square_163_7_6_3 is

port (

a: in std_logic_vector(162 downto 0);

z: out std_logic_vector(162 downto 0)

);

end square_163_7_6_3;

architecture circuit of square_163_7_6_3 is

signal s, t, u, s_plus_t: std_logic_vector(162 downto 0);

signal xor1, xor2: std_logic;

begin

vector_s: for i in 0 to 80 generate

s(2*i)<= a(i);

s(2*i + 1)<= a(i+82);

end generate;

s(162)<= a(81);

vector_t1: for j in 0 to 6 generate

t(j)<= '0';

end generate;

t(7)<= a(82);

vector_t2: for i in 4 to 80 generate

t(2*i)<= a(i+7;

t(2*i + 1)<= a(i+79);

end generate;

t(162)<= a(159);

xor1<= a(160) xor a(161);

xor2<= a(161) xor a(162);

u(0)<= a(160);

u(1)<= a(160) xor a(162);

u(2)<= a(161);

u(3)<= xor1;

u(4)<= a(82) xor a(160);

u(5)<= xor2;

u(6)<= a(83) xor xor1;

u(7)<= '0';

u(<= a(84) xor xor1;

u(9)<= '0';

u(10)<= a(85) xor xor2;

u(11)<= '0';

u(12)<= a(86) xor a(162);

u(13)<= '0';

vector_u: for i in 7 to 80 generate

u(2*i)<= a(i+80);

u(2*i + 1)<= '0';

end generate;

u(162)<= a(161);

xor_gates1: for j in 0 to 162 generate

s_plus_t(j)<= s(j) xor t(j);

end generate;

xor_gates2: for j in 0 to 162 generate

z(j)<= s_plus_t(j) xor u(j);

end generate;

end circuit;

This the the exact code I found online, I think. But it is for 163-bit. So it is difficult to test and verify. Can you please help me to convert it for a 5-bit to make me understand it?

Many Thanks!

Hmmm... I don't think I can help you understand the code above. The

purpose of the code you posted, or at least how it was derived, is not

clear to me. If it helps you any, I have replaced it with a version

containing more white space for clarity.

Polynomial arithmetic is not my strong suit, but it seems familiar, so I

must have done it somewhere, sometime. Maybe it was that class in

multivalued logic which was actually a thinly disguised course in

abstract algebra taught in the EE department. Or more likely it is just

familiar from working with CRC calculations.

Here is my take on why you came up with the description of the formula

that you did. I am assuming that multiplication is the AND operation

and addition is the XOR operation. So '*' really means AND while '+'

really means XOR.

With that in mind here are some identities...

a(n) * a(n) = a(n)

a(n) * a(m) + a(m) * a(n) = a(n) * a(m) + a(n) * a(m) = 0

a(4 downto 0) is your input and z(8 downto 0) is your output.

a4, a3, a2, a1, a0 * a0

a4, a3, a2, a1, a0 * a1

a4, a3, a2, a1, a0 * a2

a4, a3, a2, a1, a0 * a3

a4, a3, a2, a1, a0 * a4

+ ----------------------------------

z8, z7, z6, z5, z4, z3, z2, z1, z0

z0 = a0 * a0 = a0

z1 = a0 * a1 + a1 * a0 = 0

z2 = a0 * a2 + a1 * a1 + a2 * a0 = a1

z3 = a0 * a3 + a1 * a2 + a2 * a1 + a3 * a0 = 0

z4 = a0 * a4 + a1 * a3 + a2 * a2 + a3 * a1 + a4 * a0 = a2

z5 = a1 * a4 + a2 * a3 + a3 * a2 + a4 * a1 = 0

z6 = a2 * a4 + a3 * a3 + a4 * a2 = a3

z7 = a3 * a4 + a4 * a3 = 0

z8 = a4 * a4 = a4

So this shows (at least for this case) the square of a polynomial *is*

represented by the formula you gave at the beginning (which includes one

more bit than needed).

'If we do AxA then the output result will be "0 a4 0 a3 0 a2 0 a1 0 a0"'

So here is the code for your square...

output_even: for i in 0 to 4 generate

z(2*i-1)<= a(i);

end generate;

output_odd: for i in 0 to 3 generate

z(2*i)<= '0';

end generate;

Replace the constants with the appropriate parameters and I expect you

can make a general function.

function poly_square (arg : std_logic_vector) return std_logic_vector is

constant arg_Hi : integer := arg'HIGH;

constant arg_Lo : integer := arg'LOW;

constant arg_Len : integer := arg'LENGTH;

variable prod : std_logic_vector ((2 * (arg_Len - 1)) downto 0)

:= (others => '0');

begin

for i in arg'range loop

prod(2*(i-arg_Lo)-1) := arg(i);

end loop;

return prod;

end poly_square;

...

signal A : std_logic_vector (4 downto 0);

signal B : std_logic_vector (8 downto 0);

...

B<= poly_square (A);

Again, not tested so there are likely errors.

--

Rick

It is a bit confusing to me.

Can I have your email id please? I can send you a relevant paper for the circuit with the algorithm. May be you will be able to understand it. I am unable to find more information related to it.

vector_s: for i in 0 to 80 generate s(2*i)<= a(i); s(2*i + 1)<= a(i+82); end generate;

s(162)<= a(81);

Do you understand what this code is doing? It is generating the signal

s() from the signal a(). Is that clear?

vector_t1: for j in 0 to 6 generate t(j)<= '0'; end generate;

t(7)<= a(82);

vector_t2: for i in 4 to 80 generate t(2*i)<= a(i+7; t(2*i + 1)<= a(i+79); end generate;

t(162)<= a(159);

This code generates t() from a().

Thanks!

I don't get it either. What are a(), s(), t() and z()? Is this

supposed to be squaring a() to get z()? If so, why is a() the same

length as z()? I can't find any of this code using Google.

I got your other email which was largely the same as your earlier post I

think. I prefer to discuss this here. There may be others who would

like to understand this or who can explain it. In fact, you might try

explaining what you are doing and ask how to do this in other groups

such as comp.dsp. Working from an undocumented code section is not a

great way to understand an algorithm.

I can't tell you anything about the code you posted. I have no idea why

they are doing all the calculations they are doing. I can read the

VHDL, but I can't read the mind of the person who wrote it.

I might be able to help you figure this out if you give more background.

What are you trying to do? What is the bigger picture? There is

often more than one way to skin a cat. If I understand what you are

trying to do with polynomials I think I have already explained what you

need to do to square a() and given you code to do it. If I don't

understand, perhaps you can explain better?

BTW, when you use Google Groups to post in newsgroups you need to fix

the quoted material and after a couple of quotes becomes unreadable.

--

Rick

Lets consider a 5 bit number, A = 10100

If we do the squaring of it then we will get,

A = 10100
x 10100
-------------
00000
00000
10100
00000
10100
-------------------
c= 100010000 (XOR operation is performed to add)

Now Irreducible polynomial is used to reduce it to 5-bit. (The aim is: if the input is out 5-bit then we need to reduce the output to 5-bit)

So for a 5-bit number the irreducible polynomial is , F(x) = x^5 + x^2 + 1 (we can write it in binary form as 100101) (It is a standard value)

Now both c and F(x) are added to reduce it to 5-bit. (from the MSB)

100010000 (value of c that we got after the squaring)
xor 100101 ( value of F(x) that we calculated)
-------------------
000111000 ( it is not 5-bit)

So now again do the xor operation to the result with the irreducible polynomial.

111000 (Do not consider 3 zeros from the left)
xor 100101 (irreducible polynomial)
---------------
011101 ( now it is reduced to a 5-bit number,(do not consider the zero at left side))

If you take a close look on the result then, we have taken A = 10100 and we got c = 100010000 (before reduction)

so simply we insert one zero between all the bits of A, then we will also get the same result.

c = 0 1 0 0 0 1 0 0 0 0 ( zeros are indicated)
| | | | |

If you remove the indicated zeros then tht value is equal to "A"

So now my main aim is to design a generalised code for 5-bit. Suppose I am giving a 5-bit input, A = a4 a3 a2 a1 a0

Then after squaring, I am getting a result C = 0-a4-0-a3-0-a2-0-a1-0-a0 ( zeros are added between all the bits of A)

Now I have to use same irreducible polynomial (100101) to reduce it to 5-bit.

rickman
Guest

Sat Jul 20, 2013 10:58 pm

Ok, we seem to be getting somewhere with this post. I am snipping
formatting problems.

On 7/20/2013 4:52 PM, lokesh kumar wrote:
Quote:

Lets consider a 5 bit number, A = 10100

If we do the squaring of it then we will get,

A = 10100
x 10100
-------------
00000
00000
10100
00000
10100
-------------------
c= 100010000 (XOR operation is performed to add)

This agrees with what we covered before.

Quote:
Now Irreducible polynomial is used to reduce it to 5-bit. (The aim is: if the input is out 5-bit then we need to reduce the output to 5-bit)

Ah! That's the part that was missing.

Quote:
So for a 5-bit number the irreducible polynomial is , F(x) = x^5 + x^2 + 1 (we can write it in binary form as 100101) (It is a standard value)

Now both c and F(x) are added to reduce it to 5-bit. (from the MSB)

Is this the same as dividing? I'd like to understand the theory behind
this... or maybe not... ;^) lol

But seriously, this rings a bell that division and multiplication are
similar if not the same in polynomial arithmetic.

Quote:

100010000 (value of c that we got after the squaring)
xor 100101 ( value of F(x) that we calculated)
-------------------
000111000 ( it is not 5-bit)

So now again do the xor operation to the result with the irreducible polynomial.

111000 (Do not consider 3 zeros from the left)
xor 100101 (irreducible polynomial)
---------------
011101 ( now it is reduced to a 5-bit number,(do not consider the zero at left side))

If you take a close look on the result then, we have taken A = 10100 and we got c = 100010000 (before reduction)

so simply we insert one zero between all the bits of A, then we will also get the same result.

c = 0 1 0 0 0 1 0 0 0 0 ( zeros are indicated)
| | | | |

Yes, this part actually doesn't require any hardware, just a bit of code
to assign different wires.

Quote:
If you remove the indicated zeros then tht value is equal to "A"

So now my main aim is to design a generalised code for 5-bit. Suppose I am giving a 5-bit input, A = a4 a3 a2 a1 a0

Then after squaring, I am getting a result C = 0-a4-0-a3-0-a2-0-a1-0-a0 ( zeros are added between all the bits of A)

Now I have to use same irreducible polynomial (100101) to reduce it to 5-bit.

Yes, more or less. Applying the irreducible polynomial (ip) to the
result is not hard to do, but the iteration can be done a couple of
ways. Is there a way to *know* how many times it must be applied?
Obviously this is what the code you provided is doing. I'm guessing
there must be some way of knowing how many times the ip must be applied
and perhaps even it is always the same number? If so, I can easily
generate the code similar to the 163 bit wide solution. If not, the
code will have to loop on applying the ip I think.

We could have a value of c = 101010101

applying the ip
101010101
100101 - once
001111101
100101 - twice
0110111
100101 - three times
010010 result!

This took three iterations and the number is not fixed although I think
we can set a max limit at three. So we will need to loop and either
exit conditionally or loop for the max count and conditionally execute
the XOR. Better, just make the 1's in each ip the value of the msb in
the appropriate bit of each intermediate c value. I bet if you dig into
the 163 bit version that is what they are doing. I can't figure it out
myself. First thing they do is to fold the input vector so the msbs are
in the bit positions that would be filled with zeros. Then they
generate vectors s, t and u which are all then XOR'd together to produce
the output.

Do you have a preference about whether this is clocked in a register and
takes multiple clock cycles or goes through successive stages of logic
without registers or clocks?

Any reason to generalize this for N bit wide inputs? I think we have
all the pieces now so we could do that as long as we can define the ip
for each value of N. Just doing it for N=5 is simpler of course.

If you do this normalization on a few test values to make sure it gives
the right answer and tell me if you want it as straight logic,
registered iteratively or pipelined, I'll give you an implementation.

--

Rick

lokesh kumar
Guest

Sun Jul 21, 2013 1:57 am

Yes, I am also thinking there must be some ways to know about the number of loops. We can calculate small numbers like 5-bit or 6-bit by using our hand. But it is very hard to calculate for 163-bit.

In your last message you told that, 3 loops are used to reduced. But it is not correct. After the first loop, you are getting two zeros in the beginning. So we do not need to apply the reduction polynomial value.So the 2nd loop is not needed. We can directly consider 1111101 and XOR with 100101 to get the 5-bit number.

http://www-brs.ub.ruhr-uni-bochum.de/netahtml/HSS/Diss/KumarSandeepS/diss.pdf

I am quite not sure if you can access this link.

If you are unable, then please google "elliptic curve cryptography for constrained devices" and check out the first link. On page number 132 (figure 8..2), the multiplier circuit for 163 bit is given. The reduction polynomial for 163 bit is, F(x) = x^163 + x^7 + x^6 + x^3 +1

So if you closely take a look on the circuit, then you can see that almost all the parts are similar. But there are 3 extra XOR gates are connected. The first extra XOR gate is connected after C2 register ( it is for x^3 which is a part of irreducible polynomial). The second extra XOR gate is connected after C5 register ( for x^6 which is a part of irreducible polynomial) and the third extra XOR gate is connected after C6 register ( for x^7 term in the polynomial). The output of C162 is directly connected to the XOR gate before C0. Because we do not need to add XOR gates for x^162 and 1.

Similarly for a 5-bit number, the reduction polynomial is F(x) = x^5 + x^2 + 1
So in this case we need to connect only one extra XOR gate after C1 (or before C0). And the output from c4 is directly connected to the XOR gate before C0.

In this case the reduction polynomial is attached. So the result if we take both A and B are of 5-bit numbers, then we will get the 5-bit number as an output.
This is the basic operation for the multiplier.

Now please look at the squaring circuit on page 133 (figure 8.3). I am unable to understand the circuit. I am not sure if the reduction polynomial is attached here. Please have a look on it and let me know if you understand.

-
Lokesh

lokesh kumar
Guest

Sun Jul 21, 2013 2:04 am

On Sunday, July 21, 2013 4:28:57 AM UTC+5:30, rickman wrote:
Quote:
Ok, we seem to be getting somewhere with this post. I am snipping

formatting problems.

On 7/20/2013 4:52 PM, lokesh kumar wrote:

Lets consider a 5 bit number, A = 10100

If we do the squaring of it then we will get,

A = 10100

x 10100

-------------

00000

00000

10100

00000

10100

-------------------

c= 100010000 (XOR operation is performed to add)

This agrees with what we covered before.

Now Irreducible polynomial is used to reduce it to 5-bit. (The aim is: if the input is out 5-bit then we need to reduce the output to 5-bit)

Ah! That's the part that was missing.

So for a 5-bit number the irreducible polynomial is , F(x) = x^5 + x^2 + 1 (we can write it in binary form as 100101) (It is a standard value)

Now both c and F(x) are added to reduce it to 5-bit. (from the MSB)

Is this the same as dividing? I'd like to understand the theory behind

this... or maybe not... ;^) lol

But seriously, this rings a bell that division and multiplication are

similar if not the same in polynomial arithmetic.

100010000 (value of c that we got after the squaring)

xor 100101 ( value of F(x) that we calculated)

-------------------

000111000 ( it is not 5-bit)

So now again do the xor operation to the result with the irreducible polynomial.

111000 (Do not consider 3 zeros from the left)

xor 100101 (irreducible polynomial)

---------------

011101 ( now it is reduced to a 5-bit number,(do not consider the zero at left side))

If you take a close look on the result then, we have taken A = 10100 and we got c = 100010000 (before reduction)

so simply we insert one zero between all the bits of A, then we will also get the same result.

c = 0 1 0 0 0 1 0 0 0 0 ( zeros are indicated)

| | | | |

Yes, this part actually doesn't require any hardware, just a bit of code

to assign different wires.

If you remove the indicated zeros then tht value is equal to "A"

So now my main aim is to design a generalised code for 5-bit. Suppose I am giving a 5-bit input, A = a4 a3 a2 a1 a0

Then after squaring, I am getting a result C = 0-a4-0-a3-0-a2-0-a1-0-a0 ( zeros are added between all the bits of A)

Now I have to use same irreducible polynomial (100101) to reduce it to 5-bit.

Yes, more or less. Applying the irreducible polynomial (ip) to the

result is not hard to do, but the iteration can be done a couple of

ways. Is there a way to *know* how many times it must be applied?

Obviously this is what the code you provided is doing. I'm guessing

there must be some way of knowing how many times the ip must be applied

and perhaps even it is always the same number? If so, I can easily

generate the code similar to the 163 bit wide solution. If not, the

code will have to loop on applying the ip I think.

We could have a value of c = 101010101

applying the ip

101010101

100101 - once

001111101

100101 - twice

0110111

100101 - three times

010010 result!

This took three iterations and the number is not fixed although I think

we can set a max limit at three. So we will need to loop and either

exit conditionally or loop for the max count and conditionally execute

the XOR. Better, just make the 1's in each ip the value of the msb in

the appropriate bit of each intermediate c value. I bet if you dig into

the 163 bit version that is what they are doing. I can't figure it out

myself. First thing they do is to fold the input vector so the msbs are

in the bit positions that would be filled with zeros. Then they

generate vectors s, t and u which are all then XOR'd together to produce

the output.

Do you have a preference about whether this is clocked in a register and

takes multiple clock cycles or goes through successive stages of logic

without registers or clocks?

Any reason to generalize this for N bit wide inputs? I think we have

all the pieces now so we could do that as long as we can define the ip

for each value of N. Just doing it for N=5 is simpler of course.

If you do this normalization on a few test values to make sure it gives

the right answer and tell me if you want it as straight logic,

registered iteratively or pipelined, I'll give you an implementation.

--

Rick

I think it is possible. We can use the shift registers and then using the irreducible polynomial.

rickman
Guest

Sun Jul 21, 2013 3:24 am

On 7/20/2013 7:57 PM, lokesh kumar wrote:
Quote:
Yes, I am also thinking there must be some ways to know about the number of loops. We can calculate small numbers like 5-bit or 6-bit by using our hand. But it is very hard to calculate for 163-bit.

In your last message you told that, 3 loops are used to reduced. But it is not correct. After the first loop, you are getting two zeros in the beginning. So we do not need to apply the reduction polynomial value.So the 2nd loop is not needed. We can directly consider 1111101 and XOR with 100101 to get the 5-bit number.

I think you didn't look at my example carefully enough. If a = 11111
then c = 101010101 which is 9 bits. The IP is used once which removes
the one in the msb and as you say, the next bit is a zero which is not
changed. So we get two bits in the first step. But that still leaves
us with a seven bit number and each further application of the IP only
removes a single bit from the resulting value of c.

Quote:
http://www-brs.ub.ruhr-uni-bochum.de/netahtml/HSS/Diss/KumarSandeepS/diss.pdf

I am quite not sure if you can access this link.

Yes, I downloaded it and can open it, just not in Acrobat, I have an
older copy which won't open this file, I have to use SumatraPDF reader.

Quote:
If you are unable, then please google "elliptic curve cryptography for constrained devices" and check out the first link. On page number 132 (figure 8..2), the multiplier circuit for 163 bit is given. The reduction polynomial for 163 bit is, F(x) = x^163 + x^7 + x^6 + x^3 +1

I think you mean figure 8.1 on page 132? That is what I am seeing.

Quote:
So if you closely take a look on the circuit, then you can see that almost all the parts are similar. But there are 3 extra XOR gates are connected. The first extra XOR gate is connected after C2 register ( it is for x^3 which is a part of irreducible polynomial). The second extra XOR gate is connected after C5 register ( for x^6 which is a part of irreducible polynomial) and the third extra XOR gate is connected after C6 register ( for x^7 term in the polynomial). The output of C162 is directly connected to the XOR gate before C0. Because we do not need to add XOR gates for x^162 and 1.

Similarly for a 5-bit number, the reduction polynomial is F(x) = x^5 + x^2 + 1
So in this case we need to connect only one extra XOR gate after C1 (or before C0). And the output from c4 is directly connected to the XOR gate before C0.

In this case the reduction polynomial is attached. So the result if we take both A and B are of 5-bit numbers, then we will get the 5-bit number as an output.
This is the basic operation for the multiplier.

Now please look at the squaring circuit on page 133 (figure 8.3). I am unable to understand the circuit. I am not sure if the reduction polynomial is attached here. Please have a look on it and let me know if you understand.

I assume you mean figure 8.2 on page 133.

Sorry, I can follow your directions above, but I understand this less
than you do. However, I think there is a lot not shown in Figure 8.2 on
page 133. I guess most, if not each output bit has at least one XOR
gate. I am guessing that they have manually "unrolled" the loop of the
calculation and it distills to a series of XOR gates on each output. In
fact, I seem to recall doing exactly that with a CRC computation once.
We had to calculate CRC on some 32 bits of input at a time. I took the
single bit calculation and applied it 32 times algebraically to produce
a fairly messy, but accurate result. We can do that in VHDL without
manually calculating the coefficients. We just apply the IP ANDed with
the corresponding bit of the temp variable and the calculations will be
implemented.

So if you want a fully combinatorial circuit like they are describing in
the squaring circuit of Figure 8.2 I will write a loop of combinatorial
logic. I can only think the complication of the code you provided where
they copy other higher order bits into the "zero" bits of c must be a
way to simplify the circuit and save the logic of the zero bits.
Although things like zero inputs are often optimized automatically in
FPGA design tools making this unnecessary.

--

Rick

rickman
Guest

Sun Jul 21, 2013 3:25 am

On 7/20/2013 7:57 PM, lokesh kumar wrote:
Quote:

Now please look at the squaring circuit on page 133 (figure 8.3). I am unable to understand the circuit. I am not sure if the reduction polynomial is attached here. Please have a look on it and let me know if you understand.

I almost forgot. To test the circuit will require that some examples be
provided with solutions. Can you come up with enough examples that
testing with those will give you confidence the circuit works correctly?

--

Rick

lokesh kumar
Guest

Sun Jul 21, 2013 12:26 pm

On Sunday, July 21, 2013 8:55:35 AM UTC+5:30, rickman wrote:
Quote:
On 7/20/2013 7:57 PM, lokesh kumar wrote:

Now please look at the squaring circuit on page 133 (figure 8.3). I am unable to understand the circuit. I am not sure if the reduction polynomial is attached here. Please have a look on it and let me know if you understand.

I almost forgot. To test the circuit will require that some examples be

provided with solutions. Can you come up with enough examples that

testing with those will give you confidence the circuit works correctly?

--

Rick

I think we can start with a small number like a 5-bit number. It would be easy to check.But for 163-bit, it is very difficult to check the result.So if we get an appropriate result for 5-bit then we can design 163-bit by using the same logic.As per your last message, if you can design a full combinatorial circuit for a 5-bit number by using the squaring circuit of figure 8..2, then we can check the results with several examples.If the circuit works fine then we can proceed for the implementation of 163-bit.

We can take the reduction polynomial as F(x)= x^5 + x^2 + 1 for a 5-bit number.

---
Lokesh

lokesh kumar
Guest

Mon Jul 22, 2013 8:21 pm

Hi Rick,

Finally I got the logic properly. And as per your comment I am sending you some more examples. Please have a look on the logic.
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Example (Part-1)
----------------
Suppose we consider a 5-bit number, A = 11010
If we do the square of the number then,

AxA = 11010
11010
---------
00000
11010
00000
11010
11010
---------------
C= 101000100 (9-bit) (XOR operation is performed to add in this case, NOT the binary addition )

But we can make it a 10-bit number by adding a zero as MSB.
Hence we can write, C= 0101000100 (Now its a 10-bit number and the value is not changed)

Now if we can obtain the same value of C, only by adding adding zeros between all the bits of A.
In the example, A = 11010
So after adding zeros between all the bits, then we will get

1 0 1 0 0 0 1 0 0
| | | |

It is the value of C that we calculated earlier. Its of 9-bit.
But we can add one extra zero at the MSB to make it 0101000100 ( Now it is of 10-bits)

Or we can directly write

0 1 0 1 0 0 0 1 0 0
| | | | |
Like this we do not need to multiply A, 2 times if we need to calculate the square.
We can simply add zeros between all the bits of A to get the square.
--------------------------------------------------------------------------------
Step-1 (For design)

So in general lets take

A = a4a3a2a1a0 ( 5-bit number)

If we take the square of the number, then we will get

C = c9 c8 c7 c6 c5 c4 c3 c2 c1 c0 ( it is of 10 bit)

C = 0 a4 0 a3 0 a2 0 a1 0 a0 ( This is how zeros are added and value of C is calculate)

-----------------------------------------------------------------------------------------------
Example (part-2)
----------------
Now our main aim is to reduce the 10-bit result to 5-bit.
For that purpose, we use reduction polynomial (it is a standard value)

The reduction polynomial for a 5-bit number is : F(x) = x^5 + x^2 + 1

We can write the reduction polynomial as 100101 in binary form.
To reduce,

0101000100 (10-bit number, that we obtained)
10101 (reduction polynomial)
---------------
0000010100 (XOR operation)

Hence the first five MSB are zero, then if we do not consider these bits. Then it is a 5-bit number. It is our final result
--------------------------------------------------------------------------------------------------------
Step-2 (For design)

The value of reduction polynomial is = 100101

We have already got the 10-bit result.
We have to do the XOR operation of the 10-bit result with the reduction polynomial in the same way (as shown is the example)

For that purpose, we need to see if the MSB of the 10-bit result is "1" or not. If it is "1" then we will do the XOR operation with the reduction polynomial.
If the MSB of 10-bit number is zero, then we need to check the second MSB of it. If the second MSB is "1" then we need to do the XOR operation there.

We need to do this loop till we obtain the 5-bit result.

Now this is the final result.
--------------------------------------------------------------------------------
I have explained you the logic with an example. I have started with an example. I have used example (1st part) to show the result after squaring. And in Step-1, I have described the general logic. In example (part-2) , I have shown the reduction of the result to 5-bit. And in Step-2, I have provided the general logic.
So basically we need to follow both step-1 and step-2.

Now I hope everything is clear. Please help me to design the code for it now. Hope this time we will be able to do it.

Many Thanks!
Lokesh

lokesh kumar
Guest

Mon Jul 22, 2013 9:17 pm

Example (Part-1)
----------------
Suppose we consider a 5-bit number, A = 11010
If we do the square of the number then,

AxA = 11010
11010
---------
00000
11010
00000
11010
11010
---------------
C= 101000100 (9-bit) (XOR operation is performed to add in this case, NOT the binary addition )

But we can make it a 10-bit number by adding a zero as MSB.
Hence we can write, C= 0101000100 (Now its a 10-bit number and the value is not changed)

Now if we can obtain the same value of C, only by adding adding zeros between all the bits of A.
In the example, A = 11010
So after adding zeros between all the bits, then we will get

1 0 1 0 0 0 1 0 0
| | | |

It is the value of C that we calculated earlier. Its of 9-bit.
But we can add one extra zero at the MSB to make it 0101000100 ( Now it is of 10-bits)

Like this we do not need to multiply A, 2 times if we need to calculate the square.
We can simply add zeros between all the bits of A to get the square.
--------------------------------------------------------------------------------
Step-1 (For design)

So in general lets take

A = a4a3a2a1a0 ( 5-bit number)

If we take the square of the number, then we will get

C = c9 c8 c7 c6 c5 c4 c3 c2 c1 c0 ( it is of 10 bit)

C = 0 a4 0 a3 0 a2 0 a1 0 a0 ( This is how zeros are added and value of C is calculate)

-----------------------------------------------------------------------------------------------
Example (part-2)
----------------
Now our main aim is to reduce the 10-bit result to 5-bit.
For that purpose, we use reduction polynomial (it is a standard value)

The reduction polynomial for a 5-bit number is : F(x) = x^5 + x^2 + 1

We can write the reduction polynomial as 100101 in binary form.
To reduce,

0101000100 (10-bit number, that we obtained)
10101 (reduction polynomial)
---------------
0000010100 (XOR operation)

Hence the first five MSB are zero, then if we do not consider these bits. Then it is a 5-bit number. It is our final result
--------------------------------------------------------------------------------------------------------
Step-2 (For design)

The value of reduction polynomial is = 100101

We have already got the 10-bit result.
We have to do the XOR operation of the 10-bit result with the reduction polynomial in the same way (as shown is the example)

For that purpose, we need to see if the MSB of the 10-bit result is "1" or not. If it is "1" then we will do the XOR operation with the reduction polynomial.
If the MSB of 10-bit number is zero, then we need to check the second MSB of it. If the second MSB is "1" then we need to do the XOR operation there.

We need to do this loop till we obtain the 5-bit result.

Now this is the final result. How can I design the code? Please help me to implement.

Many Thanks!
Lokesh

Guest

Mon Feb 03, 2020 2:45 am

There has to be something for there to be nothing

Goto page Previous  1, 2

elektroda.net NewsGroups Forum Index - VHDL Language - Squaring of a binary number