Here is new definition for keyword "if_2", version 2.

W

Weng Tianxiang

Guest
Here is new definition for keyword "if_2", version 2.

It is developed based on many discussions after my first post: " New keyword "if_2" is suggested for dealing with 2-write port memory."

New keyword "if_2" is used to put m-write and n-read memory module from chip manufactures' toolbox behind HDL language so that with the new keyword "if_2" introduction any m-write and n-read memory module would be fully specified in HDL with very simple coding and without special technique and knowledge about memory module, or instantiated memory module needed for circuit designers. All related complex job is left to synthesizer' manufacturers.

If_2-statement ::
[ if_2_label : ]

if_2 condition then

sequence_of_statements

{ elsif condition then

sequence_of_statements }

[ else

sequence_of_statements ]

end if [ if_2_label ] ;

1. Any assignment statement's target array in sequence_of_statements under an if-2 statement is an independent write to a memory that must be executed, not obeying statement sequence in a process, regardless how many writes to the target array are coded before or after its appearance.

2. Any assignment statement's target non-array signal in sequence_of_statements under an if_2 statement obeys statement sequence in a process.

3. An if-statement under an if_2-statement is treated as an if_2-statement.

4. An if_2-statement can only exist within a clocked process.


Here is a code example to specify a 3-write and 2-read memory module:

p1: process(CLK) is

begin

if CLK'event and CLK = '1' then

if C1 then

An_Array(a) <= D1; -- it is first write to array An_Array

end if;

if_2 C2 then

An_Array(b) <= D2; -- it is the second write to array An_Array

end if;

if_2 C3 then

An_Array(c) <= D3; -- it is the third write to array An_Array

end if;

X <= An_Array(j); -- first read from array An_Array

Y <= An_Array(k); -- second read from array An_Array

end if;

end process;

Especial thanks to the creative response writers who mentioned keyword "if_3", who gave me the specification of Cyclone and has deep discussions with me, and Han from HDL-lab whose implementation of a 8-write and 8-read memory for a CPU chip gave me deep impression long before the new idea is born.

Weng
 
On Thursday, September 26, 2019 at 11:42:35 PM UTC-4, Weng Tianxiang wrote:
Here is new definition for keyword "if_2", version 2.

Same comments as with the first 'definition' which is that it provides no benefit to anyone that uses VHDL and it expands the keyword list of the standard without providing any benefit. Good luck with that. Now that VHDL-2019 is near the end of the finish line, VHDL-2030 won't be far behind, that will be your next opportunity.

Each instance of 'if_2' in your example process can be replaced with today's 'if' and the example process works with every VHDL standard that has been released to date. So, if 'if_2' ever became part of the standard, then anyone who would use it is locking themselves into requiring use of a particular standard when it is not needed. That is poor design practice. I guess users will just have to muddle through by typing the exact same thing except for the needless '_2'.

Kevin Jennings
 
I download someone's key code from pastebin.com and copy it here for easy discussion. He claims that the following code describes a 10-write and 10-read memory module, but he admitted that a synthesizer does not run it well.

architecture example of memtest is
....
.. shared variable memory : memory_t; -- !!!
begin
.. . blks : for i in 0 to PORT_COUNT-1 generate
.. . . memport : process(clocks(i))
.. . . begin
.. . . . . if rising_edge(clocks(i)) then
.. . . . . . . if stbs(i) = '1' then
.. . . . . . . . . memory(addrs(i)) := writes(i);
.. . . . . . . end if;
.. . . . . . . reads(i) <= memory(addrs(i));
.. . . . . end if;
.. . . end process;
.. . end generate;
end architecture;

Does it mean a 10-write and 10-read port memory module?

I really don't understand what the code means, and how synthesizer executes it, and hope some experts explain it to me further.

As a fact, he says that "The fact that this probably wouldn't go so well when you ran it comes down to the synthesizer not the language." That is absolutely not as good as you promised what you have describe it before: the method to generate an n-write and m-read memory module is well established in HDL grammar. If a well defined code based on a grammar cannot run well by a synthesizer, can I believe what you say?

If it is really a 10-port memory module, why a synthesizer does not do well?

Thank you.

Weng
 
On Friday, September 27, 2019 at 7:25:33 PM UTC-4, Weng Tianxiang wrote:
I download someone's key code from pastebin.com and copy it here for easy discussion. He claims that the following code describes a 10-write and 10-read memory module, but he admitted that a synthesizer does not run it well..

architecture example of memtest is
...
. shared variable memory : memory_t; -- !!!
begin
. . blks : for i in 0 to PORT_COUNT-1 generate
. . . memport : process(clocks(i))
. . . begin
. . . . . if rising_edge(clocks(i)) then
. . . . . . . if stbs(i) = '1' then
. . . . . . . . . memory(addrs(i)) := writes(i);
. . . . . . . end if;
. . . . . . . reads(i) <= memory(addrs(i));
. . . . . end if;
. . . end process;
. . end generate;
end architecture;

Does it mean a 10-write and 10-read port memory module?

I really don't understand what the code means, and how synthesizer executes it, and hope some experts explain it to me further.

As a fact, he says that "The fact that this probably wouldn't go so well when you ran it comes down to the synthesizer not the language." That is absolutely not as good as you promised what you have describe it before: the method to generate an n-write and m-read memory module is well established in HDL grammar. If a well defined code based on a grammar cannot run well by a synthesizer, can I believe what you say?

If it is really a 10-port memory module, why a synthesizer does not do well?

Thank you.

Weng

Yes, it describes a single memory with N ports where N is defined by PORT_COUNT. What part of the code do you find confusing?

There is no shortcoming in the language. This code describes the memory properly. If a synthesizer can't synthesize this code for 10 ports that is a problem with the synthesizer, not the language. I'm willing to bet you will have a hard time finding a library module for a 10 port memory.

If you don't understand the language enough to know this code describes an N port memory, you really are not in a position to tell the rest of us how the language should be changed to accommodate your lack of understanding.

--

Rick C.

- Get 2,000 miles of free Supercharging
- Tesla referral code - https://ts.la/richard11209
 
On Friday, September 27, 2019 at 5:05:55 PM UTC-7, Rick C wrote:
On Friday, September 27, 2019 at 7:25:33 PM UTC-4, Weng Tianxiang wrote:
I download someone's key code from pastebin.com and copy it here for easy discussion. He claims that the following code describes a 10-write and 10-read memory module, but he admitted that a synthesizer does not run it well.

architecture example of memtest is
...
. shared variable memory : memory_t; -- !!!
begin
. . blks : for i in 0 to PORT_COUNT-1 generate
. . . memport : process(clocks(i))
. . . begin
. . . . . if rising_edge(clocks(i)) then
. . . . . . . if stbs(i) = '1' then
. . . . . . . . . memory(addrs(i)) := writes(i);
. . . . . . . end if;
. . . . . . . reads(i) <= memory(addrs(i));
. . . . . end if;
. . . end process;
. . end generate;
end architecture;

Does it mean a 10-write and 10-read port memory module?

I really don't understand what the code means, and how synthesizer executes it, and hope some experts explain it to me further.

As a fact, he says that "The fact that this probably wouldn't go so well when you ran it comes down to the synthesizer not the language." That is absolutely not as good as you promised what you have describe it before: the method to generate an n-write and m-read memory module is well established in HDL grammar. If a well defined code based on a grammar cannot run well by a synthesizer, can I believe what you say?

If it is really a 10-port memory module, why a synthesizer does not do well?

Thank you.

Weng

Yes, it describes a single memory with N ports where N is defined by PORT_COUNT. What part of the code do you find confusing?

There is no shortcoming in the language. This code describes the memory properly. If a synthesizer can't synthesize this code for 10 ports that is a problem with the synthesizer, not the language. I'm willing to bet you will have a hard time finding a library module for a 10 port memory.

If you don't understand the language enough to know this code describes an N port memory, you really are not in a position to tell the rest of us how the language should be changed to accommodate your lack of understanding.

--

Rick C.

- Get 2,000 miles of free Supercharging
- Tesla referral code - https://ts.la/richard11209

Rick,
I think your conclusion is made too earlier.

Here is what the code author responses:

The synthesizer doesn’t do well because any additional port to a memory makes it exponentially harder to implement.

You can implement a multi-memory in 3 ways:

real physical designed hard memory block. These are the dual-ported memories of FPGAs

flip-flops or latch based arrays. These are very area inefficient.

weird architectures that use dual-ported memories to build memories with a larger number of ports. That’s the paper that you linked too. It is extremely area inefficient as well.

In practice. designers avoid multi-ported memories like the plague because they are very costly. It has nothing to do with language features. As shown above: writing the RTL for a 10-ported memory is trivial. You don’t need new keywords for it.

For example: a 10 ported read/write memory would require on the order of 100 RAMs using the paper that you linked to.

That is why synthesis tools don’t infer them: you’d give designers a lot of rope to hang themselves with a feature for which there is no demand.

Rick,
After seeing the code author's response do you have any new idea?

Weng
 
On Friday, September 27, 2019 at 9:21:02 PM UTC-4, Weng Tianxiang wrote:
On Friday, September 27, 2019 at 5:05:55 PM UTC-7, Rick C wrote:
On Friday, September 27, 2019 at 7:25:33 PM UTC-4, Weng Tianxiang wrote:
I download someone's key code from pastebin.com and copy it here for easy discussion. He claims that the following code describes a 10-write and 10-read memory module, but he admitted that a synthesizer does not run it well.

architecture example of memtest is
...
. shared variable memory : memory_t; -- !!!
begin
. . blks : for i in 0 to PORT_COUNT-1 generate
. . . memport : process(clocks(i))
. . . begin
. . . . . if rising_edge(clocks(i)) then
. . . . . . . if stbs(i) = '1' then
. . . . . . . . . memory(addrs(i)) := writes(i);
. . . . . . . end if;
. . . . . . . reads(i) <= memory(addrs(i));
. . . . . end if;
. . . end process;
. . end generate;
end architecture;

Does it mean a 10-write and 10-read port memory module?

I really don't understand what the code means, and how synthesizer executes it, and hope some experts explain it to me further.

As a fact, he says that "The fact that this probably wouldn't go so well when you ran it comes down to the synthesizer not the language." That is absolutely not as good as you promised what you have describe it before: the method to generate an n-write and m-read memory module is well established in HDL grammar. If a well defined code based on a grammar cannot run well by a synthesizer, can I believe what you say?

If it is really a 10-port memory module, why a synthesizer does not do well?

Thank you.

Weng

Yes, it describes a single memory with N ports where N is defined by PORT_COUNT. What part of the code do you find confusing?

There is no shortcoming in the language. This code describes the memory properly. If a synthesizer can't synthesize this code for 10 ports that is a problem with the synthesizer, not the language. I'm willing to bet you will have a hard time finding a library module for a 10 port memory.

If you don't understand the language enough to know this code describes an N port memory, you really are not in a position to tell the rest of us how the language should be changed to accommodate your lack of understanding.

--

Rick C.

- Get 2,000 miles of free Supercharging
- Tesla referral code - https://ts.la/richard11209

Rick,
I think your conclusion is made too earlier.

Here is what the code author responses:

The synthesizer doesn’t do well because any additional port to a memory makes it exponentially harder to implement.

You can implement a multi-memory in 3 ways:

real physical designed hard memory block. These are the dual-ported memories of FPGAs

flip-flops or latch based arrays. These are very area inefficient.

weird architectures that use dual-ported memories to build memories with a larger number of ports. That’s the paper that you linked too. It is extremely area inefficient as well.

In practice. designers avoid multi-ported memories like the plague because they are very costly. It has nothing to do with language features. As shown above: writing the RTL for a 10-ported memory is trivial. You don’t need new keywords for it.

For example: a 10 ported read/write memory would require on the order of 100 RAMs using the paper that you linked to.

That is why synthesis tools don’t infer them: you’d give designers a lot of rope to hang themselves with a feature for which there is no demand.

Rick,
After seeing the code author's response do you have any new idea?

I'm not clear on what your points are. I don't see anything in this post that contradicts anything I've said. What did I say that you are addressing?

BTW, it is hard to follow the conversation when you keep starting new threads on the same topic.

--

Rick C.

+ Get 2,000 miles of free Supercharging
+ Tesla referral code - https://ts.la/richard11209
 
On 28/09/2019 00:25, Weng Tianxiang wrote:
I download someone's key code from pastebin.com and copy it here for easy discussion. He claims that the following code describes a 10-write and 10-read memory module, but he admitted that a synthesizer does not run it well.

architecture example of memtest is
...
. shared variable memory : memory_t; -- !!!
begin
. . blks : for i in 0 to PORT_COUNT-1 generate
. . . memport : process(clocks(i))
. . . begin
. . . . . if rising_edge(clocks(i)) then
. . . . . . . if stbs(i) = '1' then
. . . . . . . . . memory(addrs(i)) := writes(i);
. . . . . . . end if;
. . . . . . . reads(i) <= memory(addrs(i));
. . . . . end if;
. . . end process;
. . end generate;
end architecture;

Does it mean a 10-write and 10-read port memory module?

I really don't understand what the code means, and how synthesizer executes it, and hope some experts explain it to me further.

As a fact, he says that "The fact that this probably wouldn't go so well when you ran it comes down to the synthesizer not the language." That is absolutely not as good as you promised what you have describe it before: the method to generate an n-write and m-read memory module is well established in HDL grammar. If a well defined code based on a grammar cannot run well by a synthesizer, can I believe what you say?

If it is really a 10-port memory module, why a synthesizer does not do well?

Because there are no suitable primitives for the synthesis tool to map
to. This is not the say the synthesis vendor couldn't infer a decuple
(had to look this up) port memory block using existing techniques like
templates, attributes, synthesis directive etc but I suspect the number
of configurations would be too large for very little return.

As many others have told you adding a new keyword to the language will
not make this any easier!

I would be interested to find out what circuit needs a true decuple port
memory block. Processor register files and network controllers require a
large number of read/write ports but I am sure it is not as high as 10.

Regards,
Hans
www.ht-lab.com


Thank you.

Weng
 
On Saturday, September 28, 2019 at 1:24:26 AM UTC-7, HT-Lab wrote:
On 28/09/2019 00:25, Weng Tianxiang wrote:
I download someone's key code from pastebin.com and copy it here for easy discussion. He claims that the following code describes a 10-write and 10-read memory module, but he admitted that a synthesizer does not run it well.

architecture example of memtest is
...
. shared variable memory : memory_t; -- !!!
begin
. . blks : for i in 0 to PORT_COUNT-1 generate
. . . memport : process(clocks(i))
. . . begin
. . . . . if rising_edge(clocks(i)) then
. . . . . . . if stbs(i) = '1' then
. . . . . . . . . memory(addrs(i)) := writes(i);
. . . . . . . end if;
. . . . . . . reads(i) <= memory(addrs(i));
. . . . . end if;
. . . end process;
. . end generate;
end architecture;

Does it mean a 10-write and 10-read port memory module?

I really don't understand what the code means, and how synthesizer executes it, and hope some experts explain it to me further.

As a fact, he says that "The fact that this probably wouldn't go so well when you ran it comes down to the synthesizer not the language." That is absolutely not as good as you promised what you have describe it before: the method to generate an n-write and m-read memory module is well established in HDL grammar. If a well defined code based on a grammar cannot run well by a synthesizer, can I believe what you say?

If it is really a 10-port memory module, why a synthesizer does not do well?

Because there are no suitable primitives for the synthesis tool to map
to. This is not the say the synthesis vendor couldn't infer a decuple
(had to look this up) port memory block using existing techniques like
templates, attributes, synthesis directive etc but I suspect the number
of configurations would be too large for very little return.

As many others have told you adding a new keyword to the language will
not make this any easier!

I would be interested to find out what circuit needs a true decuple port
memory block. Processor register files and network controllers require a
large number of read/write ports but I am sure it is not as high as 10.

Regards,
Hans
www.ht-lab.com



Thank you.

Weng

Hi Hans,

I remember that you mentioned that you implemented a 8*8 port memory module using technique based on paper "Efficient Multi-Ported Memories for FPGAs"..

Can you disclose more details and your experiences about your implementation? And what is the best technique to design a CPU register file in your opinion?

In my project, I need multiple 2-write and 2 read port memory, true dual port memory does not meet my requirement. I estimate that I need 4 RAM with each having 1-write and 1-read port.

Even though my project is still in logic design stage and there is no problem for me to simulate the logic, based on current logic design: an array can be read n times and written m times: when multiple writing to an array in a process I guess a simulator would only write any data at the written address once it meets an assignment statement that would guarantee the last write is valid if their writing addresses are same.

The technique based on the paper needs n*m RAM blocks if each RAM block has one write and one read port. What role may a dual port memory block play?

Thank you.

Weng
 
On Saturday, September 28, 2019 at 10:02:42 AM UTC-4, Weng Tianxiang wrote:
On Saturday, September 28, 2019 at 1:24:26 AM UTC-7, HT-Lab wrote:
On 28/09/2019 00:25, Weng Tianxiang wrote:
I download someone's key code from pastebin.com and copy it here for easy discussion. He claims that the following code describes a 10-write and 10-read memory module, but he admitted that a synthesizer does not run it well.

architecture example of memtest is
...
. shared variable memory : memory_t; -- !!!
begin
. . blks : for i in 0 to PORT_COUNT-1 generate
. . . memport : process(clocks(i))
. . . begin
. . . . . if rising_edge(clocks(i)) then
. . . . . . . if stbs(i) = '1' then
. . . . . . . . . memory(addrs(i)) := writes(i);
. . . . . . . end if;
. . . . . . . reads(i) <= memory(addrs(i));
. . . . . end if;
. . . end process;
. . end generate;
end architecture;

Does it mean a 10-write and 10-read port memory module?

I really don't understand what the code means, and how synthesizer executes it, and hope some experts explain it to me further.

As a fact, he says that "The fact that this probably wouldn't go so well when you ran it comes down to the synthesizer not the language." That is absolutely not as good as you promised what you have describe it before: the method to generate an n-write and m-read memory module is well established in HDL grammar. If a well defined code based on a grammar cannot run well by a synthesizer, can I believe what you say?

If it is really a 10-port memory module, why a synthesizer does not do well?

Because there are no suitable primitives for the synthesis tool to map
to. This is not the say the synthesis vendor couldn't infer a decuple
(had to look this up) port memory block using existing techniques like
templates, attributes, synthesis directive etc but I suspect the number
of configurations would be too large for very little return.

As many others have told you adding a new keyword to the language will
not make this any easier!

I would be interested to find out what circuit needs a true decuple port
memory block. Processor register files and network controllers require a
large number of read/write ports but I am sure it is not as high as 10.

Regards,
Hans
www.ht-lab.com



Thank you.

Weng


Hi Hans,

I remember that you mentioned that you implemented a 8*8 port memory module using technique based on paper "Efficient Multi-Ported Memories for FPGAs".

Can you disclose more details and your experiences about your implementation? And what is the best technique to design a CPU register file in your opinion?

In my project, I need multiple 2-write and 2 read port memory, true dual port memory does not meet my requirement. I estimate that I need 4 RAM with each having 1-write and 1-read port.

Even though my project is still in logic design stage and there is no problem for me to simulate the logic, based on current logic design: an array can be read n times and written m times: when multiple writing to an array in a process I guess a simulator would only write any data at the written address once it meets an assignment statement that would guarantee the last write is valid if their writing addresses are same.

The technique based on the paper needs n*m RAM blocks if each RAM block has one write and one read port. What role may a dual port memory block play?

You think too much in terms of the HDL you have written. There is no way for the HDL to know the two addresses are equal, so the first/last thing doesn't enter into the matter. That is also why the suggested code to infer a multiple write port memory is with a shared variable and separate processes.

Remember that an HDL is a hardware description language. Exactly what hardware are you trying to describe? That is, how do you expect the tools to implement your multiple write port memory?

The fact that your code simulated means nothing if the code can't be synthesized to working hardware.

--

Rick C.

-- Get 2,000 miles of free Supercharging
-- Tesla referral code - https://ts.la/richard11209
 
On 28/09/2019 15:02, Weng Tianxiang wrote:
On Saturday, September 28, 2019 at 1:24:26 AM UTC-7, HT-Lab wrote:
On 28/09/2019 00:25, Weng Tianxiang wrote:
...


Hi Hans,

I remember that you mentioned that you implemented a 8*8 port memory module using technique based on paper "Efficient Multi-Ported Memories for FPGAs".

Hi Weng,

I actually used the XOR variant (not multipumped) to implement a 4W8R
port. You can find the paper here:

http://fpgacpu.ca/multiport/FPGA2012-LaForest-XOR-Paper.pdf

and more papers on the main page:

http://fpgacpu.ca/multiport/

Can you disclose more details and your experiences about your implementation? And what is the best technique to design a CPU register file in your opinion?

That all depends on your design. In my case I could use the XOR variant
as I have a pipelined design were I could latch the register file's read
request early on in the pipeline and then in a later stage XOR with the
new results for the write request. The XOR is the most area efficient
but was the most complicated to add to my design (due to data hazards
and the fact that each write request also needs a read request).

In my project, I need multiple 2-write and 2 read port memory, true dual port memory does not meet my requirement. I estimate that I need 4 RAM with each having 1-write and 1-read port.

In that case forget about LaForest Et.al paper and simple use one of the
core wizards like Intel's MegaWizard, Xilinx's Coregen etc. You get 2W2R
area/speed optimised design with lots of configurable options.

Even though my project is still in logic design stage and there is no problem for me to simulate the logic, based on current logic design: an array can be read n times and written m times: when multiple writing to an array in a process I guess a simulator would only write any data at the written address once it meets an assignment statement that would guarantee the last write is valid if their writing addresses are same.

The core wizards gives you the option what should happen if you
read/write to the same address.

The technique based on the paper needs n*m RAM blocks if each RAM block has one write and one read port. What role may a dual port memory block play?

Not sure what you are asking, you need DPRAM's as the basic building
block for a a multi-port design. If you have the time I would suggest to
implement the various versions and see how they behave, I learned a lot
from it.

Good luck,
Hans
www.ht-lab.com


Thank you.

Weng
 
On Saturday, September 28, 2019 at 9:29:54 AM UTC-7, HT-Lab wrote:
On 28/09/2019 15:02, Weng Tianxiang wrote:
On Saturday, September 28, 2019 at 1:24:26 AM UTC-7, HT-Lab wrote:
On 28/09/2019 00:25, Weng Tianxiang wrote:
..


Hi Hans,

I remember that you mentioned that you implemented a 8*8 port memory module using technique based on paper "Efficient Multi-Ported Memories for FPGAs".

Hi Weng,

I actually used the XOR variant (not multipumped) to implement a 4W8R
port. You can find the paper here:

http://fpgacpu.ca/multiport/FPGA2012-LaForest-XOR-Paper.pdf

and more papers on the main page:

http://fpgacpu.ca/multiport/


Can you disclose more details and your experiences about your implementation? And what is the best technique to design a CPU register file in your opinion?

That all depends on your design. In my case I could use the XOR variant
as I have a pipelined design were I could latch the register file's read
request early on in the pipeline and then in a later stage XOR with the
new results for the write request. The XOR is the most area efficient
but was the most complicated to add to my design (due to data hazards
and the fact that each write request also needs a read request).


In my project, I need multiple 2-write and 2 read port memory, true dual port memory does not meet my requirement. I estimate that I need 4 RAM with each having 1-write and 1-read port.

In that case forget about LaForest Et.al paper and simple use one of the
core wizards like Intel's MegaWizard, Xilinx's Coregen etc. You get 2W2R
area/speed optimised design with lots of configurable options.


Even though my project is still in logic design stage and there is no problem for me to simulate the logic, based on current logic design: an array can be read n times and written m times: when multiple writing to an array in a process I guess a simulator would only write any data at the written address once it meets an assignment statement that would guarantee the last write is valid if their writing addresses are same.

The core wizards gives you the option what should happen if you
read/write to the same address.


The technique based on the paper needs n*m RAM blocks if each RAM block has one write and one read port. What role may a dual port memory block play?

Not sure what you are asking, you need DPRAM's as the basic building
block for a a multi-port design. If you have the time I would suggest to
implement the various versions and see how they behave, I learned a lot
from it.

Good luck,
Hans
www.ht-lab.com



Thank you.

Weng

Hans,
Thank you very much for your help and sharing your experience with me, and 2 links are valuable. I will spend time reading those specifications and papers carefully.

Because VHDL has means to generate a n*m port code, so my if_2 idea is meaningless and dead. I am sorry for your times spent on those related posts.Of cause I learned a lot.

Also thank Rick and JK for your time.

Weng
 
On 28/09/2019 21:27, Weng Tianxiang wrote:
On Saturday, September 28, 2019 at 9:29:54 AM UTC-7, HT-Lab wrote:
On 28/09/2019 15:02, Weng Tianxiang wrote:
On Saturday, September 28, 2019 at 1:24:26 AM UTC-7, HT-Lab wrote:
On 28/09/2019 00:25, Weng Tianxiang wrote:
..


Hi Hans,

I remember that you mentioned that you implemented a 8*8 port memory module using technique based on paper "Efficient Multi-Ported Memories for FPGAs".

Hi Weng,

I actually used the XOR variant (not multipumped) to implement a 4W8R
port. You can find the paper here:

http://fpgacpu.ca/multiport/FPGA2012-LaForest-XOR-Paper.pdf

and more papers on the main page:

http://fpgacpu.ca/multiport/


Can you disclose more details and your experiences about your implementation? And what is the best technique to design a CPU register file in your opinion?

That all depends on your design. In my case I could use the XOR variant
as I have a pipelined design were I could latch the register file's read
request early on in the pipeline and then in a later stage XOR with the
new results for the write request. The XOR is the most area efficient
but was the most complicated to add to my design (due to data hazards
and the fact that each write request also needs a read request).


In my project, I need multiple 2-write and 2 read port memory, true dual port memory does not meet my requirement. I estimate that I need 4 RAM with each having 1-write and 1-read port.

In that case forget about LaForest Et.al paper and simple use one of the
core wizards like Intel's MegaWizard, Xilinx's Coregen etc. You get 2W2R
area/speed optimised design with lots of configurable options.


Even though my project is still in logic design stage and there is no problem for me to simulate the logic, based on current logic design: an array can be read n times and written m times: when multiple writing to an array in a process I guess a simulator would only write any data at the written address once it meets an assignment statement that would guarantee the last write is valid if their writing addresses are same.

The core wizards gives you the option what should happen if you
read/write to the same address.


The technique based on the paper needs n*m RAM blocks if each RAM block has one write and one read port. What role may a dual port memory block play?

Not sure what you are asking, you need DPRAM's as the basic building
block for a a multi-port design. If you have the time I would suggest to
implement the various versions and see how they behave, I learned a lot
from it.

Good luck,
Hans
www.ht-lab.com



Thank you.

Weng


Hans,
Thank you very much for your help and sharing your experience with me, and 2 links are valuable. I will spend time reading those specifications and papers carefully.

Because VHDL has means to generate a n*m port code, so my if_2 idea is meaningless and dead. I am sorry for your times spent on those related posts.Of cause I learned a lot.

No need to apologise, how many of us can say we have 4(?) granted US
patents. Please continue to share your ideas and questions.

Regards,
Hans.
www.ht-lab.com
 
On Sunday, September 29,
No need to apologise, how many of us can say we have 4(?) granted US
patents. Please continue to share your ideas and questions.

Regards,
Hans.
www.ht-lab.com

Hans,
Thank you for your experience, help and encouragement.

Your 4*8 memory module and a 4 computer system are a great achievement!!!

Weng
 

Welcome to EDABoard.com

Sponsor

Back
Top