EDK : FSL macros defined by Xilinx are wrong

On 8/13/15 9:44 PM, thomas.entner99@gmail.com wrote:
Again SDCC has few
developers, and at least recently, the most active ones don't seem that
interested in the pics.

Back to the topic of the open FPGA tool chain, I think there would
bemany "PICs", i.e. topics which are addressed by no / too few developers.

But the whole discussion is quite theoretical as long as A & X do
not open their bitstream formats. And I do not think that they will do
anything that will support an open source solution, as software is the
main entry obstacle for FPGA startups. If there would be a flexible
open-source tool-chain with large developer and user-base that can be
ported to new architectures easily, this would make it much easier for
new competition. (Think gcc...)

Also (as mentioned above) I think with the good and free tool chains
from the suppliers, their would be not much demand for such a open
source tool chain. There are other points where I would see more
motiviation and even there is not happening much:
- Good open source Verilog/VHDL editor (Yes, I have heard of
Emacs...)
as the integrated editors are average (Altera) or bad (Xilinx).
(Currently I am evaluating two commercial VHDL editors...)
- A kind of graphical editor for VHDL and Verilog as the top/higher
levels of bigger projects are often a pain IMHO (like writing netlists
by hand). I would even start such a project myself if I had the time...
But even with such things where I think would be quite some demand,
the "critical mass" of the FPGA community is too low to get projects
started and especially keep them running.

Thomas

One big factor against an open source tool chain is that while the FPGA
vendors describe in general terms the routing inside the devices, the
precise details are not given, and I suspect that these details may be
considered as part of the "secret sauce" that makes the device work. The
devices have gotten so big and complicated, that it is impractical to
use fully populated muxes, and how you chose what gets to what is important.

Processors can also have little details like this, but for processors it
tends to just affect the execution speed, and a compiler that doesn't
take them into account can still do a reasonable job. For an FPGA,
without ALL the details for this you can't even do the routing.
 
On 8/13/2015 11:17 PM, Richard Damon wrote:
On 8/13/15 9:44 PM, thomas.entner99@gmail.com wrote:
Again SDCC has few
developers, and at least recently, the most active ones don't seem that
interested in the pics.

Back to the topic of the open FPGA tool chain, I think there would
bemany "PICs", i.e. topics which are addressed by no / too few
developers.

But the whole discussion is quite theoretical as long as A & X do
not open their bitstream formats. And I do not think that they will do
anything that will support an open source solution, as software is the
main entry obstacle for FPGA startups. If there would be a flexible
open-source tool-chain with large developer and user-base that can be
ported to new architectures easily, this would make it much easier for
new competition. (Think gcc...)

Also (as mentioned above) I think with the good and free tool chains
from the suppliers, their would be not much demand for such a open
source tool chain. There are other points where I would see more
motiviation and even there is not happening much:
- Good open source Verilog/VHDL editor (Yes, I have heard of
Emacs...)
as the integrated editors are average (Altera) or bad (Xilinx).
(Currently I am evaluating two commercial VHDL editors...)
- A kind of graphical editor for VHDL and Verilog as the top/higher
levels of bigger projects are often a pain IMHO (like writing netlists
by hand). I would even start such a project myself if I had the time...

But even with such things where I think would be quite some demand,
the "critical mass" of the FPGA community is too low to get projects
started and especially keep them running.

Thomas


One big factor against an open source tool chain is that while the FPGA
vendors describe in general terms the routing inside the devices, the
precise details are not given, and I suspect that these details may be
considered as part of the "secret sauce" that makes the device work. The
devices have gotten so big and complicated, that it is impractical to
use fully populated muxes, and how you chose what gets to what is
important.

I'm not sure what details of routing aren't available. There may not
be a document which details it all, but last I saw, there were chip
level design tools which allow you to see all of the routing and
interconnects. The delay info can be extracted from the timing analysis
tools. As far as I am aware, there is no "secret sauce".


Processors can also have little details like this, but for processors it
tends to just affect the execution speed, and a compiler that doesn't
take them into account can still do a reasonable job. For an FPGA,
without ALL the details for this you can't even do the routing.

Timing data in an FPGA may be difficult to extract, but otherwise I
think all the routing info is readily available.

--

Rick
 
On 8/14/15 1:29 PM, rickman wrote:
On 8/13/2015 11:17 PM, Richard Damon wrote:

One big factor against an open source tool chain is that while the FPGA
vendors describe in general terms the routing inside the devices, the
precise details are not given, and I suspect that these details may be
considered as part of the "secret sauce" that makes the device work. The
devices have gotten so big and complicated, that it is impractical to
use fully populated muxes, and how you chose what gets to what is
important.

I'm not sure what details of routing aren't available. There may not
be a document which details it all, but last I saw, there were chip
level design tools which allow you to see all of the routing and
interconnects. The delay info can be extracted from the timing analysis
tools. As far as I am aware, there is no "secret sauce".


Processors can also have little details like this, but for processors it
tends to just affect the execution speed, and a compiler that doesn't
take them into account can still do a reasonable job. For an FPGA,
without ALL the details for this you can't even do the routing.

Timing data in an FPGA may be difficult to extract, but otherwise I
think all the routing info is readily available.

My experience is that you get to see what location a given piece of
logic, and which channels it travels. You do NOT see which particular
wire in that channel is being used. In general, each logic cell does not
have routing to every wire in that channel, and every wire does not have
access to every cross wire. These details tend to be the secret sauce,
as when they do it well, you aren't supposed to notice the incomplete
connections.

I have had to work with the factory on things like this. I had a very
full FPGA and needed to make a small change. With the change I had some
over clogged routing, but if I removed all internal constraints the
fitter couldn't find a fit. Working with someone who did know the
details, we were able to relax just a few internal constraints and get
the system to fit the design. He did comment that my design was probably
the fullest design he had seen in the wild, we had grown to about 95%
logic utilization.
 
On 8/14/2015 9:32 PM, Richard Damon wrote:
On 8/14/15 1:29 PM, rickman wrote:
On 8/13/2015 11:17 PM, Richard Damon wrote:

One big factor against an open source tool chain is that while the FPGA
vendors describe in general terms the routing inside the devices, the
precise details are not given, and I suspect that these details may be
considered as part of the "secret sauce" that makes the device work. The
devices have gotten so big and complicated, that it is impractical to
use fully populated muxes, and how you chose what gets to what is
important.

I'm not sure what details of routing aren't available. There may not
be a document which details it all, but last I saw, there were chip
level design tools which allow you to see all of the routing and
interconnects. The delay info can be extracted from the timing analysis
tools. As far as I am aware, there is no "secret sauce".


Processors can also have little details like this, but for processors it
tends to just affect the execution speed, and a compiler that doesn't
take them into account can still do a reasonable job. For an FPGA,
without ALL the details for this you can't even do the routing.

Timing data in an FPGA may be difficult to extract, but otherwise I
think all the routing info is readily available.


My experience is that you get to see what location a given piece of
logic, and which channels it travels. You do NOT see which particular
wire in that channel is being used. In general, each logic cell does not
have routing to every wire in that channel, and every wire does not have
access to every cross wire. These details tend to be the secret sauce,
as when they do it well, you aren't supposed to notice the incomplete
connections.

Don't they still have the chip editor? That *must* show everything of
importance.


I have had to work with the factory on things like this. I had a very
full FPGA and needed to make a small change. With the change I had some
over clogged routing, but if I removed all internal constraints the
fitter couldn't find a fit. Working with someone who did know the
details, we were able to relax just a few internal constraints and get
the system to fit the design. He did comment that my design was probably
the fullest design he had seen in the wild, we had grown to about 95%
logic utilization.

Yeah, that's pretty full. I start to worry around 80%, but I've never
actually had one fail to route other than the ones I tried to help by
doing placement, lol.

--

Rick
 
On 8/14/15 10:59 PM, rickman wrote:
On 8/14/2015 9:32 PM, Richard Damon wrote:

My experience is that you get to see what location a given piece of
logic, and which channels it travels. You do NOT see which particular
wire in that channel is being used. In general, each logic cell does not
have routing to every wire in that channel, and every wire does not have
access to every cross wire. These details tend to be the secret sauce,
as when they do it well, you aren't supposed to notice the incomplete
connections.

Don't they still have the chip editor? That *must* show everything of
importance.

The chip editors tend to just show the LOGIC resources, not the details
of the routing resources. The manufactures tend to do a good job of
giving the detail of the logic blocks you are working with, as this is
the part of the design you tend to specify. Routing on the other hand
tends to not be something you care about, just that the routing 'works'.
When they have done a good job at designing the routing you don't notice
it, but there have been cases where the routing turned out not quite
flexible enough and you notice that you can't fill the device as well
before hitting routing issues.
I have had to work with the factory on things like this. I had a very
full FPGA and needed to make a small change. With the change I had some
over clogged routing, but if I removed all internal constraints the
fitter couldn't find a fit. Working with someone who did know the
details, we were able to relax just a few internal constraints and get
the system to fit the design. He did comment that my design was probably
the fullest design he had seen in the wild, we had grown to about 95%
logic utilization.

Yeah, that's pretty full. I start to worry around 80%, but I've never
actually had one fail to route other than the ones I tried to help by
doing placement, lol.

They suggest that you consider 75-80% to be "Full". This design started
in the 70% level but we were adding capability to the system and the
density grew. (And were already using the largest chip for the
footprint). Our next step was to redo the board and get the usage back
down. When we hit the issue we had a mostly working design but were
fixing the one last bug, and that was when the fitter threw its fit.
 
On Tuesday, August 18, 2015 at 11:35:55 AM UTC-4, rickman wrote:
I'm not sure what details of the routing the chip editors leave out.
You only need to know what is connected to what, through what and what
the delays for all those cases are.

If you're trying to implement an open source toolchain you would likely need to know *how* to specify those connections via the programming bitstream.

Kevin
 
On 8/15/2015 8:32 AM, Richard Damon wrote:
On 8/14/15 10:59 PM, rickman wrote:
On 8/14/2015 9:32 PM, Richard Damon wrote:

My experience is that you get to see what location a given piece of
logic, and which channels it travels. You do NOT see which particular
wire in that channel is being used. In general, each logic cell does not
have routing to every wire in that channel, and every wire does not have
access to every cross wire. These details tend to be the secret sauce,
as when they do it well, you aren't supposed to notice the incomplete
connections.

Don't they still have the chip editor? That *must* show everything of
importance.

The chip editors tend to just show the LOGIC resources, not the details
of the routing resources. The manufactures tend to do a good job of
giving the detail of the logic blocks you are working with, as this is
the part of the design you tend to specify. Routing on the other hand
tends to not be something you care about, just that the routing 'works'.
When they have done a good job at designing the routing you don't notice
it, but there have been cases where the routing turned out not quite
flexible enough and you notice that you can't fill the device as well
before hitting routing issues.

I'm not sure what details of the routing the chip editors leave out.
You only need to know what is connected to what, through what and what
the delays for all those cases are. Other than that, the routing does
just "work".


I have had to work with the factory on things like this. I had a very
full FPGA and needed to make a small change. With the change I had some
over clogged routing, but if I removed all internal constraints the
fitter couldn't find a fit. Working with someone who did know the
details, we were able to relax just a few internal constraints and get
the system to fit the design. He did comment that my design was probably
the fullest design he had seen in the wild, we had grown to about 95%
logic utilization.

Yeah, that's pretty full. I start to worry around 80%, but I've never
actually had one fail to route other than the ones I tried to help by
doing placement, lol.


They suggest that you consider 75-80% to be "Full". This design started
in the 70% level but we were adding capability to the system and the
density grew. (And were already using the largest chip for the
footprint). Our next step was to redo the board and get the usage back
down. When we hit the issue we had a mostly working design but were
fixing the one last bug, and that was when the fitter threw its fit.

The "full" utilization number is approximate because it depends on the
details of the design. Some designs can get to higher utilization
numbers, others less. As a way of pointing out that the routing is the
part of the chip that uses the most space while the logic is smaller,
Xilinx sales people used to say, "We sell you the routing and give you
the logic for free." The point is the routing usually limits your
design rather than the logic. If you want to be upset about utilization
numbers, ask them how much of your routing gets used! It's *way* below
80%.

--

Rick
 
On 8/18/15 11:35 AM, rickman wrote:
On 8/15/2015 8:32 AM, Richard Damon wrote:

The chip editors tend to just show the LOGIC resources, not the details
of the routing resources. The manufactures tend to do a good job of
giving the detail of the logic blocks you are working with, as this is
the part of the design you tend to specify. Routing on the other hand
tends to not be something you care about, just that the routing 'works'.
When they have done a good job at designing the routing you don't notice
it, but there have been cases where the routing turned out not quite
flexible enough and you notice that you can't fill the device as well
before hitting routing issues.

I'm not sure what details of the routing the chip editors leave out. You
only need to know what is connected to what, through what and what the
delays for all those cases are. Other than that, the routing does just
"work".

Look closely. The chip editor will normally show you the exact logic
element you are using with a precise location. The output will then go
out into a routing channel and on the the next logic logic cell(s) that
it goes to. It may even show you the the various rows and columns of
routing it is going through. Those rows and columns are made of a
(large) number of distinct wires with routing resources connecting
outputs to select lines and select lines being brought into the next
piece of routing/logic. Which wire is being used will not be indicated,
nor are all the wires interchangeable, so which wire can matter for
fitting. THIS is the missing information.
I have had to work with the factory on things like this. I had a very
full FPGA and needed to make a small change. With the change I had some
over clogged routing, but if I removed all internal constraints the
fitter couldn't find a fit. Working with someone who did know the
details, we were able to relax just a few internal constraints and get
the system to fit the design. He did comment that my design was
probably
the fullest design he had seen in the wild, we had grown to about 95%
logic utilization.

Yeah, that's pretty full. I start to worry around 80%, but I've never
actually had one fail to route other than the ones I tried to help by
doing placement, lol.


They suggest that you consider 75-80% to be "Full". This design started
in the 70% level but we were adding capability to the system and the
density grew. (And were already using the largest chip for the
footprint). Our next step was to redo the board and get the usage back
down. When we hit the issue we had a mostly working design but were
fixing the one last bug, and that was when the fitter threw its fit.

The "full" utilization number is approximate because it depends on the
details of the design. Some designs can get to higher utilization
numbers, others less. As a way of pointing out that the routing is the
part of the chip that uses the most space while the logic is smaller,
Xilinx sales people used to say, "We sell you the routing and give you
the logic for free." The point is the routing usually limits your
design rather than the logic. If you want to be upset about utilization
numbers, ask them how much of your routing gets used! It's *way* below
80%.
And this is why the keep the real details of the routing proprietary.
(Not to keep you from getting upset) The serious design work goes into
figuring out how much they really need per cell. If they could figure
out a better allocation that let them cut the routing per cell by 10%,
they could give you 10% more logic for free. If they goof and provide
too little routing, you see the resources that you were sold (since they
advertize the logic capability) as being wasted by some 'dumb design
limitation'. There have been families that got black eyes of having
routing problems, and thus should be avoided for 'serious' work.
 
On 8/18/2015 2:21 PM, KJ wrote:
On Tuesday, August 18, 2015 at 11:35:55 AM UTC-4, rickman wrote:

I'm not sure what details of the routing the chip editors leave out.
You only need to know what is connected to what, through what and what
the delays for all those cases are.

If you're trying to implement an open source toolchain you would likely need to know *how* to specify those connections via the programming bitstream.

Well... yeah. That's the sticky wicket, knowing how to generate the
bitstream. I think you missed the point of this subthread.

--

Rick
 
On 8/18/2015 9:40 PM, Richard Damon wrote:
On 8/18/15 11:35 AM, rickman wrote:
On 8/15/2015 8:32 AM, Richard Damon wrote:

The chip editors tend to just show the LOGIC resources, not the details
of the routing resources. The manufactures tend to do a good job of
giving the detail of the logic blocks you are working with, as this is
the part of the design you tend to specify. Routing on the other hand
tends to not be something you care about, just that the routing 'works'.
When they have done a good job at designing the routing you don't notice
it, but there have been cases where the routing turned out not quite
flexible enough and you notice that you can't fill the device as well
before hitting routing issues.

I'm not sure what details of the routing the chip editors leave out. You
only need to know what is connected to what, through what and what the
delays for all those cases are. Other than that, the routing does just
"work".


Look closely. The chip editor will normally show you the exact logic
element you are using with a precise location. The output will then go
out into a routing channel and on the the next logic logic cell(s) that
it goes to. It may even show you the the various rows and columns of
routing it is going through. Those rows and columns are made of a
(large) number of distinct wires with routing resources connecting
outputs to select lines and select lines being brought into the next
piece of routing/logic. Which wire is being used will not be indicated,
nor are all the wires interchangeable, so which wire can matter for
fitting. THIS is the missing information.

I can't speak with total authority since I have not used a chip editor
in a decade. But when I have used them they showed sufficient detail
that I could control every aspect of the routing. In fact, it showed
every routing resource in sufficient detail that the logic components
were rather small and were a bit hard to see.

When you say which wire is used is not shown, how would you be able to
do manual routing if the details are not there? Manual routing and
logic is the purpose of the chip editors, no?


I have had to work with the factory on things like this. I had a very
full FPGA and needed to make a small change. With the change I had
some
over clogged routing, but if I removed all internal constraints the
fitter couldn't find a fit. Working with someone who did know the
details, we were able to relax just a few internal constraints and get
the system to fit the design. He did comment that my design was
probably
the fullest design he had seen in the wild, we had grown to about 95%
logic utilization.

Yeah, that's pretty full. I start to worry around 80%, but I've never
actually had one fail to route other than the ones I tried to help by
doing placement, lol.


They suggest that you consider 75-80% to be "Full". This design started
in the 70% level but we were adding capability to the system and the
density grew. (And were already using the largest chip for the
footprint). Our next step was to redo the board and get the usage back
down. When we hit the issue we had a mostly working design but were
fixing the one last bug, and that was when the fitter threw its fit.

The "full" utilization number is approximate because it depends on the
details of the design. Some designs can get to higher utilization
numbers, others less. As a way of pointing out that the routing is the
part of the chip that uses the most space while the logic is smaller,
Xilinx sales people used to say, "We sell you the routing and give you
the logic for free." The point is the routing usually limits your
design rather than the logic. If you want to be upset about utilization
numbers, ask them how much of your routing gets used! It's *way* below
80%.

And this is why the keep the real details of the routing proprietary.
(Not to keep you from getting upset) The serious design work goes into
figuring out how much they really need per cell. If they could figure
out a better allocation that let them cut the routing per cell by 10%,
they could give you 10% more logic for free. If they goof and provide
too little routing, you see the resources that you were sold (since they
advertize the logic capability) as being wasted by some 'dumb design
limitation'. There have been families that got black eyes of having
routing problems, and thus should be avoided for 'serious' work.

I don't follow the logic. There are always designs that deviate from
the typical utilization in both directions. Whether you can see what
details in the chip editor has nothing to do with user satisfaction
since you can read the utilization numbers in the reports and don't need
to see any routing, etc.

--

Rick
 
On Tuesday, March 17, 2015 at 4:58:39 AM UTC+3, princesse91 wrote:
Hi Ahmed,
Can you tell me how did you generate VHDL from Handel-C?? I'm working on the conversion from c++ to vhdl with handel-c but i didn't know how to do it..
Thanks

It's very easy use interfaces for inputs and outputs in your Handel-C code. Then change the debug option into VHDL or Verilog or EDIF or SystemC. Be sure to use Mentor Graghics DK Design Suite 5.
 
On Friday, 21 August 2015 18:36:59 UTC+1, ahmed...@gmail.com wrote:
On Tuesday, March 17, 2015 at 4:58:39 AM UTC+3, princesse91 wrote:
Hi Ahmed,
Can you tell me how did you generate VHDL from Handel-C?? I'm working on the conversion from c++ to vhdl with handel-c but i didn't know how to do it..
Thanks

It's very easy use interfaces for inputs and outputs in your Handel-C code. Then change the debug option into VHDL or Verilog or EDIF or SystemC. Be sure to use Mentor Graghics DK Design Suite 5.

do you still have access to Mentor Graphics DK Design Suite 5? I thought this was now obsolete
 
On 22/08/2015 23:14, martinjpearson wrote:
On Friday, 21 August 2015 18:36:59 UTC+1, ahmed...@gmail.com wrote:
On Tuesday, March 17, 2015 at 4:58:39 AM UTC+3, princesse91 wrote:
Hi Ahmed,
Can you tell me how did you generate VHDL from Handel-C?? I'm working on the conversion from c++ to vhdl with handel-c but i didn't know how to do it..
Thanks

It's very easy use interfaces for inputs and outputs in your Handel-C code. Then change the debug option into VHDL or Verilog or EDIF or SystemC. Be sure to use Mentor Graghics DK Design Suite 5.

do you still have access to Mentor Graphics DK Design Suite 5? I thought this was now obsolete
I think DK is not (yet) obsolete but barely alive, the latest version is
5.4_1 released back in 2011. Nowadays anybody interested in
C/C++/SystemC synthesis will have a wide choice from free to CatapultC.

Hans
www.ht-lab.com
 
On Sunday, 23 August 2015 08:50:13 UTC+1, HT-Lab wrote:
On 22/08/2015 23:14, martinjpearson wrote:
On Friday, 21 August 2015 18:36:59 UTC+1, ahmed...@gmail.com wrote:
On Tuesday, March 17, 2015 at 4:58:39 AM UTC+3, princesse91 wrote:
Hi Ahmed,
Can you tell me how did you generate VHDL from Handel-C?? I'm working on the conversion from c++ to vhdl with handel-c but i didn't know how to do it..
Thanks

It's very easy use interfaces for inputs and outputs in your Handel-C code. Then change the debug option into VHDL or Verilog or EDIF or SystemC. Be sure to use Mentor Graghics DK Design Suite 5.

do you still have access to Mentor Graphics DK Design Suite 5? I thought this was now obsolete

I think DK is not (yet) obsolete but barely alive, the latest version is
5.4_1 released back in 2011. Nowadays anybody interested in
C/C++/SystemC synthesis will have a wide choice from free to CatapultC.

Hans
www.ht-lab.com

Our site licence has now expired and Mentor will not renew it. Anyone have any experience of Impulse C? I'm drawn to the CSP based architecture
 
On 08/26/2015 02:03 PM, David Brown wrote:
On 26/08/15 01:20, Johann Klammer wrote:
Hello,
How are typical CPLD input muxen build up?

(For other posters, a CPLD is a "complex programmable logic device". It
is a bit like a simple FPGA - there is no fixed dividing line between
them, but CPLD's tend to be built from a fairly small number of fairly
complex "macrocells" containing a flip-flop and a set of AND/OR trees
for logic operations, while FPGA's tend to have a large number of much
simpler cells and use lookup tables for logic.)


I was looking at the ATMEL .jed files and
their input muxen seem to be a series of sparse, one-
hot encoded bitfields with fewer bits
than inputs in total. So now I am wondering:
How are they distributed?
Just the name of the permutation problem might help, or
some other relevant search terms.


You are unlikely to be able to make sense of the programming file for
even the simplest of PLD. It is not information that is published by
the manufacturers, making it almost impossible to figure out which bits
are used for the routing, the AND/OR tables, and other features. But it
Their .jed files have comments... In this case I know which is which.
guessing the and-matrix assignment is trivial(done from the input equations).
The meaning of the MC fuses can be found by trial&error, I believe.

the mux feeding those input lines is different... there's 40 lines coming into
the and array from the outside(80 counting both inv and noninv ones). +16 local loopbacks(not muxed)...
but the widths of the 40 muxes are not sufficiently large to do arbitrary selections from
the inputs.

as far as I can tell from their docs and .jed
their devices have

dev: 1502 1504 1508
input lines: 68 132 260
mux width: 5 8 27
(40 muxes in their input switch)


[...]
 
Excuse me for a very long gap that I was absent.

On Tue, 18 Aug 2015 21:40:27 -0400, Richard Damon wrote:

On 8/18/15 11:35 AM, rickman wrote:
I'm not sure what details of the routing the chip editors leave out.
You only need to know what is connected to what, through what and what
the delays for all those cases are. Other than that, the routing does
just "work".

Look closely. [...] Those rows and columns are made of a
(large) number of distinct wires with routing resources connecting
outputs to select lines and select lines being brought into the next
piece of routing/logic. Which wire is being used will not be indicated,
nor are all the wires interchangeable, so which wire can matter for
fitting. THIS is the missing information.

A comment:

All this information sounds like it can be teased out of the physical
chips. But, I find there are other considerations. First, doing this
might jeopardize the FPGA manufacturer. It's no good to have a FOSS
toolchain but no FPGAs to use it on. Second, there is the problem of a
FPGA manufacturer releasing a small tweak that would invalidate the
entire work done. Third, and this is where it gets interesting, the time
and effort spent reverse-engineering a great number of FPGA models is
probably better spent engineering a FOSH ASIC toolchain together with the
assorted manufacturing technology. Because - honestly - if you are
willing to program FPGAs, you are really not very far away from forging
ASICs, are you?

Speaking for myself, I'm working alone on FPGAs far away from silicon
powerhouses and I have to jump through hoops and loops to get the chips.
Jumping through hoops and loops to get my design forged into an ASIC is
not really that different.
 
On 8/30/2016 1:11 AM, rickman wrote:
On 8/30/2016 12:03 AM, Cecil Bayona wrote:
On 8/29/2016 7:55 PM, rickman wrote:
On 8/29/2016 4:30 PM, Cecil Bayona wrote:
Nothing Fancy, that is why in my earlier post I mentioned that I don't
have a lot of experience. I been working a 32 bit stack based CPU, but
it's a work in progress, I'm still sorting it out, it taken less than
20% of the chip, but a stack CPU are rather simple compared to other
CPU's, when finished it should be pretty nice, most instructions take
one clock to execute, and it used packed instructions, 5
instructions to
a word fetch. Originally it was on a Lattice Brevia2, I am now
converting it to a Artix-7 board, but there is software involved too so
it's going slow and I'm learning as I go.

Just a comment on your stack processor. I've done some design work with
stack processors and read about a lot of designs. In my humble opinion,
if you have multiple cycle instructions, you are doing it wrong. I
don't want to steal the thread. If you care to discuss this we can
start another thread.

I'm not sure why you think that it uses multiple cock instruction, I
mentioned that most occur in one clock, the exception is load immediate
it takes the instruction fetch, then a second fetch for the 32 bit value
to push on the stack, all others take one clock. Even that one can take
place in one clock with extra hardware to fetch RAM with a buffer to
hold two 32 bit words, an alternative is to use two clocks one to
execute the other to fetch program instructions.

What is does have is multiple instructions on one program word, it's
five instructions or one depending on what it is, jump, and call take
the whole 32 bit word and is not packed, everything else is 5
instructions to a 32 bit program word so you have fewer memory fetches.

Are you using external memory? The CPUs I've designed were 100%
internal to the FPGA so there was no advantage to fetching multiple
instructions. In fact, the multiplexer needed was just more delay.

I thought you design was not one clock per instruction because of what
you said. Your design is using a single memory interface. It is common
for stack machines to separate data and instruction space. But if you
are working out of external memory that is not so easy.

I assume you have read the various similar work that has been done? The
J1 is very interesting in that it was so durn simple. A true MISC
design and effective. If you meander over to comp.lang.forth there are
folks there who have designed some MISC CPUs which have proven their
worth. Bernd Paysan designed the B16 which seems like an effective
design. I've never tried to work with it, but it sounds similar to
yours in they way it combines multiple instructions into a single
instruction word.
I think the multiple instructions is to save on program space, 32 bits
to an instruction will eat up the memory space ways too quickly. It is a
Von Newman machine with a single address space, code and data is all in
the same space.

It uses three kinds of instruction formats, short and long, and double word.

A short instruction is 6 bits so you can cram 5 instruction to a memory
word, so there are 64 possible instructions but the bare machine uses 27
opcodes so there is room for additional instructions. The are things
that require no addresses such as dup, swap, return, add, etc.

Long instructions take 30 bits, they are things like jump, call,
including conditional versions, they include a 24 bit address as part of
the instruction which can address 16MB of program space, the upper two
bits are unused at present

There is a single two word instruction, which pushes the second 32 bit
word into the stack, it itself is a 6 bit instruction and there can be
more that one "LIT" instruction, each one has an additional word to use
as a literal. Each 32 bit program word could have 6 LIT instructions
with each one having an additional 32 bit word following the program
code word so if you have six literal words packed into single 32 bit
program word, it is followed by six 32 words containing the literal
values, the program continues 7 words after the current 6 instruction
word, of course one can have fewer literals.

The whole thing is like a packed instruction pipeline it minimizes the
number of fetches but any control flow instruction cancels and dumps the
current instruction queue and does a fetch at a different place since
the IP has changed.

The job of packing the instructions into one word is handled by the
Forth Compiler so the user does not see it, it just makes the program
code smaller automatically. I did some test and 5 is too long a queue,
when doing a 16 bit version it packs 3 instructions to a word and that
version ends up with a shorter program area so it's more efficient, in
part because Forth changes the Program Counter often so having a longer
instruction queue waste instruction slots. on a 16 bit version adding
some extra instructions would be nice so you end up with add with carry,
subtract with borrow etc so it can handle 32 operations efficiently.

Overall its an efficient design but it could be improved as anything
else could. One could add combined instructions where a return is
combined with some regular instructions so it execute both in one clock.

Eventually I want to work with the J1, its a simpler 16 bit machine but
since the instruction word contains multiple fields it can have
instructions that do multiple operations at the same time naturally,
with a more complex compiler packing multiple instructions it might save
on program space. It does have very limited addressing space due to it's
instructions limited to 16 bits.
--
Cecil - k5nwa
 
On 8/30/2016 4:49 AM, Cecil Bayona wrote:
On 8/30/2016 1:11 AM, rickman wrote:
On 8/30/2016 12:03 AM, Cecil Bayona wrote:
On 8/29/2016 7:55 PM, rickman wrote:
On 8/29/2016 4:30 PM, Cecil Bayona wrote:
Nothing Fancy, that is why in my earlier post I mentioned that I don't
have a lot of experience. I been working a 32 bit stack based CPU,
but
it's a work in progress, I'm still sorting it out, it taken less than
20% of the chip, but a stack CPU are rather simple compared to other
CPU's, when finished it should be pretty nice, most instructions take
one clock to execute, and it used packed instructions, 5
instructions to
a word fetch. Originally it was on a Lattice Brevia2, I am now
converting it to a Artix-7 board, but there is software involved
too so
it's going slow and I'm learning as I go.

Just a comment on your stack processor. I've done some design work
with
stack processors and read about a lot of designs. In my humble
opinion,
if you have multiple cycle instructions, you are doing it wrong. I
don't want to steal the thread. If you care to discuss this we can
start another thread.

I'm not sure why you think that it uses multiple cock instruction, I
mentioned that most occur in one clock, the exception is load immediate
it takes the instruction fetch, then a second fetch for the 32 bit value
to push on the stack, all others take one clock. Even that one can take
place in one clock with extra hardware to fetch RAM with a buffer to
hold two 32 bit words, an alternative is to use two clocks one to
execute the other to fetch program instructions.

What is does have is multiple instructions on one program word, it's
five instructions or one depending on what it is, jump, and call take
the whole 32 bit word and is not packed, everything else is 5
instructions to a 32 bit program word so you have fewer memory fetches.

Are you using external memory? The CPUs I've designed were 100%
internal to the FPGA so there was no advantage to fetching multiple
instructions. In fact, the multiplexer needed was just more delay.

I thought you design was not one clock per instruction because of what
you said. Your design is using a single memory interface. It is common
for stack machines to separate data and instruction space. But if you
are working out of external memory that is not so easy.

I assume you have read the various similar work that has been done? The
J1 is very interesting in that it was so durn simple. A true MISC
design and effective. If you meander over to comp.lang.forth there are
folks there who have designed some MISC CPUs which have proven their
worth. Bernd Paysan designed the B16 which seems like an effective
design. I've never tried to work with it, but it sounds similar to
yours in they way it combines multiple instructions into a single
instruction word.

I think the multiple instructions is to save on program space, 32 bits
to an instruction will eat up the memory space ways too quickly. It is a
Von Newman machine with a single address space, code and data is all in
the same space.

Again, that is a reflection of having a common address space for
instructions and data. My CPUs use 8 or 9 bits for instructions while
being data size independent. The instruction format does not imply any
particular data bus size.


It uses three kinds of instruction formats, short and long, and double
word.

A short instruction is 6 bits so you can cram 5 instruction to a memory
word, so there are 64 possible instructions but the bare machine uses 27
opcodes so there is room for additional instructions. The are things
that require no addresses such as dup, swap, return, add, etc.

Long instructions take 30 bits, they are things like jump, call,
including conditional versions, they include a 24 bit address as part of
the instruction which can address 16MB of program space, the upper two
bits are unused at present

There is a single two word instruction, which pushes the second 32 bit
word into the stack, it itself is a 6 bit instruction and there can be
more that one "LIT" instruction, each one has an additional word to use
as a literal. Each 32 bit program word could have 6 LIT instructions
with each one having an additional 32 bit word following the program
code word so if you have six literal words packed into single 32 bit
program word, it is followed by six 32 words containing the literal
values, the program continues 7 words after the current 6 instruction
word, of course one can have fewer literals.

I've shied away from multicycle instructions because it means more bits
(1 bit in this case) to indicate the cycle count which is more input(s)
to the decoder. I wanted to try to keep the decoder as simple as possible.


The whole thing is like a packed instruction pipeline it minimizes the
number of fetches but any control flow instruction cancels and dumps the
current instruction queue and does a fetch at a different place since
the IP has changed.

The F18A does that. Learning how to pack instructions into the word is
a bit tricky. Even harder is learning how to time execution, but that's
because it is async and does not use a fixed frequency clock.


The job of packing the instructions into one word is handled by the
Forth Compiler so the user does not see it, it just makes the program
code smaller automatically. I did some test and 5 is too long a queue,
when doing a 16 bit version it packs 3 instructions to a word and that
version ends up with a shorter program area so it's more efficient, in
part because Forth changes the Program Counter often so having a longer
instruction queue waste instruction slots. on a 16 bit version adding
some extra instructions would be nice so you end up with add with carry,
subtract with borrow etc so it can handle 32 operations efficiently.

Overall its an efficient design but it could be improved as anything
else could. One could add combined instructions where a return is
combined with some regular instructions so it execute both in one clock.

One of the things I have looked at briefly is breaking out the three
"engines" so each one is separately controlled by fields in the
instruction. This will require a larger instruction, but 16 bits should
be enough. Then many types of instructions could be combined. I didn't
pursue it because I didn't want to work on the assembler that would
handle it.


Eventually I want to work with the J1, its a simpler 16 bit machine but
since the instruction word contains multiple fields it can have
instructions that do multiple operations at the same time naturally,
with a more complex compiler packing multiple instructions it might save
on program space. It does have very limited addressing space due to it's
instructions limited to 16 bits.

I seem to recall discussing this with someone else not too long ago. I
don't think there actually is much parallelism possible with the J1
design. You can combine a return instruction with arithmetic
instructions, and there are fields to adjust the stack independently.
So you might be able to combine say, 2DUP + in one instruction by using
+ with a DSTACK +1 instead of a -1. But the useful combos will be
limited. The utility is also limited by the instruction frequency.
Only 35% of the instructions in the app the J1 was designed for are ALU
instructions which are the only ones that can be paralleled.

In my CPU design I had separate "engines" for the data stack (ALU), the
return stack (destination for literals, addresses for all operations and
looping control) and the instruction fetch. If each of these had
separate fields in the instruction word there might be more opportunity
for parallelism. But as I said, I didn't pursue this because the
software support would be messy. I'd like to get back to that, but it
won't happen any time soon.

--

Rick C
 
I wonder if they ever gave it to him ?
Would they do it now that these chips are no longer produced?
 
lolinka04@gmail.com wrote:
I wonder if they ever gave it to him ?
Would they do it now that these chips are no longer produced?

If you're going to reply to an 18-year-old post, it would
be nice to quote the thread for those who don't keep that
many headers downloaded.

In any case, AMD is long out of the SPLD business, but the
22V10 still lives on. Atmel is making them as the
ATF22V10:

http://www.atmel.com/Images/doc0735.pdf

The data sheet mentions "flash" technology, so I assume the
programming algorithm has changed since the EEPROM versions
made by Lattice, TI, and AMD.

Are you planning to build your own programmer?

--
Gabor
 

Welcome to EDABoard.com

Sponsor

Back
Top