EDAboard.com | EDAboard.de | EDAboard.co.uk | WTWH Media

Tiny CPUs for Slow Logic

Ask a question - edaboard.com

elektroda.net NewsGroups Forum Index - FPGA - Tiny CPUs for Slow Logic

Goto page Previous  1, 2, 3, 4, 5  Next

Theo
Guest

Wed Mar 20, 2019 12:45 pm   



gnuarm.deletethisbit_at_gmail.com wrote:
Quote:
On Tuesday, March 19, 2019 at 10:29:07 AM UTC-4, Theo Markettos wrote:

When people talk about things like "software running on such heterogeneous
cores" it makes me think they don't really understand how this could be
used. If you treat these small cores like logic elements, you don't have
such lofty descriptions of "system software" since the software isn't
created out of some global software package. Each core is designed to do
a specific job just like any other piece of hardware and it has discrete
inputs and outputs just like any other piece of hardware. If the hardware
clock is not too fast, the software can synchronize with and literally
function like hardware, but implementing more complex logic than the same
area of FPGA fabric might.


The point is that we need to understand what the whole system is doing. In
the XMOS case, we can look at a piece of software with N threads, running
across the cores provided on the chip. One piece of software, distributed
over the hardware resource available - the system is doing one thing.

Your bottom-up approach means it's difficult to see the big picture of
what's going on. That means it's hard to understand the whole system, and
to program from a whole-system perspective.

Quote:
Not sure what is hard to think about. It's a CPU, a small CPU with
limited memory to implement small tasks that can do rather complex
operations compared to a state machine really and includes memory,
arithmetic and logic as well as I/O without having to write a single line
of HDL. Only the actual app needs to be written.


Here are the sematic descriptions of basic logic elements:

LUT: q = f(x,y,z)
FF: q <= d_in (delay of one cycle)
BRAM: q = array[addr]
DSP: q = a*b + c

A P&R tool can build a system out of these building blocks. It's notable
that the state-holding elements in this schema do nothing else except
holding state. That makes writing the tools easier (and we all know how
difficult the tools already are). In general, we don't tend to instantiate
these primitives manually but describe the higher level functions (eg a 64
bit add) in HDL and allow the tools to select appropriate primitives for us
(eg a number of fast-adder blocks chained together).

What's the logic equation of a processor? It has state, but vastly more
state than the simplicity of a flipflop. What pattern does the P&R tool
need to match to infer a processor? How is any verification tool going
to understand whether the processor with software is doing the right thing?

If your answer is 'we don't need verification tools, we program by hand'
then a) software has bugs, and automated verification is a handy way to
catch them, and b) you're never going to be writing hundreds of different
mini-programs to run on each core, let alone make them correct.

If we scale the processors up a bit, I could see the merits in say a bank
of, say, 32 Cortex M0s that could be interconnected as part of the FPGA
fabric and programmed in software for dedicated tasks (for instance, read
the I2C EEPROM on the DRAM DIMM and configure the DRAM controller at boot).
But this is an SoC construct (built using SoC builder tools, and over which
the programmer has some purview although, as it turns out, sketchier than
you might think[1]). Such CPUs would likely be running bigger corpora of
software (for instance, the DRAM controller vendor's provided initialisation
code) which would likely be in C. But in this case we could just use a
soft-core today (the CPU ISA is most irrelevant for this application, so a
RISC-V/Microblaze/NIOS would be fine).

[1] https://inf.ethz.ch/personal/troscoe/pubs/hotos15-gerber.pdf

I can also see another niche, at the extreme bottom end, where a CPLD might
have one of your processors plus a few hundred logic cells. That's
essentially a microcontroller with FPGA, or an FPGA with microcontroller -
which some of the vendors already produce (although possibly not
small/cheap/low power enough). Here I can't see the advantages of using a
stack-based CPU versus paying a bit more to program in C. Although I don't
have experience in markets where the retail price of the product is $1, and so
every $0.001 matters.

Quote:
I would be interested to know what applications might use heterogenous
many-cores and what performance is achievable.

Yes, clearly not getting the concept. Asking about heterogeneous
performance is totally antithetical to this idea.


You keep mentioning 700 MIPS, which suggests performance is important. If
these are simple state machine replacements, why do we care about
performance?


In essence, your proposal has a disconnect between the situations existing
FPGA blocks are used (implemented automatically by P&R tools) and the
situations software is currently used (human-driven software and
architectural design). It's unclear how you claim to bridge this gap.

Theo


Guest

Wed Mar 20, 2019 12:45 pm   



On Tuesday, March 19, 2019 at 10:07:38 PM UTC+2, Tom Gardner wrote:
Quote:
On 19/03/19 17:35, already5chosen_at_yahoo.com wrote:
On Tuesday, March 19, 2019 at 6:19:36 PM UTC+2, Tom Gardner wrote:
The "granularity" of the computation and communication will be a key to
understanding what the OP is thinking.

I don't know what Rick had in mind. I personally would go for one "hard-CPU"
block per 4000-5000 6-input logic elements (i.e. Altera ALMs or Xilinx CLBs).
Each block could be configured either as one 64-bit core or pair of 32-bit
cores. The bock would contains hard instruction decoders/ALUs/shifters and
hard register files. It can optionally borrow adjacent DSP blocks for
multipliers. Adjacent embedded memory blocks can be used for data memory.
Code memory should be a bit more flexible giving to designer a choice between
embedded memory blocks or distributed memory (X)/MLABs(A).

It would be interesting to find an application level
description (i.e. language constructs) that
- could be automatically mapped onto those primitives
by a toolset
- was useful for more than a niche subset of applications
- was significantly better than existing tools

I wouldn't hold my breath Smile


I think, you are looking at it from wrong angle.
One doesn't really need new tools to design and simulate such things. What's needed is a combinations of existing tools - compilers, assemblers, probably software simulator plug-ins into existing HDL simulators, but the later is just luxury for speeding up simulations, in principle, feeding HDL simulator with RTL model of the CPU core will work too.

As to niches, all "hard" blocks that we currently have in FPGAs are about niches. It's extremely rare that user's design uses all or majority of the features of given FPGA device and need LUTs, embedded memories, PLLs, multiplies, SERDESs, DDR DRAM I/O blocks etc in exact amounts appearing in the device.
It still makes sense, economically, to have them all built in, because masks and other NREs are mighty expensive while silicon itself is relatively cheap. Multiple small hard CPU cores are really not very different from features, mentioned above.

Tom Gardner
Guest

Wed Mar 20, 2019 12:45 pm   



On 20/03/19 10:41, already5chosen_at_yahoo.com wrote:
Quote:
On Tuesday, March 19, 2019 at 10:07:38 PM UTC+2, Tom Gardner wrote:
On 19/03/19 17:35, already5chosen_at_yahoo.com wrote:
On Tuesday, March 19, 2019 at 6:19:36 PM UTC+2, Tom Gardner wrote:

The UK Parliament is an unmitigated dysfunctional mess.


Do you prefer dysfunctional mesh ;)

:) I'll settle for anything that /works/ predictably :(


UK political system is completely off-topic in comp.arch.fpga. However I'd say that IMHO right now your parliament is facing unusually difficult problem on one hand, but at the same time it's not really "life or death" sort of the problem. Having troubles and appearing non-decisive in such situation is normal. It does not mean that the system is broken.


Firstly, you chose to snip the analogy, thus removing the context.

Secondly, actually currently there are /very/ plausible reasons
to believe it might be life or death for my 98yo mother, and
may hasten my death. No, I'm not going to elaborate on a public
forum.

I will note that Operation Yellowhammer will, barring miracles,
be started on Monday, and that a prominent *brexiteer* (Michael Gove)
is shit scared of a no-deal exit because all the chemicals required
to purify our drinking water come from Europe.

Theo Markettos
Guest

Wed Mar 20, 2019 1:45 pm   



already5chosen_at_yahoo.com wrote:
Quote:
As to niches, all "hard" blocks that we currently have in FPGAs are about
niches. It's extremely rare that user's design uses all or majority of
the features of given FPGA device and need LUTs, embedded memories, PLLs,
multiplies, SERDESs, DDR DRAM I/O blocks etc in exact amounts appearing in
the device. It still makes sense, economically, to have them all built
in, because masks and other NREs are mighty expensive while silicon itself
is relatively cheap. Multiple small hard CPU cores are really not very
different from features, mentioned above.


A lot of these 'niches' have been proven in soft-logic.

Implement your system in soft-logic, discover that there's lots of
multiply-adds and they're slow and take up area. A DSP block is thus an
'accelerator' (or 'most compact representation') of the same concept in
soft-logic.

The same goes for BRAMs (can be implemented via registers but too much
area), adders (slow when implemented with generic LUTs), etc.

Other features (SERDES, PLLs, DDR, etc) can't be done at all without
hard-logic support. If you want those features, you need the hard logic,
simple as that.

Through analysis of existing designs we can have a provable win of the hard
over soft logic, to make it worthwhile putting it on the silicon and
integrating into the tools. In some of these cases, I'd guess the win over
the soft-logic is 10x or more saving in area.

Rick's idea can be done today in soft-logic. So someone could build a proof
of concept and measure the cases where it improves things over the baseline.
If that case is compelling, let's put it in the hard logic.

But thus far we haven't seen a clear case for why someone should build a
proof of concept. I'm not saying it doesn't exist, but we need a clear
elucidation of the problem that it might solve.

Theo

Tom Gardner
Guest

Wed Mar 20, 2019 2:45 pm   



On 20/03/19 10:56, already5chosen_at_yahoo.com wrote:
Quote:
On Tuesday, March 19, 2019 at 10:07:38 PM UTC+2, Tom Gardner wrote:
On 19/03/19 17:35, already5chosen_at_yahoo.com wrote:
On Tuesday, March 19, 2019 at 6:19:36 PM UTC+2, Tom Gardner wrote:
The "granularity" of the computation and communication will be a key
to understanding what the OP is thinking.

I don't know what Rick had in mind. I personally would go for one
"hard-CPU" block per 4000-5000 6-input logic elements (i.e. Altera ALMs
or Xilinx CLBs). Each block could be configured either as one 64-bit core
or pair of 32-bit cores. The bock would contains hard instruction
decoders/ALUs/shifters and hard register files. It can optionally borrow
adjacent DSP blocks for multipliers. Adjacent embedded memory blocks can
be used for data memory. Code memory should be a bit more flexible giving
to designer a choice between embedded memory blocks or distributed memory
(X)/MLABs(A).

It would be interesting to find an application level description (i.e.
language constructs) that - could be automatically mapped onto those
primitives by a toolset - was useful for more than a niche subset of
applications - was significantly better than existing tools

I wouldn't hold my breath :)


I think, you are looking at it from wrong angle. One doesn't really need new
tools to design and simulate such things. What's needed is a combinations of
existing tools - compilers, assemblers, probably software simulator plug-ins
into existing HDL simulators, but the later is just luxury for speeding up
simulations, in principle, feeding HDL simulator with RTL model of the CPU
core will work too.


That would be one perfectly acceptable embodiment of a toolset
that I mentioned.

But more difficult that creating such a toolset is defining
an application level description that a toolset can munge.

So, define (initially by example, later more formally) inputs
to the toolset and outputs from it. Then we can judge whether
the concepts are more than handwaving wishes.



Quote:
As to niches, all "hard" blocks that we currently have in FPGAs are about
niches. It's extremely rare that user's design uses all or majority of the
features of given FPGA device and need LUTs, embedded memories, PLLs,
multiplies, SERDESs, DDR DRAM I/O blocks etc in exact amounts appearing in
the device. It still makes sense, economically, to have them all built in,
because masks and other NREs are mighty expensive while silicon itself is
relatively cheap. Multiple small hard CPU cores are really not very different
from features, mentioned above.


All the blocks you mention have a simple API and easily
enumerated set of behaviour.

The whole point of processors is that they enable much more
complex behaviour that is practically impossible to enumerate.

Alternatively, if it is possible to enumerate the behaviour
of a processor, then it would be easy and more efficient to
implement the behaviour in conventional logic blocks.

Tom Gardner
Guest

Wed Mar 20, 2019 3:45 pm   



On 20/03/19 14:11, already5chosen_at_yahoo.com wrote:
Quote:
On Wednesday, March 20, 2019 at 3:37:17 PM UTC+2, Tom Gardner wrote:

But more difficult that creating such a toolset is defining an application
level description that a toolset can munge.

So, define (initially by example, later more formally) inputs to the
toolset and outputs from it. Then we can judge whether the concepts are
more than handwaving wishes.


I don't understand what you are asking for.


Go back and read the parts of my post that you chose to snip.

Give a handwaving indication of the concepts that avoid the
conceptual problems that I mentioned.

Or better still, get the OP to do it.



Quote:
If I had such thing, I'd use it in exactly the same way that I use soft cores
(Nios2) today. I will just use them more frequently, because today it costs
me logic resources (often acceptable, but not always) and synthesis and
fitter time (and that what I really hate). On the other hand, "hard" core
would be almost free in both aspects. It would be as expensive as "soft" or
even costlier, in HDL simulations, but until now I managed to avoid "full
system" simulations that cover everything including CPU core and the program
that runs on it. Or may be, I did it once or twice years ago and already
don't remember. Anyway, for me it's not an important concern and I consider
myself rather heavy user of soft cores.

Also, theoretically, if performance of the hard core is non-trivially higher
than that of soft cores, either due to higher IPC (I didn't measure, but
would guess that for majority of tasks Nios2-f IPC is 20-30% lower than ARM
Cortex-M4) or due to higher clock rate, then it will open up even more
niches. However I'd expect that performance factor would be less important
for me, personally, than other factors mentioned above.



Guest

Wed Mar 20, 2019 3:45 pm   



On Wednesday, March 20, 2019 at 3:37:17 PM UTC+2, Tom Gardner wrote:
Quote:

But more difficult that creating such a toolset is defining
an application level description that a toolset can munge.

So, define (initially by example, later more formally) inputs
to the toolset and outputs from it. Then we can judge whether
the concepts are more than handwaving wishes.


I don't understand what you are asking for.

If I had such thing, I'd use it in exactly the same way that I use soft cores (Nios2) today. I will just use them more frequently, because today it costs me logic resources (often acceptable, but not always) and synthesis and fitter time (and that what I really hate). On the other hand, "hard" core would be almost free in both aspects.
It would be as expensive as "soft" or even costlier, in HDL simulations, but until now I managed to avoid "full system" simulations that cover everything including CPU core and the program that runs on it. Or may be, I did it once or twice years ago and already don't remember. Anyway, for me it's not an important concern and I consider myself rather heavy user of soft cores.

Also, theoretically, if performance of the hard core is non-trivially higher than that of soft cores, either due to higher IPC (I didn't measure, but would guess that for majority of tasks Nios2-f IPC is 20-30% lower than ARM Cortex-M4) or due to higher clock rate, then it will open up even more niches. However I'd expect that performance factor would be less important for me, personally, than other factors mentioned above.


Guest

Wed Mar 20, 2019 4:45 pm   



On Wednesday, March 20, 2019 at 6:14:21 AM UTC-4, David Brown wrote:
Quote:
On 20/03/2019 03:30, gnuarm.deletethisbit_at_gmail.com wrote:
On Tuesday, March 19, 2019 at 10:29:07 AM UTC-4, Theo Markettos
wrote:
Tom Gardner <spamjunk_at_blueyonder.co.uk> wrote:
Understand XMOS's xCORE processors and xC language, see how they
complement and support each other. I found the net result
stunningly easy to get working first time, without having to
continually read obscure errata!

I can see the merits of the XMOS approach. But I'm unclear how
this relates to the OP's proposal, which (I think) is having tiny
CPUs as hard logic blocks on an FPGA, like DSP blocks.

I completely understand the problem of running out of hardware
threads, so a means of 'just add another one' is handy. But the
issue is how to combine such things with other synthesised logic.

The XMOS approach is fine when the hardware is uniform and the
software sits on top, but when the hardware is synthesised and the
'CPUs' sit as pieces in a fabric containing random logic (as I
think the OP is suggesting) it becomes a lot harder to reason about
what the system is doing and what the software running on such
heterogeneous cores should look like. Only the FPGA tools have a
full view of what the system looks like, and it seems stretching
them to have them also generate software to run on these cores.

When people talk about things like "software running on such
heterogeneous cores" it makes me think they don't really understand
how this could be used. If you treat these small cores like logic
elements, you don't have such lofty descriptions of "system software"
since the software isn't created out of some global software package.
Each core is designed to do a specific job just like any other piece
of hardware and it has discrete inputs and outputs just like any
other piece of hardware. If the hardware clock is not too fast, the
software can synchronize with and literally function like hardware,
but implementing more complex logic than the same area of FPGA fabric
might.


That is software.

If you want to try to get cycle-precise control of the software and use
that precision for direct hardware interfacing, you are almost certainly
going to have a poor, inefficient and difficult design. It doesn't
matter if you say "think of it like logic" - it is /not/ logic, it is
software, and you don't use that for cycle-precise control. You use
when you need flexibility, calculations, and decisions.


I suppose you can make anything difficult if you try hard enough.

The point is you don't have to make it difficult by talking about "software running on such heterogeneous cores". Just talk about it being a small hunk of software that is doing a specific job. Then the mystery is gone and the task can be made as easy as the task is.

In VHDL this would be a process(). VHDL programs are typically chock full of processes and no one wrings their hands worrying about how they will design the "software running on such heterogeneous cores".

BTW, VHDL is software too.


Quote:
There is no need to think about how the CPUs would communicate unless
there is a specific need for them to do so. The F18A uses a
handshaked parallel port in their design. They seem to have done a
pretty slick job of it and can actually hang the processor waiting
for the acknowledgement saving power and getting an instantaneous
wake up following the handshake. This can be used with other CPUs or


Fair enough.


Ok, that's a start.

Rick C.


Guest

Wed Mar 20, 2019 4:45 pm   



On Wednesday, March 20, 2019 at 4:31:27 PM UTC+2, Tom Gardner wrote:
Quote:
On 20/03/19 14:11, already5chosen_at_yahoo.com wrote:
On Wednesday, March 20, 2019 at 3:37:17 PM UTC+2, Tom Gardner wrote:

But more difficult that creating such a toolset is defining an application
level description that a toolset can munge.

So, define (initially by example, later more formally) inputs to the
toolset and outputs from it. Then we can judge whether the concepts are
more than handwaving wishes.


I don't understand what you are asking for.

Go back and read the parts of my post that you chose to snip.

Give a handwaving indication of the concepts that avoid the
conceptual problems that I mentioned.


Frankly, it starts to sound like you never used soft CPU cores in your designs.
So, for somebody like myself, who uses them routinely for different tasks since 2006, you are really not easy to understand.
Concept? Concepts are good for new things, not for something that is a variation of something old and routine and obviously working.

Quote:

Or better still, get the OP to do it.


With that part I agree.


Guest

Wed Mar 20, 2019 4:45 pm   



On Wednesday, March 20, 2019 at 6:29:50 AM UTC-4, already...@yahoo.com wrote:
Quote:
On Wednesday, March 20, 2019 at 4:32:07 AM UTC+2, gnuarm.del...@gmail.com wrote:
On Tuesday, March 19, 2019 at 11:24:33 AM UTC-4, Svenn Are Bjerkem wrote:
On Tuesday, March 19, 2019 at 1:13:38 AM UTC+1, gnuarm.del...@gmail.com wrote:
Most of us have implemented small processors for logic operations that don't need to happen at high speed. Simple CPUs can be built into an FPGA using a very small footprint much like the ALU blocks. There are stack based processors that are very small, smaller than even a few kB of memory..

If they were easily programmable in something other than C would anyone be interested? Or is a C compiler mandatory even for processors running very small programs?

I am picturing this not terribly unlike the sequencer I used many years ago on an I/O board for an array processor which had it's own assembler. It was very simple and easy to use, but very much not a high level language. This would have a language that was high level, just not C rather something extensible and simple to use and potentially interactive.

Rick C.

picoblaze is such a small cpu and I would like to program it in something else but its assembler language.

Yes, it is small. How large is the program you are interested in?

Rick C.

I don't know about Svenn Are Bjerkem, but can tell you about myself.
Last time when I considered something like that and wrote enough of the program to make measurements the program contained ~250 Nios2 instructions. I'd guess, on minimalistic stack machine it would take 350-400 instructions..
At the end, I didn't do it in software. Coding the same functionality in HDL turned out to be not hard, which probably suggests that my case was smaller than average.

Another extreme, where I did end up using "small" soft core, it was much more like "real" software: 2300 Nios2 instructions.


What sorts of applications where these?

Rick C.


Guest

Wed Mar 20, 2019 4:45 pm   



On Wednesday, March 20, 2019 at 6:41:55 AM UTC-4, already...@yahoo.com wrote:
Quote:
On Tuesday, March 19, 2019 at 10:07:38 PM UTC+2, Tom Gardner wrote:
On 19/03/19 17:35, already5chosen_at_yahoo.com wrote:
On Tuesday, March 19, 2019 at 6:19:36 PM UTC+2, Tom Gardner wrote:

The UK Parliament is an unmitigated dysfunctional mess.


Do you prefer dysfunctional mesh ;)

:) I'll settle for anything that /works/ predictably :(


UK political system is completely off-topic in comp.arch.fpga. However I'd say that IMHO right now your parliament is facing unusually difficult problem on one hand, but at the same time it's not really "life or death" sort of the problem. Having troubles and appearing non-decisive in such situation is normal. It does not mean that the system is broken.


I was watching a video of a guy who bangs together Teslas from salvage cars.. This one was about him actually buying a used Tesla from Tesla and the many trials and tribulations he had. He had traveled to a dealership over an hour drive away and they said they didn't have anything for him. At one point he says he is not going to get too wigged out over all this because it is a "first world problem". That gave me insight into my own issues realizing that what seems at first to me to be a major issue, is an issue that much of the world would LOVE to have.

I'm wondering if Brexit is not one of those issues... I'm just sayin'...

FPGA design is similar. Consider which of your issues are "first world" issues when you design.

Rick C.


Guest

Wed Mar 20, 2019 4:45 pm   



On Wednesday, March 20, 2019 at 6:53:07 AM UTC-4, Theo wrote:
Quote:
gnuarm.deletethisbit_at_gmail.com wrote:
On Tuesday, March 19, 2019 at 10:29:07 AM UTC-4, Theo Markettos wrote:

When people talk about things like "software running on such heterogeneous
cores" it makes me think they don't really understand how this could be
used. If you treat these small cores like logic elements, you don't have
such lofty descriptions of "system software" since the software isn't
created out of some global software package. Each core is designed to do
a specific job just like any other piece of hardware and it has discrete
inputs and outputs just like any other piece of hardware. If the hardware
clock is not too fast, the software can synchronize with and literally
function like hardware, but implementing more complex logic than the same
area of FPGA fabric might.

The point is that we need to understand what the whole system is doing. In
the XMOS case, we can look at a piece of software with N threads, running
across the cores provided on the chip. One piece of software, distributed
over the hardware resource available - the system is doing one thing.

Your bottom-up approach means it's difficult to see the big picture of
what's going on. That means it's hard to understand the whole system, and
to program from a whole-system perspective.


I never mentioned a bottom up or a top down approach to design. Nothing about using these small CPUs is about the design "direction". I am pretty sure that you have to define the circuit they will work in before you can start designing the code.


Quote:
Not sure what is hard to think about. It's a CPU, a small CPU with
limited memory to implement small tasks that can do rather complex
operations compared to a state machine really and includes memory,
arithmetic and logic as well as I/O without having to write a single line
of HDL. Only the actual app needs to be written.

Here are the sematic descriptions of basic logic elements:

LUT: q = f(x,y,z)
FF: q <= d_in (delay of one cycle)
BRAM: q = array[addr]
DSP: q = a*b + c

A P&R tool can build a system out of these building blocks. It's notable
that the state-holding elements in this schema do nothing else except
holding state. That makes writing the tools easier (and we all know how
difficult the tools already are). In general, we don't tend to instantiate
these primitives manually but describe the higher level functions (eg a 64
bit add) in HDL and allow the tools to select appropriate primitives for us
(eg a number of fast-adder blocks chained together).

What's the logic equation of a processor?


Obviously it is like a combination of LUTs with FFs and able to implement any logic you wish including math. BTW, in many devices the elements are not at all so simple. Xilinx LUTs can be used as shift registers. There are additional logic within the logic blocks that allow math with carry chains, combining LUTs to form larger LUTs, breaking LUTs into smaller LUTs and lets not forget about routing which may not be used much anymore, not sure.

So your simple world of four elements is really not so valid.


Quote:
It has state, but vastly more
state than the simplicity of a flipflop. What pattern does the P&R tool
need to match to infer a processor?


Why does it need to be inferred. If you want to write an HDL tool to turn HDL into processor code, have at it. But then there are other methods. Someone mentioned his MO is to use other tools for designing his algorithms and letting that tool generate the software for a processor or the HDL for an FPGA. That would seem easy enough to integrate.


Quote:
How is any verification tool going
to understand whether the processor with software is doing the right thing?


Huh? You can't simulate code on a processor???


Quote:
If your answer is 'we don't need verification tools, we program by hand'
then a) software has bugs, and automated verification is a handy way to
catch them, and b) you're never going to be writing hundreds of different
mini-programs to run on each core, let alone make them correct.


You seem to have left the roadway here. I'm lost.


Quote:
If we scale the processors up a bit, I could see the merits in say a bank
of, say, 32 Cortex M0s that could be interconnected as part of the FPGA
fabric and programmed in software for dedicated tasks (for instance, read
the I2C EEPROM on the DRAM DIMM and configure the DRAM controller at boot).


I don't follow your logic. What is different about the ARM processor from the stack processor other than that it is larger and slower and requires a royalty on each one? Are you talking about writing the code in C vs. what ever is used for the stack processor?


Quote:
But this is an SoC construct (built using SoC builder tools, and over which
the programmer has some purview although, as it turns out, sketchier than
you might think[1]). Such CPUs would likely be running bigger corpora of
software (for instance, the DRAM controller vendor's provided initialisation
code) which would likely be in C. But in this case we could just use a
soft-core today (the CPU ISA is most irrelevant for this application, so a
RISC-V/Microblaze/NIOS would be fine).

[1] https://inf.ethz.ch/personal/troscoe/pubs/hotos15-gerber.pdf


The point of the many hard cores is the saving of resources. Soft cores would be the most wasteful way to implement logic. If the application is large enough they can implement things in software that aren't as practical in HDL, but that would be a different class of logic from the tiny CPUs I'm talking about.


Quote:
I can also see another niche, at the extreme bottom end, where a CPLD might
have one of your processors plus a few hundred logic cells. That's
essentially a microcontroller with FPGA, or an FPGA with microcontroller -
which some of the vendors already produce (although possibly not
small/cheap/low power enough). Here I can't see the advantages of using a
stack-based CPU versus paying a bit more to program in C. Although I don't
have experience in markets where the retail price of the product is $1, and so
every $0.001 matters.

I would be interested to know what applications might use heterogenous
many-cores and what performance is achievable.

Yes, clearly not getting the concept. Asking about heterogeneous
performance is totally antithetical to this idea.

You keep mentioning 700 MIPS, which suggests performance is important. If
these are simple state machine replacements, why do we care about
performance?


You lost me with the gear shift. The mention of instruction rate is about the CPU being fast enough to keep up with FPGA logic. The issue with "heterogeneous performance" is the "heterogeneous" part, lumping the many CPUs together to create some sort of number cruncher. That's not what this is about. Like in the GA144, I fully expect most CPUs to be sitting around most of the time idling, waiting for data. This is a good thing actually. These CPUs could consume significant current if they run at GHz all the time. I believe in the GA144 at that slower rate each processor can use around 2..5 mA. Not sure if a smaller process would use more or less power when running flat out. It's been too many years since I worked with those sorts of numbers.


Quote:
In essence, your proposal has a disconnect between the situations existing
FPGA blocks are used (implemented automatically by P&R tools) and the
situations software is currently used (human-driven software and
architectural design). It's unclear how you claim to bridge this gap.


I don't usually think of designing in those terms. If I want to design something, I design it. I ignore many tools only using the ones I find useful.. In this case I would have no problem writing code for the processor and if needed, rolling into the FPGA simulation a model of the processor to run the code. In a professional implementation I would expect these models to be written for me in modules that run much faster than HDL so the simulation speed is not impacted.

I certainly don't see how P&R tools would be a problem. They accommodate multipliers, DSP blocks, memory block and many, many special bits of assorted components inside the FPGAs which vary from vendor to vendor. Clock generators and distribution is pretty unique to each manufacturer. Lattice has all sorts of modules to offer like I2C and embedded Flash. Then there are entire CPUs embedded in FPGAs. Why would supporting them be so different from what I am talking about?

Rick C.

David Brown
Guest

Wed Mar 20, 2019 4:45 pm   



On 20/03/2019 15:50, gnuarm.deletethisbit_at_gmail.com wrote:
Quote:
On Wednesday, March 20, 2019 at 6:14:21 AM UTC-4, David Brown wrote:
On 20/03/2019 03:30, gnuarm.deletethisbit_at_gmail.com wrote:
On Tuesday, March 19, 2019 at 10:29:07 AM UTC-4, Theo Markettos
wrote:
Tom Gardner <spamjunk_at_blueyonder.co.uk> wrote:
Understand XMOS's xCORE processors and xC language, see how
they complement and support each other. I found the net
result stunningly easy to get working first time, without
having to continually read obscure errata!

I can see the merits of the XMOS approach. But I'm unclear
how this relates to the OP's proposal, which (I think) is
having tiny CPUs as hard logic blocks on an FPGA, like DSP
blocks.

I completely understand the problem of running out of hardware
threads, so a means of 'just add another one' is handy. But
the issue is how to combine such things with other synthesised
logic.

The XMOS approach is fine when the hardware is uniform and the
software sits on top, but when the hardware is synthesised and
the 'CPUs' sit as pieces in a fabric containing random logic
(as I think the OP is suggesting) it becomes a lot harder to
reason about what the system is doing and what the software
running on such heterogeneous cores should look like. Only the
FPGA tools have a full view of what the system looks like, and
it seems stretching them to have them also generate software to
run on these cores.

When people talk about things like "software running on such
heterogeneous cores" it makes me think they don't really
understand how this could be used. If you treat these small
cores like logic elements, you don't have such lofty descriptions
of "system software" since the software isn't created out of some
global software package. Each core is designed to do a specific
job just like any other piece of hardware and it has discrete
inputs and outputs just like any other piece of hardware. If the
hardware clock is not too fast, the software can synchronize with
and literally function like hardware, but implementing more
complex logic than the same area of FPGA fabric might.


That is software.

If you want to try to get cycle-precise control of the software and
use that precision for direct hardware interfacing, you are almost
certainly going to have a poor, inefficient and difficult design.
It doesn't matter if you say "think of it like logic" - it is /not/
logic, it is software, and you don't use that for cycle-precise
control. You use when you need flexibility, calculations, and
decisions.

I suppose you can make anything difficult if you try hard enough.


Equally, you can make anything sound simple if you are vague enough and
wave your hands around.

Quote:
The point is you don't have to make it difficult by talking about
"software running on such heterogeneous cores". Just talk about it
being a small hunk of software that is doing a specific job. Then
the mystery is gone and the task can be made as easy as the task is.


I did not use the phrase "software running on such heterogeneous cores"
- and I am not trying to make anything difficult. You are making cpu
cores. They run software. Saying they are "like logic elements" or
"they connect directly to hardware" does not make it so - and it does
not mean that what they run is not software.

Quote:

In VHDL this would be a process(). VHDL programs are typically chock
full of processes and no one wrings their hands worrying about how
they will design the "software running on such heterogeneous cores".


BTW, VHDL is software too.


I agree that VHDL is software. And yes, there are usually processes in
VHDL designs.

I am not /worrying/ about these devices running software - I am simply
saying that they /will/ be running software. I can't comprehend why you
want to deny that. It seems that you are frightened of software or
programmers, and want to call it anything /but/ software.

If the software a core is running is simple enough to be described in
VHDL, then it should be a VHDL process - not software in a cpu core. If
it is too complex for that, it is going to have to be programmed
separately in an appropriate language. That is not necessarily harder
or easier than VHDL design - it is just different.

If you try to force the software to be synchronous with timing on the
hardware, /then/ you are going to be in big difficulties. So don't do
that - use hardware for the tightest timing, and software for the bits
that software is good for.


Quote:

There is no need to think about how the CPUs would communicate
unless there is a specific need for them to do so. The F18A uses
a handshaked parallel port in their design. They seem to have
done a pretty slick job of it and can actually hang the processor
waiting for the acknowledgement saving power and getting an
instantaneous wake up following the handshake. This can be used
with other CPUs or


Fair enough.

Ok, that's a start.


I'd expect that the sensible way to pass data between these, if you need
to do so much, is using FIFO's.


Guest

Wed Mar 20, 2019 5:45 pm   



On Wednesday, March 20, 2019 at 5:51:21 PM UTC+2, Tom Gardner wrote:
Quote:
On 20/03/19 14:51, already5chosen_at_yahoo.com wrote:
On Wednesday, March 20, 2019 at 4:31:27 PM UTC+2, Tom Gardner wrote:
On 20/03/19 14:11, already5chosen_at_yahoo.com wrote:
On Wednesday, March 20, 2019 at 3:37:17 PM UTC+2, Tom Gardner wrote:

But more difficult that creating such a toolset is defining an application
level description that a toolset can munge.

So, define (initially by example, later more formally) inputs to the
toolset and outputs from it. Then we can judge whether the concepts are
more than handwaving wishes.


I don't understand what you are asking for.

Go back and read the parts of my post that you chose to snip.

Give a handwaving indication of the concepts that avoid the
conceptual problems that I mentioned.

Frankly, it starts to sound like you never used soft CPU cores in your designs.
So, for somebody like myself, who uses them routinely for different tasks since 2006, you are really not easy to understand.

Professionally, since 1978 I've done everything from low noise
analogue electronics, many hardware-software systems using
all sorts of technologies, networking at all levels of the
protocol stack, "up" to high availability distributed soft
real-time systems.

And almost all of that has been on the bleeding edge.

So, yes, I do have more than a passing acquaintance with
the characteristics of many hardware and software technologies,
and where partitions between them can, should and should not
be drawn.


Is it sort of admission that you indeed never designed with soft cores?

Quote:

Concept? Concepts are good for new things, not for something that is a variation of something old and routine and obviously working.

Whatever is being proposed, is it old or new?

If old then the OP needs enlightenment and concrete
examples can easily be noted.

If new, then provide the concepts.


It is a new variation of of old concept.
A cross between PPCs in ancient VirtexPro and soft cores virtually everywhere in more modern times.
Probably, best characterized by what is not alike: it is not alike Xilinx Zynq or Altera Cyclone5-HPS.

"New" part comes more from new economics of sub-20nm processes than from abstractions that you try to draf into it. NRE is more and more expensive, gates are more and more cheap (Well, the cost of gates started to stagnate in last couple of years, but that does not matter. What's matter is that at something like TSMC 12nm gate are already quite cheap). So, adding multiple small CPU cores that could be used as replacement for multiple soft CPU cores that people already used to use today, now starts to make sense. May be, it's not a really good proposition, but at these silicon geometries it can't be written out as obviously stupid proposition.

It appears that I don't agree with Rick about "how small is small" and respectively about how many of them should be placed on die, but we probably agree about percentage of the area of FPGA that intuitively seem worth to allocate for such feature - more than 1% but less than 5%.
Also he appears to like stack-based ISAs while I lean toward more conventional 32-bit or 32/64-bit RISC, or, may be, even toward modern CISC akin to Renesas RX, but those are relatively minor details.


Quote:

Or better still, get the OP to do it.


With that part I agree.



Guest

Wed Mar 20, 2019 5:45 pm   



On Wednesday, March 20, 2019 at 11:30:15 AM UTC-4, David Brown wrote:
Quote:
On 20/03/2019 15:50, gnuarm.deletethisbit_at_gmail.com wrote:
On Wednesday, March 20, 2019 at 6:14:21 AM UTC-4, David Brown wrote:
On 20/03/2019 03:30, gnuarm.deletethisbit_at_gmail.com wrote:
On Tuesday, March 19, 2019 at 10:29:07 AM UTC-4, Theo Markettos
wrote:
Tom Gardner <spamjunk_at_blueyonder.co.uk> wrote:
Understand XMOS's xCORE processors and xC language, see how
they complement and support each other. I found the net
result stunningly easy to get working first time, without
having to continually read obscure errata!

I can see the merits of the XMOS approach. But I'm unclear
how this relates to the OP's proposal, which (I think) is
having tiny CPUs as hard logic blocks on an FPGA, like DSP
blocks.

I completely understand the problem of running out of hardware
threads, so a means of 'just add another one' is handy. But
the issue is how to combine such things with other synthesised
logic.

The XMOS approach is fine when the hardware is uniform and the
software sits on top, but when the hardware is synthesised and
the 'CPUs' sit as pieces in a fabric containing random logic
(as I think the OP is suggesting) it becomes a lot harder to
reason about what the system is doing and what the software
running on such heterogeneous cores should look like. Only the
FPGA tools have a full view of what the system looks like, and
it seems stretching them to have them also generate software to
run on these cores.

When people talk about things like "software running on such
heterogeneous cores" it makes me think they don't really
understand how this could be used. If you treat these small
cores like logic elements, you don't have such lofty descriptions
of "system software" since the software isn't created out of some
global software package. Each core is designed to do a specific
job just like any other piece of hardware and it has discrete
inputs and outputs just like any other piece of hardware. If the
hardware clock is not too fast, the software can synchronize with
and literally function like hardware, but implementing more
complex logic than the same area of FPGA fabric might.


That is software.

If you want to try to get cycle-precise control of the software and
use that precision for direct hardware interfacing, you are almost
certainly going to have a poor, inefficient and difficult design.
It doesn't matter if you say "think of it like logic" - it is /not/
logic, it is software, and you don't use that for cycle-precise
control. You use when you need flexibility, calculations, and
decisions.

I suppose you can make anything difficult if you try hard enough.


Equally, you can make anything sound simple if you are vague enough and
wave your hands around.


Not trying to make it sound "simple". Just saying it can be useful and not the same as designing a chip with many CPUs for the purpose of providing lots of MIPS to crunch numbers. Those ideas and methods don't apply here.


Quote:
The point is you don't have to make it difficult by talking about
"software running on such heterogeneous cores". Just talk about it
being a small hunk of software that is doing a specific job. Then
the mystery is gone and the task can be made as easy as the task is.


I did not use the phrase "software running on such heterogeneous cores"
- and I am not trying to make anything difficult. You are making cpu
cores. They run software. Saying they are "like logic elements" or
"they connect directly to hardware" does not make it so - and it does
not mean that what they run is not software.


You don't need to complicate the design by applying all the limitations of multi-processing when this is NOT at all the same. I call them logic elements because that is the intent, for them to implement logic. Yes, it is software, but that in itself creates no problems I am aware of.

As to the connection, I really don't get your point. They either connect directly to the hardware because that's how they are designed, or they don't.... because that's how they are designed. I don't know what you are saying about that.


Quote:
In VHDL this would be a process(). VHDL programs are typically chock
full of processes and no one wrings their hands worrying about how
they will design the "software running on such heterogeneous cores".


BTW, VHDL is software too.

I agree that VHDL is software. And yes, there are usually processes in
VHDL designs.

I am not /worrying/ about these devices running software - I am simply
saying that they /will/ be running software. I can't comprehend why you
want to deny that.


Enough! The CPUs run software. Now, what is YOUR point?


Quote:
It seems that you are frightened of software or
programmers, and want to call it anything /but/ software.

If the software a core is running is simple enough to be described in
VHDL, then it should be a VHDL process - not software in a cpu core.


Ok, now you have crossed into a philosophical domain. If you want to think in these terms I won't dissuade you, but it has no meaning in digital design and I won't discuss it further.


Quote:
If
it is too complex for that, it is going to have to be programmed
separately in an appropriate language. That is not necessarily harder
or easier than VHDL design - it is just different.


Ok, so what?


Quote:
If you try to force the software to be synchronous with timing on the
hardware, /then/ you are going to be in big difficulties. So don't do
that - use hardware for the tightest timing, and software for the bits
that software is good for.


LOL! You are thinking in terms that are very obsolete. Read about how the F18A synchronizes with other processors and you will find that this is an excellent way to interface to the hardware as well. Just like logic, when the CPU hand shakes with a logic clock, it only has to meet the timing of a clock cycle, just like all the logic in the same design. In a VHDL process the steps are written out in sequence and not assumed to be running in parallel, just like software. When the process reaches a point of synchronization it will halt, just like logic.


Quote:
There is no need to think about how the CPUs would communicate
unless there is a specific need for them to do so. The F18A uses
a handshaked parallel port in their design. They seem to have
done a pretty slick job of it and can actually hang the processor
waiting for the acknowledgement saving power and getting an
instantaneous wake up following the handshake. This can be used
with other CPUs or


Fair enough.

Ok, that's a start.


I'd expect that the sensible way to pass data between these, if you need
to do so much, is using FIFO's.


Between what exactly??? You are designing a system that is not before you. More importantly you don't actually know anything about the ideas used in the F18A and GA144 designs.

I'm not trying to be rude, but you should learn more about them before you assume they need to work like every other processor you've ever used. The F18A and GA144 really only have two particularly unique ideas. One is that the processor is very, very small and as a consequence, fast. The other is the communications technique.

Charles Moore is a unique thinker and he realized that with the advance of processing technology CPUs could be made very small and so become MIPS fodder. By that I mean you no longer need to focus on utilizing all the MIPS in a CPU. Instead, they can be treated as disposable and only a tiny fraction of the available MIPS used to implement some function... usefully.

While the GA144 is a commercial failure for many reasons, it does illustrate some very innovative ideas and is what prompted me to consider what happens when you can scatter CPUs around an FPGA as if they were logic blocks.

No, I don't have a fully developed "business plan". I am just interested in exploring the idea. Moore's (Green Array's actually, CM isn't actively working with them at this point I believe) chip isn't very practical because Moore isn't terribly interested in being practical exactly. But that isn't to say it doesn't embody some very interesting ideas.

Rick C.

Goto page Previous  1, 2, 3, 4, 5  Next

elektroda.net NewsGroups Forum Index - FPGA - Tiny CPUs for Slow Logic

Ask a question - edaboard.com

Arabic version Bulgarian version Catalan version Czech version Danish version German version Greek version English version Spanish version Finnish version French version Hindi version Croatian version Indonesian version Italian version Hebrew version Japanese version Korean version Lithuanian version Latvian version Dutch version Norwegian version Polish version Portuguese version Romanian version Russian version Slovak version Slovenian version Serbian version Swedish version Tagalog version Ukrainian version Vietnamese version Chinese version Turkish version
EDAboard.com map