EDAboard.com | EDAboard.eu | EDAboard.de | EDAboard.co.uk | RTV forum PL | NewsGroups PL

Forth in VHDL

Ask a question - edaboard.com

elektroda.net NewsGroups Forum Index - VHDL Language - Forth in VHDL

Goto page Previous  1, 2

Anton Ertl
Guest

Tue Jul 26, 2016 10:00 pm   



rickman <gnuarm_at_gmail.com> writes:
Quote:
I have been
told the clock tree in a large chip can dissipate half the power.


IIRC the clock for the 21064 (1992) consumed 30% of the power, and the
final driver of the clock had a gate length of 35cm. That was at
200MHz.

Of course that could not scale, so quite some time ago they have
divided the chips into smaller clock domains (and later also power
domains); e.g., the Williamette (first Pentium 4, 2001, 1400MHz) had a
very fast integer ALU core that, however, did not include
multiplication or shifting. So integer multiplication and shifting
were achieved by shipping the data over to the FPU, and then shipping
the result back. The data had to cross several clock domain borders
on the way, losing a cycle on every crossing; that's why integer
multiplication is slower than FP multiplication on the Pentium 4.

Quote:
The bottom line is asynchronously clocked CPU chips have been designed
before but have never made an impact on the market. I recall one that
was an 8051 I believe and I seem to recall an ARM being designed this
way. Of course, the GA144 is the most notable and possibly the most
successful example so far.


AFAIK the GA144 is not an asynchronous design; it's a clocked design,
but the clock is generated internally (one clock per core). At least
an earlier chip by Chuck Moore worked that way (IIRC the MuP21), and
the idea that this was an async design was already rampant (and
contradicted) at the time.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2016: http://www.euroforth.org/ef16/

rickman
Guest

Tue Jul 26, 2016 10:58 pm   



On 7/26/2016 12:00 PM, Anton Ertl wrote:
Quote:
rickman <gnuarm_at_gmail.com> writes:
I have been
told the clock tree in a large chip can dissipate half the power.

IIRC the clock for the 21064 (1992) consumed 30% of the power, and the
final driver of the clock had a gate length of 35cm. That was at
200MHz.

Of course that could not scale, so quite some time ago they have
divided the chips into smaller clock domains (and later also power
domains); e.g., the Williamette (first Pentium 4, 2001, 1400MHz) had a
very fast integer ALU core that, however, did not include
multiplication or shifting. So integer multiplication and shifting
were achieved by shipping the data over to the FPU, and then shipping
the result back. The data had to cross several clock domain borders
on the way, losing a cycle on every crossing; that's why integer
multiplication is slower than FP multiplication on the Pentium 4.

The bottom line is asynchronously clocked CPU chips have been designed
before but have never made an impact on the market. I recall one that
was an 8051 I believe and I seem to recall an ARM being designed this
way. Of course, the GA144 is the most notable and possibly the most
successful example so far.

AFAIK the GA144 is not an asynchronous design; it's a clocked design,
but the clock is generated internally (one clock per core). At least
an earlier chip by Chuck Moore worked that way (IIRC the MuP21), and
the idea that this was an async design was already rampant (and
contradicted) at the time.


None of the CPUs described as "asynchronous" are truly that. They are
asynchronously clocked. In the GA144 there are delay paths that are
activated for each type of instruction with a delay matched to the time
taken for that class of instruction. Someone here argued with me that
this constituted an astable oscillator but that is just semantics. Of
course it will oscillate as the end of any one clock period has to
coincide with the beginning of the next. But just like all the other
"async" CPUs, the GA144 is async in the same way, asynchronously clocked.

True asynchronous logic is different. It has no clocked registers. The
logic is self latching like RS FFs and has to be designed very
differently even from asynchronously clocked logic. I remember an async
logic state machine available many years ago when PLDs were still new.
It was true async logic, but never made much of a dent in the market. I
saw it used on one design which likely became out of date due to the
part becoming obsolete not too long after.

--

Rick C

Anton Ertl
Guest

Wed Jul 27, 2016 3:39 pm   



rickman <gnuarm_at_gmail.com> writes:
Quote:
On 7/26/2016 12:00 PM, Anton Ertl wrote:
In the GA144 there are delay paths that are
activated for each type of instruction with a delay matched to the time
taken for that class of instruction.


So you no longer have to do three nops before (or was it after?) a
full-length "+"? That's new then. The way I understood the
description of the MuP21 in the earlier discussion, it had some kind
of on-chip oscillator that clocked the whole core, and the addition
could take up to four cycles.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2016: http://www.euroforth.org/ef16/

rickman
Guest

Wed Jul 27, 2016 8:01 pm   



On 7/27/2016 5:39 AM, Anton Ertl wrote:
Quote:
rickman <gnuarm_at_gmail.com> writes:
On 7/26/2016 12:00 PM, Anton Ertl wrote:
In the GA144 there are delay paths that are
activated for each type of instruction with a delay matched to the time
taken for that class of instruction.

So you no longer have to do three nops before (or was it after?) a
full-length "+"? That's new then. The way I understood the
description of the MuP21 in the earlier discussion, it had some kind
of on-chip oscillator that clocked the whole core, and the addition
could take up to four cycles.


All of the ALU instructions are timed with the same timing path. The
add requires extra time for the carry to settle, so one nop is required
before an addition unless the previous instruction does not modify
either of the two operands in which case no nop is needed. Other,
non-alu instructions have various timings and have other timing paths.
It is definitely *not* one timing path for the "whole core".

I don't recall the various classes of instructions that have separate
timing from the ALU, but at one point I made a timing based tool in a
spread sheet. Type in the instructions and it gave you the timing. I
think it even accounted for instruction word boundaries which require
additional timing for the next word fetch under some conditions.

--

Rick C

Jan Coombs
Guest

Thu Jul 28, 2016 3:46 am   



On Tue, 26 Jul 2016 16:00:23 GMT
anton_at_mips.complang.tuwien.ac.at (Anton Ertl) wrote:

Quote:
AFAIK the GA144 is not an asynchronous design; it's a clocked
design, but the clock is generated internally (one clock per
core). At least an earlier chip by Chuck Moore worked that
way (IIRC the MuP21), and the idea that this was an async
design was already rampant (and contradicted) at the time.


Yes, both Green Arrays and IntellaSys chips [1][2] need 'nop' or
another instruction that does not alter T or S to precede an
addition. This is in order to allow the carry to stabilise. The
earlier manual states that the carry propagates nine bits in
each processor cycle.

The GA144 is asynchronous at the boundary of each processor
module. AFAICTell this is a common 'asynchronous' design method.
Otherwise the wikkipedia article [3] "Asynchronous CPU" needs
revision. There are perhaps zero recent CPU designs built of
purely asynchronous logic? (And, for good patent readers, what
are Achronix async FPGA parts made of)

The most efficient asynchronous signalling method for random
logic seems to be "four state encoding". This uses two wires to
carry a single bit and a 'time stamp'. The 'time stamp' is the
data cycle to which the data bit belongs, mod 2.

The encoding is arranged so that in the transition to each new
time phase only one of the two wires changes state, regardless
of whether or not the data has changed. This avoids race
problems between the two wires.

I'd like to know what the cost for this is in transistors,
having guessed it is about 10 _times_ more than simple
conventional logic.

Jan Coombs
--
[1] DB001-110412-F18A.pdf "F18A Technology Reference" pg8
[2] "SEAforth 40C18 Data Sheet (Preliminary)" pg44
[3] https://en.wikipedia.org/wiki/Asynchronous_circuit

rickman
Guest

Thu Jul 28, 2016 4:15 am   



On 7/24/2016 11:01 PM, rickman wrote:
Quote:
On 7/24/2016 10:29 PM, rickman wrote:
I wonder how hard it would be to write a Forth in VHDL? It would likely
be as easy to do in non-synthesizable code as any other language. It
might be a bit harder in synthesizable code. For one, the I/O would
need to be constructed from scratch based on some hardware interface.
The non-synthesizable code could just read from a file... I wonder if
you can read from the console in VHDL? I've never tried that before.

I did a little digging and it looks like you *can* do console I/O in
VHDL using the textio package. So I can't think of anything to stop a
vforth from being written... unless the vforth name has already been used.


I took a look at a C Forth implementation, pForth. I'm not sure I
follow everything he is doing, I guess my C is a bit rusty. I see he
used a large CASE statement for primitive words. I'm not clear on how
the inner interpreter works through the XTs in a word definition for
words compiled into other words. I suppose it is just a matter of
indexes into the dictionary (or pointers in C) but I don't see the code
he is using to manipulate them. He uses a ton of defines which hide the
details and it is a bit of work for me to try to figure this out.

I'll get it sooner or later. I just need to keep reading.

--

Rick C

Goto page Previous  1, 2

elektroda.net NewsGroups Forum Index - VHDL Language - Forth in VHDL

Ask a question - edaboard.com

Arabic versionBulgarian versionCatalan versionCzech versionDanish versionGerman versionGreek versionEnglish versionSpanish versionFinnish versionFrench versionHindi versionCroatian versionIndonesian versionItalian versionHebrew versionJapanese versionKorean versionLithuanian versionLatvian versionDutch versionNorwegian versionPolish versionPortuguese versionRomanian versionRussian versionSlovak versionSlovenian versionSerbian versionSwedish versionTagalog versionUkrainian versionVietnamese versionChinese version
RTV map EDAboard.com map News map EDAboard.eu map EDAboard.de map EDAboard.co.uk map