Forth in VHDL

rickman · Jul 25, 2016

I wonder how hard it would be to write a Forth in VHDL? It would likely
be as easy to do in non-synthesizable code as any other language. It
might be a bit harder in synthesizable code. For one, the I/O would
need to be constructed from scratch based on some hardware interface.
The non-synthesizable code could just read from a file... I wonder if
you can read from the console in VHDL? I've never tried that before.

--

Rick C

Paul Rubin · Jul 25, 2016

rickman <gnuarm@gmail.com> writes:

The speed advantage of async logic is essentially a myth in that you
have to design for real time needs which means deadlines.

I thought the idea was instructions could have variable latency
depending on the data, and async meant you could start the next
instruction as soon as the previous one finished.

You really can't design async logic in FPGAs that I know of. It has
to be custom chips.

Hmm, I wonder what the obstacle is. Certainly async was used in old
discrete or low-integration machines like the PDP-6 and KA10 from the
1970s. I think it feel out of favor as machines got more complex, due
to debugging headaches etc.

> I looked at what amounts to VLIW for MISC processors

That's sort of what the Novix was, I thought.

rickman · Jul 25, 2016

On 7/25/2016 12:57 AM, Cecil Bayona wrote:

On 7/24/2016 11:08 PM, rickman wrote:
On 7/24/2016 11:25 PM, Cecil Bayona wrote:
On 7/24/2016 10:05 PM, rickman wrote:

Are you still recovering?

Recovery is going pretty well, thanks for asking. I have regained most
of the muscle strength lost from limping very abnormally for some years.
I still have swelling in the operated leg but no one is very concerned
and it seems to be better the last week or so. They did an ultrasound
to rule out DVT so it should just be some more time.

The funny part is that my work no longer requires much of me other than
pushing paper and staying on top of things. I accept orders for
hardware I sell and place the orders with my fabrication house. When
they complete testing and ship the boards I send the invoice to the
customer and pay the fab house. Lots of profit with very little work.
In fact, technically I was working during my surgery (waiting for boards
to be fabbed). That's the main reason why I have free time.

Good to hear you are doing better, it might take a while of therapy and
exercises until you are closer to being your old self physically.

Myself tomorrow I will go to the doctor in the morning to figure out
what to try next, I'm not responding to very expensive medicine so I
might have to take desperate measures to bring my health under control.

If you mentioned this before I don't recall. Sounds like it might be
serious.

As to a "directly executing" Forth CPU, what would that be exactly?
Hardware is hardware. To design it requires forming an idea of the
hardware. The only hardware I have thought of to execute Forth would be
the same sort of stack machines I've seen many times and designed
myself. Not much new there. What would a "native" Forth machine look
like other than a dual stack CPU which is what the Forth virtual machine
is?

I've seen CPUs made from asynchronous random logic rather your typical
synchronous logic like almost all standard CPUs, they tend to be a
little faster but a nightmare to debug. You would still have the
registers but most of the logic would be random logic versus nice state
machines and everything running off a master clock. Not any major
advantages other than being a little bit faster but lots of
disadvantages, timing is a nightmare, so is debugging it.

The speed advantage of async logic is essentially a myth in that you
have to design for real time needs which means deadlines. An async
processor can be faster due to the various factors which affect timing,
but you have to allow for all these factors being the worst possible in
calculating if you can meet your deadlines. Same as clocked logic, but
rather than at the clock cycle, at the system level.

I had this discussion once and the only app anyone came up with was
networking. It is not uncommon for CPUs to become saturated so that
packets fall on the floor when the CPU is slammed. An async processor
would be able to utilize any speed advantage it might have to process
more packets before dropping any.

No enough gains to justify the problems so I stick to nice synchronous
logic, one is better off with CPUs like the ep32, five packed
instructions to a 32 bit word, TOS an NOS on registers, single clock
execution of instructions, a nice machine, another nice one is the J1
capable of multiple instruction execution in one clock.

You really can't design async logic in FPGAs that I know of. It has to
be custom chips. Achronix is developing async logic FPGAs, but I think
they are intent on the really big customers only, small users need not
apply.

I looked at what amounts to VLIW for MISC processors (sounds like an
oxymoron, no?) The ide was that there are three processors in any two
stack machine. 1) the instruction processor, 2) tThe data (stack)
processor, 3) the return (I call it the address) processor. Each one
could do something independent on each instruction cycle. The
instructions can be encoded (standard MISC instructions) or the
instructions can be separate, one for each of the three processors
(which is the main idea behind VLIW). Various combinations of
instructions give the basic primitives but many other useful
combinations as well such as a return on any instruction that doesn't
conflict with the hardware usage.

I only looked at this for a bit. I believe that is how the J1 gets some
parallelism. My stumbling block was in writing a compiler to take
advantage of this. But then I don't really intend to program any of my
apps in Forth really. I would just code it in the stack machine
assembly, not unlike what Chuck Moore does for the F18A. Forth is used
to write the macro assembler.

--

Rick C

Cecil Bayona · Jul 25, 2016

On 7/24/2016 11:08 PM, rickman wrote:

On 7/24/2016 11:25 PM, Cecil Bayona wrote:
On 7/24/2016 10:05 PM, rickman wrote:

Are you still recovering?

Recovery is going pretty well, thanks for asking. I have regained most
of the muscle strength lost from limping very abnormally for some years.
I still have swelling in the operated leg but no one is very concerned
and it seems to be better the last week or so. They did an ultrasound
to rule out DVT so it should just be some more time.

The funny part is that my work no longer requires much of me other than
pushing paper and staying on top of things. I accept orders for
hardware I sell and place the orders with my fabrication house. When
they complete testing and ship the boards I send the invoice to the
customer and pay the fab house. Lots of profit with very little work.
In fact, technically I was working during my surgery (waiting for boards
to be fabbed). That's the main reason why I have free time.

Good to hear you are doing better, it might take a while of therapy and
exercises until you are closer to being your old self physically.

Myself tomorrow I will go to the doctor in the morning to figure out
what to try next, I'm not responding to very expensive medicine so I
might have to take desperate measures to bring my health under control.

As to a "directly executing" Forth CPU, what would that be exactly?
Hardware is hardware. To design it requires forming an idea of the
hardware. The only hardware I have thought of to execute Forth would be
the same sort of stack machines I've seen many times and designed
myself. Not much new there. What would a "native" Forth machine look
like other than a dual stack CPU which is what the Forth virtual machine
is?

I've seen CPUs made from asynchronous random logic rather your typical
synchronous logic like almost all standard CPUs, they tend to be a
little faster but a nightmare to debug. You would still have the
registers but most of the logic would be random logic versus nice state
machines and everything running off a master clock. Not any major
advantages other than being a little bit faster but lots of
disadvantages, timing is a nightmare, so is debugging it.

No enough gains to justify the problems so I stick to nice synchronous
logic, one is better off with CPUs like the ep32, five packed
instructions to a 32 bit word, TOS an NOS on registers, single clock
execution of instructions, a nice machine, another nice one is the J1
capable of multiple instruction execution in one clock.

--
Cecil - k5nwa

rickman · Jul 25, 2016

On 7/24/2016 11:25 PM, Cecil Bayona wrote:

On 7/24/2016 10:05 PM, rickman wrote:
On 7/24/2016 11:01 PM, rickman wrote:
On 7/24/2016 10:29 PM, rickman wrote:
I wonder how hard it would be to write a Forth in VHDL? It would
likely
be as easy to do in non-synthesizable code as any other language. It
might be a bit harder in synthesizable code. For one, the I/O would
need to be constructed from scratch based on some hardware interface.
The non-synthesizable code could just read from a file... I wonder if
you can read from the console in VHDL? I've never tried that before.

I did a little digging and it looks like you *can* do console I/O in
VHDL using the textio package. So I can't think of anything to stop a
vforth from being written... unless the vforth name has already been
used.

Looks like the name vforth has been used before... twice! Once for a
VIC forth and once, more recently for a VAX forth? Really? Do people
still have VAX computers? Or is this run on a VAX emulator? Maybe
that's a retro thing?

So it looks like nothing stands in the way of writing a Forth in VHDL
other than free time... which I seem to have a lot of.

But what would you gain? It would be horribly slow and no benefit that I
can see. Better to use VHDL to create a CPU that runs Forth natively,
that is why I been working on some of the Forth CPUs that run in FPGAs.

Aside from the above mentioned it would be interesting to create
hardware that runs Forth natively, not a CPU but a set of hardware that
runs the code natively.

Are you still recovering?

Recovery is going pretty well, thanks for asking. I have regained most
of the muscle strength lost from limping very abnormally for some years.
I still have swelling in the operated leg but no one is very concerned
and it seems to be better the last week or so. They did an ultrasound
to rule out DVT so it should just be some more time.

The funny part is that my work no longer requires much of me other than
pushing paper and staying on top of things. I accept orders for
hardware I sell and place the orders with my fabrication house. When
they complete testing and ship the boards I send the invoice to the
customer and pay the fab house. Lots of profit with very little work.
In fact, technically I was working during my surgery (waiting for boards
to be fabbed). That's the main reason why I have free time.

I was thinking in terms of a Forth interpreter for use as a test bench.
Currently I write test benches one of three ways. Either I write ad hoc
code that generates arbitrary signals based on the requirements of the
object being tested - or I use another copy of the hardware being tested
as the tester when it uses a symmetrical interface - or for interfaces
with well parameterized functionality, I write a text interpreter which
reads commands from a file to control the testing.

Using a Forth interpreter would allow me to handle any of the above in
Forth directly reading and writing I/Os to the device under test - or
connecting inputs and outputs of various modules - or reading commands
from a file to manipulate interfaces.

You are thinking of speed in terms of something real time. VHDL
simulation are nowhere near real time. But it would be nice to program
a test bench in something other than VHDL.

As to a "directly executing" Forth CPU, what would that be exactly?
Hardware is hardware. To design it requires forming an idea of the
hardware. The only hardware I have thought of to execute Forth would be
the same sort of stack machines I've seen many times and designed
myself. Not much new there. What would a "native" Forth machine look
like other than a dual stack CPU which is what the Forth virtual machine
is?

--

Rick C

Cecil Bayona · Jul 25, 2016

On 7/24/2016 10:05 PM, rickman wrote:

On 7/24/2016 11:01 PM, rickman wrote:
On 7/24/2016 10:29 PM, rickman wrote:
I wonder how hard it would be to write a Forth in VHDL? It would likely
be as easy to do in non-synthesizable code as any other language. It
might be a bit harder in synthesizable code. For one, the I/O would
need to be constructed from scratch based on some hardware interface.
The non-synthesizable code could just read from a file... I wonder if
you can read from the console in VHDL? I've never tried that before.

I did a little digging and it looks like you *can* do console I/O in
VHDL using the textio package. So I can't think of anything to stop a
vforth from being written... unless the vforth name has already been
used.

Looks like the name vforth has been used before... twice! Once for a
VIC forth and once, more recently for a VAX forth? Really? Do people
still have VAX computers? Or is this run on a VAX emulator? Maybe
that's a retro thing?

So it looks like nothing stands in the way of writing a Forth in VHDL
other than free time... which I seem to have a lot of.

But what would you gain? It would be horribly slow and no benefit that I
can see. Better to use VHDL to create a CPU that runs Forth natively,
that is why I been working on some of the Forth CPUs that run in FPGAs.

Aside from the above mentioned it would be interesting to create
hardware that runs Forth natively, not a CPU but a set of hardware that
runs the code natively.

Are you still recovering?

--
Cecil - k5nwa

rickman · Jul 25, 2016

On 7/24/2016 11:01 PM, rickman wrote:

On 7/24/2016 10:29 PM, rickman wrote:
I wonder how hard it would be to write a Forth in VHDL? It would likely
be as easy to do in non-synthesizable code as any other language. It
might be a bit harder in synthesizable code. For one, the I/O would
need to be constructed from scratch based on some hardware interface.
The non-synthesizable code could just read from a file... I wonder if
you can read from the console in VHDL? I've never tried that before.

I did a little digging and it looks like you *can* do console I/O in
VHDL using the textio package. So I can't think of anything to stop a
vforth from being written... unless the vforth name has already been used.

Looks like the name vforth has been used before... twice! Once for a
VIC forth and once, more recently for a VAX forth? Really? Do people
still have VAX computers? Or is this run on a VAX emulator? Maybe
that's a retro thing?

So it looks like nothing stands in the way of writing a Forth in VHDL
other than free time... which I seem to have a lot of.

--

Rick C

rickman · Jul 25, 2016

On 7/24/2016 10:29 PM, rickman wrote:

I wonder how hard it would be to write a Forth in VHDL? It would likely
be as easy to do in non-synthesizable code as any other language. It
might be a bit harder in synthesizable code. For one, the I/O would
need to be constructed from scratch based on some hardware interface.
The non-synthesizable code could just read from a file... I wonder if
you can read from the console in VHDL? I've never tried that before.

I did a little digging and it looks like you *can* do console I/O in
VHDL using the textio package. So I can't think of anything to stop a
vforth from being written... unless the vforth name has already been used.

--

Rick C

rickman · Jul 25, 2016

On 7/25/2016 1:38 AM, Paul Rubin wrote:

rickman <gnuarm@gmail.com> writes:
The speed advantage of async logic is essentially a myth in that you
have to design for real time needs which means deadlines.

I thought the idea was instructions could have variable latency
depending on the data, and async meant you could start the next
instruction as soon as the previous one finished.

It is not really without a clock, just no single, system clock. Each
data path has to be timed by a parallel path of logic to create the
clock. In the F18A, there are a number of paths which are separately
timed, but roughly they fall into three or four groups with the same
delay. This is not unlike clocked processors with multiple clock
instructions. The slow paths are broken into more than one clock cycle.

I shouldn't say there is "no" speedup. But I've not seen anything that
says it is very significant.

You really can't design async logic in FPGAs that I know of. It has
to be custom chips.

Hmm, I wonder what the obstacle is. Certainly async was used in old
discrete or low-integration machines like the PDP-6 and KA10 from the
1970s. I think it feel out of favor as machines got more complex, due
to debugging headaches etc.

Async logic requires very different design. There is truly async logic
which has no clocks at all. This is hard to design requiring "coverage"
of boundaries between product terms to make sure there is no glitching
when inputs change which should result in no output changes. But what
is talked about in processors should be called "self timed" logic.

Self timed logic requires a lot more timing data than FPGA makers are
willing to commit to. It would be a PITA for them to characterize it
and then test to it. All for fairly marginal advantages... unless
Achronix finds an easier way to make it all happen.

I looked at what amounts to VLIW for MISC processors

That's sort of what the Novix was, I thought.

I don't think it is literally separate instructions for the various
internal processors, but I haven't looked at it in depth. It does
provide for data movements in parallel. But then the CPU I designed had
very simple data paths. I don't know so much about the Novix, but I
think it had a lot of data path features.

--

Rick C

Cecil Bayona · Jul 26, 2016

On 7/25/2016 12:22 AM, rickman wrote:

On 7/25/2016 12:57 AM, Cecil Bayona wrote:
On 7/24/2016 11:08 PM, rickman wrote:

Myself tomorrow I will go to the doctor in the morning to figure out
what to try next, I'm not responding to very expensive medicine so I
might have to take desperate measures to bring my health under control.

If you mentioned this before I don't recall. Sounds like it might be
serious.

It is, I'm diabetic and for the past 6 month or so I have not been
responding very well to my medicine, the doctor has tried several
different drugs with poor results, My A1C was 11.3 6 month ago. Slowly
things have been improving so 3 months ago it went down to 8.3 on A1C
test, this morning my result from lab work show that I have a A1C of 7.1
this time around.

Unfortunately the medicine that seems to work (Victoza) is not covered
by my insurance and I can't afford it so this morning she gave me some
samples of 75%/25% HumaLog Insulin which will cost $212 a month, it
will be difficult and a strain on my budget but I have little choice. It
will take a few weeks before I know how well its working as it will take
a week for the older medicine to work out of my system.

The desperate measures has to do with eating, the latest research shows
that if someone can tolerate it fasting will help a lot in bringing down
your blood sugar, and helps you loose weight which also is a big help. I
can tolerate fasting well and tried it out a few times and it worked
quite well, after 2 days of fasting my blood sugar was way down into the
really good range 85-100 in the morning, and 150 2 hours after a meal,
that is totally normal for a non-diabetic. So a starting plan would be
fast for three days, eat moderately for 2 days, then repeat the cycle
again, after a month or two of this, you can't help it and you start
loosing weight, if your weight gets down low enough you don't need to
fast as much and you won't need the drugs a much.

No enough gains to justify the problems so I stick to nice synchronous
logic, one is better off with CPUs like the ep32, five packed
instructions to a 32 bit word, TOS an NOS on registers, single clock
execution of instructions, a nice machine, another nice one is the J1
capable of multiple instruction execution in one clock.

You really can't design async logic in FPGAs that I know of. It has to
be custom chips. Achronix is developing async logic FPGAs, but I think
they are intent on the really big customers only, small users need not
apply.

I have not looked into it but you might be right as the LUTS have flip
flops on their outputs and would need to be clocked.

I looked at what amounts to VLIW for MISC processors (sounds like an
oxymoron, no?) The ide was that there are three processors in any two
stack machine. 1) the instruction processor, 2) tThe data (stack)
processor, 3) the return (I call it the address) processor. Each one
could do something independent on each instruction cycle. The
instructions can be encoded (standard MISC instructions) or the
instructions can be separate, one for each of the three processors
(which is the main idea behind VLIW). Various combinations of
instructions give the basic primitives but many other useful
combinations as well such as a return on any instruction that doesn't
conflict with the hardware usage.

I only looked at this for a bit. I believe that is how the J1 gets some
parallelism. My stumbling block was in writing a compiler to take
advantage of this. But then I don't really intend to program any of my
apps in Forth really. I would just code it in the stack machine
assembly, not unlike what Chuck Moore does for the F18A. Forth is used
to write the macro assembler.

The J1 uses micro-programming, the instructions are the control bits for
the internal CPU units, so it can do multiple transfers and logic/math
operations at the same time, just set the right bits. So you could do a
math operation, save the result, and return in one instruction.

--
Cecil - k5nwa

rickman · Jul 26, 2016

On 7/25/2016 3:11 PM, Cecil Bayona wrote:

On 7/25/2016 12:22 AM, rickman wrote:
On 7/25/2016 12:57 AM, Cecil Bayona wrote:
On 7/24/2016 11:08 PM, rickman wrote:

Myself tomorrow I will go to the doctor in the morning to figure out
what to try next, I'm not responding to very expensive medicine so I
might have to take desperate measures to bring my health under control.

If you mentioned this before I don't recall. Sounds like it might be
serious.

It is, I'm diabetic and for the past 6 month or so I have not been
responding very well to my medicine, the doctor has tried several
different drugs with poor results, My A1C was 11.3 6 month ago. Slowly
things have been improving so 3 months ago it went down to 8.3 on A1C
test, this morning my result from lab work show that I have a A1C of 7.1
this time around.

Unfortunately the medicine that seems to work (Victoza) is not covered
by my insurance and I can't afford it so this morning she gave me some
samples of 75%/25% HumaLog Insulin which will cost $212 a month, it
will be difficult and a strain on my budget but I have little choice. It
will take a few weeks before I know how well its working as it will take
a week for the older medicine to work out of my system.

The desperate measures has to do with eating, the latest research shows
that if someone can tolerate it fasting will help a lot in bringing down
your blood sugar, and helps you loose weight which also is a big help. I
can tolerate fasting well and tried it out a few times and it worked
quite well, after 2 days of fasting my blood sugar was way down into the
really good range 85-100 in the morning, and 150 2 hours after a meal,
that is totally normal for a non-diabetic. So a starting plan would be
fast for three days, eat moderately for 2 days, then repeat the cycle
again, after a month or two of this, you can't help it and you start
loosing weight, if your weight gets down low enough you don't need to
fast as much and you won't need the drugs a much.

Diabetes runs in my family, but so far I have avoided it. Maybe because
at *only* 35 lbs overweight I am a lighter member. It's been awhile
since I was tested so I should do that. My last check was a doctor
doing the blood sugar tests with the little strips in a supermarket. I
had something small to eat some hours before and still was in the 70
range I seem to recall. He said that was great!

No enough gains to justify the problems so I stick to nice synchronous
logic, one is better off with CPUs like the ep32, five packed
instructions to a 32 bit word, TOS an NOS on registers, single clock
execution of instructions, a nice machine, another nice one is the J1
capable of multiple instruction execution in one clock.

You really can't design async logic in FPGAs that I know of. It has to
be custom chips. Achronix is developing async logic FPGAs, but I think
they are intent on the really big customers only, small users need not
apply.

I have not looked into it but you might be right as the LUTS have flip
flops on their outputs and would need to be clocked.

Async clocked logic still has clocks, they just aren't a single clock
running the whole chip. Think of traffic with lights. Cars arrive at a
light and are stopped until it turns green. Cars arrive at different
times (different logic delays) but they all must wait until they are all
there before the light can turn green. The whole city is running all
lights on the same cycle. So obviously there will be some lights that
could run faster.

So instead each block is timed independently. Each light is adjusted
for the length and speed of the street so it can cycle faster if the
street is shorter. But now all the lights are out of sync and more
control signals are needed to allow the traffic to flow smoothly. Much
harder to organize to let traffic flow smoothly.

But there are still lights and traffic stops waiting for them...

I looked at what amounts to VLIW for MISC processors (sounds like an
oxymoron, no?) The ide was that there are three processors in any two
stack machine. 1) the instruction processor, 2) tThe data (stack)
processor, 3) the return (I call it the address) processor. Each one
could do something independent on each instruction cycle. The
instructions can be encoded (standard MISC instructions) or the
instructions can be separate, one for each of the three processors
(which is the main idea behind VLIW). Various combinations of
instructions give the basic primitives but many other useful
combinations as well such as a return on any instruction that doesn't
conflict with the hardware usage.

I only looked at this for a bit. I believe that is how the J1 gets some
parallelism. My stumbling block was in writing a compiler to take
advantage of this. But then I don't really intend to program any of my
apps in Forth really. I would just code it in the stack machine
assembly, not unlike what Chuck Moore does for the F18A. Forth is used
to write the macro assembler.

The J1 uses micro-programming, the instructions are the control bits for
the internal CPU units, so it can do multiple transfers and logic/math
operations at the same time, just set the right bits. So you could do a
math operation, save the result, and return in one instruction.

The J1 does not use microprogramming. Microprogramming is using
instructions internal to the CPU which then control the operations of
the CPU over multiple clock cycles. The J1 instructions run in a single
clock cycle. I see no indication in the Verilog code this processor
uses any internal instruction memory or that it uses more than one clock
cycle per instruction. I think you are misusing the term
microprogramming to refer to the fact that the J1 instructions use
separate fields to directly control CPU functions. That is simply a
matter of horizontal vs. vertical instruction format (less encoded vs.
more encoded). The GA144 would be a vertical format with no fields
within the instruction, just 5 bit opcodes to control the entire CPU.

The J1 instruction format does have some unencoded fields for internal
processor control. But that is only for the ALU and data op
instructions. Other instructions usurp these fields for literal
addresses or data. The fields of the data op instructions include a bit
for return and can manipulate the return stack to pop and push data, but
that's all. There is one bit that appears to be unused.

The only operations that can be done in parallel is to perform a return
in an instruction also performing any data op. With the independent T>R
control bit an instruction can in parallel do R>PC, T>R and R>T. This
would be a single instruction co-routine switch. I've never used
coroutines and I expect few others have.

When I mentioned separate processors I was referring to the fact that in
my CPU design the data stack and ALU operate in parallel with the return
stack and it's ALU along with the instruction fetch. I only explored
the possibilities briefly, but a 16 bit instruction would have supported
a fairly complete instruction set for each unit independently. The 18
bit memory of many FPGAs would support an even richer instruction set.

I expect such a VLIW type instruction format to maximize parallelism and
also provide for multiple Forth level instructions to be executed in
parallel or at least higher level Forth instructions to be a single
instruction rather than multiple, not just executing a return in
parallel with a data op.

--

Rick C

Cecil Bayona · Jul 26, 2016

On 7/25/2016 4:39 PM, rickman wrote:

On 7/25/2016 3:11 PM, Cecil Bayona wrote:
On 7/25/2016 12:22 AM, rickman wrote:

Diabetes runs in my family, but so far I have avoided it. Maybe because
at *only* 35 lbs overweight I am a lighter member. It's been awhile
since I was tested so I should do that. My last check was a doctor
doing the blood sugar tests with the little strips in a supermarket. I
had something small to eat some hours before and still was in the 70
range I seem to recall. He said that was great!

Normal blood is 100 if you have gone without food for 4 hours or more,
70 is too low to the point you need to eat something to bring it up so
I'm not sure what kind of test they did. It used to be they would test
for Ketones in your urine that was very inaccurate, the modern way is
with a blood sample.

Weight is a major factor in preventing Diabetes so try to keep your
weight down and close to your optimum weight. Once you have diabetes
it's hard to loose the weight.

--
Cecil - k5nwa

rickman · Jul 26, 2016

On 7/25/2016 7:01 PM, Cecil Bayona wrote:

On 7/25/2016 4:39 PM, rickman wrote:
On 7/25/2016 3:11 PM, Cecil Bayona wrote:
On 7/25/2016 12:22 AM, rickman wrote:

Diabetes runs in my family, but so far I have avoided it. Maybe because
at *only* 35 lbs overweight I am a lighter member. It's been awhile
since I was tested so I should do that. My last check was a doctor
doing the blood sugar tests with the little strips in a supermarket. I
had something small to eat some hours before and still was in the 70
range I seem to recall. He said that was great!

Normal blood is 100 if you have gone without food for 4 hours or more,
70 is too low to the point you need to eat something to bring it up so
I'm not sure what kind of test they did. It used to be they would test
for Ketones in your urine that was very inaccurate, the modern way is
with a blood sample.

Weight is a major factor in preventing Diabetes so try to keep your
weight down and close to your optimum weight. Once you have diabetes
it's hard to loose the weight.

This was perhaps 2 in the afternoon and I had only had some yogurt
earlier in the morning. I was fasting to get the blood test done as I
had heard that is how they do it, but I slipped and ate some yogurt.
The strips are the ones where you prick your finger, put the blood on
the strip and slip it into the glucometer.

Web pages I found indicate blood sugar levels of 70 to 100 mg/dL are
normal when fasting. Below 60 and they start to worry.

--

Rick C

Jan Coombs · Jul 26, 2016

On Mon, 25 Jul 2016 14:11:16 -0500
Cecil Bayona <cbayona@cbayona.com> wrote:

On 7/25/2016 12:22 AM, rickman wrote:

You really can't design async logic in FPGAs that I know
of. It has to be custom chips. Achronix is developing
async logic FPGAs, but I think they are intent on the really
big customers only, small users need not apply.

I have not looked into it but you might be right as the LUTS
have flip flops on their outputs and would need to be clocked.

Yes, if a logic device uses LUTs, then it needs to latch the LUT
output when it is stable, and the longest logic propagation time
must fit between two clock edges.

In the fine-grained Microsemi/Actel Flash FPGAs there are no
LUTs, the programmable element is either logic or a flop. There
are only eight synchronous library parts, and the majority of
the rest are purely combinatorial. [1]

Perhaps these parts could be used to synthesize async logic
designs.

Jan Coombs
--
[1] IGLOO, ProASIC3, SmartFusion and Fusion Macro Library Guide
http://www.microsemi.com/document-portal/doc_view/130886-igloo-proasic3-smartfusion-and-fusion-macro-library-guide-for-software-v9-0
[2] https://en.wikipedia.org/wiki/Asynchronous_circuit

rickman · Jul 26, 2016

On 7/26/2016 6:04 AM, Jan Coombs wrote:

On Mon, 25 Jul 2016 14:11:16 -0500
Cecil Bayona <cbayona@cbayona.com> wrote:

On 7/25/2016 12:22 AM, rickman wrote:

You really can't design async logic in FPGAs that I know
of. It has to be custom chips. Achronix is developing
async logic FPGAs, but I think they are intent on the really
big customers only, small users need not apply.

I have not looked into it but you might be right as the LUTS
have flip flops on their outputs and would need to be clocked.

Yes, if a logic device uses LUTs, then it needs to latch the LUT
output when it is stable, and the longest logic propagation time
must fit between two clock edges.

In the fine-grained Microsemi/Actel Flash FPGAs there are no
LUTs, the programmable element is either logic or a flop. There
are only eight synchronous library parts, and the majority of
the rest are purely combinatorial. [1]

Perhaps these parts could be used to synthesize async logic
designs.

I think there is some misunderstanding of the basic LUT-FF cell design.
The output of the LUT is always available outside the logic cell. The
FF can be used to register the output of the LUT or not. It can also be
used to register another signal from outside the logic cell. So the
presence of the FF in the logic cell has nothing to do with the ease of
implementing async clocked logic in an FPGA.

The reason FPGAs are not suited for async clocked logic is because this
design method requires the specification of both max and min delays on
logic paths. The logic specified by the user has a parallel path which
is used to clock the output FF. This parallel path must have a minimum
delay that is longer than the maximum delay of the logic. FPGAs are not
specified in a way to show a design meets this requirement.

If you wish to take it on yourself to figure out appropriate
specifications for the various logic and routing paths within an FPGA,
you can do your own timing analysis. Then you could design async
clocked logic in that FPGA. But this would be a very difficult task
without knowing details on the variations of timing across a large
number of chips from a large number of batches. If the FPGA vendor
makes any changes to the process (which do happen from time to time for
yield purposes) your measurements are no longer valid or at least the
margin provided by your tools is now suspect.

I don't think you will find much speed improvement over just clocking
the design synchronously. Yes, on the test bench you can see speed
improvements. But once you allow for the three basic variations in
timing, process, voltage and temperature you will likely see no real
advantage and my find the async logic to be worse than sync logic
because of the margins required.

As someone pointed out, there will be power improvements. I have been
told the clock tree in a large chip can dissipate half the power. That
is pretty amazing, but when you think about the huge distributed
capacitance and the many matched buffers required to keep the clock skew
to a minimum across the chip, I shouldn't doubt this.

The bottom line is asynchronously clocked CPU chips have been designed
before but have never made an impact on the market. I recall one that
was an 8051 I believe and I seem to recall an ARM being designed this
way. Of course, the GA144 is the most notable and possibly the most
successful example so far. None have made an impact on the market.
Partly that is because one big advantage of the sync clocked device is
that it can tell time! The digital world works off of timing. Nearly
all interfaces require timing. Nearly all applications require timing.
Adding a processor that free runs means another way has to be found to
sync the processes to the outside world. Once you do that many of the
speed advantages go away.

We are on a road that gives us faster processors every year. The power
for a given level of complexity in processors continues to decrease.
Design an async clocked processor this year and next year it will be
usurped by the next generation of sync clocked processor. It's hard to
hit a moving target.

I think FPGAs can be used best to design Forth processors using
conventional logic techniques focusing on interesting CPU architectures
rather than misusing the FPGA.

This post got pretty long... not nearly a manifesto though...

--

Rick C

Anton Ertl · Jul 26, 2016

rickman <gnuarm@gmail.com> writes:

I have been
told the clock tree in a large chip can dissipate half the power.

IIRC the clock for the 21064 (1992) consumed 30% of the power, and the
final driver of the clock had a gate length of 35cm. That was at
200MHz.

Of course that could not scale, so quite some time ago they have
divided the chips into smaller clock domains (and later also power
domains); e.g., the Williamette (first Pentium 4, 2001, 1400MHz) had a
very fast integer ALU core that, however, did not include
multiplication or shifting. So integer multiplication and shifting
were achieved by shipping the data over to the FPU, and then shipping
the result back. The data had to cross several clock domain borders
on the way, losing a cycle on every crossing; that's why integer
multiplication is slower than FP multiplication on the Pentium 4.

The bottom line is asynchronously clocked CPU chips have been designed
before but have never made an impact on the market. I recall one that
was an 8051 I believe and I seem to recall an ARM being designed this
way. Of course, the GA144 is the most notable and possibly the most
successful example so far.

AFAIK the GA144 is not an asynchronous design; it's a clocked design,
but the clock is generated internally (one clock per core). At least
an earlier chip by Chuck Moore worked that way (IIRC the MuP21), and
the idea that this was an async design was already rampant (and
contradicted) at the time.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2016: http://www.euroforth.org/ef16/

rickman · Jul 26, 2016

On 7/26/2016 12:00 PM, Anton Ertl wrote:

rickman <gnuarm@gmail.com> writes:
I have been
told the clock tree in a large chip can dissipate half the power.

IIRC the clock for the 21064 (1992) consumed 30% of the power, and the
final driver of the clock had a gate length of 35cm. That was at
200MHz.

Of course that could not scale, so quite some time ago they have
divided the chips into smaller clock domains (and later also power
domains); e.g., the Williamette (first Pentium 4, 2001, 1400MHz) had a
very fast integer ALU core that, however, did not include
multiplication or shifting. So integer multiplication and shifting
were achieved by shipping the data over to the FPU, and then shipping
the result back. The data had to cross several clock domain borders
on the way, losing a cycle on every crossing; that's why integer
multiplication is slower than FP multiplication on the Pentium 4.

The bottom line is asynchronously clocked CPU chips have been designed
before but have never made an impact on the market. I recall one that
was an 8051 I believe and I seem to recall an ARM being designed this
way. Of course, the GA144 is the most notable and possibly the most
successful example so far.

AFAIK the GA144 is not an asynchronous design; it's a clocked design,
but the clock is generated internally (one clock per core). At least
an earlier chip by Chuck Moore worked that way (IIRC the MuP21), and
the idea that this was an async design was already rampant (and
contradicted) at the time.

None of the CPUs described as "asynchronous" are truly that. They are
asynchronously clocked. In the GA144 there are delay paths that are
activated for each type of instruction with a delay matched to the time
taken for that class of instruction. Someone here argued with me that
this constituted an astable oscillator but that is just semantics. Of
course it will oscillate as the end of any one clock period has to
coincide with the beginning of the next. But just like all the other
"async" CPUs, the GA144 is async in the same way, asynchronously clocked.

True asynchronous logic is different. It has no clocked registers. The
logic is self latching like RS FFs and has to be designed very
differently even from asynchronously clocked logic. I remember an async
logic state machine available many years ago when PLDs were still new.
It was true async logic, but never made much of a dent in the market. I
saw it used on one design which likely became out of date due to the
part becoming obsolete not too long after.

--

Rick C

Anton Ertl · Jul 27, 2016

rickman <gnuarm@gmail.com> writes:

On 7/26/2016 12:00 PM, Anton Ertl wrote:
In the GA144 there are delay paths that are
activated for each type of instruction with a delay matched to the time
taken for that class of instruction.

So you no longer have to do three nops before (or was it after?) a
full-length "+"? That's new then. The way I understood the
description of the MuP21 in the earlier discussion, it had some kind
of on-chip oscillator that clocked the whole core, and the addition
could take up to four cycles.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2016: http://www.euroforth.org/ef16/

rickman · Jul 27, 2016

On 7/27/2016 5:39 AM, Anton Ertl wrote:

rickman <gnuarm@gmail.com> writes:
On 7/26/2016 12:00 PM, Anton Ertl wrote:
In the GA144 there are delay paths that are
activated for each type of instruction with a delay matched to the time
taken for that class of instruction.

So you no longer have to do three nops before (or was it after?) a
full-length "+"? That's new then. The way I understood the
description of the MuP21 in the earlier discussion, it had some kind
of on-chip oscillator that clocked the whole core, and the addition
could take up to four cycles.

All of the ALU instructions are timed with the same timing path. The
add requires extra time for the carry to settle, so one nop is required
before an addition unless the previous instruction does not modify
either of the two operands in which case no nop is needed. Other,
non-alu instructions have various timings and have other timing paths.
It is definitely *not* one timing path for the "whole core".

I don't recall the various classes of instructions that have separate
timing from the ALU, but at one point I made a timing based tool in a
spread sheet. Type in the instructions and it gave you the timing. I
think it even accounted for instruction word boundaries which require
additional timing for the next word fetch under some conditions.

--

Rick C

Jan Coombs · Jul 28, 2016

On Tue, 26 Jul 2016 16:00:23 GMT
anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

AFAIK the GA144 is not an asynchronous design; it's a clocked
design, but the clock is generated internally (one clock per
core). At least an earlier chip by Chuck Moore worked that
way (IIRC the MuP21), and the idea that this was an async
design was already rampant (and contradicted) at the time.

Yes, both Green Arrays and IntellaSys chips [1][2] need 'nop' or
another instruction that does not alter T or S to precede an
addition. This is in order to allow the carry to stabilise. The
earlier manual states that the carry propagates nine bits in
each processor cycle.

The GA144 is asynchronous at the boundary of each processor
module. AFAICTell this is a common 'asynchronous' design method.
Otherwise the wikkipedia article [3] "Asynchronous CPU" needs
revision. There are perhaps zero recent CPU designs built of
purely asynchronous logic? (And, for good patent readers, what
are Achronix async FPGA parts made of)

The most efficient asynchronous signalling method for random
logic seems to be "four state encoding". This uses two wires to
carry a single bit and a 'time stamp'. The 'time stamp' is the
data cycle to which the data bit belongs, mod 2.

The encoding is arranged so that in the transition to each new
time phase only one of the two wires changes state, regardless
of whether or not the data has changed. This avoids race
problems between the two wires.

I'd like to know what the cost for this is in transistors,
having guessed it is about 10 _times_ more than simple
conventional logic.

Jan Coombs
--
[1] DB001-110412-F18A.pdf "F18A Technology Reference" pg8
[2] "SEAforth 40C18 Data Sheet (Preliminary)" pg44
[3] https://en.wikipedia.org/wiki/Asynchronous_circuit

Forth in VHDL

rickman

Guest

Paul Rubin

Guest

rickman

Guest

Cecil Bayona

Guest

rickman

Guest

Cecil Bayona

Guest

rickman

Guest

rickman

Guest

rickman

Guest

Cecil Bayona

Guest

rickman

Guest

Cecil Bayona

Guest

rickman

Guest

Jan Coombs

Guest

rickman

Guest

Anton Ertl

Guest

rickman

Guest

Anton Ertl

Guest

rickman

Guest

Jan Coombs

Guest

Log in

Welcome to EDABoard.com

Sponsor