EDAboard.com | EDAboard.eu | EDAboard.de | EDAboard.co.uk | RTV forum PL | NewsGroups PL

EDK : FSL macros defined by Xilinx are wrong

Ask a question - edaboard.com

elektroda.net NewsGroups Forum Index - FPGA - EDK : FSL macros defined by Xilinx are wrong

Goto page Previous  1, 2, 3 ... 354, 355, 356


Guest

Fri Aug 21, 2015 7:36 pm   



On Tuesday, March 17, 2015 at 4:58:39 AM UTC+3, princesse91 wrote:
Quote:
Hi Ahmed,
Can you tell me how did you generate VHDL from Handel-C?? I'm working on the conversion from c++ to vhdl with handel-c but i didn't know how to do it..
Thanks


It's very easy use interfaces for inputs and outputs in your Handel-C code. Then change the debug option into VHDL or Verilog or EDIF or SystemC. Be sure to use Mentor Graghics DK Design Suite 5.

martinjpearson
Guest

Sun Aug 23, 2015 12:14 am   



On Friday, 21 August 2015 18:36:59 UTC+1, ahmed...@gmail.com wrote:
Quote:
On Tuesday, March 17, 2015 at 4:58:39 AM UTC+3, princesse91 wrote:
Hi Ahmed,
Can you tell me how did you generate VHDL from Handel-C?? I'm working on the conversion from c++ to vhdl with handel-c but i didn't know how to do it..
Thanks

It's very easy use interfaces for inputs and outputs in your Handel-C code. Then change the debug option into VHDL or Verilog or EDIF or SystemC. Be sure to use Mentor Graghics DK Design Suite 5.


do you still have access to Mentor Graphics DK Design Suite 5? I thought this was now obsolete

HT-Lab
Guest

Sun Aug 23, 2015 1:50 pm   



On 22/08/2015 23:14, martinjpearson wrote:
Quote:
On Friday, 21 August 2015 18:36:59 UTC+1, ahmed...@gmail.com wrote:
On Tuesday, March 17, 2015 at 4:58:39 AM UTC+3, princesse91 wrote:
Hi Ahmed,
Can you tell me how did you generate VHDL from Handel-C?? I'm working on the conversion from c++ to vhdl with handel-c but i didn't know how to do it..
Thanks

It's very easy use interfaces for inputs and outputs in your Handel-C code. Then change the debug option into VHDL or Verilog or EDIF or SystemC. Be sure to use Mentor Graghics DK Design Suite 5.

do you still have access to Mentor Graphics DK Design Suite 5? I thought this was now obsolete

I think DK is not (yet) obsolete but barely alive, the latest version is
5.4_1 released back in 2011. Nowadays anybody interested in
C/C++/SystemC synthesis will have a wide choice from free to CatapultC.

Hans
www.ht-lab.com

martinjpearson
Guest

Mon Aug 24, 2015 10:17 am   



On Sunday, 23 August 2015 08:50:13 UTC+1, HT-Lab wrote:
Quote:
On 22/08/2015 23:14, martinjpearson wrote:
On Friday, 21 August 2015 18:36:59 UTC+1, ahmed...@gmail.com wrote:
On Tuesday, March 17, 2015 at 4:58:39 AM UTC+3, princesse91 wrote:
Hi Ahmed,
Can you tell me how did you generate VHDL from Handel-C?? I'm working on the conversion from c++ to vhdl with handel-c but i didn't know how to do it..
Thanks

It's very easy use interfaces for inputs and outputs in your Handel-C code. Then change the debug option into VHDL or Verilog or EDIF or SystemC. Be sure to use Mentor Graghics DK Design Suite 5.

do you still have access to Mentor Graphics DK Design Suite 5? I thought this was now obsolete

I think DK is not (yet) obsolete but barely alive, the latest version is
5.4_1 released back in 2011. Nowadays anybody interested in
C/C++/SystemC synthesis will have a wide choice from free to CatapultC.

Hans
www.ht-lab.com


Our site licence has now expired and Mentor will not renew it. Anyone have any experience of Impulse C? I'm drawn to the CSP based architecture

Johann Klammer
Guest

Wed Aug 26, 2015 8:38 pm   



On 08/26/2015 02:03 PM, David Brown wrote:
Quote:
On 26/08/15 01:20, Johann Klammer wrote:
Hello,
How are typical CPLD input muxen build up?

(For other posters, a CPLD is a "complex programmable logic device". It
is a bit like a simple FPGA - there is no fixed dividing line between
them, but CPLD's tend to be built from a fairly small number of fairly
complex "macrocells" containing a flip-flop and a set of AND/OR trees
for logic operations, while FPGA's tend to have a large number of much
simpler cells and use lookup tables for logic.)


I was looking at the ATMEL .jed files and
their input muxen seem to be a series of sparse, one-
hot encoded bitfields with fewer bits
than inputs in total. So now I am wondering:
How are they distributed?
Just the name of the permutation problem might help, or
some other relevant search terms.


You are unlikely to be able to make sense of the programming file for
even the simplest of PLD. It is not information that is published by
the manufacturers, making it almost impossible to figure out which bits
are used for the routing, the AND/OR tables, and other features. But it

Their .jed files have comments... In this case I know which is which.
guessing the and-matrix assignment is trivial(done from the input equations).
The meaning of the MC fuses can be found by trial&error, I believe.

the mux feeding those input lines is different... there's 40 lines coming into
the and array from the outside(80 counting both inv and noninv ones). +16 local loopbacks(not muxed)...
but the widths of the 40 muxes are not sufficiently large to do arbitrary selections from
the inputs.

as far as I can tell from their docs and .jed
their devices have

dev: 1502 1504 1508
input lines: 68 132 260
mux width: 5 8 27
(40 muxes in their input switch)


[...]

Aleksandar Kuktin
Guest

Sun Sep 13, 2015 12:38 pm   



Excuse me for a very long gap that I was absent.

On Tue, 18 Aug 2015 21:40:27 -0400, Richard Damon wrote:

Quote:
On 8/18/15 11:35 AM, rickman wrote:
I'm not sure what details of the routing the chip editors leave out.
You only need to know what is connected to what, through what and what
the delays for all those cases are. Other than that, the routing does
just "work".

Look closely. [...] Those rows and columns are made of a
(large) number of distinct wires with routing resources connecting
outputs to select lines and select lines being brought into the next
piece of routing/logic. Which wire is being used will not be indicated,
nor are all the wires interchangeable, so which wire can matter for
fitting. THIS is the missing information.


A comment:

All this information sounds like it can be teased out of the physical
chips. But, I find there are other considerations. First, doing this
might jeopardize the FPGA manufacturer. It's no good to have a FOSS
toolchain but no FPGAs to use it on. Second, there is the problem of a
FPGA manufacturer releasing a small tweak that would invalidate the
entire work done. Third, and this is where it gets interesting, the time
and effort spent reverse-engineering a great number of FPGA models is
probably better spent engineering a FOSH ASIC toolchain together with the
assorted manufacturing technology. Because - honestly - if you are
willing to program FPGAs, you are really not very far away from forging
ASICs, are you?

Speaking for myself, I'm working alone on FPGAs far away from silicon
powerhouses and I have to jump through hoops and loops to get the chips.
Jumping through hoops and loops to get my design forged into an ASIC is
not really that different.

Cecil Bayona
Guest

Tue Aug 30, 2016 2:49 pm   



On 8/30/2016 1:11 AM, rickman wrote:
Quote:
On 8/30/2016 12:03 AM, Cecil Bayona wrote:
On 8/29/2016 7:55 PM, rickman wrote:
On 8/29/2016 4:30 PM, Cecil Bayona wrote:
Nothing Fancy, that is why in my earlier post I mentioned that I don't
have a lot of experience. I been working a 32 bit stack based CPU, but
it's a work in progress, I'm still sorting it out, it taken less than
20% of the chip, but a stack CPU are rather simple compared to other
CPU's, when finished it should be pretty nice, most instructions take
one clock to execute, and it used packed instructions, 5
instructions to
a word fetch. Originally it was on a Lattice Brevia2, I am now
converting it to a Artix-7 board, but there is software involved too so
it's going slow and I'm learning as I go.

Just a comment on your stack processor. I've done some design work with
stack processors and read about a lot of designs. In my humble opinion,
if you have multiple cycle instructions, you are doing it wrong. I
don't want to steal the thread. If you care to discuss this we can
start another thread.

I'm not sure why you think that it uses multiple cock instruction, I
mentioned that most occur in one clock, the exception is load immediate
it takes the instruction fetch, then a second fetch for the 32 bit value
to push on the stack, all others take one clock. Even that one can take
place in one clock with extra hardware to fetch RAM with a buffer to
hold two 32 bit words, an alternative is to use two clocks one to
execute the other to fetch program instructions.

What is does have is multiple instructions on one program word, it's
five instructions or one depending on what it is, jump, and call take
the whole 32 bit word and is not packed, everything else is 5
instructions to a 32 bit program word so you have fewer memory fetches.

Are you using external memory? The CPUs I've designed were 100%
internal to the FPGA so there was no advantage to fetching multiple
instructions. In fact, the multiplexer needed was just more delay.

I thought you design was not one clock per instruction because of what
you said. Your design is using a single memory interface. It is common
for stack machines to separate data and instruction space. But if you
are working out of external memory that is not so easy.

I assume you have read the various similar work that has been done? The
J1 is very interesting in that it was so durn simple. A true MISC
design and effective. If you meander over to comp.lang.forth there are
folks there who have designed some MISC CPUs which have proven their
worth. Bernd Paysan designed the B16 which seems like an effective
design. I've never tried to work with it, but it sounds similar to
yours in they way it combines multiple instructions into a single
instruction word.

I think the multiple instructions is to save on program space, 32 bits
to an instruction will eat up the memory space ways too quickly. It is a
Von Newman machine with a single address space, code and data is all in
the same space.

It uses three kinds of instruction formats, short and long, and double word.

A short instruction is 6 bits so you can cram 5 instruction to a memory
word, so there are 64 possible instructions but the bare machine uses 27
opcodes so there is room for additional instructions. The are things
that require no addresses such as dup, swap, return, add, etc.

Long instructions take 30 bits, they are things like jump, call,
including conditional versions, they include a 24 bit address as part of
the instruction which can address 16MB of program space, the upper two
bits are unused at present

There is a single two word instruction, which pushes the second 32 bit
word into the stack, it itself is a 6 bit instruction and there can be
more that one "LIT" instruction, each one has an additional word to use
as a literal. Each 32 bit program word could have 6 LIT instructions
with each one having an additional 32 bit word following the program
code word so if you have six literal words packed into single 32 bit
program word, it is followed by six 32 words containing the literal
values, the program continues 7 words after the current 6 instruction
word, of course one can have fewer literals.

The whole thing is like a packed instruction pipeline it minimizes the
number of fetches but any control flow instruction cancels and dumps the
current instruction queue and does a fetch at a different place since
the IP has changed.

The job of packing the instructions into one word is handled by the
Forth Compiler so the user does not see it, it just makes the program
code smaller automatically. I did some test and 5 is too long a queue,
when doing a 16 bit version it packs 3 instructions to a word and that
version ends up with a shorter program area so it's more efficient, in
part because Forth changes the Program Counter often so having a longer
instruction queue waste instruction slots. on a 16 bit version adding
some extra instructions would be nice so you end up with add with carry,
subtract with borrow etc so it can handle 32 operations efficiently.

Overall its an efficient design but it could be improved as anything
else could. One could add combined instructions where a return is
combined with some regular instructions so it execute both in one clock.

Eventually I want to work with the J1, its a simpler 16 bit machine but
since the instruction word contains multiple fields it can have
instructions that do multiple operations at the same time naturally,
with a more complex compiler packing multiple instructions it might save
on program space. It does have very limited addressing space due to it's
instructions limited to 16 bits.
--
Cecil - k5nwa

rickman
Guest

Tue Aug 30, 2016 3:35 pm   



On 8/30/2016 4:49 AM, Cecil Bayona wrote:
Quote:


On 8/30/2016 1:11 AM, rickman wrote:
On 8/30/2016 12:03 AM, Cecil Bayona wrote:
On 8/29/2016 7:55 PM, rickman wrote:
On 8/29/2016 4:30 PM, Cecil Bayona wrote:
Nothing Fancy, that is why in my earlier post I mentioned that I don't
have a lot of experience. I been working a 32 bit stack based CPU,
but
it's a work in progress, I'm still sorting it out, it taken less than
20% of the chip, but a stack CPU are rather simple compared to other
CPU's, when finished it should be pretty nice, most instructions take
one clock to execute, and it used packed instructions, 5
instructions to
a word fetch. Originally it was on a Lattice Brevia2, I am now
converting it to a Artix-7 board, but there is software involved
too so
it's going slow and I'm learning as I go.

Just a comment on your stack processor. I've done some design work
with
stack processors and read about a lot of designs. In my humble
opinion,
if you have multiple cycle instructions, you are doing it wrong. I
don't want to steal the thread. If you care to discuss this we can
start another thread.

I'm not sure why you think that it uses multiple cock instruction, I
mentioned that most occur in one clock, the exception is load immediate
it takes the instruction fetch, then a second fetch for the 32 bit value
to push on the stack, all others take one clock. Even that one can take
place in one clock with extra hardware to fetch RAM with a buffer to
hold two 32 bit words, an alternative is to use two clocks one to
execute the other to fetch program instructions.

What is does have is multiple instructions on one program word, it's
five instructions or one depending on what it is, jump, and call take
the whole 32 bit word and is not packed, everything else is 5
instructions to a 32 bit program word so you have fewer memory fetches.

Are you using external memory? The CPUs I've designed were 100%
internal to the FPGA so there was no advantage to fetching multiple
instructions. In fact, the multiplexer needed was just more delay.

I thought you design was not one clock per instruction because of what
you said. Your design is using a single memory interface. It is common
for stack machines to separate data and instruction space. But if you
are working out of external memory that is not so easy.

I assume you have read the various similar work that has been done? The
J1 is very interesting in that it was so durn simple. A true MISC
design and effective. If you meander over to comp.lang.forth there are
folks there who have designed some MISC CPUs which have proven their
worth. Bernd Paysan designed the B16 which seems like an effective
design. I've never tried to work with it, but it sounds similar to
yours in they way it combines multiple instructions into a single
instruction word.

I think the multiple instructions is to save on program space, 32 bits
to an instruction will eat up the memory space ways too quickly. It is a
Von Newman machine with a single address space, code and data is all in
the same space.


Again, that is a reflection of having a common address space for
instructions and data. My CPUs use 8 or 9 bits for instructions while
being data size independent. The instruction format does not imply any
particular data bus size.


Quote:
It uses three kinds of instruction formats, short and long, and double
word.

A short instruction is 6 bits so you can cram 5 instruction to a memory
word, so there are 64 possible instructions but the bare machine uses 27
opcodes so there is room for additional instructions. The are things
that require no addresses such as dup, swap, return, add, etc.

Long instructions take 30 bits, they are things like jump, call,
including conditional versions, they include a 24 bit address as part of
the instruction which can address 16MB of program space, the upper two
bits are unused at present

There is a single two word instruction, which pushes the second 32 bit
word into the stack, it itself is a 6 bit instruction and there can be
more that one "LIT" instruction, each one has an additional word to use
as a literal. Each 32 bit program word could have 6 LIT instructions
with each one having an additional 32 bit word following the program
code word so if you have six literal words packed into single 32 bit
program word, it is followed by six 32 words containing the literal
values, the program continues 7 words after the current 6 instruction
word, of course one can have fewer literals.


I've shied away from multicycle instructions because it means more bits
(1 bit in this case) to indicate the cycle count which is more input(s)
to the decoder. I wanted to try to keep the decoder as simple as possible.


Quote:
The whole thing is like a packed instruction pipeline it minimizes the
number of fetches but any control flow instruction cancels and dumps the
current instruction queue and does a fetch at a different place since
the IP has changed.


The F18A does that. Learning how to pack instructions into the word is
a bit tricky. Even harder is learning how to time execution, but that's
because it is async and does not use a fixed frequency clock.


Quote:
The job of packing the instructions into one word is handled by the
Forth Compiler so the user does not see it, it just makes the program
code smaller automatically. I did some test and 5 is too long a queue,
when doing a 16 bit version it packs 3 instructions to a word and that
version ends up with a shorter program area so it's more efficient, in
part because Forth changes the Program Counter often so having a longer
instruction queue waste instruction slots. on a 16 bit version adding
some extra instructions would be nice so you end up with add with carry,
subtract with borrow etc so it can handle 32 operations efficiently.

Overall its an efficient design but it could be improved as anything
else could. One could add combined instructions where a return is
combined with some regular instructions so it execute both in one clock.


One of the things I have looked at briefly is breaking out the three
"engines" so each one is separately controlled by fields in the
instruction. This will require a larger instruction, but 16 bits should
be enough. Then many types of instructions could be combined. I didn't
pursue it because I didn't want to work on the assembler that would
handle it.


Quote:
Eventually I want to work with the J1, its a simpler 16 bit machine but
since the instruction word contains multiple fields it can have
instructions that do multiple operations at the same time naturally,
with a more complex compiler packing multiple instructions it might save
on program space. It does have very limited addressing space due to it's
instructions limited to 16 bits.


I seem to recall discussing this with someone else not too long ago. I
don't think there actually is much parallelism possible with the J1
design. You can combine a return instruction with arithmetic
instructions, and there are fields to adjust the stack independently.
So you might be able to combine say, 2DUP + in one instruction by using
+ with a DSTACK +1 instead of a -1. But the useful combos will be
limited. The utility is also limited by the instruction frequency.
Only 35% of the instructions in the app the J1 was designed for are ALU
instructions which are the only ones that can be paralleled.

In my CPU design I had separate "engines" for the data stack (ALU), the
return stack (destination for literals, addresses for all operations and
looping control) and the instruction fetch. If each of these had
separate fields in the instruction word there might be more opportunity
for parallelism. But as I said, I didn't pursue this because the
software support would be messy. I'd like to get back to that, but it
won't happen any time soon.

--

Rick C


Guest

Thu Sep 01, 2016 6:51 pm   



I wonder if they ever gave it to him ?
Would they do it now that these chips are no longer produced?

GaborSzakacs
Guest

Fri Sep 02, 2016 12:09 am   



lolinka04_at_gmail.com wrote:
Quote:
I wonder if they ever gave it to him ?
Would they do it now that these chips are no longer produced?



If you're going to reply to an 18-year-old post, it would
be nice to quote the thread for those who don't keep that
many headers downloaded.

In any case, AMD is long out of the SPLD business, but the
22V10 still lives on. Atmel is making them as the
ATF22V10:

http://www.atmel.com/Images/doc0735.pdf

The data sheet mentions "flash" technology, so I assume the
programming algorithm has changed since the EEPROM versions
made by Lattice, TI, and AMD.

Are you planning to build your own programmer?

--
Gabor

Jan Coombs
Guest

Fri Sep 02, 2016 1:07 am   



On Thu, 1 Sep 2016 09:51:36 -0700 (PDT)
lolinka04_at_gmail.com wrote:

Quote:
I wonder if they ever gave it to him ?
Would they do it now that these chips are no longer produced?


I did a project to program early Flash logic parts. This
included a PC plugin board, borrowed equation compiler, and
configuration stream generator.

Why might this ancient history be of interest?

Jan Coombs


Guest

Fri Sep 02, 2016 10:53 pm   



In my youth I tried to build a GAL programmer, but I never got it to work with the samples I had.

Later I found out that there appeared to be quite different programming algorithms for different parts with the same name, so be aware...

Thomas

Jan Coombs
Guest

Sun Oct 02, 2016 8:46 pm   



On Fri, 2 Sep 2016 13:53:25 -0700 (PDT)
thomas.entner99_at_gmail.com wrote:

Quote:
In my youth I tried to build a GAL programmer, but I never got
it to work with the samples I had.


Youth, thats the bit I missed out on, I was a mature student
then I did the EPLD work.

Quote:
Later I found out that there appeared to be quite different
programming algorithms for different parts with the same name,
so be aware...


I did have the (NDA) programming documents from LatticeSemi and
Intel for these parts, and seem to remember that the Lattice
programming algorithms were not complete in the regular chip
documentation.


While moving Lattice tools around I noticed that here:

/home/.../Diamond/IspVMsystem/isptools/ispvmsystem/Database/ee9/22v10a/

are these files:

ispgal22v10avp28_spi_loader.jed
ispgal22v10avq32_spi_loader.jed
ispVM_005b.xdf

Perhaps some of the tools do still exist?

> Thomas

Jan Coombs
--


Guest

Thu Oct 20, 2016 1:47 am   



> What other content would you like to see?

They claim something impressive ("Translate Wikipedia in less than a Tenth of a Second") but give no details about the task nor the system.

If the claim is not total marketing nonsense I would imply that they mean translating from one language to another (e.g. English to German).

From the article link (and the picture) you could also imply that one FPGA (or the card in the hand of the guy) does this. But this is simply unbelievable. So the question is: How many FPGAs are involved? With out this, the claimed time is simply not meaningful, as double the number of FPGA will mean half the time (every Wikipedia article can be translated individually, so it is easy to execute the task in parallel...).

But I guess this is all not Microsoft's fault, but the problem of that specific link. I found following which gives much more insight at the end of the page:
https://www.top500.org/news/microsoft-goes-all-in-for-fpgas-to-build-out-cloud-based-ai/

There it says that 4 FPGAs (Stratix V D5, ca. 500k LE) would require 4 hours to translate Wikipedia. The 0.1 seconds are achieved with a huge cloud of such FPGA equipped systems...

Of course still impressive, but not the same as most people might think after reading the headline. (And it also makes me wonder about the future of the Altera/Intel low cost FPGAs, when to want to sell a Stratix into every server...)

Regards,

Thomas

www.entner-electronics.com - Home of EEBlaster and JPEG Codec

rickman
Guest

Mon Oct 24, 2016 9:06 pm   



On 10/19/2016 7:47 PM, thomas.entner99_at_gmail.com wrote:
Quote:
What other content would you like to see?

They claim something impressive ("Translate Wikipedia in less than a Tenth of a Second") but give no details about the task nor the system.

If the claim is not total marketing nonsense I would imply that they mean translating from one language to another (e.g. English to German).

From the article link (and the picture) you could also imply that one FPGA (or the card in the hand of the guy) does this. But this is simply unbelievable. So the question is: How many FPGAs are involved? With out this, the claimed time is simply not meaningful, as double the number of FPGA will mean half the time (every Wikipedia article can be translated individually, so it is easy to execute the task in parallel...).

But I guess this is all not Microsoft's fault, but the problem of that specific link. I found following which gives much more insight at the end of the page:
https://www.top500.org/news/microsoft-goes-all-in-for-fpgas-to-build-out-cloud-based-ai/

There it says that 4 FPGAs (Stratix V D5, ca. 500k LE) would require 4 hours to translate Wikipedia. The 0.1 seconds are achieved with a huge cloud of such FPGA equipped systems...

Of course still impressive, but not the same as most people might think after reading the headline. (And it also makes me wonder about the future of the Altera/Intel low cost FPGAs, when to want to sell a Stratix into every server...)


For sure the release is short of engineering data... it *is* a marketing
pitch. The point is they plan to be providing a combination of FPGA and
CPU which will run much faster and use less power than the CPU alone.
No, they aren't offering hard numbers and the task of translating
wikipedia is not really the best benchmark for serving up or searching
web pages. It is meant to offer a metric that even laymen can relate to.

In other words, it's meant to sound good to those who would not
understand more engineering information.

Microsoft has no incentive to sell FPGAs. Their incentive is to provide
the software on faster hardware. If the hardware doesn't pan out,
Microsoft gets nothing but expenses.

--

Rick C

Goto page Previous  1, 2, 3 ... 354, 355, 356

elektroda.net NewsGroups Forum Index - FPGA - EDK : FSL macros defined by Xilinx are wrong

Ask a question - edaboard.com

Arabic versionBulgarian versionCatalan versionCzech versionDanish versionGerman versionGreek versionEnglish versionSpanish versionFinnish versionFrench versionHindi versionCroatian versionIndonesian versionItalian versionHebrew versionJapanese versionKorean versionLithuanian versionLatvian versionDutch versionNorwegian versionPolish versionPortuguese versionRomanian versionRussian versionSlovak versionSlovenian versionSerbian versionSwedish versionTagalog versionUkrainian versionVietnamese versionChinese version
RTV map EDAboard.com map News map EDAboard.eu map EDAboard.de map EDAboard.co.uk map