Spectral Purity Measurement

rickman · Dec 24, 2014

On 12/24/2014 3:24 AM, Rob Doyle wrote:

On 12/23/2014 9:40 PM, rickman wrote:
On 12/23/2014 11:02 PM, Rob Doyle wrote:
On 12/23/2014 6:10 PM, robert bristow-johnson wrote:

this did not seem to get posted so i am reposting. sorry for any
repeated post.

On 12/22/14 5:17 PM, Rob Doyle wrote:
On 12/21/2014 5:13 PM, Eric Jacobsen wrote:
On Sun, 21 Dec 2014 14:52:40 -0500, robert bristow-johnson
rbj@audioimagination.com> wrote:

On 12/19/14 11:04 PM, Eric Jacobsen wrote:
On Fri, 19 Dec 2014 18:19:24 -0500, robert bristow-johnson
rbj@audioimagination.com> wrote:

On 12/19/14 10:06 AM, rickman wrote:
I want to analyze the output of a DDS circuit and am
wondering if an FFT is the best way to do this. I'm
mainly concerned with the "close
in"
spurs that are often generated by a DDS.

i still get the concepts of DDS and NCO mixed up. what are
the differences?

One is spelled DDS and the other is spelled NCO.

is the NCO the typical table-lookup kind (with phase
accumulator)? or can it be algorithmic? like

y[n] = (2*cos(omega_0))*y[n-1] - y[n-2]

where omega_0 is the normalized angular frequency of the
sinusoid and with appropriate initial states, y[-1] and y[-2]
to result in the amplitude and initial phase desired.

is that an NCO that can be used in this DDS? or must it be
LUT?

Generally NCO or DDS refers to a phase accumulator with a LUT,
since it is easily implemented in hardware. That's a general
architecture that is well-known and can be adjusted to produce
very clean local oscillators. If somebody tried to sell me a
block of IP with an "NCO" built some other way I'd be asking a
lot of questions.

I have built NCOs using CORDIC rotators. No lookup tables. They
pipeline nicely and are therefore very fast, they require no
multipliers [1],

???

i don't s'pose Ray Andraka is hanging around (he was Dr. CORDIC here
a while back), but i always thought that CORDIC did essentially

x[n] = cos(2*pi*f0/Fs) * x[n-1] - sin(2*pi*f0/Fs) * y[n-1]
y[n] = sin(2*pi*f0/Fs) * x[n-1] + cos(2*pi*f0/Fs) * y[n-1]

Yes. So far. So good. These are my notes if anyone is interested...

[snip]

Assume theta = 2*pi*f0*t/fs, i.e., theta is the output of a phase
accumulator for an NCO application.

Factor out the cos(theta):

x[n] = cos(theta) {x[n-1] - y[n-1] tan(theta)}
y[n] = cos(theta) {y[n-1] + x[n-1] tan(theta)}

If you select tan(theta) from the set of 1/(2**i) then [1] this becomes:

x[n] = cos(theta) {x[n-1] - y[n-1] / 2**i}
y[n] = cos(theta) {y[n-1] + x[n-1] / 2**i}

At this point you might be thinking "Holy crap. That's one heck of a
constraint!" Yeh... but keep reading anyway.

You can drop the cos(theta) common term. It's just a gain term that
rapidly converges to 1.647. Therefore the gain of a CORDIC is not 0 dB.

x[n] = x[n-1] - y[n-1] / 2**i
y[n] = y[n-1] + x[n-1] / 2**i

or (assuming twos complement math) - simply:

x[n] = x[n-1] - y[n-1] >> i
y[n] = y[n-1] + x[n-1] >> i

where >> is a shift right operation

[1] As this point it seems as if an *extreme* limitation has been placed
on the selection of rotation angles. The equation above only describes
how to rotate an input signal by tan(theta) = 1/(2**i) - or by one of
the following angles:

atan(1) (45.000000000000000000000000000000 degrees)
atan(1/2) (26.565051177077989351572193720453 degrees)
atan(1/4) (14.036243467926478582892320159163 degrees)
atan(1/8) (7.1250163489017975619533008412068 degrees)
atan(1/16) (3.5763343749973510306847789144588 degrees)
atan(1/32) (1.7899106082460693071502497760791 degrees)

...and so forth.

The equation above does not describe how to rotate an input signal an
arbitrary angle! Although this is true; all is not lost.

Notice that in general that theta/2 < tan(theta).

This truth allows the CORDIC to be used iteratively to rotate any input
to any angle with any precision. IMO this is the genius of the CORDIC.

I probably should have mentioned that you swap the rotation direction by
flipping the additions and subtractions.

The term z[n] is introduced to accumulate the angle as the CORDIC
iterates. The term d[n] swaps the direction of rotation. Finally the
familiar recursive CORDIC equation can be written as follows:

x[n] = x[n-1] - d[n] y[n-1] >> i
y[n] = y[n-1] + d[n] x[n-1] >> i
z[n] = z[n-1] - d[n] tan(1/2**i)

where:

d[n] is +1 for z[n-1] < theta. Clockwise rotation next.
d[n] is -1 for z[n-1] > theta. Counter-clockwise rotation next.

No multiplies here.

But this is the same as a multiply in terns of complexity, no? One
large difference is that a multiply can be supported in commonly
available hardware while this algorithm requires dedicated hardware or
iterative software.

I agree that the CORDIC has the same complexity as a multiply. I agree
that table-based algorithms using multipliers use less FPGA fabric.

I was simply pointing out that there might be places where a CORDIC has
advantages over LUT-based NCOs.

Especially if have ROM or multiplier limitations.

I also wanted to point out that if you need to do a 20-bit (using your
120dB example) complex downconversion for example, the CORDIC still
requires zero multipliers.

If you want to do a 20-bit complex downconversion using a table-based
NCO followed by a complex mixer, you might need a *lot* of multipliers.
If you only have an 18-bit multiplier, each multiplication requires
(maybe up to) 4 multiplier blocks and you need 8 multiplications.

I also /suspect/ that for any given device technology the CORDIC will
execute at higher speeds.

Thats all...

I understand, but the distinction between a multiplier and a CORDIC
implementation is pretty pointless these days. If you have the space to
implement a CORDIC wouldn't you have the space to implement an
iterative multiply? I did a linear interpolation just because I could
do the multiply iteratively while the previous sample was shifted out to
the CODEC. One adder is the same as the CORDIC, no?

I guess the difference is you only need one CORDIC while for a sine that
is not an approximation you need two multipliers. I don't see how one
would be faster than the other except for the case of a dedicated
multiplier being much faster.

--

Rick

robert bristow-johnson · Dec 25, 2014

On 12/24/14 4:13 AM, rickman wrote:

On 12/24/2014 3:24 AM, Rob Doyle wrote:
On 12/23/2014 9:40 PM, rickman wrote:
On 12/23/2014 11:02 PM, Rob Doyle wrote:
On 12/23/2014 6:10 PM, robert bristow-johnson wrote:

..... (a whole bunch of stuff)

so, Rick, did that built-in notch filter make any sense to you? it's so
cheap in software that i would think it would be cheap in VHDL or
whatever your hardware language is.

and i would just do LUT with linear interpolation. extend the table by
one point so that LUT[N] = LUT[0] and you won't have to worry about an
additional wrap-around in the linear interpolation. linear
interpolation has a sinc^2 frequency response, so it puts zeros smack
into the middle of images which reduces their amplitude a lot if the
content frequency is much less than the sample rate. if your LUT length
is decently long (like 512 or 1K or more), you'll do pretty good
regarding the "purity" of your sinusoid.

and with a perfectly tuned notch filter with, say, a 1/3 octave BW,
you'll know exactly what your impurities are in either time domain or
frequency domain (if you FFT it).

--

r b-j rbj@audioimagination.com

"Imagination is more important than knowledge."

rickman · Dec 25, 2014

On 12/24/2014 3:19 PM, robert bristow-johnson wrote:

On 12/24/14 4:13 AM, rickman wrote:
On 12/24/2014 3:24 AM, Rob Doyle wrote:
On 12/23/2014 9:40 PM, rickman wrote:
On 12/23/2014 11:02 PM, Rob Doyle wrote:
On 12/23/2014 6:10 PM, robert bristow-johnson wrote:

..... (a whole bunch of stuff)

so, Rick, did that built-in notch filter make any sense to you? it's so
cheap in software that i would think it would be cheap in VHDL or
whatever your hardware language is.

and i would just do LUT with linear interpolation. extend the table by
one point so that LUT[N] = LUT[0] and you won't have to worry about an
additional wrap-around in the linear interpolation. linear
interpolation has a sinc^2 frequency response, so it puts zeros smack
into the middle of images which reduces their amplitude a lot if the
content frequency is much less than the sample rate. if your LUT length
is decently long (like 512 or 1K or more), you'll do pretty good
regarding the "purity" of your sinusoid.

and with a perfectly tuned notch filter with, say, a 1/3 octave BW,
you'll know exactly what your impurities are in either time domain or
frequency domain (if you FFT it).

I thought I replied about the notch filter. I"m not clear on what it
buys me. If I FFT the data without the filter I get the same spectrum
with the signal present which does not interfere with the spectrum. So
what is the point? None of the analysis stuff will be implemented in
hardware, so that is not an issue.

BTW, in a sine table for linear interpolation, I don't use sine(0) as
the value in LUT(0). I give the points a half step phase offset with
the linear interp signed. I also offset the values to minimize the
error over the step range which will be interpolated. BTW, LUT(N) won't
equal LUT(0). Only 90Â° is stored so that in your table LUT(0) = 0 and
LUT(256) = 1. In my table each of the values are non-zero and not 1
although if the table is large enough, the value of LUT(N-1) is also 1.
Having a table of 2^N+1 entries is a PITA in hardware.

--

Rick

robert bristow-johnson · Dec 25, 2014

On 12/24/14 4:43 PM, rickman wrote:

On 12/24/2014 3:19 PM, robert bristow-johnson wrote:
On 12/24/14 4:13 AM, rickman wrote:
On 12/24/2014 3:24 AM, Rob Doyle wrote:
On 12/23/2014 9:40 PM, rickman wrote:
On 12/23/2014 11:02 PM, Rob Doyle wrote:
On 12/23/2014 6:10 PM, robert bristow-johnson wrote:

..... (a whole bunch of stuff)

....

and with a perfectly tuned notch filter with, say, a 1/3 octave BW,
you'll know exactly what your impurities are in either time domain or
frequency domain (if you FFT it).

I thought I replied about the notch filter. I"m not clear on what it
buys me. If I FFT the data without the filter I get the same spectrum
with the signal present which does not interfere with the spectrum.

do you know exactly what to subtract from the FFT to get whatever your
residual nasty stuff is? is that bump a sidelobe of your windowed
sinusoid or is it part of the "impurity" that you're measuring?

So
what is the point? None of the analysis stuff will be implemented in
hardware, so that is not an issue.

BTW, in a sine table for linear interpolation, I don't use sine(0) as
the value in LUT(0).

i don't think that matters.

I give the points a half step phase offset with the
linear interp signed.

nor that.

I also offset the values to minimize the error
over the step range which will be interpolated.

so it's kinda an optimal phase offset that gets your quantized sine
values the least error (however the total error is defined).

> BTW, LUT(N) won't equal LUT(0).

it's N+1 points instead of N. so it's the same N points you would have
had anyway, with one more added.

> Only 90Â° is stored so that in your table LUT(0) = 0 and LUT(256) = 1.

okay. i guess i'm looking at resources more like a software guy would.
if i were coding this for a DSP chip or something, i would just
quadruple the number of entries and have a single cycle of the waveform,
whatever it is.

In my table each of the values are non-zero and not 1 although if
the table is large enough, the value of LUT(N-1) is also 1. Having a
table of 2^N+1 entries is a PITA in hardware.

i understand. ((2*pi)/N)/2 radians offset so the points are the same
symmetry for each quadrant. and then it's sign manipulation that the
hardware folk don't mind fiddling with.

but you still know the frequency in advance and a notch filter can be
tuned to that frequency.

--

r b-j rbj@audioimagination.com

"Imagination is more important than knowledge."

rickman · Dec 25, 2014

On 12/24/2014 6:56 PM, robert bristow-johnson wrote:

On 12/24/14 4:43 PM, rickman wrote:
On 12/24/2014 3:19 PM, robert bristow-johnson wrote:
On 12/24/14 4:13 AM, rickman wrote:
On 12/24/2014 3:24 AM, Rob Doyle wrote:
On 12/23/2014 9:40 PM, rickman wrote:
On 12/23/2014 11:02 PM, Rob Doyle wrote:
On 12/23/2014 6:10 PM, robert bristow-johnson wrote:

..... (a whole bunch of stuff)

...

and with a perfectly tuned notch filter with, say, a 1/3 octave BW,
you'll know exactly what your impurities are in either time domain or
frequency domain (if you FFT it).

I thought I replied about the notch filter. I"m not clear on what it
buys me. If I FFT the data without the filter I get the same spectrum
with the signal present which does not interfere with the spectrum.

do you know exactly what to subtract from the FFT to get whatever your
residual nasty stuff is? is that bump a sidelobe of your windowed
sinusoid or is it part of the "impurity" that you're measuring?

I believe you are making this too complex. The measurement is a one
time thing. I can use as large a transform as I am willing to wait for
and therefore minimize the sidelobes. The measurement should be good
enough that if I can't measure it, I won't really care about it.

Check out these plots...

https://sites.google.com/site/fpgastuff/dds_oddities.pdf

The last page has some interesting data.

So
what is the point? None of the analysis stuff will be implemented in
hardware, so that is not an issue.

BTW, in a sine table for linear interpolation, I don't use sine(0) as
the value in LUT(0).

i don't think that matters.

I give the points a half step phase offset with the
linear interp signed.

nor that.

I also offset the values to minimize the error
over the step range which will be interpolated.

so it's kinda an optimal phase offset that gets your quantized sine
values the least error (however the total error is defined).

BTW, LUT(N) won't equal LUT(0).

it's N+1 points instead of N. so it's the same N points you would have
had anyway, with one more added.

Only 90Â° is stored so that in your table LUT(0) = 0 and LUT(256) = 1.

okay. i guess i'm looking at resources more like a software guy would.
if i were coding this for a DSP chip or something, i would just
quadruple the number of entries and have a single cycle of the waveform,
whatever it is.

Even in software that can get expensive. The LUT is O(2^N) in size so
you don't want N to get too large. *Much* better to use your N for
storing useful data rather than duplicate info.

In my table each of the values are non-zero and not 1 although if
the table is large enough, the value of LUT(N-1) is also 1. Having a
table of 2^N+1 entries is a PITA in hardware.

i understand. ((2*pi)/N)/2 radians offset so the points are the same
symmetry for each quadrant. and then it's sign manipulation that the
hardware folk don't mind fiddling with.

but you still know the frequency in advance and a notch filter can be
tuned to that frequency.

If there is a purpose to it.

--

Rick

rickman · Dec 25, 2014

On 12/24/2014 7:48 PM, rickman wrote:

On 12/24/2014 6:56 PM, robert bristow-johnson wrote:
On 12/24/14 4:43 PM, rickman wrote:
On 12/24/2014 3:19 PM, robert bristow-johnson wrote:
On 12/24/14 4:13 AM, rickman wrote:
On 12/24/2014 3:24 AM, Rob Doyle wrote:
On 12/23/2014 9:40 PM, rickman wrote:
On 12/23/2014 11:02 PM, Rob Doyle wrote:
On 12/23/2014 6:10 PM, robert bristow-johnson wrote:

..... (a whole bunch of stuff)

...

and with a perfectly tuned notch filter with, say, a 1/3 octave BW,
you'll know exactly what your impurities are in either time domain or
frequency domain (if you FFT it).

I thought I replied about the notch filter. I"m not clear on what it
buys me. If I FFT the data without the filter I get the same spectrum
with the signal present which does not interfere with the spectrum.

do you know exactly what to subtract from the FFT to get whatever your
residual nasty stuff is? is that bump a sidelobe of your windowed
sinusoid or is it part of the "impurity" that you're measuring?

I believe you are making this too complex. The measurement is a one
time thing. I can use as large a transform as I am willing to wait for
and therefore minimize the sidelobes. The measurement should be good
enough that if I can't measure it, I won't really care about it.

Check out these plots...

https://sites.google.com/site/fpgastuff/dds_oddities.pdf

The last page has some interesting data.

I hit send too quickly. I also meant to point out that the spurs of
interest are the ones closer to the carrier. How well can I filter the
carrier without filtering the side lobes?

So
what is the point? None of the analysis stuff will be implemented in
hardware, so that is not an issue.

BTW, in a sine table for linear interpolation, I don't use sine(0) as
the value in LUT(0).

i don't think that matters.

I give the points a half step phase offset with the
linear interp signed.

nor that.

I also offset the values to minimize the error
over the step range which will be interpolated.

so it's kinda an optimal phase offset that gets your quantized sine
values the least error (however the total error is defined).

BTW, LUT(N) won't equal LUT(0).

it's N+1 points instead of N. so it's the same N points you would have
had anyway, with one more added.

Only 90Â° is stored so that in your table LUT(0) = 0 and LUT(256) = 1.

okay. i guess i'm looking at resources more like a software guy would.
if i were coding this for a DSP chip or something, i would just
quadruple the number of entries and have a single cycle of the waveform,
whatever it is.

Even in software that can get expensive. The LUT is O(2^N) in size so
you don't want N to get too large. *Much* better to use your N for
storing useful data rather than duplicate info.

In my table each of the values are non-zero and not 1 although if
the table is large enough, the value of LUT(N-1) is also 1. Having a
table of 2^N+1 entries is a PITA in hardware.

i understand. ((2*pi)/N)/2 radians offset so the points are the same
symmetry for each quadrant. and then it's sign manipulation that the
hardware folk don't mind fiddling with.

but you still know the frequency in advance and a notch filter can be
tuned to that frequency.

If there is a purpose to it.

--

Rick

glen herrmannsfeldt · Dec 25, 2014

In comp.dsp Rob Doyle <radioengr@gmail.com> wrote:
> On 12/23/2014 6:10 PM, robert bristow-johnson wrote:

(snip)

i don't s'pose Ray Andraka is hanging around (he was Dr. CORDIC here
a while back), but i always thought that CORDIC did essentially

x[n] = cos(2*pi*f0/Fs) * x[n-1] - sin(2*pi*f0/Fs) * y[n-1]
y[n] = sin(2*pi*f0/Fs) * x[n-1] + cos(2*pi*f0/Fs) * y[n-1]

Yes. So far. So good. These are my notes if anyone is interested...

[snip]

Assume theta = 2*pi*f0*t/fs, i.e., theta is the output of a phase
accumulator for an NCO application.

Factor out the cos(theta):

x[n] = cos(theta) {x[n-1] - y[n-1] tan(theta)}
y[n] = cos(theta) {y[n-1] + x[n-1] tan(theta)}

If you select tan(theta) from the set of 1/(2**i)
then [1] this becomes:

Nice if you are doing it in binary, but many hand calculators
do it in decimal. I believe I have seen the explanation once, but
it is much harder to find than the binary version.

This goes back to at least the beginning of HP hand calculators.

/ 2**i}
y[n] = cos(theta) {y[n-1] + x[n-1] / 2**i}

At this point you might be thinking "Holy crap. That's one heck of a
constraint!" Yeh... but keep reading anyway.

You can drop the cos(theta) common term. It's just a gain term that
rapidly converges to 1.647. Therefore the gain of a CORDIC is not 0 dB.

x[n] = x[n-1] - y[n-1] / 2**i
y[n] = y[n-1] + x[n-1] / 2**i

or (assuming twos complement math) - simply:

x[n] = x[n-1] - y[n-1] >> i
y[n] = y[n-1] + x[n-1] >> i

where >> is a shift right operation

[1] As this point it seems as if an *extreme* limitation has been placed
on the selection of rotation angles. The equation above only describes
how to rotate an input signal by tan(theta) = 1/(2**i) - or by one of
the following angles:

atan(1) (45.000000000000000000000000000000 degrees)
atan(1/2) (26.565051177077989351572193720453 degrees)
atan(1/4) (14.036243467926478582892320159163 degrees)
atan(1/8) (7.1250163489017975619533008412068 degrees)
atan(1/16) (3.5763343749973510306847789144588 degrees)
atan(1/32) (1.7899106082460693071502497760791 degrees)

(snip)

-- glen

Eric Jacobsen · Dec 25, 2014

On Tue, 23 Dec 2014 18:10:43 -0500, rickman <gnuarm@gmail.com> wrote:

On 12/23/2014 4:48 PM, Eric Jacobsen wrote:
On Tue, 23 Dec 2014 11:06:39 -0500, rickman <gnuarm@gmail.com> wrote:

On 12/23/2014 10:35 AM, Eric Jacobsen wrote:
On Mon, 22 Dec 2014 15:17:23 -0700, Rob Doyle <radioengr@gmail.com
wrote:

On 12/21/2014 5:13 PM, Eric Jacobsen wrote:
On Sun, 21 Dec 2014 14:52:40 -0500, robert bristow-johnson
rbj@audioimagination.com> wrote:

On 12/19/14 11:04 PM, Eric Jacobsen wrote:
On Fri, 19 Dec 2014 18:19:24 -0500, robert bristow-johnson
rbj@audioimagination.com> wrote:

On 12/19/14 10:06 AM, rickman wrote:
I want to analyze the output of a DDS circuit and am wondering if an FFT
is the best way to do this. I'm mainly concerned with the "close in"
spurs that are often generated by a DDS.

i still get the concepts of DDS and NCO mixed up. what are the differences?

One is spelled DDS and the other is spelled NCO.

is the NCO the typical table-lookup kind (with phase accumulator)? or
can it be algorithmic? like

y[n] = (2*cos(omega_0))*y[n-1] - y[n-2]

where omega_0 is the normalized angular frequency of the sinusoid and
with appropriate initial states, y[-1] and y[-2] to result in the
amplitude and initial phase desired.

is that an NCO that can be used in this DDS? or must it be LUT?

Generally NCO or DDS refers to a phase accumulator with a LUT, since
it is easily implemented in hardware. That's a general architecture
that is well-known and can be adjusted to produce very clean local
oscillators. If somebody tried to sell me a block of IP with an
"NCO" built some other way I'd be asking a lot of questions.

I have built NCOs using CORDIC rotators. No lookup tables. They pipeline
nicely and are therefore very fast, they require no multipliers [1],
they generate quadrature outputs for free, they can perform frequency
translations for free (again no multipliers), and they are simple prove
numerical accuracy.

CORDICs are fine when and where they make sense, but they are often
not the best tradeoff. If you have no memory, no multipliers, or
gates are way cheaper than memory, and if the latency is tolerable,
then a CORDIC may be a good option.

[1] Maybe not a huge issue these days. The LUT-based NCOs requires two
multipliers to combine the coarse and fine LUTs (four multipliers if you
need a complex NCO output) and perhaps another four multipliers if you
need to do a frequency translation.

Many applications don't need separate LUTs to get the required
performance, and even then, or even in the case of complex output, it
can be done without multipliers.

Care to elaborate on this? I'm not at all clear on how you make a LUT
based NCO without LUTs and unless you are using a very coarse
approximation, without multipliers.

Not sure what you're asking. You a need a LUT, but just one in many
cases. Having a dual-ported single LUT is easy in an FPGA and
usually in silicon as well.

What makes a multiplier necessary? I've never found the need, but my
apps are limited to comm.

Maybe we aren't on the same page. The multiplier is there for the fine
adjustment. If you are happy with some -60 or -80 dB spurs one LUT is
fine. But if you want better performance the single LUT approach
requires *very* large tables.

There are a lot of tricks that can be used to keep the table size
down. I've mentioned one already.

Eric Jacobsen
Anchor Hill Communications
http://www.anchorhill.com

Eric Jacobsen · Dec 25, 2014

On Wed, 24 Dec 2014 01:24:42 -0700, Rob Doyle <radioengr@gmail.com>
wrote:

On 12/23/2014 9:40 PM, rickman wrote:
On 12/23/2014 11:02 PM, Rob Doyle wrote:
On 12/23/2014 6:10 PM, robert bristow-johnson wrote:

this did not seem to get posted so i am reposting. sorry for any
repeated post.

On 12/22/14 5:17 PM, Rob Doyle wrote:
On 12/21/2014 5:13 PM, Eric Jacobsen wrote:
On Sun, 21 Dec 2014 14:52:40 -0500, robert bristow-johnson
rbj@audioimagination.com> wrote:

On 12/19/14 11:04 PM, Eric Jacobsen wrote:
On Fri, 19 Dec 2014 18:19:24 -0500, robert bristow-johnson
rbj@audioimagination.com> wrote:

On 12/19/14 10:06 AM, rickman wrote:
I want to analyze the output of a DDS circuit and am
wondering if an FFT is the best way to do this. I'm
mainly concerned with the "close
in"
spurs that are often generated by a DDS.

i still get the concepts of DDS and NCO mixed up. what are
the differences?

One is spelled DDS and the other is spelled NCO.

is the NCO the typical table-lookup kind (with phase
accumulator)? or can it be algorithmic? like

y[n] = (2*cos(omega_0))*y[n-1] - y[n-2]

where omega_0 is the normalized angular frequency of the
sinusoid and with appropriate initial states, y[-1] and y[-2]
to result in the amplitude and initial phase desired.

is that an NCO that can be used in this DDS? or must it be
LUT?

Generally NCO or DDS refers to a phase accumulator with a LUT,
since it is easily implemented in hardware. That's a general
architecture that is well-known and can be adjusted to produce
very clean local oscillators. If somebody tried to sell me a
block of IP with an "NCO" built some other way I'd be asking a
lot of questions.

I have built NCOs using CORDIC rotators. No lookup tables. They
pipeline nicely and are therefore very fast, they require no
multipliers [1],

???

i don't s'pose Ray Andraka is hanging around (he was Dr. CORDIC here
a while back), but i always thought that CORDIC did essentially

x[n] = cos(2*pi*f0/Fs) * x[n-1] - sin(2*pi*f0/Fs) * y[n-1]
y[n] = sin(2*pi*f0/Fs) * x[n-1] + cos(2*pi*f0/Fs) * y[n-1]

Yes. So far. So good. These are my notes if anyone is interested...

[snip]

Assume theta = 2*pi*f0*t/fs, i.e., theta is the output of a phase
accumulator for an NCO application.

Factor out the cos(theta):

x[n] = cos(theta) {x[n-1] - y[n-1] tan(theta)}
y[n] = cos(theta) {y[n-1] + x[n-1] tan(theta)}

If you select tan(theta) from the set of 1/(2**i) then [1] this becomes:

x[n] = cos(theta) {x[n-1] - y[n-1] / 2**i}
y[n] = cos(theta) {y[n-1] + x[n-1] / 2**i}

At this point you might be thinking "Holy crap. That's one heck of a
constraint!" Yeh... but keep reading anyway.

You can drop the cos(theta) common term. It's just a gain term that
rapidly converges to 1.647. Therefore the gain of a CORDIC is not 0 dB.

x[n] = x[n-1] - y[n-1] / 2**i
y[n] = y[n-1] + x[n-1] / 2**i

or (assuming twos complement math) - simply:

x[n] = x[n-1] - y[n-1] >> i
y[n] = y[n-1] + x[n-1] >> i

where >> is a shift right operation

[1] As this point it seems as if an *extreme* limitation has been placed
on the selection of rotation angles. The equation above only describes
how to rotate an input signal by tan(theta) = 1/(2**i) - or by one of
the following angles:

atan(1) (45.000000000000000000000000000000 degrees)
atan(1/2) (26.565051177077989351572193720453 degrees)
atan(1/4) (14.036243467926478582892320159163 degrees)
atan(1/8) (7.1250163489017975619533008412068 degrees)
atan(1/16) (3.5763343749973510306847789144588 degrees)
atan(1/32) (1.7899106082460693071502497760791 degrees)

...and so forth.

The equation above does not describe how to rotate an input signal an
arbitrary angle! Although this is true; all is not lost.

Notice that in general that theta/2 < tan(theta).

This truth allows the CORDIC to be used iteratively to rotate any input
to any angle with any precision. IMO this is the genius of the CORDIC.

I probably should have mentioned that you swap the rotation direction by
flipping the additions and subtractions.

The term z[n] is introduced to accumulate the angle as the CORDIC
iterates. The term d[n] swaps the direction of rotation. Finally the
familiar recursive CORDIC equation can be written as follows:

x[n] = x[n-1] - d[n] y[n-1] >> i
y[n] = y[n-1] + d[n] x[n-1] >> i
z[n] = z[n-1] - d[n] tan(1/2**i)

where:

d[n] is +1 for z[n-1] < theta. Clockwise rotation next.
d[n] is -1 for z[n-1] > theta. Counter-clockwise rotation next.

No multiplies here.

But this is the same as a multiply in terns of complexity, no? One
large difference is that a multiply can be supported in commonly
available hardware while this algorithm requires dedicated hardware or
iterative software.

I agree that the CORDIC has the same complexity as a multiply. I agree
that table-based algorithms using multipliers use less FPGA fabric.

I was simply pointing out that there might be places where a CORDIC has
advantages over LUT-based NCOs.

Especially if have ROM or multiplier limitations.

I also wanted to point out that if you need to do a 20-bit (using your
120dB example) complex downconversion for example, the CORDIC still
requires zero multipliers.

If you want to do a 20-bit complex downconversion using a table-based
NCO followed by a complex mixer, you might need a *lot* of multipliers.
If you only have an 18-bit multiplier, each multiplication requires
(maybe up to) 4 multiplier blocks and you need 8 multiplications.

I also /suspect/ that for any given device technology the CORDIC will
execute at higher speeds.

Thats all...

That's been my experience; that if multipliers are scarce or too
expensive, or memory is scarce or too expensive, then a CORDIC is a
nice back-up option. These days multipliers and memory are both
plentiful in most platforms, so CORDICs just aren't as useful as they
used to be.

The latency is sometimes an issue as well.

There are still some places where they make sense, though.

The CORDIC simply does a successive approximation to the angle -
rotating the angle clockwise or counter-clockwise by these limited
selection of angles as necessary to converge on the desired angle. Each
time the iteration occurs, the angle error is reduced by at least half.

doesn't that require a few multiplications?

Nope. Just adds/subtracts - the sign of the angle error determines which
direction to rotate on the next iteration. If this is pipelined, the
shifts aren't real - they just select which bits of the previous
iteration are used on the next iteration. The tan(1/2**i) term is a
constant for each iteration.

As an implementation detail, it saves hardware if you iterate from
the angle toward zero instead of from zero toward the angle. If you do
that, the sign of z is the direction of rotation. It saves a
magnitude compare for each iteration.

Rob.

Eric Jacobsen
Anchor Hill Communications
http://www.anchorhill.com

Eric Jacobsen · Dec 25, 2014

On Wed, 24 Dec 2014 18:56:41 -0500, robert bristow-johnson
<rbj@audioimagination.com> wrote:

On 12/24/14 4:43 PM, rickman wrote:
On 12/24/2014 3:19 PM, robert bristow-johnson wrote:
On 12/24/14 4:13 AM, rickman wrote:
On 12/24/2014 3:24 AM, Rob Doyle wrote:
On 12/23/2014 9:40 PM, rickman wrote:
On 12/23/2014 11:02 PM, Rob Doyle wrote:
On 12/23/2014 6:10 PM, robert bristow-johnson wrote:

..... (a whole bunch of stuff)

...

and with a perfectly tuned notch filter with, say, a 1/3 octave BW,
you'll know exactly what your impurities are in either time domain or
frequency domain (if you FFT it).

I thought I replied about the notch filter. I"m not clear on what it
buys me. If I FFT the data without the filter I get the same spectrum
with the signal present which does not interfere with the spectrum.

do you know exactly what to subtract from the FFT to get whatever your
residual nasty stuff is? is that bump a sidelobe of your windowed
sinusoid or is it part of the "impurity" that you're measuring?

So
what is the point? None of the analysis stuff will be implemented in
hardware, so that is not an issue.

BTW, in a sine table for linear interpolation, I don't use sine(0) as
the value in LUT(0).

i don't think that matters.

I give the points a half step phase offset with the
linear interp signed.

nor that.

Actually, not having a zero entries in the table for the zero
crossings can help solve some common problems with artifacts like
spurs (for oscillators) and jitter (for clock generators).

The case for the clock output is easy to explain, since the MSB duty
cycle is not symmetric when there are two zero entries. Just
offsetting the phase a tiny bit, even to just one LSB present in the
table output near the zero crossing, makes the MSB duty cycle 50% in
the table.

I also offset the values to minimize the error
over the step range which will be interpolated.

so it's kinda an optimal phase offset that gets your quantized sine
values the least error (however the total error is defined).

BTW, LUT(N) won't equal LUT(0).

it's N+1 points instead of N. so it's the same N points you would have
had anyway, with one more added.

Which means you just doubled the size of the memory.

Only 90Â° is stored so that in your table LUT(0) = 0 and LUT(256) = 1.

okay. i guess i'm looking at resources more like a software guy would.
if i were coding this for a DSP chip or something, i would just
quadruple the number of entries and have a single cycle of the waveform,
whatever it is.

A quarter cycle is all that's really needed.

In my table each of the values are non-zero and not 1 although if
the table is large enough, the value of LUT(N-1) is also 1. Having a
table of 2^N+1 entries is a PITA in hardware.

i understand. ((2*pi)/N)/2 radians offset so the points are the same
symmetry for each quadrant. and then it's sign manipulation that the
hardware folk don't mind fiddling with.

but you still know the frequency in advance and a notch filter can be
tuned to that frequency.

If there's only one frequency needed and it's known in advance, you
may not need a DDS/NCO. Usually an NCO is used because it needs to
vary a bit either for tracking or tuning or other adjustments.

Eric Jacobsen
Anchor Hill Communications
http://www.anchorhill.com

rickman · Dec 25, 2014

On 12/25/2014 10:52 AM, Eric Jacobsen wrote:

On Tue, 23 Dec 2014 18:10:43 -0500, rickman <gnuarm@gmail.com> wrote:

On 12/23/2014 4:48 PM, Eric Jacobsen wrote:
On Tue, 23 Dec 2014 11:06:39 -0500, rickman <gnuarm@gmail.com> wrote:

On 12/23/2014 10:35 AM, Eric Jacobsen wrote:

Many applications don't need separate LUTs to get the required
performance, and even then, or even in the case of complex output, it
can be done without multipliers.

Care to elaborate on this? I'm not at all clear on how you make a LUT
based NCO without LUTs and unless you are using a very coarse
approximation, without multipliers.

Not sure what you're asking. You a need a LUT, but just one in many
cases. Having a dual-ported single LUT is easy in an FPGA and
usually in silicon as well.

What makes a multiplier necessary? I've never found the need, but my
apps are limited to comm.

Maybe we aren't on the same page. The multiplier is there for the fine
adjustment. If you are happy with some -60 or -80 dB spurs one LUT is
fine. But if you want better performance the single LUT approach
requires *very* large tables.

There are a lot of tricks that can be used to keep the table size
down. I've mentioned one already.

And what was that? You have made some 20 or more posts in this thread,
I don't feel like weeding through all of them to find this. Reading
back through this thread it seems like your posts are intended to be
mysterious rather than informative. Every one leaves enough unsaid that
more questions are needed.

--

Rick

rickman · Dec 25, 2014

On 12/25/2014 10:55 AM, Eric Jacobsen wrote:

On Wed, 24 Dec 2014 01:24:42 -0700, Rob Doyle <radioengr@gmail.com
wrote:

I agree that the CORDIC has the same complexity as a multiply. I agree
that table-based algorithms using multipliers use less FPGA fabric.

I was simply pointing out that there might be places where a CORDIC has
advantages over LUT-based NCOs.

Especially if have ROM or multiplier limitations.

I also wanted to point out that if you need to do a 20-bit (using your
120dB example) complex downconversion for example, the CORDIC still
requires zero multipliers.

If you want to do a 20-bit complex downconversion using a table-based
NCO followed by a complex mixer, you might need a *lot* of multipliers.
If you only have an 18-bit multiplier, each multiplication requires
(maybe up to) 4 multiplier blocks and you need 8 multiplications.

I also /suspect/ that for any given device technology the CORDIC will
execute at higher speeds.

Thats all...

That's been my experience; that if multipliers are scarce or too
expensive, or memory is scarce or too expensive, then a CORDIC is a
nice back-up option. These days multipliers and memory are both
plentiful in most platforms, so CORDICs just aren't as useful as they
used to be.

I think the distinction between a multiply and the CORDIC technique is
bogus. CORDIC is an iterative process including all the operations that
make up a multiply. The only difference is that in many cases there is
hardware available that facilitates execution of generic multiplies
while the CORDIC must be implemented in detail in every case.

The latency is sometimes an issue as well.

There are still some places where they make sense, though.

Care to explain?

--

Rick

Eric Jacobsen · Dec 26, 2014

On Thu, 25 Dec 2014 11:56:15 -0500, rickman <gnuarm@gmail.com> wrote:

On 12/25/2014 10:52 AM, Eric Jacobsen wrote:
On Tue, 23 Dec 2014 18:10:43 -0500, rickman <gnuarm@gmail.com> wrote:

On 12/23/2014 4:48 PM, Eric Jacobsen wrote:
On Tue, 23 Dec 2014 11:06:39 -0500, rickman <gnuarm@gmail.com> wrote:

On 12/23/2014 10:35 AM, Eric Jacobsen wrote:

Many applications don't need separate LUTs to get the required
performance, and even then, or even in the case of complex output, it
can be done without multipliers.

Care to elaborate on this? I'm not at all clear on how you make a LUT
based NCO without LUTs and unless you are using a very coarse
approximation, without multipliers.

Not sure what you're asking. You a need a LUT, but just one in many
cases. Having a dual-ported single LUT is easy in an FPGA and
usually in silicon as well.

What makes a multiplier necessary? I've never found the need, but my
apps are limited to comm.

Maybe we aren't on the same page. The multiplier is there for the fine
adjustment. If you are happy with some -60 or -80 dB spurs one LUT is
fine. But if you want better performance the single LUT approach
requires *very* large tables.

There are a lot of tricks that can be used to keep the table size
down. I've mentioned one already.

And what was that? You have made some 20 or more posts in this thread,
I don't feel like weeding through all of them to find this. Reading
back through this thread it seems like your posts are intended to be
mysterious rather than informative. Every one leaves enough unsaid that
more questions are needed.

I can't divulge trade secrets or proprietary information that doesn't
belong to me. I can, however, hint in directions of benefit. Take
it or leave it.

Eric Jacobsen
Anchor Hill Communications
http://www.anchorhill.com

Eric Jacobsen · Dec 26, 2014

On Thu, 25 Dec 2014 12:02:57 -0500, rickman <gnuarm@gmail.com> wrote:

On 12/25/2014 10:55 AM, Eric Jacobsen wrote:
On Wed, 24 Dec 2014 01:24:42 -0700, Rob Doyle <radioengr@gmail.com
wrote:

I agree that the CORDIC has the same complexity as a multiply. I agree
that table-based algorithms using multipliers use less FPGA fabric.

I was simply pointing out that there might be places where a CORDIC has
advantages over LUT-based NCOs.

Especially if have ROM or multiplier limitations.

I also wanted to point out that if you need to do a 20-bit (using your
120dB example) complex downconversion for example, the CORDIC still
requires zero multipliers.

If you want to do a 20-bit complex downconversion using a table-based
NCO followed by a complex mixer, you might need a *lot* of multipliers.
If you only have an 18-bit multiplier, each multiplication requires
(maybe up to) 4 multiplier blocks and you need 8 multiplications.

I also /suspect/ that for any given device technology the CORDIC will
execute at higher speeds.

Thats all...

That's been my experience; that if multipliers are scarce or too
expensive, or memory is scarce or too expensive, then a CORDIC is a
nice back-up option. These days multipliers and memory are both
plentiful in most platforms, so CORDICs just aren't as useful as they
used to be.

I think the distinction between a multiply and the CORDIC technique is
bogus. CORDIC is an iterative process including all the operations that
make up a multiply. The only difference is that in many cases there is
hardware available that facilitates execution of generic multiplies
while the CORDIC must be implemented in detail in every case.

In the past (some of it long ago) when we did tradeoffs on using a
CORDIC or an NCO, or a CORDIC or a complex mix implemented with
multipliers, it comes down to resource availability. If multipliers
are available (either in FPGA fabric or as a module in silicon), then
a mixer is generally much more efficient with multipliers. If the
memory is available, then a LUT with a phase accumulator is hard to
beat for a numeric oscillator. The latency may also tilt the
tradeoff further away from the CORDIC.

They certainly have their place, but those places have gotten more
limited as silicon resources get cheaper.

The latency is sometimes an issue as well.

There are still some places where they make sense, though.

Care to explain?

--

Rick

Eric Jacobsen
Anchor Hill Communications
http://www.anchorhill.com

rickman · Dec 26, 2014

On 12/25/2014 3:56 PM, Eric Jacobsen wrote:

On Thu, 25 Dec 2014 11:56:15 -0500, rickman <gnuarm@gmail.com> wrote:

On 12/25/2014 10:52 AM, Eric Jacobsen wrote:
On Tue, 23 Dec 2014 18:10:43 -0500, rickman <gnuarm@gmail.com> wrote:

On 12/23/2014 4:48 PM, Eric Jacobsen wrote:
On Tue, 23 Dec 2014 11:06:39 -0500, rickman <gnuarm@gmail.com> wrote:

On 12/23/2014 10:35 AM, Eric Jacobsen wrote:

Many applications don't need separate LUTs to get the required
performance, and even then, or even in the case of complex output, it
can be done without multipliers.

Care to elaborate on this? I'm not at all clear on how you make a LUT
based NCO without LUTs and unless you are using a very coarse
approximation, without multipliers.

Not sure what you're asking. You a need a LUT, but just one in many
cases. Having a dual-ported single LUT is easy in an FPGA and
usually in silicon as well.

What makes a multiplier necessary? I've never found the need, but my
apps are limited to comm.

Maybe we aren't on the same page. The multiplier is there for the fine
adjustment. If you are happy with some -60 or -80 dB spurs one LUT is
fine. But if you want better performance the single LUT approach
requires *very* large tables.

There are a lot of tricks that can be used to keep the table size
down. I've mentioned one already.

And what was that? You have made some 20 or more posts in this thread,
I don't feel like weeding through all of them to find this. Reading
back through this thread it seems like your posts are intended to be
mysterious rather than informative. Every one leaves enough unsaid that
more questions are needed.

I can't divulge trade secrets or proprietary information that doesn't
belong to me. I can, however, hint in directions of benefit. Take
it or leave it.

I have no idea what you are talking about. If you don't have anything
to say, why are you bothering to post? I don't even recall the hints.
Or are you forbidden from pointing out what those are?

You said you had already mentioned a way to reduce table size. What was
that?

--

Rick

rickman · Dec 26, 2014

On 12/25/2014 4:01 PM, Eric Jacobsen wrote:

On Thu, 25 Dec 2014 12:02:57 -0500, rickman <gnuarm@gmail.com> wrote:

On 12/25/2014 10:55 AM, Eric Jacobsen wrote:
On Wed, 24 Dec 2014 01:24:42 -0700, Rob Doyle <radioengr@gmail.com
wrote:

I agree that the CORDIC has the same complexity as a multiply. I agree
that table-based algorithms using multipliers use less FPGA fabric.

I was simply pointing out that there might be places where a CORDIC has
advantages over LUT-based NCOs.

Especially if have ROM or multiplier limitations.

I also wanted to point out that if you need to do a 20-bit (using your
120dB example) complex downconversion for example, the CORDIC still
requires zero multipliers.

If you want to do a 20-bit complex downconversion using a table-based
NCO followed by a complex mixer, you might need a *lot* of multipliers.
If you only have an 18-bit multiplier, each multiplication requires
(maybe up to) 4 multiplier blocks and you need 8 multiplications.

I also /suspect/ that for any given device technology the CORDIC will
execute at higher speeds.

Thats all...

That's been my experience; that if multipliers are scarce or too
expensive, or memory is scarce or too expensive, then a CORDIC is a
nice back-up option. These days multipliers and memory are both
plentiful in most platforms, so CORDICs just aren't as useful as they
used to be.

I think the distinction between a multiply and the CORDIC technique is
bogus. CORDIC is an iterative process including all the operations that
make up a multiply. The only difference is that in many cases there is
hardware available that facilitates execution of generic multiplies
while the CORDIC must be implemented in detail in every case.

In the past (some of it long ago) when we did tradeoffs on using a
CORDIC or an NCO, or a CORDIC or a complex mix implemented with
multipliers, it comes down to resource availability. If multipliers
are available (either in FPGA fabric or as a module in silicon), then
a mixer is generally much more efficient with multipliers. If the
memory is available, then a LUT with a phase accumulator is hard to
beat for a numeric oscillator. The latency may also tilt the
tradeoff further away from the CORDIC.

They certainly have their place, but those places have gotten more
limited as silicon resources get cheaper.

Let's assume there is no multiplier blocks and no LUTs. Now how is the
CORDIC better than using a multiplies? Just like the CORDIC the
multiplies can be done iteratively using virtually the same logic.
Speed, in most cases, will be determined by the carry chain in the adder
so speed should be about the same.

The latency is sometimes an issue as well.

There are still some places where they make sense, though.

Care to explain?

So where are the places where the CORDIC makes sense?

--

Rick

Eric Jacobsen · Dec 26, 2014

On Thu, 25 Dec 2014 16:08:45 -0500, rickman <gnuarm@gmail.com> wrote:

On 12/25/2014 3:56 PM, Eric Jacobsen wrote:
On Thu, 25 Dec 2014 11:56:15 -0500, rickman <gnuarm@gmail.com> wrote:

On 12/25/2014 10:52 AM, Eric Jacobsen wrote:
On Tue, 23 Dec 2014 18:10:43 -0500, rickman <gnuarm@gmail.com> wrote:

On 12/23/2014 4:48 PM, Eric Jacobsen wrote:
On Tue, 23 Dec 2014 11:06:39 -0500, rickman <gnuarm@gmail.com> wrote:

On 12/23/2014 10:35 AM, Eric Jacobsen wrote:

Many applications don't need separate LUTs to get the required
performance, and even then, or even in the case of complex output, it
can be done without multipliers.

Care to elaborate on this? I'm not at all clear on how you make a LUT
based NCO without LUTs and unless you are using a very coarse
approximation, without multipliers.

Not sure what you're asking. You a need a LUT, but just one in many
cases. Having a dual-ported single LUT is easy in an FPGA and
usually in silicon as well.

What makes a multiplier necessary? I've never found the need, but my
apps are limited to comm.

Maybe we aren't on the same page. The multiplier is there for the fine
adjustment. If you are happy with some -60 or -80 dB spurs one LUT is
fine. But if you want better performance the single LUT approach
requires *very* large tables.

There are a lot of tricks that can be used to keep the table size
down. I've mentioned one already.

And what was that? You have made some 20 or more posts in this thread,
I don't feel like weeding through all of them to find this. Reading
back through this thread it seems like your posts are intended to be
mysterious rather than informative. Every one leaves enough unsaid that
more questions are needed.

I can't divulge trade secrets or proprietary information that doesn't
belong to me. I can, however, hint in directions of benefit. Take
it or leave it.

I have no idea what you are talking about. If you don't have anything
to say, why are you bothering to post?

Why do you care whether I post or not? Feel free to put me in your
kill file if you don't like my posts.

I don't even recall the hints.
Or are you forbidden from pointing out what those are?

No, but it seems to me like unnecessary duplication. There are lots
of hints from multiple people scattered throughout the thread, as well
as in some of the literature mentioned previously.

You said you had already mentioned a way to reduce table size. What was
that?

One way is to store a quarter wave instead of a full cycle. I think
that was mentioned more than once, but here it is again just for you.

--

Rick

Eric Jacobsen
Anchor Hill Communications
http://www.anchorhill.com

Eric Jacobsen · Dec 26, 2014

On Thu, 25 Dec 2014 16:13:15 -0500, rickman <gnuarm@gmail.com> wrote:

On 12/25/2014 4:01 PM, Eric Jacobsen wrote:
On Thu, 25 Dec 2014 12:02:57 -0500, rickman <gnuarm@gmail.com> wrote:

On 12/25/2014 10:55 AM, Eric Jacobsen wrote:
On Wed, 24 Dec 2014 01:24:42 -0700, Rob Doyle <radioengr@gmail.com
wrote:

I agree that the CORDIC has the same complexity as a multiply. I agree
that table-based algorithms using multipliers use less FPGA fabric.

I was simply pointing out that there might be places where a CORDIC has
advantages over LUT-based NCOs.

Especially if have ROM or multiplier limitations.

I also wanted to point out that if you need to do a 20-bit (using your
120dB example) complex downconversion for example, the CORDIC still
requires zero multipliers.

If you want to do a 20-bit complex downconversion using a table-based
NCO followed by a complex mixer, you might need a *lot* of multipliers.
If you only have an 18-bit multiplier, each multiplication requires
(maybe up to) 4 multiplier blocks and you need 8 multiplications.

I also /suspect/ that for any given device technology the CORDIC will
execute at higher speeds.

Thats all...

That's been my experience; that if multipliers are scarce or too
expensive, or memory is scarce or too expensive, then a CORDIC is a
nice back-up option. These days multipliers and memory are both
plentiful in most platforms, so CORDICs just aren't as useful as they
used to be.

I think the distinction between a multiply and the CORDIC technique is
bogus. CORDIC is an iterative process including all the operations that
make up a multiply. The only difference is that in many cases there is
hardware available that facilitates execution of generic multiplies
while the CORDIC must be implemented in detail in every case.

In the past (some of it long ago) when we did tradeoffs on using a
CORDIC or an NCO, or a CORDIC or a complex mix implemented with
multipliers, it comes down to resource availability. If multipliers
are available (either in FPGA fabric or as a module in silicon), then
a mixer is generally much more efficient with multipliers. If the
memory is available, then a LUT with a phase accumulator is hard to
beat for a numeric oscillator. The latency may also tilt the
tradeoff further away from the CORDIC.

They certainly have their place, but those places have gotten more
limited as silicon resources get cheaper.

Let's assume there is no multiplier blocks and no LUTs. Now how is the
CORDIC better than using a multiplies? Just like the CORDIC the
multiplies can be done iteratively using virtually the same logic.
Speed, in most cases, will be determined by the carry chain in the adder
so speed should be about the same.

It could be there isn't much difference, which means if you have a
CORDIC laying around, there may not be a reason to not use it.

The latency is sometimes an issue as well.

There are still some places where they make sense, though.

Care to explain?

So where are the places where the CORDIC makes sense?

--

Rick

Eric Jacobsen
Anchor Hill Communications
http://www.anchorhill.com

rickman · Dec 26, 2014

On 12/25/2014 4:32 PM, Eric Jacobsen wrote:

On Thu, 25 Dec 2014 16:08:45 -0500, rickman <gnuarm@gmail.com> wrote:

On 12/25/2014 3:56 PM, Eric Jacobsen wrote:
On Thu, 25 Dec 2014 11:56:15 -0500, rickman <gnuarm@gmail.com> wrote:

On 12/25/2014 10:52 AM, Eric Jacobsen wrote:
On Tue, 23 Dec 2014 18:10:43 -0500, rickman <gnuarm@gmail.com> wrote:

On 12/23/2014 4:48 PM, Eric Jacobsen wrote:
On Tue, 23 Dec 2014 11:06:39 -0500, rickman <gnuarm@gmail.com> wrote:

On 12/23/2014 10:35 AM, Eric Jacobsen wrote:

Many applications don't need separate LUTs to get the required
performance, and even then, or even in the case of complex output, it
can be done without multipliers.

Care to elaborate on this? I'm not at all clear on how you make a LUT
based NCO without LUTs and unless you are using a very coarse
approximation, without multipliers.

Not sure what you're asking. You a need a LUT, but just one in many
cases. Having a dual-ported single LUT is easy in an FPGA and
usually in silicon as well.

What makes a multiplier necessary? I've never found the need, but my
apps are limited to comm.

Maybe we aren't on the same page. The multiplier is there for the fine
adjustment. If you are happy with some -60 or -80 dB spurs one LUT is
fine. But if you want better performance the single LUT approach
requires *very* large tables.

There are a lot of tricks that can be used to keep the table size
down. I've mentioned one already.

And what was that? You have made some 20 or more posts in this thread,
I don't feel like weeding through all of them to find this. Reading
back through this thread it seems like your posts are intended to be
mysterious rather than informative. Every one leaves enough unsaid that
more questions are needed.

I can't divulge trade secrets or proprietary information that doesn't
belong to me. I can, however, hint in directions of benefit. Take
it or leave it.

I have no idea what you are talking about. If you don't have anything
to say, why are you bothering to post?

Why do you care whether I post or not? Feel free to put me in your
kill file if you don't like my posts.

If you aren't interested in having a conversation, why do you bother to
type? Above you said you had already mentioned "one" method already.
Clearly that one is not a trade secret. Care to explain what method you
are referring to?

I don't even recall the hints.
Or are you forbidden from pointing out what those are?

No, but it seems to me like unnecessary duplication. There are lots
of hints from multiple people scattered throughout the thread, as well
as in some of the literature mentioned previously.

Exactly, scattered in some 50 or so messages. If you have something to
say, why no say it instead of being so vague? Just tell me which
message you are referring to.

You said you had already mentioned a way to reduce table size. What was
that?

One way is to store a quarter wave instead of a full cycle. I think
that was mentioned more than once, but here it is again just for you.

Thank you for the response.

Yes, that is table reduction 101. Anyone other than a newbie is aware
of that. I believe *I* was the one who in this thread pointed it out to
someone who said memory is cheap not fully appreciating that memory is
order 2^N is size. Even so it is just a factor of four and does nothing
to change the fact that memory is anything but cheap if you are looking
for high resolution and low distortion. Using MBs of memory to store a
LUT is usually not a good trade off.

What was the technique *you* mentioned as you indicate above?

--

Rick

Eric Jacobsen · Dec 26, 2014

On Thu, 25 Dec 2014 18:08:54 -0500, rickman <gnuarm@gmail.com> wrote:

On 12/25/2014 4:32 PM, Eric Jacobsen wrote:
On Thu, 25 Dec 2014 16:08:45 -0500, rickman <gnuarm@gmail.com> wrote:

On 12/25/2014 3:56 PM, Eric Jacobsen wrote:
On Thu, 25 Dec 2014 11:56:15 -0500, rickman <gnuarm@gmail.com> wrote:

On 12/25/2014 10:52 AM, Eric Jacobsen wrote:
On Tue, 23 Dec 2014 18:10:43 -0500, rickman <gnuarm@gmail.com> wrote:

On 12/23/2014 4:48 PM, Eric Jacobsen wrote:
On Tue, 23 Dec 2014 11:06:39 -0500, rickman <gnuarm@gmail.com> wrote:

On 12/23/2014 10:35 AM, Eric Jacobsen wrote:

Many applications don't need separate LUTs to get the required
performance, and even then, or even in the case of complex output, it
can be done without multipliers.

Care to elaborate on this? I'm not at all clear on how you make a LUT
based NCO without LUTs and unless you are using a very coarse
approximation, without multipliers.

Not sure what you're asking. You a need a LUT, but just one in many
cases. Having a dual-ported single LUT is easy in an FPGA and
usually in silicon as well.

What makes a multiplier necessary? I've never found the need, but my
apps are limited to comm.

Maybe we aren't on the same page. The multiplier is there for the fine
adjustment. If you are happy with some -60 or -80 dB spurs one LUT is
fine. But if you want better performance the single LUT approach
requires *very* large tables.

There are a lot of tricks that can be used to keep the table size
down. I've mentioned one already.

And what was that? You have made some 20 or more posts in this thread,
I don't feel like weeding through all of them to find this. Reading
back through this thread it seems like your posts are intended to be
mysterious rather than informative. Every one leaves enough unsaid that
more questions are needed.

I can't divulge trade secrets or proprietary information that doesn't
belong to me. I can, however, hint in directions of benefit. Take
it or leave it.

I have no idea what you are talking about. If you don't have anything
to say, why are you bothering to post?

Why do you care whether I post or not? Feel free to put me in your
kill file if you don't like my posts.

If you aren't interested in having a conversation, why do you bother to
type? Above you said you had already mentioned "one" method already.
Clearly that one is not a trade secret. Care to explain what method you
are referring to?

I did later in the same post.

I don't even recall the hints.
Or are you forbidden from pointing out what those are?

No, but it seems to me like unnecessary duplication. There are lots
of hints from multiple people scattered throughout the thread, as well
as in some of the literature mentioned previously.

Exactly, scattered in some 50 or so messages. If you have something to
say, why no say it instead of being so vague? Just tell me which
message you are referring to.

So you want me to go back and search through the thread for you? Are
you not capable of doing that? I'm not at all clear why you think
that I should do the search if you're the one that wants the
information.

You said you had already mentioned a way to reduce table size. What was
that?

One way is to store a quarter wave instead of a full cycle. I think
that was mentioned more than once, but here it is again just for you.

Thank you for the response.

Yes, that is table reduction 101. Anyone other than a newbie is aware
of that. I believe *I* was the one who in this thread pointed it out to
someone who said memory is cheap not fully appreciating that memory is
order 2^N is size. Even so it is just a factor of four and does nothing
to change the fact that memory is anything but cheap if you are looking
for high resolution and low distortion. Using MBs of memory to store a
LUT is usually not a good trade off.

What was the technique *you* mentioned as you indicate above?

That was one of them. I mentioned it more than once. In my
experience a 4x reduction in memory can be significant, and the
quarter-wave trick isn't obvious to some people so I didn't make the
assumption that it was.

You're welcome.

Eric Jacobsen
Anchor Hill Communications
http://www.anchorhill.com

Spectral Purity Measurement

rickman

Guest

robert bristow-johnson

Guest

rickman

Guest

robert bristow-johnson

Guest

rickman

Guest

rickman

Guest

glen herrmannsfeldt

Guest

Eric Jacobsen

Guest

Eric Jacobsen

Guest

Eric Jacobsen

Guest

rickman

Guest

rickman

Guest

Eric Jacobsen

Guest

Eric Jacobsen

Guest

rickman

Guest

rickman

Guest

Eric Jacobsen

Guest

Eric Jacobsen

Guest

rickman

Guest

Eric Jacobsen

Guest

Log in

Welcome to EDABoard.com

Sponsor