EDAboard.com | EDAboard.eu | EDAboard.de | EDAboard.co.uk | RTV forum PL | NewsGroups PL

Q on "average"

Ask a question - edaboard.com

elektroda.net NewsGroups Forum Index - Electronics Design - Q on "average"

Goto page Previous  1, 2

Robert Baer
Guest

Thu Jan 19, 2012 2:42 am   



Martin Brown wrote:
Quote:
Don Y wrote:
Hi Fred,

On 1/17/2012 1:10 PM, Fred Bloggs wrote:
On Jan 17, 12:36 pm, Tim Wescott<t...@seemywebsite.com> wrote:
On Tue, 17 Jan 2012 01:57:54 -0800, Robert Baer
robertb...@localnet.com> wrote:

Keywords: average, mean, RMS.
Take N samples of a "noisy" level.
Typical scheme seems to be sum(readings)/N. Now, let N be odd,
then
sort the data and pick the middle value. A minimum seems to be
7 but
i prefer 11 as a minimum sample count.

A few outliers have little effect on the result - no matter how
outrageous they are; cannot say the same for any of the other
methods.

Is there any test or paper that supports this oddball scheme?

Take N samples of a signal whose noise is exactly equal to 99 1% of the
time, and -1 99% of the time.

Do your median scheme.

Do you see a problem?

The problem is that example has no median.

No, for N >= 1 there is always an average, median, mode, etc.
(though I admit the mode could be ambiguous in some cases!)

The problem is that those may not be good measures of central tendency
for a given population/sample set.

You have to think about what you want the measure to *tell* you.
And, how it can be biased in a population.

One of the relevant ones to electronics is MTBF for a filament lamp.
(and other units subject to infant mortality and limited service life)

Hardly any filament lamps actually fail at their nominal MTBF. A cohort
of a few percent die very quickly soon after being fitted and the rest
tend to live to a ripe old age. If you swap lamps out too soon to avoid
having any in service failures you actually increase the risk of
downtime due to the replacement dying suddenly during its burn in phase.

The average in this case significantly underestimates the working life
of lamps that survive the first few days - the median is usually a far
better estimator where distributions are skewed or subject to outliers.

If you have the set of measured values it is as well to plot the
histogram of their distribution to avoid surprises. Crunching it down to
a single number can easily lead you astray.

E.g., the median home value and average home value in any
particular neighborhood *tend* to be pretty close. (homes
tend to have similar values within a neighborhood... odd to
find 7,000 sq ft palaces alongside 600 sq ft "shacks")

The average salary of people riding a *bus* may closely
correlate with the median of that same population -- on
*some* busses! On others, it may be wildly different!

Median and L1-Norm model fitting is generally preferable when the data
are extremely noisy and subject to serious interference and outliers. It
wasn't popular in the past because apart from a handful of special cases
it is much harded to compute. But these days with almost unlimited
computing power on your desktop it is not a real problem.

The L2-Norm (aka least squares fit) tends to be dominated by fitting the
largest values whether or not they are accurate. Pulse counting devices
have systematic errors from dead time at high count rates and so can
warp the calibration of otherwise highly linear instruments.

Regards,
Martin Brown
"Harder to compute"????

What is so hard to sort a pile of numbers?
If the quantity is small, a stupid bubble sort is fast and symple and
adding one interchange flag improves it a lot.
And the merge-sort is simple, and almost the fastest of them all in
almost all cases; again, adding one interchange flag improves it as well.
*
Then again, you never written a program to multiply two multi-million
digit numbers together while managing memory allocation to prevent
memory leaks in OS/2 (which _loved_ to create them if you did not
actively and mercilessly prevent their possibility).

Robert Baer
Guest

Thu Jan 19, 2012 2:44 am   



Fred Bloggs wrote:
Quote:
On Jan 18, 3:56 am, Martin Brown <|||newspam...@nezumi.demon.co.uk
wrote:
Don Y wrote:
Hi Fred,
On 1/17/2012 1:10 PM, Fred Bloggs wrote:
On Jan 17, 12:36 pm, Tim Wescott<t...@seemywebsite.com> wrote:
On Tue, 17 Jan 2012 01:57:54 -0800, Robert Baer
robertb...@localnet.com> wrote:
Keywords: average, mean, RMS.
Take N samples of a "noisy" level.
Typical scheme seems to be sum(readings)/N. Now, let N be odd, then
sort the data and pick the middle value. A minimum seems to be 7
but
i prefer 11 as a minimum sample count.
A few outliers have little effect on the result - no matter how
outrageous they are; cannot say the same for any of the other methods.
Is there any test or paper that supports this oddball scheme?
Take N samples of a signal whose noise is exactly equal to 99 1% of the
time, and -1 99% of the time.
Do your median scheme.
Do you see a problem?
The problem is that example has no median.
No, for N >= 1 there is always an average, median, mode, etc.
(though I admit the mode could be ambiguous in some cases!)
The problem is that those may not be good measures of central tendency
for a given population/sample set.
You have to think about what you want the measure to *tell* you.
And, how it can be biased in a population.
One of the relevant ones to electronics is MTBF for a filament lamp.
(and other units subject to infant mortality and limited service life)

Hardly any filament lamps actually fail at their nominal MTBF. A cohort
of a few percent die very quickly soon after being fitted and the rest
tend to live to a ripe old age. If you swap lamps out too soon to avoid
having any in service failures you actually increase the risk of
downtime due to the replacement dying suddenly during its burn in phase.

The average in this case significantly underestimates the working life
of lamps that survive the first few days - the median is usually a far
better estimator where distributions are skewed or subject to outliers.

Not sure if that is really good example because in the case
reliability statistics what you have is a time dependent failure rate
parameter throughout the burn-in phase. In other words, the
distribution itself is changing. Once past the burn-in phase, the
distribution steady-states to a constant failure rate exponential
distribution, in the simplest case, until the wear-out phase wherein
the failure rate becomes time-dependent again, i.e. the distribution
changes once more. The exponential might look skewed, but in the case
of the constant failure rate exponential, the MTBF does coincide with
the 50% point of the cumulative and is therefore also the median. So
RB's technique of using a sample median as an approximation of
population mean is valid and will converge.

If you have the set of measured values it is as well to plot the
histogram of their distribution to avoid surprises. Crunching it down to
a single number can easily lead you astray.

E.g., the median home value and average home value in any
particular neighborhood *tend* to be pretty close. (homes
tend to have similar values within a neighborhood... odd to
find 7,000 sq ft palaces alongside 600 sq ft "shacks")
The average salary of people riding a *bus* may closely
correlate with the median of that same population -- on
*some* busses! On others, it may be wildly different!
Median and L1-Norm model fitting is generally preferable when the data
are extremely noisy and subject to serious interference and outliers. It
wasn't popular in the past because apart from a handful of special cases
it is much harded to compute. But these days with almost unlimited
computing power on your desktop it is not a real problem.

The L2-Norm (aka least squares fit) tends to be dominated by fitting the
largest values whether or not they are accurate. Pulse counting devices
have systematic errors from dead time at high count rates and so can
warp the calibration of otherwise highly linear instruments.

Regards,
Martin Brown- Hide quoted text -

- Show quoted text -

Sounds like you are talking about the Weibull failure rate curve aka

"bathtub" curve.

Don Y
Guest

Thu Jan 19, 2012 4:09 am   



Hi Robert,

On 1/18/2012 6:42 PM, Robert Baer wrote:

Quote:
"Harder to compute"????

While you copied the entire previous post, I was unable to see
a reference to "harder to compute" anywhere, therein.

Be that as it may...

Quote:
What is so hard to sort a pile of numbers?
If the quantity is small, a stupid bubble sort is fast and symple and
adding one interchange flag improves it a lot.
And the merge-sort is simple, and almost the fastest of them all in
almost all cases; again, adding one interchange flag improves it as well.

It is more expensive to find the median of an arbitrary list
of values than it is to find the mean. To find the median,
you ned to track more state than to find the mean.

E.g., with just two (suitably wide) "registers", you can compute
the mean of an arbitrarily long stream of values -- without ever
needing to RE-examine a value once it has been "tabulated".

In terms of computational power, the mean requires one addition per
datum. By comparison, any sort performs "at least one" comparison
on each datum (compare ~= subtraction ~= addition). Plus, of course,
the added "state".

*Look* at your algorithm and count operators/registers. You'll
find that it's simple, but arithmetic mean is "simpler". THink of
how you would process a list of 2^32+1 individual values
(or 2^64+1, etc.).

Martin Brown
Guest

Thu Jan 19, 2012 9:56 am   



Robert Baer wrote:
Quote:
Martin Brown wrote:

Median and L1-Norm model fitting is generally preferable when the data
are extremely noisy and subject to serious interference and outliers.
It wasn't popular in the past because apart from a handful of special
cases it is much harded to compute. But these days with almost
unlimited computing power on your desktop it is not a real problem.

The L2-Norm (aka least squares fit) tends to be dominated by fitting
the largest values whether or not they are accurate. Pulse counting
devices have systematic errors from dead time at high count rates and
so can warp the calibration of otherwise highly linear instruments.


"Harder to compute"????
What is so hard to sort a pile of numbers?

That is one of the handful of special cases. But it is still O(NlogN) vs
O(N). If you only want the median value you can do it in O(Nlog(log(N)))
or O(N^(2/3)logN) - see Knuth Sorting & Searching for details.

Another is for a robust L1-Norm linear fit of data where a closed form
solution exists. But anything more than that and you are into much more
complex optimisation codes to solve a non-linear fitting problem

minimise sum_i ABS( data[i] - model[i])

One way it is done in practice is to solve instead

minimise sum_i lim e->0 sqrt( (data[i]-model[i])^2 + e)

For various decreasing values of e and extrapolate to e = 0

Quote:
If the quantity is small, a stupid bubble sort is fast and symple and
adding one interchange flag improves it a lot.
And the merge-sort is simple, and almost the fastest of them all in
almost all cases; again, adding one interchange flag improves it as well.

You really are out of the stone age aren't you. Compared to the mean
which is an O(N) process a moronic bubble sort is O(N^2) and no-one with
any sort of clue would ever recommend using it.

Heapsort and quicksort manage O(NlogN) which is respectable but still
much slower than a single fast pass over the data for the mean and SD.

Regards,
Martin Brown

Robert Baer
Guest

Sat Jan 21, 2012 8:57 am   



Martin Brown wrote:
Quote:
Robert Baer wrote:
Martin Brown wrote:

Median and L1-Norm model fitting is generally preferable when the
data are extremely noisy and subject to serious interference and
outliers. It wasn't popular in the past because apart from a handful
of special cases it is much harded to compute. But these days with
almost unlimited computing power on your desktop it is not a real
problem.

The L2-Norm (aka least squares fit) tends to be dominated by fitting
the largest values whether or not they are accurate. Pulse counting
devices have systematic errors from dead time at high count rates and
so can warp the calibration of otherwise highly linear instruments.


"Harder to compute"????
What is so hard to sort a pile of numbers?

That is one of the handful of special cases. But it is still O(NlogN) vs
O(N). If you only want the median value you can do it in O(Nlog(log(N)))
or O(N^(2/3)logN) - see Knuth Sorting & Searching for details.

Another is for a robust L1-Norm linear fit of data where a closed form
solution exists. But anything more than that and you are into much more
complex optimisation codes to solve a non-linear fitting problem

minimise sum_i ABS( data[i] - model[i])

One way it is done in practice is to solve instead

minimise sum_i lim e->0 sqrt( (data[i]-model[i])^2 + e)

For various decreasing values of e and extrapolate to e = 0

If the quantity is small, a stupid bubble sort is fast and symple
and adding one interchange flag improves it a lot.
And the merge-sort is simple, and almost the fastest of them all in
almost all cases; again, adding one interchange flag improves it as well.

You really are out of the stone age aren't you. Compared to the mean
which is an O(N) process a moronic bubble sort is O(N^2) and no-one with
any sort of clue would ever recommend using it.

Heapsort and quicksort manage O(NlogN) which is respectable but still
much slower than a single fast pass over the data for the mean and SD.

Regards,
Martin Brown
I know about bubble sort...remember i indicated a *small* group of

umbers, and i said an interchange flag improved it (no longer n^2).
Merge sort is, i think another name for quicksort.

josephkk
Guest

Mon Jan 23, 2012 6:26 am   



On Fri, 20 Jan 2012 23:57:39 -0800, Robert Baer <robertbaer_at_localnet.com>
wrote:

Quote:
Martin Brown wrote:
Robert Baer wrote:
Martin Brown wrote:

Median and L1-Norm model fitting is generally preferable when the
data are extremely noisy and subject to serious interference and
outliers. It wasn't popular in the past because apart from a handful
of special cases it is much harded to compute. But these days with
almost unlimited computing power on your desktop it is not a real
problem.

The L2-Norm (aka least squares fit) tends to be dominated by fitting
the largest values whether or not they are accurate. Pulse counting
devices have systematic errors from dead time at high count rates and
so can warp the calibration of otherwise highly linear instruments.


"Harder to compute"????
What is so hard to sort a pile of numbers?

That is one of the handful of special cases. But it is still O(NlogN) vs
O(N). If you only want the median value you can do it in O(Nlog(log(N)))
or O(N^(2/3)logN) - see Knuth Sorting & Searching for details.

Another is for a robust L1-Norm linear fit of data where a closed form
solution exists. But anything more than that and you are into much more
complex optimisation codes to solve a non-linear fitting problem

minimise sum_i ABS( data[i] - model[i])

One way it is done in practice is to solve instead

minimise sum_i lim e->0 sqrt( (data[i]-model[i])^2 + e)

For various decreasing values of e and extrapolate to e = 0

If the quantity is small, a stupid bubble sort is fast and symple
and adding one interchange flag improves it a lot.
And the merge-sort is simple, and almost the fastest of them all in
almost all cases; again, adding one interchange flag improves it as well.

You really are out of the stone age aren't you. Compared to the mean
which is an O(N) process a moronic bubble sort is O(N^2) and no-one with
any sort of clue would ever recommend using it.

Heapsort and quicksort manage O(NlogN) which is respectable but still
much slower than a single fast pass over the data for the mean and SD.

Regards,
Martin Brown
I know about bubble sort...remember i indicated a *small* group of
umbers, and i said an interchange flag improved it (no longer n^2).
Merge sort is, i think another name for quicksort.

No. It is not. Quicksort is the flag enhanced bubble sort. Faster than
O(n^2) but not as fast as any of several O(n * log(n)) sorts. There is
about 4 of them. Merge sort and shaker sort are two of them

Got an A+ on that project in school, all of the canonical sorts are in
CAlgo published by ACM. There are about 7 of them. That is how many i
used in the assignment.

?-)

Phil Hobbs
Guest

Mon Jan 23, 2012 7:45 am   



josephkk wrote:
Quote:

On Fri, 20 Jan 2012 23:57:39 -0800, Robert Baer <robertbaer_at_localnet.com
wrote:

Martin Brown wrote:
Robert Baer wrote:
Martin Brown wrote:

Median and L1-Norm model fitting is generally preferable when the
data are extremely noisy and subject to serious interference and
outliers. It wasn't popular in the past because apart from a handful
of special cases it is much harded to compute. But these days with
almost unlimited computing power on your desktop it is not a real
problem.

The L2-Norm (aka least squares fit) tends to be dominated by fitting
the largest values whether or not they are accurate. Pulse counting
devices have systematic errors from dead time at high count rates and
so can warp the calibration of otherwise highly linear instruments.


"Harder to compute"????
What is so hard to sort a pile of numbers?

That is one of the handful of special cases. But it is still O(NlogN) vs
O(N). If you only want the median value you can do it in O(Nlog(log(N)))
or O(N^(2/3)logN) - see Knuth Sorting & Searching for details.

Another is for a robust L1-Norm linear fit of data where a closed form
solution exists. But anything more than that and you are into much more
complex optimisation codes to solve a non-linear fitting problem

minimise sum_i ABS( data[i] - model[i])

One way it is done in practice is to solve instead

minimise sum_i lim e->0 sqrt( (data[i]-model[i])^2 + e)

For various decreasing values of e and extrapolate to e = 0

If the quantity is small, a stupid bubble sort is fast and symple
and adding one interchange flag improves it a lot.
And the merge-sort is simple, and almost the fastest of them all in
almost all cases; again, adding one interchange flag improves it as well.

You really are out of the stone age aren't you. Compared to the mean
which is an O(N) process a moronic bubble sort is O(N^2) and no-one with
any sort of clue would ever recommend using it.

Heapsort and quicksort manage O(NlogN) which is respectable but still
much slower than a single fast pass over the data for the mean and SD.

Regards,
Martin Brown
I know about bubble sort...remember i indicated a *small* group of
umbers, and i said an interchange flag improved it (no longer n^2).
Merge sort is, i think another name for quicksort.

No. It is not. Quicksort is the flag enhanced bubble sort. Faster than
O(n^2) but not as fast as any of several O(n * log(n)) sorts. There is
about 4 of them. Merge sort and shaker sort are two of them

Got an A+ on that project in school, all of the canonical sorts are in
CAlgo published by ACM. There are about 7 of them. That is how many i
used in the assignment.

?-)

You're perhaps confusing Shell's method with quicksort. Shell's method
is O(n**4/3) or something like that, and is often the fastest algorithm
on moderate-sized arrays. Quicksort and heapsort are the two classical
n*log(n) methods--on average. Vanilla quicksort is actually an n**2
method if you try sorting an already-sorted list!

Cheers

Phil Hobbs
--
Dr Philip C D Hobbs
Principal Consultant
ElectroOptical Innovations LLC
Optics, Electro-optics, Photonics, Analog Electronics

160 North State Road #203
Briarcliff Manor NY 10510
845-480-2058

hobbs at electrooptical dot net
http://electrooptical.net

josephkk
Guest

Tue Jan 24, 2012 5:17 am   



On Mon, 23 Jan 2012 01:45:33 -0500, Phil Hobbs
<pcdhSpamMeSenseless_at_electrooptical.net> wrote:

Quote:

I know about bubble sort...remember i indicated a *small* group of
umbers, and i said an interchange flag improved it (no longer n^2).
Merge sort is, i think another name for quicksort.

No. It is not. Quicksort is the flag enhanced bubble sort. Faster than
O(n^2) but not as fast as any of several O(n * log(n)) sorts. There is
about 4 of them. Merge sort and shaker sort are two of them

Got an A+ on that project in school, all of the canonical sorts are in
CAlgo published by ACM. There are about 7 of them. That is how many i
used in the assignment.

?-)

You're perhaps confusing Shell's method with quicksort. Shell's method
is O(n**4/3) or something like that, and is often the fastest algorithm
on moderate-sized arrays. Quicksort and heapsort are the two classical
n*log(n) methods--on average. Vanilla quicksort is actually an n**2
method if you try sorting an already-sorted list!

Cheers

I just saw my Calgo book yesterday, i can look all of them up again. Be
much faster than trying to find my school program again.

?-)

Goto page Previous  1, 2

elektroda.net NewsGroups Forum Index - Electronics Design - Q on "average"

Ask a question - edaboard.com

Arabic versionBulgarian versionCatalan versionCzech versionDanish versionGerman versionGreek versionEnglish versionSpanish versionFinnish versionFrench versionHindi versionCroatian versionIndonesian versionItalian versionHebrew versionJapanese versionKorean versionLithuanian versionLatvian versionDutch versionNorwegian versionPolish versionPortuguese versionRomanian versionRussian versionSlovak versionSlovenian versionSerbian versionSwedish versionTagalog versionUkrainian versionVietnamese versionChinese version
RTV map EDAboard.com map News map EDAboard.eu map EDAboard.de map EDAboard.co.uk map Opony