EDAboard.com | EDAboard.de | EDAboard.co.uk | WTWH Media

Median filter in Verilog

Ask a question - edaboard.com

elektroda.net NewsGroups Forum Index - Verilog Language - Median filter in Verilog

Goto page 1, 2  Next

john
Guest

Tue Jun 24, 2008 6:37 pm   



Hi all,
I am trying to design a median filter in Verilog. There are plenty of
papers on median filter designs for image/audio applications. However,
I cannot find any starting point for a median filter which needs to
sort 100 numbers (14bit wide each).

Any suggestions? Thanks.....

Mike Treseler
Guest

Tue Jun 24, 2008 6:38 pm   



john wrote:

Quote:
I am trying to design a median filter in Verilog. There are plenty of
papers on median filter designs for image/audio applications.

Any suggestions? Thanks.....

I like to prototype that sort of thing in python
before writing rtl. If python suits you
there is a python package called MyHDL that
can generate verilog code. Related example:
http://myhdl.jandecaluwe.com/doku.php/cookbook:bitonic


-- Mike Treseler

glen herrmannsfeldt
Guest

Tue Jun 24, 2008 8:05 pm   



john wrote:

Quote:
I am trying to design a median filter in Verilog. There are plenty of
papers on median filter designs for image/audio applications. However,
I cannot find any starting point for a median filter which needs to
sort 100 numbers (14bit wide each).

My favorite architecture for such processors is the
systolic array, which may or may not be useful in this case.

First thought is that, given an already sorted array you
remove one point and add the new one in the appropriate position.
(Note that http://en.wikipedia.org/wiki/Median_filter says
you should have an odd number, and 100 isn't odd.)

The sort/insert should be done in a fixed number of clock
cycles, like one or two. That would seem to require N
comparators, and N bit priority encoder, and N multiplexers
to shift and insert as appropriate. That sounds like a lot
of logic, but it shouldn't be hard in a reasonable sized
ASIC or FPGA to fit it in. It requires signals going
long distances across the array, which might slow it
down too much.

My first thought on seeing Median was an algorithm
in Knuth, "The Art of Computer Programming," vol. 3,
on an O(n) algorithm for median that doesn't sort
the whole array.

A systolic array solution isn't immediately obvious,
but that might work better for high speed processing.
It might be that the algorithm in Knuth could be
implemented in some type of parallel array not
requiring long distance communication.

-- glen

Jonathan Bromley
Guest

Thu Jun 26, 2008 12:13 pm   



On Tue, 24 Jun 2008 11:05:34 -0800, glen herrmannsfeldt wrote:

Quote:
john wrote:

I am trying to design a median filter in Verilog. There are plenty of
papers on median filter designs for image/audio applications. However,
I cannot find any starting point for a median filter which needs to
sort 100 numbers (14bit wide each).

A systolic array solution isn't immediately obvious

I believe there is a quasi-systolic solution, but it isn't simple.

Consider an array of cells, each of which holds a 14-bit data value
and a 7-bit link value. The cells are maintained in order sorted
on the data value, and therefore the median is trivially the
data value of the middle cell. Each cell's link value contains
the cell number of the next cell in arrival-time order, forming
a linked list. Two additional 7-bit registers hold the cell
numbers of the oldest and newest points (head and tail pointers
of the linked list).

On arrival of a new data point, you must...
- locate the oldest cell, pointed-to by the head pointer;
this cell's value will be lost
- locate the position in the sorted array that will be
occupied by the new data point - this is simple enough,
requiring a 14-bit magnitude comparator at each cell
- for every cell, shift its contents
right, left or remain in place, depending on where it
is relative to the new datum's insertion point, the
oldest cell and the newest cell (tail pointer)
- for every cell, decrement or increment its link value
depending on where the link points to relative to those
three critical positions
- finally, update the head and tail pointers to reflect
the new arrangements.

The majority of the logic and routing in this scheme is
highly localised to individual cells and their immediate
neighbours. However, some values must be distributed to
every cell, and some information (the location at which
the new data point is to be inserted) must be inferred
from a thermometer code that's spread over the whole
array (a single-bit output from each cell).
So it's not truly systolic, but I think it's workable
for ~100 points. I guess it would be very much easier
to do if you had three or four clock cycles to handle
each new data point.

The resource cost appears to be, for the OP's problem:
- one data register per point (14 bits)
- one link register per point (7 bits), with an
incrementer/decrementer on it
- one data value comparator per point (14-bit magnitude comp)
- two link value comparators per point (7-bit magnitude)
- a very small amount of local control logic per point
- 7-bit head and tail pointer registers
- a global 100-to-7 thermometer coder to compute the
location at which the new value will be inserted
- buffers to distribute the input value, head and tail
pointers, and new value location to all cells

I'll try to work through this, because it's fun; but I
don't have too much free time right now, so no promises.
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.bromley_at_MYCOMPANY.com
http://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

john
Guest

Thu Jun 26, 2008 2:41 pm   



On Jun 24, 7:38 pm, Mike Treseler <mike_trese...@comcast.net> wrote:
Quote:
john wrote:
I am trying to design a median filter in Verilog. There are plenty of
papers on median filter designs for image/audio applications.
Any suggestions? Thanks.....

I like to prototype that sort of thing in python
before writing rtl. If python suits you
there is a python package called MyHDL that
can generate verilog code. Related example:http://myhdl.jandecaluwe.com/doku.php/cookbook:bitonic

-- Mike Treseler

I tested the verilog code in the bitonic example and works fine. I
would like to do some mods on the MyHDL implementation
and simulate the results with my verilog simulator. Is there an easy
way to use this Python package? I am not familiar with Python at all.
Thank you Verymuch. Much appreciated.
John

Jonathan Bromley
Guest

Thu Jun 26, 2008 3:25 pm   



On Thu, 26 Jun 2008 06:54:36 -0700 (PDT), john wrote:

Quote:
I do not get why the arrival-time order information is
important here and where it is used in the scheme
you are proposing.

The idea I have (and it's only an idea at present) is to
keep the most recent 100 data points as a sorted array.
In this way, when each new point arrives it's only necessary
to find where to put it in the sorted array, because the
other 99 points are already sorted; it seems like a crazy
waste of resources to do all the sorting all over again.
Inserting one new item in an already-sorted array is
quite easy to do in hardware, if the array is not too big.

However, when each new point arrives you also need
to throw away the oldest (100th) point. But the points
have been sorted into the array - so you don't know which
entry in the array holds the oldest point. That was the
problem I was trying to solve with the linked-list idea,
but I now think I perhaps have a simpler way to do it...
when you get to my age the wheels go round quite a bit
slower than they used to, so you'll have to wait awhile
for the result of that "I now think" !!!

Quote:
I have 10 clock cycles to sort each new data point.

That should be plenty, if what I'm planning can be
made to work.

Your current idea seems to be keeping the most recent
100 data points in a bunch of 100 registers, and feeding
them all into a sorting network. This is a very
extravagant way of doing it. The classic bitonic sort
needs 24 compare/swap elements to sort 8 data points;
using Batcher's ingenious modification of the sort,
you can get that down to 19 compare/swaps. Batcher's
algorithm requires - wait for it - 1077 compare/swap
elements to sort 100 points. That's BIG. That's why
I've been trying to think of more compact methods Smile
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.bromley_at_MYCOMPANY.com
http://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

Jonathan Bromley
Guest

Thu Jun 26, 2008 4:52 pm   



On Thu, 26 Jun 2008 08:34:20 -0700 (PDT), john wrote:

Quote:
It seems like I did not explain my problem clearly since the very
beginning. I need to sort an array of 100 points. I am receiving each
data point every 10 clock cycles. When I have 101 points in the array,
I output the median value and clear the array to start all over again.
This will probably be simpler than what you were thinking - no arrival
time order info required.

Whoa! That's not a median filter, in any sense that I'm
familiar with. However, if that's what you want to do then it's
MUCH easier. Unless your clock is rather fast, you can probably
do one sample per clock quite easily.

Build an array of 101 registers. Each register has a comparator
that compares the current register contents with the incoming
data bus.

Initialise all these registers to the lowest possible value (zero?).

When each new value arrives, each register's comparator compares
the new value with the register's contents and yields a "greater
than" result - call that GT[i] for element [i], where [i] is in
the range [0:100]. If the incoming number is bigger than the
register, then GT[i] is true.

Each register now behaves in the following way:

if (GT[i] && GT[i+1])
register[i] <= register[i+1];
else if (GT[i] && !GT[i+1])
register[i] <= incoming_value;
// else (!GT[i]), hold the value of register[i]

The result of this is that the register array is always
in sorted order. After you've received the 101st incoming
value, you can pick off the median from register[50]
before clearing all registers back to zero again.

The only tiny wrinkle is what happens at the highest-numbered
element; that's actually quite easy - you just create an extra
signal GT[101] that is stuck at false.

I've used this insertion-sort mechanism many times. It's
nice and scalable, except that you need to distribute the
incoming value to all 101 registers and comparators. You
can probably trust your synthesis tool to make a reasonable
job of that, but it will certainly create some buffering.
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.bromley_at_MYCOMPANY.com
http://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

john
Guest

Thu Jun 26, 2008 4:54 pm   



On Jun 26, 1:13 pm, Jonathan Bromley <jonathan.brom...@MYCOMPANY.com>
wrote:
Quote:
On Tue, 24 Jun 2008 11:05:34 -0800, glen herrmannsfeldt wrote:
john wrote:

I am trying to design a median filter in Verilog. There are plenty of
papers on median filter designs for image/audio applications. However,
I cannot find any starting point for a median filter which needs to
sort 100 numbers (14bit wide each).
A systolic array solution isn't immediately obvious

I believe there is a quasi-systolic solution, but it isn't simple.

Consider an array of cells, each of which holds a 14-bit data value
and a 7-bit link value. The cells are maintained in order sorted
on the data value, and therefore the median is trivially the
data value of the middle cell. Each cell's link value contains
the cell number of the next cell in arrival-time order, forming
a linked list. Two additional 7-bit registers hold the cell
numbers of the oldest and newest points (head and tail pointers
of the linked list).

On arrival of a new data point, you must...
- locate the oldest cell, pointed-to by the head pointer;
this cell's value will be lost
- locate the position in the sorted array that will be
occupied by the new data point - this is simple enough,
requiring a 14-bit magnitude comparator at each cell
- for every cell, shift its contents
right, left or remain in place, depending on where it
is relative to the new datum's insertion point, the
oldest cell and the newest cell (tail pointer)
- for every cell, decrement or increment its link value
depending on where the link points to relative to those
three critical positions
- finally, update the head and tail pointers to reflect
the new arrangements.

The majority of the logic and routing in this scheme is
highly localised to individual cells and their immediate
neighbours. However, some values must be distributed to
every cell, and some information (the location at which
the new data point is to be inserted) must be inferred
from a thermometer code that's spread over the whole
array (a single-bit output from each cell).
So it's not truly systolic, but I think it's workable
for ~100 points. I guess it would be very much easier
to do if you had three or four clock cycles to handle
each new data point.

The resource cost appears to be, for the OP's problem:
- one data register per point (14 bits)
- one link register per point (7 bits), with an
incrementer/decrementer on it
- one data value comparator per point (14-bit magnitude comp)
- two link value comparators per point (7-bit magnitude)
- a very small amount of local control logic per point
- 7-bit head and tail pointer registers
- a global 100-to-7 thermometer coder to compute the
location at which the new value will be inserted
- buffers to distribute the input value, head and tail
pointers, and new value location to all cells

I'll try to work through this, because it's fun; but I
don't have too much free time right now, so no promises.
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.brom...@MYCOMPANY.comhttp://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

Thanks for your explanation Jonathan. I do not get why the arrival-
time
order information is important here and where it is used in the scheme
you are proposing. I have 10 clock cycles to sort each new data point.

Jonathan Bromley
Guest

Thu Jun 26, 2008 4:56 pm   



On Thu, 26 Jun 2008 16:52:33 +0100, Jonathan Bromley wrote:

Quote:
Build an array of 101 registers.
[...]
Each register now behaves in the following way:
[...]


Afterthought, for algorithm wonks: This is an O(N^2)
sort with the interesting property that it is O(N) in
size and O(N) in time - to sort an array of N elements
you need N comparators and registers, and it takes
N clock cycles to do the sort.
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.bromley_at_MYCOMPANY.com
http://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

Mike Treseler
Guest

Thu Jun 26, 2008 5:59 pm   



john wrote:

Quote:
I tested the verilog code in the bitonic example and works fine. I
would like to do some mods on the MyHDL implementation
and simulate the results with my verilog simulator.

Check out Jan's manual:
http://www.jandecaluwe.com/Tools/MyHDL/manual/MyHDL.html
Logical simulation is built in.

Quote:
Is there an easy way to use this Python package?

Not without reading the docs and working an example.

I am not familiar with Python at all.

It's worth learning.

56 Thu Jun 26 /evtfs/home/tres> python
Type "help", "copyright", "credits" or "license" for more information.

Quote:
2**32
4294967296L
hex(4294967296)
'0x100000000L'



-- Mike Treseler

john
Guest

Thu Jun 26, 2008 6:34 pm   



On Jun 26, 4:25 pm, Jonathan Bromley <jonathan.brom...@MYCOMPANY.com>
wrote:
Quote:
On Thu, 26 Jun 2008 06:54:36 -0700 (PDT), john wrote:
I do not get why the arrival-time order information is
important here and where it is used in the scheme
you are proposing.

The idea I have (and it's only an idea at present) is to
keep the most recent 100 data points as a sorted array.
In this way, when each new point arrives it's only necessary
to find where to put it in the sorted array, because the
other 99 points are already sorted; it seems like a crazy
waste of resources to do all the sorting all over again.
Inserting one new item in an already-sorted array is
quite easy to do in hardware, if the array is not too big.

However, when each new point arrives you also need
to throw away the oldest (100th) point. But the points
have been sorted into the array - so you don't know which
entry in the array holds the oldest point. That was the
problem I was trying to solve with the linked-list idea,
but I now think I perhaps have a simpler way to do it...
when you get to my age the wheels go round quite a bit
slower than they used to, so you'll have to wait awhile
for the result of that "I now think" !!!

I have 10 clock cycles to sort each new data point.

That should be plenty, if what I'm planning can be
made to work.

Your current idea seems to be keeping the most recent
100 data points in a bunch of 100 registers, and feeding
them all into a sorting network. This is a very
extravagant way of doing it. The classic bitonic sort
needs 24 compare/swap elements to sort 8 data points;
using Batcher's ingenious modification of the sort,
you can get that down to 19 compare/swaps. Batcher's
algorithm requires - wait for it - 1077 compare/swap
elements to sort 100 points. That's BIG. That's why
I've been trying to think of more compact methods Smile
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.brom...@MYCOMPANY.comhttp://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

It seems like I did not explain my problem clearly since the very
beginning. I need to sort an array of 100 points. I am receiving each
data point every 10 clock cycles. When I have 101 points in the array,
I output the median value and clear the array to start all over again.
This will probably be simpler than what you were thinking - no arrival
time order info required. I am just trying to get a good solution in
terms of area and power. Thanks a million. John.

glen herrmannsfeldt
Guest

Fri Jun 27, 2008 10:19 pm   



john wrote:
(snip)

Quote:
It seems like I did not explain my problem clearly since the very
beginning. I need to sort an array of 100 points. I am receiving each
data point every 10 clock cycles. When I have 101 points in the array,
I output the median value and clear the array to start all over again.
This will probably be simpler than what you were thinking - no arrival
time order info required. I am just trying to get a good solution in
terms of area and power. Thanks a million. John.

Much easier. (Though you want to sort all 101.)

Still, you might look at the algorithm in Knuth.

-- glen

glen herrmannsfeldt
Guest

Fri Jun 27, 2008 10:45 pm   



Jonathan Bromley wrote:

(snip, I wrote)

Quote:
A systolic array solution isn't immediately obvious

I believe there is a quasi-systolic solution, but it isn't simple.

How about a binary tree shaped systolic array. Not that
I completely see it yet, but since 10 clock cycles
are allowed for each data point, it would seem that
one could get through a log2(100) level tree in less
than 10 cycles. (with the small complication that
100 is not a power of two.)

-- glen

john
Guest

Tue Jul 01, 2008 8:21 pm   



On Jun 26, 5:56 pm, Jonathan Bromley <jonathan.brom...@MYCOMPANY.com>
wrote:
Quote:
On Thu, 26 Jun 2008 16:52:33 +0100, Jonathan Bromley wrote:
Build an array of 101 registers.
[...]
Each register now behaves in the following way:

[...]

Afterthought, for algorithm wonks: This is an O(N^2)
sort with the interesting property that it is O(N) in
size and O(N) in time - to sort an array of N elements
you need N comparators and registers, and it takes
N clock cycles to do the sort.
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.brom...@MYCOMPANY.comhttp://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

Thanks a million for your help. 25 comparators with a mux for my 100
points seems to be working
fine in 4 clock cycles whiile saving some are.

gupta
Guest

Mon Aug 18, 2008 5:11 pm   



On Jun 24, 5:37 pm, john <alcar...@hotmail.com> wrote:
Quote:
Hi all,
I am trying to design a median filter in Verilog. There are plenty of
papers on median filter designs for image/audio applications. However,
I cannot find any starting point for a median filter which needs to
sort 100 numbers (14bit wide each).

Any suggestions? Thanks.....

Hello John,

I saw one of your posts in google groups regarding design of median
filter in verilog. I am also working on the same topic and i need your
help. If u are already completed the task, is it possbile to send me
the code and please specify a book to study more about median filter
( regarding the behavior or working of the filter)

Thanks in Advance.

Gupta.

Goto page 1, 2  Next

elektroda.net NewsGroups Forum Index - Verilog Language - Median filter in Verilog

Ask a question - edaboard.com

Arabic version Bulgarian version Catalan version Czech version Danish version German version Greek version English version Spanish version Finnish version French version Hindi version Croatian version Indonesian version Italian version Hebrew version Japanese version Korean version Lithuanian version Latvian version Dutch version Norwegian version Polish version Portuguese version Romanian version Russian version Slovak version Slovenian version Serbian version Swedish version Tagalog version Ukrainian version Vietnamese version Chinese version Turkish version
EDAboard.com map