Monday, February 9, 2009

Wavelet Transform in brief

The wavelet transform replaces the Fourier transform's sinusoidal waves by a family generated by translations and dilations of a window called a wavelet.

It takes two arguments: time and scale.


Outline

The wavelet transform is defined by

where the base atom y is a zero average function, centered around zero with a finite energy. The family of vectors is obtained by translations and dilatations of the base atom:

The previous function is centered around u, like the windowed Fourier atom. If the frequency center of y is h, then the frequency center of the dilated function is h/s.

Its time spread is proportional to s. Its frequency spread is proportional to the inverse of s. Here is an example of Heisenberg boxes of wavelet atoms:

At the finer scales, more Heisenberg boxes can be placed side to side because there is a better time resolution.

Properties

The wavelet transform has thus a time frequency resolution which depends on the scale s. Under the condition

it is a complete, stable and redundant representation of the signal; in particular, the wavelet transform is left invertible. The redundancy implies the existence of a reproducing kernel.

Scalogram

If h denotes the frequency center of the base wavelet, then the frequency center of a dilated wavelet is x=h/s. The scalogram of a signal is defined by

The normalized scalogram is .

Choice of Window

As far as the continuous wavelet transform is concerned, a wavelet is simply a finite energy function with a zero mean. Besides its Heisenberg box, the most important feature of a wavelet is the number of its vanishing moments:

The vanishing moments property makes it possible to analyse the local regularity of a signal.

A theorem caracterizes fast decaying wavelets with n vanishing moments as the nth derivatives of a fast decaying function.

Implementation

The wavelet transform is computed with a Fast Wavelet Transform. It computes a discrete transform with circular convolutions, which are themselves computed with a FFT.

To speed up computations, dyadic wavelets are often used. The dyadic wavelet transform is implemented by filter banks.


FIR Filter Implementation

The simplest FIR filter to apply to a signal is the rectangular or boxcar filter, which is implemented with IDL's SMOOTH function, or the closely related MEDIAN function.

Applying other FIR filters to signals is straightforward since the filter is non-recursive. The filtered signal is simply the convolution of the impulse response of the filter with the original signal. The impulse response of the filter is computed with the DIGITAL_FILTER function or by the procedure in the previous section.

IDL's BLK_CON function provides a simple and efficient way to convolve a filter with a signal. Using u(k) from the previous example and the bandstop filter created above creates the plot shown in the figure below.

The frequency response of the filtered signal shows that the frequency component at 11.0 cycles / second has been filtered out, while the frequency components at 2.8 and 6.25 cycles / second, as well as the DC component, have been passed by the filter.

Enter the following commands at the IDL prompt to create the plot:

; Compute time data sequence u:
@sigprc01
; Compute the Kaiser filter coefficients
; with the sampling period in seconds:
delt = 0.02
; Frequencies above f_low will be passed:
f_low = 15.
; Frequencies below f_high will be passed:
f_high = 7.
; Ripple amplitude will be less than -50 dB:
a_ripple = 50.
; The order of the filter:
nterms = 40
; Compute the impulse response = the filter coefficients:
bs_ir_k = DIGITAL_FILTER(f_low*2*delt, f_high*2*delt, $
a_ripple, nterms)
; Convolve the Kaiser filter with the signal:
u_filt = BLK_CON(bs_ir_k, u)
; Spectrum of original signal:
v = FFT(u)
; Spectrum of the filtered signal:
v_filt = FFT(u_filt)
; Create a log-log plot of power spectra.
; F = [0.0, 1.0/(N*delt), ... , 1.0/(2.0*delt)]
F = FINDGEN(N/2+1) / (N*delt)
PLOT, F, ABS(v(0:N/2))^2, YTITLE='Power Spectrum', /YLOG, $
XTITLE='Frequency in cycles / second', /XLOG, $
XRANGE=[1.0,1.0/(2.0*delt)], XSTYLE=1, $
TITLE='Spectrum of u(k) Before (solid) and!CAfter $
(dashed) Digital Filtering'

Alternately, you can run the following batch file to create the plot:

@sigprc12

See Running the Example Code if IDL does not find the batch file.

Tikhonov's regularization and multiresolution analysis

If wj(I) are the wavelet coefficients of the image I at the scale j, we have:
$\displaystyle \hat{w}_j^{(I)}(u,v)$ = $\displaystyle \hat{g}(2^{j-1}u, 2^{j-1}v) \prod_{i=j-2}^{i=0}\hat{h}(2^{i}u, 2^{i}v) \hat{I}(u,v)$
= $\displaystyle {\hat{\psi}(2^{j}u, 2^{j}v) \over \hat{\phi}(u,v)} \hat{P}(u,v) \hat{O}(u,v)$ (14.106)
= $\displaystyle \hat{w}_j^{(P)} \hat{O}(u,v)$

where wj(P) are the wavelet coefficients of the PSF at the scale j. The wavelet coefficients of the image I are the product of convolution of object O by the wavelet coefficients of the PSF.

To deconvolve the image, we have to minimize for each scale j:

$\displaystyle \parallel {\hat \psi(2^ju, 2^jv)\over \hat\phi(u, v)} \hat P(u,v) \hat O(u,v) - \hat w_j^{(I)}(u,v)\parallel^2$ (14.107)

and for the plane at the lower resolution:
$\displaystyle \parallel {\hat \phi(2^{n-1}u, 2^{n-1}v)\over \hat\phi(u, v)} \hat P(u,v) \hat O(u,v) - \hat c_{n-1}^{(I)}(u,v)\parallel^2$ (14.108)

n being the number of planes of the wavelet transform ((n-1) wavelet coefficient planes and one plane for the image at the lower resolution). The problem has not generally a unique solution, and we need to do a regularization [40]. At each scale, we add the term:
$\displaystyle \gamma_j \parallel w_j^{(O)} \parallel^2 \mbox{ min }$ (14.109)

This is a smoothness constraint. We want to have the minimum information in the restored object. From equations 14.107, 14.108, 14.109, we find:
$\displaystyle \hat D(u,v) \hat O(u,v) = \hat N(u,v)$ (14.110)

with:
\begin{eqnarray*}\hat D(u,v) = \sum_j \mid \hat\psi(2^ju, 2^jv) \mid^2 (\mid\hat... ... \gamma_j) + \mid \hat \phi(2^{n-1}u,2^{n-1}v)\hat P(u,v) \mid^2 \end{eqnarray*}

and:
\begin{eqnarray*}\hat N(u,v) = \hat\phi(u, v) [ \sum_j \hat P^*(u,v)\hat\psi^*(2... ... \hat P^*(u,v) \hat\phi^*(2^{n-1}u,2^{n-1}v) \hat c_{n-1}^{(I)}] \end{eqnarray*}

if the equation is well constrained, the object can be computed by a simple division of $\hat N$ by $\hat D$. An iterative algorithm can be used to do this inversion if we want to add other constraints such as positivity. We have in fact a multiresolution Tikhonov's regularization. This method has the advantage to furnish a solution quickly, but optimal regularization parameters $\gamma_j$ cannot be found directly, and several tests are generally necessary before finding an acceptable solution. Hovewer, the method can be interesting if we need to deconvolve a big number of images with the same noise characteristics. In this case, parameters have to be determined only the first time. In a general way, we prefer to use one of the following iterative algorithms.

Adaptive filtering from the wavelet transform

In the preceding algorithm we have assumed the properties of the signal and the noise to be stationary. The wavelet transform was first used to obtain an algorithm which is faster than classical Wiener Filtering. Then we took into account the correlation between two different scales. In this way we got a filtering with stationary properties. In fact, these hypotheses were too simple, because in general the signal may not arise from a Gaussian stochastic process. Knowing the noise distribution, we can determine the statistically significant level at each scale of the measured wavelet coefficients. If wi(x) is very weak, this level is not significant and could be due to noise. Then the hypothesis that the value Wi(x) is null is not forbidden. In the opposite case where wi(x) is significant, we keep its value. If the noise is Gaussian, we write:
P.Wi(x) = W(s)i(x) (14.92)

where P is the non linear operator which performs the inverse transform, the wavelet transform, and the thresholding. An alternative is to use the following iterative solution which is similar to Van Cittert's algorithm:
W(n)i(x) = Wi(s)(x) + Wi(n-1)(x) - P.Wi(n-1)(x) (14.93)

for the significant coefficients ( $W_i^{(s)}(x) \neq 0$) and:
Wi(n)(x) = Wi(n-1)(x) (14.94)

for the non significant coefficients ( Wi(s)(x) = 0).

The algorithm is the following one:

1.
Compute the wavelet transform of the data. We get wi.
2.
Estimate the standard deviation of the noise B0 of the first plane from the histogram of w0.
3.
Estimate the standard deviation of the noise Bi from B0 at each scale.
4.
Estimate the significant level at each scale, and threshold.
5.
Initialize: W(0)i(x) = Wi(s)(x)
6.
Reconstruct the picture by using the iterative method.

The thresholding may introduce negative values in the resulting image. A positivity constraint can be introduced in the iterative process, by thresholding the restored image. The algorithm converges after five or six iterations.

Multiresolution with scaling functions with a frequency cut-offThe convolution from the continuous wavelet transform

We will examine here the computation of a convolution by using the continuous wavelet transform in order to get a framework for linear smoothings. Let us consider the convolution product of two functions:
$\displaystyle h(x)=\int_{-\infty}^{+\infty} f(u)g(x-u)dx$ (14.68)

We introduce two real wavelets functions $\psi(x)$ and $\chi(x)$such that:
$\displaystyle C=\int_0^{+\infty} \frac{\hat{\psi}^*(\nu)\hat{\chi}(\nu)}{\nu}d\nu$ (14.69)

is defined. Wg(a,b) denotes the wavelet transform of g with the wavelet function $\psi(x)$:
$\displaystyle W_g(a,b)=\frac{1}{\sqrt a}\int_{-\infty}^{+\infty}g(x)\psi^*(\frac{x-b}{a})dx$ (14.70)

We restore g(x) with the wavelet function $\chi(x)$:
$\displaystyle g(x)=\frac{1}{C}\int_0^{+\infty}\int_{-\infty}^{+\infty}\frac{1}{\sqrt a} W_g(a,b)\chi(\frac{x-b}{a})\frac{dadb}{a^2}$ (14.71)

The convolution product can be written as:
$\displaystyle h(x)=\frac{1}{C}\int_0^{+\infty}\frac{da}{a^{\frac{5}{ 2}}}\int_{-\infty}^{+\infty} W_g(a,b)db\int_{-\infty}^{+\infty}f(u)\chi(\frac{x-u-b}{a}) du$ (14.72)

Let us denote $\tilde{\chi}(x)=\chi(-x)$. The wavelet transform Wf(a,b) of f(x) with the wavelet $\tilde{\chi}(x)$ is:
$\displaystyle \tilde W_f(a,b)=\frac{1}{\sqrt a}\int_{-\infty}^{+\infty}f(x)\tilde{\chi}(\frac{x-b}{a})dx$ (14.73)

That leads to:
$\displaystyle h(x)=\frac{1}{C}\int_0^{+\infty}\frac{da}{a^2}\int_{-\infty}^{+\infty} \tilde W_f(a,x-b)W_g(a,b)db$ (14.74)

Then we get the final result:
$\displaystyle h(x)={1\over C}\int_0^{+\infty}\tilde W_f(a,x)\otimes W_g(a,x)\frac{da}{ a^2}$ (14.75)

In order to compute a convolution with the continuous wavelet transform:
  • We compute the wavelet transform $\tilde W_f(a,b)$ of the function f(x) with the wavelet function $\tilde{\chi}(x)$;
  • We compute the wavelet transform Wg(a,b) of the function g(x) with the wavelet function $\psi (x)$;
  • We sum the convolution product of the wavelet transforms, scale by scale.

The wavelet transform permits us to perform any linear filtering. Its efficiency depends on the number of terms in the wavelet transform associated with g(x) for a given signal f(x). If we have a filter where the number of significant coefficients is small for each scale, the complexity of the algorithm is proportional to N. For a classical convolution, the complexity is also proportional to N, but the number of operations is also proportional to the length of the convolution mask. The main advantage of the present technique lies in the possibility of having a filter with long scale terms without computing the convolution on a large window. If we achieve the convolution with the FFT algorithm, the complexity is of order $N\log_2N$. The computing time is longer than the one obtained with the wavelet transform if we concentrate the energy on very few coefficients.

Multiresolution Analysis

Multiresolution analysis [25] results from the embedded subsets generated by the interpolations at different scales.

A function f(x) is projected at each step j onto the subset Vj. This projection is defined by the scalar product cj(k) of f(x) with the scaling function $\phi(x)$ which is dilated and translated:

$" align="middle" border="0" height="51" width="320"> (14.9)

As $\phi(x)$ is a scaling function which has the property:
$\displaystyle {1 \over 2} \phi({x \over 2}) = \sum_n h(n) \phi(x-n)$ (14.10)

or
$\displaystyle \hat{\phi}(2\nu)=\hat h(\nu)\hat{\phi}(\nu)$ (14.11)

where $\hat h(\nu)$ is the Fourier transform of the function $\sum_n h(n)\delta(x-n)$. We get:
$\displaystyle \hat h(\nu)=\sum_n h(n)e^{-2\pi n \nu}$ (14.12)

Equation 14.10 permits to compute directly the set cj+1(k) from cj(k). If we start from the set c0(k) we compute all the sets cj(k), with j>0, without directly computing any other scalar product:
$\displaystyle c_{j+1}(k)=\sum_n h(n-2 k)c_{j}(n)$ (14.13)

At each step, the number of scalar products is divided by 2. Step by step the signal is smoothed and information is lost. The remaining information can be restored using the complementary subspace Wj+1 of Vj+1 in Vj. This subspace can be generated by a suitable wavelet function $\psi(x)$ with translation and dilation.

$\displaystyle {1 \over 2}\psi({x\over 2})=\sum_n g(n)\phi(x-n)$ (14.14)

or
$\displaystyle \hat{\psi}(2\nu)=\hat{g}(\nu)\hat{\phi}(\nu)$ (14.15)

We compute the scalar products $" align="middle" border="0" height="51" width="323"> with:

$\displaystyle w_{j+1}(k)=\sum_n g(n-2k)c_{j}(n)$ (14.16)

With this analysis, we have built the first part of a filter bank [34]. In order to restore the original data, Mallat uses the properties of orthogonal wavelets, but the theory has been generalized to a large class of filters [8] by introducing two other filters $\tilde h$ and $\tilde g$ named conjugated to h and g. The restoration is performed with:

$\displaystyle c_{j}(k)=2\sum_l [c_{j+1}(l)\tilde h(k+2l)+w_{j+1}(l)\tilde g(k+2l)]$ (14.17)

In order to get an exact restoration, two conditions are required for the conjugate filters:

  • Dealiasing condition:
    $\displaystyle \hat h(\nu+\frac{1}{2})\hat{\tilde h}(\nu)+\hat g(\nu+\frac{1}{2})\hat{\tilde g}(\nu)=0$ (14.18)

  • Exact restoration:
    $\displaystyle \hat h(\nu)\hat{\tilde h}(\nu)+\hat g(\nu)\hat{\tilde g}(\nu)=1$ (14.19)


Figure 14.3: The filter bank associated with the multiresolution analysis
\begin{figure} \centerline{ \hbox{ \psfig{figure=fig_paper_3.ps,bbllx=3.5cm,bblly=9cm,bburx=18cm,bbury=22.5cm,height=10 cm,width=14cm,clip=} }} \end{figure}

In the decomposition, the function is successively convolved with the two filters H (low frequencies) and G (high frequencies). Each resulting function is decimated by suppression of one sample out of two. The high frequency signal is left, and we iterate with the low frequency signal (upper part of figure 14.3). In the reconstruction, we restore the sampling by inserting a 0 between each sample, then we convolve with the conjugate filters $\tilde H$ and $\tilde G$, we add the resulting functions and we multiply the result by 2. We iterate up to the smallest scale (lower part of figure 14.3).

Orthogonal wavelets correspond to the restricted case where:

$\displaystyle \hat g(\nu)$ = $\displaystyle e^{-2\pi\nu}\hat h^*(\nu+{1\over 2})$ (14.20)
$\displaystyle \hat{\tilde h}(\nu)$ = $\displaystyle \hat h^*(\nu)$ (14.21)
$\displaystyle \hat{\tilde g}(\nu)$ = $\displaystyle \hat g^*(\nu)$ (14.22)

and
$\displaystyle \mid \hat h(\nu) \mid^2+ \mid \hat h(\nu+{1\over 2} \mid^2=1$ (14.23)

We can easily see that this set satisfies the two basic relations 14.18 and 14.19. Daubechies wavelets are the only compact solutions. For biorthogonal wavelets [8] we have the relations:
$\displaystyle \hat g(\nu)$ = $\displaystyle e^{-2\pi\nu}\hat{\tilde h}^*(\nu+{1\over 2})$ (14.24)
$\displaystyle \hat{\tilde g}(\nu)$ = $\displaystyle e^{2\pi\nu}\hat h^*(\nu+{1\over 2})$ (14.25)

and
$\displaystyle \hat h(\nu)\hat{\tilde h}(\nu)+\hat h^*(\nu+{1\over 2}) \hat{\tilde h}^*(\nu+{1\over 2})=1$ (14.26)

We also satisfy relations 14.18 and 14.19. A large class of compact wavelet functions can be derived. Many sets of filters were proposed, especially for coding. It was shown [9] that the choice of these filters must be guided by the regularity of the scaling and the wavelet functions. The complexity is proportional to N. The algorithm provides a pyramid of N elements.

The 2D algorithm is based on separate variables leading to prioritizing of x and y directions. The scaling function is defined by:

$\displaystyle \phi(x,y) = \phi(x)\phi(y)$ (14.27)

The passage from a resolution to the next one is done by:
$\displaystyle f_{j+1}(k_x,k_y) = \sum_{l_x=-\infty}^{+\infty} \sum_{l_y=-\infty}^{+\infty} h(l_x- 2k_x) h(l_y -2k_y) f_{j}(l_x,l_y)$ (14.28)

The detail signal is obtained from three wavelets:
  • a vertical wavelet :
    \begin{displaymath}\psi^1(x,y) = \phi(x)\psi(y) \end{displaymath}

  • a horizontal wavelet:
    \begin{displaymath}\psi^2(x,y) = \psi(x)\phi(y) \end{displaymath}

  • a diagonal wavelet:
    \begin{displaymath}\psi^3(x,y) = \psi(x)\psi(y) \end{displaymath}

which leads to three sub-images:
\begin{displaymath}C_{j+1}^1 (k_x, k_y) = \sum_{l_x=-\infty}^{+\infty} \sum_{l_y=-\infty}^{+\infty} g(l_x- 2k_x) h(l_y -2k_y) f_{j}(l_x,l_y)\end{displaymath}


\begin{displaymath}C_{j+1}^2 (k_x, k_y) = \sum_{l_x=-\infty}^{+\infty} \sum_{l_y=-\infty}^{+\infty} h(l_x- 2k_x) g(l_y -2k_y) f_{j}(l_x,l_y)\end{displaymath}


\begin{displaymath}C_{j+1}^3 (k_x, k_y) = \sum_{l_x=-\infty}^{+\infty} \sum_{l_y=-\infty}^{+\infty} g(l_x- 2k_x) g(l_y -2k_y) f_{j}(l_x,l_y)\end{displaymath}


Figure 14.4: Wavelet transform representation of an image
\begin{figure} \centerline{ \hbox{ \psfig{figure=fig_mallat3.ps,bbllx=5.5cm,bblly=9.5cm,bburx=16cm,bbury=20cm,height=8cm,width=8cm,clip=} }} \end{figure}

The wavelet transform can be interpreted as the decomposition on frequency sets with a spatial orientation.