Adaptive filters are systems that are able to adapt their coefficients with respect to the properties of their environment, in order to satisfy a given objective. Furthermore, they may also be able to adapt themselves to modifications of the environment and track them. Many real-world applications employ adaptive filters, as Hearing aids, Localization and tracking Active noise control (anti-noise), Noise suppression, Audio upmix of stereo signals,Adaptive beamforming, MPEG audio coding, Non-linear echo cancellation, Adaptation of neural networks, etc. The following figure, taken from [Ref][1], presents some possible applications:
We will first begin by describing the general filtering problem and derive the optimal solution, known as the Wiener filter. We will then explain how the solution can be obtained through iterative algorithms. Finally, we will describe how these algorithms can be turned into adaptive filters.
[1]: M. Harteneck and R.W. Stewart, Adaptive Digital Signal Processing JAVA Teaching Tool, IEEE TRANSACTIONS ON EDUCATION, MAY 2001, VOLUME 44, NUMBER 2, IEEDAB (ISSN 0018-9359) online here
In what follows, we will consider a general problem, which is sometimes called the Wiener problem
. Many actual problems, including filter identification, noise cancellation, linear prediction, etc can be formulated as special cases of this Wiener problem.
The classical formulation is as follows: Given a random signal u(n), we would like to find a transform T{u} such that the result is as close as possible to some desired response
d(n). We will restrict this general problem on two aspects.
linear
transforms of the sequence {u(n)}n=0..N−1; that is filterings of u(n). Furthermore, we will even restrict ourselves to causal, finite impulse response filters with p taps. We denote by w (with w for Wiener) the impulse response. For now, we assume that the system is stationary, which implies that the impulse response does not depend on time n. Hence, the output can be computed as the convolution productAny cost function could be used, such as |∙|, |∙|2, |∙|3 or even sinhe(n)... Among these possibilities, the square of the error yields interesting, closed-form solutions and simple computations.
We can choose to work only with the sequences at hand and look at an integrated error such as
Jls(w,n0,n1)=n1∑n=n0e(n)2Such a criterion is called the Least Square criterion.
We may also choose to work with the stochastic processes on average, and consider a mean square error
The corresponding criterion is the Minimum Mean Square Error criterion.
Definitions
Let us define by u(n) the p×1 column vector collecting p consecutive samples of the input u(n)T=[u(n),u(n−1),…u(n−p+1)], and by w the vector collecting the samples of the impulse response: wT=[w(0),w(1),…w(p−1)]. Clearly, the output (2) of filter w can be written as the dot product y(n)=wTu(n), and the error is simply e(n)=wTu(n)−d(n). Observe that Jmse(w,n) is a quadratic form in w. Therefore, the criterion admits a single global minimum. To see this, let us develop the MSE:
Jmse(w,n)=E[(wTu(n)−d(n))(u(n)Tw(n)−d(n))]=wTE[u(n)u(n)T]w−2wTE[u(n)d(n)]+E[d(n)2]=wTRuuw−2wTRdu+σ2dwhere we denoted \begin{equation}
{Ruu=E[u(n)u(n)T]the correlation matrix of u(n)Rdu=E[d(n)u(n)]the correlation vector of d(n) and u(n)\end{equation}
We also used the fact that the dot product between two vectors is scalar and therefore equal to its transpose: e.g. wTu(n)=u(n)Tw.
From formula (12), it can be checked that the MSE can also be put into the form of a perfect square, as
Jmse(w,n)=(w−△w)TRuu(w−△w)−△w TRuu△w+σ2dif
â–³w:Ruuâ–³w=RduSince the quadratic form in (13) is always nonnegative, we see that the MSE is minimum if and only if
w=△w=R−1uuRdu,assuming that Ruu is invertible. The minimum error is then given by
Jmse(△w,n)=σ2d−△w TRduAlternatively, the minimum can also be found by equating the derivative of the criterion to zero. Indeed, this derivative is ddwJmse(w,n)=dE[e(n)2]dw=2E[de(n)dwe(n)]. Since e(n)=wTu(n)−d(n), its derivative with respect to w is u(n), and it remains
ddwJmse(w,n)=2E[u(n)e(n)]=2E[u(n)(u(n)Tw−d(n))]=2(Ruuw−Rdu).Hence, the derivative is zero if and only if Ruuw=Rdu which is the solution (15).
Interestingly, we see that the optimum estimator depends only on the second order properties of the desired response and the input sequence. This is a consequence of our choice of restricting ourselves to a quadratic criterion and a linear transform.
The derivation of the least-squares estimator closely follows the steps we used for the MMSE estimator. This follows easily once the problem is formulated in matrix form. Define the error vector as e(n0,n1)T=[e(n0),e(n0+1),…e(n1)]. Each component of the error, say e(k) is equal to e(k)=u(k)Tw−d(k). Therefore, we have
[u(n0)u(n0−1)…u(n0−p+1)u(n0+1)u(n0)…u(n0−p+2)⋮⋱⋮u(n1)u(n1−1)…u(n1−p+1)]w−[d(n0)d(n0+1)⋮d(n1)]=[e(n0)e(n0+1)⋮e(n1)].This can also be written in compact form as U(n0,n1)w−d(n0,n1)=e(n0,n1). Then, the LS criterion (4) can be reformulated as Jls(w,n0,n1)=n1∑n=n0e(n)2=e(n0,n1)Te(n0,n1) that is Jls(w,n0,n1)=(U(n0,n1)w−d(n0,n1))T(U(n0,n1)w−d(n0,n1)). Now, it is a simple task to compute the derivative of this LS criterion with respect to w. One readily obtain ddwJls(w,n0,n1)=2U(n0,n1)T(U(n0,n1)w−d(n0,n1)), which is equal to zero if and only if
U(n0,n1)TU(n0,n1)w=U(n0,n1)Td(n0,n1).The different matrices and vectors above depend on two indexes n0 and n1. It is now time to discuss the meaning of these indexes and the possible choices for their values. Suppose that the data are available on N samples, from n=0 to n=N−1. When we want to compute the error e(k), with k<p, we see that the result depend on unobserved values. The same kind of problem occurs if we want to compute the error for k>N−1. Therefore we face the problem of affecting a value to unobserved values. A possibility is to take a value of zero for unobserved vales. Another possibility consists in affecting the values by periodization, modulo N, of the available data. A last possibility is to avoid the situations which request the use of unknown values.
The two main choices are the following:
covariance form
. correlation form
. The data matrix has now dimensions N+p−1×p. It is now easy to see that the generic term of [U(n0,n1)TU(n0,n1)]ij has the form ∑nu(n−i)u(n−j), that is, is (up to a factor) an estimate of the correlation Ruu(i−j). Consequently, we have an estimate of the correlation matrix Ruu given by ˆRuu=[U(n0,n1)TU(n0,n1)]. In the case of the choice of the correlation form for the data matrix, the resulting estimate of the correlation matrix has Toeplitz symmetry. It is interesting to note that by construction, the estimated correlation matrix is automatically non-negative definite. Similarly, Rdu can be estimated as ˆRdu=U(n0,n1)Td(n0,n1). Finally, the LS estimate is
△w ls=[U(n0,n1)TU(n0,n1)]−1U(n0,n1)Td(n0,n1)=ˆR−1uuˆRdu.This figure is taken from cnx.org
We begin by simulating the problem. You may use the function
lfilter
to compute the output of the system. Take for x a gaussian noise,np.random.normal
ornp.random.randn
, with unit variance on N points, and add a gaussian noise with scale factor 0.1 on the output.
# DO IT YOURSELF!
#
from scipy.signal import lfilter
N=0 # update this
x=0 # update this
htest=10*np.array([1, 0.7, 0.7, 0.7, 0.3, 0 ])
y0=0 # FILL IN SOMETHING CORRECT HERE
y=0 # FILL IN SOMETHING CORRECT HERE
#y0 = #noiseless output
#y= #noisy output
plt.plot(y)
plt.xlabel("Time")
plt.title("Observation")
figcaption("System output in an identification problem")
from scipy.signal import lfilter
# test
N=200
x=np.random.randn(N)
htest=10*np.array([1, 0.7, 0.7, 0.7, 0.3, 0 ])
#L=size(htest)
#yo=zeros(N)
#for t in range(L,200):
# yo[t]=htest.dot(x[t:t-L:-1])
#y=yo+ 0.1*randn(N)
y=lfilter(htest,[1],x)+0.1*randn(N)
plt.plot(y)
plt.xlabel("Time")
plt.title("Observation")
figcaption("System output in an identification problem")
Once this is done, we shall solve the normal equation (14). Of course, we firts need to estimate the correlation matrix Ruu and the correlation vector Rdu. This can be done with the functions xcorr
and toeplitz
. Beware on the fact that xcorr
returns two vectors and that the returned correlation vector is the symmetric sequence with positive and negative indexes.
Now, in order to implement the identification procedure, one has to put the problem as a Wiener problem and identify the input sequence u and the desired one d. Actually, here one should simply observe that we look for a filter, which excited by the same x(n) should yield an output z(n) as similar as y0(n) as possible. So, what would you take for u and d?
One thus take u=x
, and d=y
(the wanted sequence is y0(n), which shall be substituted by y(n) -- since y0 is unknown).
We now have to implement the estimation of correlations and then compute the solution to the normal equation. We note q+1 the size of the filter (then of the correlation vector and matrix). The inverse of a matrix can be obtained using the function
inv
in the modulenp.linalg
. The matrix mutiplication can be done using the.dot()
method. Finally, you may evaluate the performance by displaying the identified coefficients and by computing the MMSE according to (16).
# DO IT YOURSELF!
from correlation import xcorr
from scipy.linalg import toeplitz
from numpy.linalg import inv
q=5
z=np.zeros(q+1)
u=z #update this
d=z #update this
c=z #update this #correlation vector
Ruu=np.outer(z,z) #update this
Rdu=z #update this
w=z #update this
print("Estimated filter", w)
print("True filter", htest)
# Minimum error
sigma2d=mean(d**2)
mmse=sigma2d-w.dot(Rdu)
print("MMSE: ",mmse)
from correlation import xcorr
from scipy.linalg import toeplitz
from numpy.linalg import inv
q=5
u=x
d=y
c=xcorr(u,u,maxlags=q)[0][q::] #correlation vector
Ruu=toeplitz(c)
Rdu=xcorr(d,u,maxlags=q)[0][q::]
w=inv(Ruu).dot(Rdu)
print("Estimated filter", w)
print("True filter", htest)
# Minimum error
sigma2d=mean(d**2)
mmse=sigma2d-w.dot(Rdu)
print("MMSE: ",mmse)
Finally, it is interesting to transform the lines above in order to plot the MMSE error as a function of q.
from correlation import xcorr
from scipy.linalg import toeplitz
from numpy.linalg import inv
u=x
d=y
qmax=18 # max value for q
mmse=np.zeros(qmax) # initialize the vector of errors
for q in range(0,qmax):
c=xcorr(u,u,maxlags=q)[0][q::] #correlation vector
Ruu=toeplitz(c)
Rdu=xcorr(d,u,maxlags=q)[0][q::]
w=inv(Ruu).dot(Rdu)
# Minimum error
sigma2d=mean(d**2)
mmse[q]=sigma2d-w.dot(Rdu)
print("MMSE: ", mmse)
plt.plot(range(0,qmax),mmse)
plt.xlabel("Order of the filter")
plt.ylabel("MMSE")
plt.title("MMSE as a function of the length of the identification filter")
figcaption("MMSE as a function of the length of the identification filter")
The evolution of the MMSE with respect to q shows that the MMSE is important while the length of the identification filter is underestimated. The MMSE falls to a "floor" when the length is equal to or higher than the true value. This offers an easy way to detect an "optimal" order for the identification.