# Nonlinear Prediction and Noise Reduction

## 11/7/00 Lecture #10 in Physics 505

### Comments on Homework #8 (Chirikov Map)

• Everyone had something that resembles Chirikov map
• The map is not normally plotted in polar coordinates despite r and q
• Here's the first few iterations for a square of initial conditions:

• • Hard to verify area conservation numerically,

• but the stretching is very evident
• Easy to verify area conservation analytically:
• rn+1 = [rn - (K/2p) sin(2pqn)] mod 1 = f(r, q)
• qn+1 = [qn + rn+1] mod 1 = [qn + rn - (K/2p) sin(2pqn)] mod 1 = g(r, q)
• An+1/An = |det J| = |frgq - fqgr| = 1

### Review (last week) - Time-Series Properties

• Time-series analysis more art than science
• Allows prediction, noise reduction, physical insight
• Easy to get wrong answers
• Hierarchy of Dynamical Behavior
• Regular
• Quasiperiodic
• Chaotic
• Pseudo-random
• Random
• Examples of Experimental Time Series
• Numerical experiments
• Real moving systems
• Abstract dynamical systems
• Non-temporal sequences
• Practical Considerations
• What variable(s) do you measure?
• How much data do you need?
• How accurately must it be measured?
• What is the effect of filtering?
• What if the system is not stationary?
• Case Study

### Autocorrelation Function

• Calculating power spectrum is difficult

• (Use canned FFT or MEM  - see Numerical Recipes)
• Autocorrelation function is easier and equivalent
• Autocorrelation function is Fourier transform of power spectrum
• Let g(t) = <x(t)x(t+t)>  (< ... > denotes time average)
• Note:  g(0) = <x(t)2>  is the mean-square value of x
• Normalizeg(t) = <x(t)x(t-t)> / <x(t)2>
• For discrete datag(n) = S XiXi+n / S Xi2
• Two problems:
• i + n cannot exceed N (number of data points)
• Spurious correlation if Xav = <X> is not zero
• Use:  g(n) = S (Xi - Xav)(Xi+n - Xav) / S (Xi - Xav)2
• Do the sums above from i = 1 to N - n
• Examples (data records of 2000 points):
• Gaussian white noise:

• • Logit transform of logistic equation:

• • Hénon map:

• • Sine wave:

• • Lorenz attractor (x variable step size 0.05):

• • A broad power spectrum gives a narrow correlation function and vice versa

• [cf: the Uncertainty Principle (tDf ~ 1)]
• Colored (correlated) noise is indistinguishable from chaos
• Correlation time is width of g(t) function (call it tau)
• It's hard to define a unique value of this width
• This curve is really symmetric about tau = 0 (hence width is 2 tau)
• 0.5/tau is sometimes called a "poor-man's Lyapunov exponent"
• Noise:  LE = infinity  ==>  tau = 0
• Logistic map:  LE = loge(2)  ==>  tau = 0.72
• Hénon map:  LE = 0.419  ==>  tau = 1.19
• Sine wave:  LE = 0  ==>  tau = infinity
• Lorenz attractor:  LE = 0.906/sec = 0.060/step  ==>  tau = 8.38
• This really only works for tau > 1
• Testing this would make a good student project
• The correlation time is a measure of how much "memory" the system has
• g(1) is the linear correlation coefficient (or "Pearson's r")

• It measures correlation with the preceding point
• If g(1) = 0 then there is no correlation

• White (f 0) noise • If g(1) ~ 1 then there is strong correlation

• Pink (1/f) noise: • If g(1) < 0 then there is anti-correlation

• Blue (f 2) noise (This was produced by taking the time derivative of white noise)
• From the correlation function g(n), the power spectrumP(f) can be found:

• P(f) = 2 S g(n) cos(2pfnDt) Dt  (ref: Tsonis)

### Time-Delayed Embeddings

• How do you know what variable to measure in an experiment?
• How many variables do you have to measure?
• The wonderful answer is that (usually) it doesn't matter!
• Example (Lorenz attractor - HW #10):
• Plot of  y versus x:

• • Plot of dx/dt versus x:

• • Plot of x(t) versus x(t-0.1):

• • These look like 3 views of the same object
• They are "diffeomorphisms"
• They have same topological properties (dimension, etc.)
• Whitney's embedding theorem says this result is general
• Taken's has shown that DE  = 2m + 1
• m is the smallest dimension that contains the attractor (3 for Lorenz)
• DE is the minimum time-delay embedding dimension  (7 for Lorenz)
• This guarantees a smooth embedding (no intersections)
• This is the price we pay for choosing an arbitrary variable
• Removal of all intersections may be unnecessary
• Recent work has shown that 2m may be sufficient (6 for Lorenz)
• In practice m often seems to suffice
• Example (Hénon viewed in various ways):

• There is obvious folding, but topology is preserved
• How do we choose an appropriate DE (embedding dimension)?
• Increase DE until topology of attractor (dimension) stops changing
• This may require more data than you have to do properly
• Saturation of attractor dimension is usually not excellent
• Example:  3-torus  (attractor dimension versus DE , 1000 points)
• Can also use the method of false nearest neighbors:
• Find the nearest neighbor to each point in embedding DE
• Increase DE by 1 and see how many former nearest neighbors are no longer nearest
• When the fraction of these false neighbors falls to nearly zero, we have found the correct embedding
• Example: 3-torus
• How do we choose an appropriate Dt for sampling a flow?
• In principle, it should not matter
• In practice there is an optimum
• Rule of thumb:  Dt ~ tau / DE
• Vary Dt until tau is about DE  (3 to 7 for Lorenz)
• A better method is to use minimum mutual information (see Abarbanel)

### Summary of Important Dimensions

• Configuration space (number of independent dynamical variables)
• Solution manifold (the space in which the solution "lives" - an integer)
• Attractor dimension (fractional if it's a strange attractor)
• Kaplan-Yorke (Lyapunov) dimension
• Hausdorff dimension
• Capacity dimension (see below)
• Information dimension
• Correlation dimension (next week)
• ... (infinitely many more)
• Observable (1-D for a univariate time series: Xi)
• Reconstructed (time-delayed) state space (can be chosen arbitrarily)
• Time-delayed embedding (the minimum time-delayed state space that preserves the topology of the solution)

### Nonlinear Prediction

• There are many forecasting (prediction) methods:
• Extrapolation (fitting data to a function)
• Moving average (MA) methods
• Linear autoregression (ARMA)
• State-space averaging (see below)
• Principal component analysis (PCA)

• (also called singular value decomposition - SVD)
• Machine learning / AI (neural nets, genetic algorithms, etc.)
• Conventional linear prediction methods apply in the time domain
• Fit the data to a mathematical function (polynomial, sine, etc.)
• The function is usually not linear, but assume that the equations governing the dynamics are (hence no chaos)
• Evaluate the function at some future time
• This works well for smooth and quasi-periodic data
• It (usually) fails badly for chaotic data
• Nonlinear methods usually apply in state space
• Lorenz proposed this method for predicting the weather
• Example (predicting next term in Hénon map - HW # 11):
• We know Xn+1 = 1 - CXn2 + BXn-1
• In a 2-D embedding, the next value is unique

• • Find M nearest points in Xn-Xn-1 space
• Calculate their average displacementDX = <Xn+1 - Xn>
• Use DX to predict next value in time series
• Repeat as necessary to get future time steps
• Sensitive dependence will eventually spoil the method
• Growth of prediction error crudely gives the Lyapunov exponent
• Example (Hénon map average error):

• • If LE = 0.604 bits/iterations, error should double every 1.7 iterations
• Saturation occurs after error grows sufficiently
• The method also can remove some noise
• Predict all points not just next point
• Need to choose DE and M optimally
• Alternate related method is to construct f(Xn, Xn-1, ...)
• This improves noise reduction but is less accurate
• Best method is to make a local function approximation
• Usually linear or quadratic functions are used
• This offers best of both worlds but is hard to implement and slow

### Lyapunov Exponent of Experimental Data

• We previously calculated largest LE from known equations
• Getting the LE from experimental data is much more difficult (canned routines are recommended - See Wolf)
• Finding a value for LE may not be very useful
• Noise and chaos both have positive LEs (LE = infinity for white noise)
• Quasiperiodic dynamics have zero LE, but there are better ways to detect it (look for discrete power spectrum)
• The value obtained is usually not very accurate
• Conceptually, it's easy to see what to do:
• Find two nearby points in an embedding DE
• Follow them a while and calculate <log(rate of separation)>
• Repeat with other nearby points until you get a good average
• There are many practical difficulties:
• How close do the points have to be?
• What if they are spuriously close because of noise?
• What if they are not oriented in the right direction?
• How far can they safely separate?
• What do you do when they get too far apart?
• How many pairs must be followed?
• How do you choose the proper embedding?
• It's especially hard to get exponents other than the largest
• The sum of the positive exponents is called the entropy

### Capacity Dimension

• The most direct indication of chaos is a strange attractor
• Strange attractors will generally have a low, non-integer dimension
• There are many ways to define and calculate the dimension
• We already encountered the Kaplan-Yorke dimension, but it requires knowledge of all the Lyapunov exponents
• A more direct method is to calculate the capacity dimension (D0)
• Capacity dimension is closely related to the Hausdorff dimension
• It is also sometimes called the "cover dimension"
• Consider data representing a line and a surface embedded in 2-D
• The number of squares N of size d required to cover the line (1-D) is proportional to 1/d
• The number of squares N of size d required to cover the surface (2-D) is proportional to 1/d2
• The number of squares N of size d required to cover a fractal (dimension D0) is proportional to 1/dD0
• Hence the fractal dimension is given by  D0 = d log(N) / d log(1/d)
• This is equivalent to  D0 = -d log(N) / d log(d)
• Plot log(N) versus log(d) and take the slope to get D0
• Example (2000 data points from Hénon map with DE = 2)
• This derivative should be taken in the limit d --> 0
• The idea can be generalized to DE > 2 using (hyper)cubes
• Many data points are required to get a good result
• The number required increases exponentially with D0

J. C. Sprott | Physics 505 Home Page | Previous Lecture | Next Lecture