[section] [section] [section]

[chapter] Theorem]Corollary Theorem]Lemma Theorem]Proposition

[chapter] [chapter]

[section]


Chapter 1
The change-point problem

1

Abstract

We learn explicit procedure for detection of a change-point in Binomial and Gaussian cases.

Source: [] J. Bucklew, Large Deviation Techniques in Decision, Simulation and Estimation. Wiley, New York 1990.

1  Introduction

The key question in quality control problem is: has a change occurred? For example, one observes some parameter on a manufactured device and determines whether the device should be accepted or rejected. When the assembly line is performing well, there will still be a certain percentage of devices that will be rejected. When the assembly line goes awry because of some sort of failure, the first sign will often be a higher percentage of defective devices. One wants to detect this change-point as quickly as possible in order to shut down the production and fix the assembly line. But of course one does not want to shut it down unless it is absolutely necessary to do so.

Change-point problems arise also in segmentation of speech (Milosavljevic 1988), edge detection in images (Basseville 1981), EKG analysis (Gustafson 1978), link failure in communication networks (Kazakos 1979).

2  Statement of problem

Suppose that we observe x1,x2,..., one item at a time. We assume that xi must follow one of two densities g (the good one), or f (the failure). At the moment of time n we accumulated a sample of size n. The null hypothesis here is that the change did not occur. The alternative is that the change already occurred.

The simplest formalization of this is to choose the following two hypotheses to test

H0: (xi) are distributed with density function g0(x).

Ha: (xi)i £ k are distributed with density function g0(x) and (xi)i > k are distributed with density g1,

Notice that this corresponds to batch-control regime: we check the assembly line every n units of time and make a decision whether the change-point occurred. Another regime that could have been employed is dynamical: at every instant n check if the change-point occurred.

Question: Are these two regimes equivalent? Are the optimal procedures essentially identical?

3  Large Deviation Rate

In order to reject the null hypothesis, we plot certain empirical curves and compare them to ``control limits" - we decide that the change-point occurred when the empirical curve crosses over a control line. To construct the control line we use the large deviation analysis from Theorem .

Suppose first that the change-point can occur only in a specific position nt, where 0 < t < 1. Then we have to test between two simple hypotheses: is the density g0, or g1 for j ³ nt? By the Neyman-Pearson Lemma, the optimal test in this situation is the likelihood ratio test: we compute the log-likelihood ln(t) = 1/nåj = ntnlog[(g1(xj))/( g0(xj))], and reject the null hypothesis when ln(t) exceeds a critical value b(t) = ba(t) corresponding to significanve level a.

Example 1

In this case, the likelihood simplifies to

ln(t) = 1/nåj = ntnlog[(g1(xj))/( g0(xj))] = 1/nåj = ntn(-(xj-m1)2 + (xj-m0)2) = (1-t)(m02-m12)+2(m1-m0)1/nåj = ntn(xj) Thus for m1 > m0 the test reduces to checking when 1/nåj = ntnxj exceeds a critical value b(t). Of course, the critical value b(t) for the averages is not identical to the critical value for the log-likelihoods, but they are closely related.

As a second example, consider the binomial experiment.

Example 2

Here [(g1(xj))/( g0(xj))] = ([(p1)/( p0)])xj([(1-p1)/( 1-p0)])1-xj, so the log likelihood is ln(t) = log[(1-p1)/( 1-p0)] +log[(p1(1-p0))/( p0(1-p1))]1/nåj = ntnxj. For p1 > p0, the test again reduces to rejecting the null hypothesis when [^p]n(t): = 1/nåj = ntnxj > b(t).

Suppose now that the change-point can occur at any location tn. Using cross-over curve b, we reject H0 when there is t Î (0,1) such that ln(t) > b(t). The probability of Type I Error is the cross-over probability for random curve [`X]n(t) = ln(1-t). That is, the level of significance of a test based on curve b is a\asymp e-ninftti([b(t)/ t]), where i(x) = supu{ux-L(u)} and

L(u) = logòeu[(g1(xj)/( g0(x))]dx = òg1u(x)g01-u(x) dx

One natural choice of b(t) is to treat all crossovers equally:

ti([b(t)/ t]) = -1/nloga, see (). Question: Another natural choice is to select b to maximize power of the test. Is this doable? Is this the same?

3.1  Example: Normal data

Rather than using ``generic theory", it is more convenient to recast the question directly in term of partial averages, computed from the last data first. Thus we are asking when Yn(t) = 1/nåj = ntnXj > b(t). If we know m0 and s, without loss of generality we may assume that m0 = 0,s = 1; otherwise, we can standardize the data before performing the test. After reversing the order of t, we choose b(t) = Ö{[(2(1-t)log1/p)/ n]} from (). Figure illustrates how the test works.

Figure 1: Detecting change-points by graphing t® 1/nåj = ntnXj. In this simulation of four runs of an experiment with n = 20, some change-points were not detected: actual change-points were D at 5, E at 15, H - none, K at 15.

3.2  Example: Binomial data

Rather than using generic theory, it is more convenient to work with ``sample proportions" [`X]n(t) = 1/nåj = ntnXj and compare them with the cutoff curve b(t). Since i(x) = xlog[x/( p0)]+(1-x)log[(1-x)/( 1-p0)], equation () is a bit more complicated here. Instead of solving the nonlinear equation for b(t) and compare b(1-t) to [`X]n(t) = 1/nåj = ntnXj > b(t), we can use the fact that i(x) in non-decreasing for x > p. This means that with a little bit of care we can base our test on the graph of t® (1-t)i([([`X]n(t))/( 1-t)]) and compare it to the constant cutoff level 1/nlog[2/( a)]. Notice that we take a two-tail significance level with a/2 per tail.

4  Exercises

Exercise 1 Consider the following three sets of normal observations {xk}:

(These data are also on the web at http://math.uc.edu/ brycw/classes/576)

Two of these change from N(0,1) to N(1/2,1) at some point. Can you find which two? (Since these sets have small sample sizes, and we are dealing with rough asymptotics, use the level of significance a = .1.)

References

[]
Korostelev, Alexander Minimax large deviations risk in change-point problems. Math. Methods Statist. 6 (1997), no. 3, 365-374.

The author considers the minimax large deviation risk for the change-point estimation in both continuous and discrete time models. In the continuous model, it is assumed that the observation process follows the Ito equation [X\dot]\sp(n)(t) = AI\sb (t > q)+(s/Ön)[W\dot]\sp (n)(t) for -¥ < t < ¥, where A and s are given positive constants, [W\dot]\sp (n) is a standard Gaussian white noise, I\sb (·) denotes the indicator function, and q Î (-T,T) for a given positive constant T.

For a fixed constant c > 0, define the minimax large deviation probability risk for an estimator [^(q)]\sb n for the change point q as

b\sb n(c) = inf
\sb ^
q
 
\sb n sup
\sb |q| < T\tfrac1nlnP\sp (n)\sbq(| ^
q
 
\sb n-q| ³ c),
where P\sp (n)\sb q denote the probability of X\sp (n) corresponding to the true value of q. It is shown that, for any c < T, lim\sb n®¥b\sb n(c) = A\sp2c/(4s\sp 2).

In the discrete-time model, we assume that X\sb i = X\sp (n)\sb i are observed at t\sb i = t\sp (n)\sb i = (i/n)T for |i| = 0,1,2,¼,n. Assume that X\sb i has a probability density p(x) for t\sb i < q and a density q(x) for t\sbi ³ q with |q| < T. For a Î (0,1), define S(a) = lnòp\sp a(x)q\sp 1-a(x)dx, and let S* = min\sb aS(a) < 0. Then it is shown that lim\sb n®¥b\sb n(c) = 2cS\sp *.

[]
G. Lorden Procedures for Reacting to a Change in Distribution Annals of Mathematical Statistics, Vol. 42, No. 6. (Dec., 1971), pp. 1897-1908.

Detecting a Change of a Normal Mean by Dynamic Sampling with a Probability Bound on a False Alarm (in Change Points) David Assaf, Moshe Pollak, Ya'acov Ritov, Benjamin Yakir Annals of Statistics, Vol. 21, No. 3. (Sep., 1993), pp. 1155-1165.

V. Couallier, C. R. Acad. Sci. Paris Sér. I Math. 329 (1999), no. 7, 633-636; CNO CMP 1 717 123

M. Baron, Canad. J. Statist. 27 (1999), no. 1, 183-197; CNO CMP 1 703 629

R. W. West and R. T. Ogden, J. Statist. Comput. Simulation 56 (1997), no. 4, 293-302; CNO CMP 1 700 069

A. A. Borovkov, Teor. Veroyatnost. i Primenen. 43 (1998), no. 4, 625-654; CNO CMP 1 692 429

M. Hu sková, in Applied statistical science, III, 83-96, Nova Sci. Publ., Commack, NY, 1998; CNO CMP 1 673 712

D. Jaru sková, Comment. Math. Univ. Carolin. 39 (1998), no. 3, 551-561; CNO CMP 1 666 790

Z. G. Stoumbos, Stochastic Anal. Appl. 17 (1999), no. 4, 637-649; MR 2000c:93094

Ch. Suquet and M.-C. Viano, Math. Methods Statist. 7 (1998), no. 2, 157-191; MR 2000c:62075

P. Kokoszka and R. Leipus, Statist. Probab. Lett. 40 (1998), no. 4, 385-393; MR 2000b:62164

S. È. Vorobe chikov, Avtomat. i Telemekh. 1998, no. 3, 50-56; translation in Automat. Remote Control 59 (1998), no. 3, part 1, 344-348 ; MR 2000a:62195

B. S. Darkhovski, Avtomat. i Telemekh. 1998, no. 8, 185-189; translation in Automat. Remote Control 59 (1998), no. 8, part 2, 1201-1204 (1999) ; MR 2000a:62162

J. H. Kim and J. D. Hart, J. Time Ser. Anal. 19 (1998), no. 4, 399-424; MR 99k:62168

A. Puhalskii and V. Spokoiny, Bernoulli 4 (1998), no. 2, 203-272; MR 99k:62040

M. H. Neumann, Scand. J. Statist. 24 (1997), no. 4, 503-521; MR 99c:62109

L. Horváth and P. Kokoszka, J. Statist. Plann. Inference 64 (1997), no. 1, 57-81; MR 99c:62061

B. Yakir, Ann. Statist. 25 (1997), no. 5, 2117-2126; MR 99a:62121

A. L. Rukhin and I. Vajda, Statistics 30 (1997), no. 3, 181-200; MR 99a:62084

A. Korostelev, Math. Methods Statist. 6 (1997), no. 3, 365-374; MR 98m:62065

L. Giraitis, R. Leipus and D. Surgailis, J. Statist. Plann. Inference 53 (1996), no. 3, 297-310; MR 98g:62062

M. Serbinowska, Statist. Probab. Lett. 29 (1996), no. 4, 337-344; MR 98d:62042

R. E. Ma boroda, Theory Probab. Math. Statist. No. 51, (1995), 129-136 (1996); MR 97m:62038

M. Razzaghi and Z. Govindarajulu, J. Appl. Statist. Sci. 4 (1996), no. 2-3, 159-173; MR 97i:62016

H. Rubin and K. S. Song, Ann. Statist. 23 (1995), no. 3, 732-739; MR 97g:62156

V. P. Dragalin, Trudy Mat. Inst. Steklov. 202 (1993), Statist. i Upravlen. Sluchain. Protsessami, 132-148; translation in Proc. Steklov Inst. Math. 202 (1994), no. 4, 107-119 ; MR 97g:62151

C.-B. Lee, Statist. Probab. Lett. 25 (1995), no. 3, 241-248; MR 97b:62058

D. Ferger, Ann. Statist. 23 (1995), no. 5, 1848-1861; MR 97a:62104

A. L. Rukhin, J. Appl. Statist. Sci. 2 (1995), no. 1, 1-12; MR 96g:62046

B. Q. Miao and L. C. Zhao, Chinese J. Appl. Probab. Statist. 9 (1993), no. 2, 138-145; MR 95f:62117

B. E. Brodsky and B. S. Darkhovsky, Nonparametric methods in change-point problems, Kluwer Acad. Publ., Dordrecht, 1993; MR 95d:62068

E. Gombay and L. Horváth, Stochastic Process. Appl. 50 (1994), no. 1, 161-171; MR 95c:62004

A. P. Korostelëv and A. B. Tsybakov, Minimax theory of image reconstruction, Lecture Notes in Statist., 82, Springer, New York, 1993; MR 95a:62028

O. de Cambry, Statist. Probab. Lett. 16 (1993), no. 5, 379-390; MR 94g:62160

W. T. Huang and Y. P. Chang, J. Statist. Plann. Inference 35 (1993), no. 3, 335-347; MR 94g:62080

O. de Cambry and V. Genon-Catalot, Statist. Decisions 10 (1992), no. 4, 389-403; MR 94c:62119

D. Ferger and W. Stute, Stochastic Process. Appl. 42 (1992), no. 2, 345-351; MR 93m:62071

A. Dembo and S. Karlin, Ann. Probab. 19 (1991), no. 4, 1737-1755; MR 92h:60044

L. Horváth, J. Multivariate Anal. 31 (1989), no. 1, 148-159; MR 91a:62057

Q. W. Yao, J. Math. Res. Exposition 9 (1989), no. 2, 181-192; MR 91a:60079

J. Praagman, Ann. Statist. 16 (1988), no. 1, 198-217; MR 89e:62056

E. Carlstein, Ann. Statist. 16 (1988), no. 1, 188-197; MR 89b:62079

B. James, K. L. James and D. Siegmund, Biometrika 74 (1987), no. 1, 71-83; MR 88h:62029


Footnotes:

1 Printed: Feb 16, 2000


File translated from TEX by TTH, version 1.59.