Experiment Design
Ron Graham
Design of Experiments, or DOE, is the procedure that invokes statistical knowledge to help you to

  • Maximize the information gleaned from a given number of experiments.
  • Minimize the number of experiments required to validate (or invalidate) an hypothesis.

What you need to take advantage of DOE principles is basically a good problem definition. A good one will include

  • an agreed-upon objective
  • one or more hypotheses to examine
  • a selection of measurements/response variables/"Ys"
  • a selection of factors/predictor variables/"Xs"
  • a desired degree of certainty for everything that's uncertain when you start
  • records of relevant material properties and machine specs
  • predictions of results (to the extent possible)
  • "buy-in" to the design from all participants in the experiment

Each hypothesis you will examine fall into one of two categories:

  • significant differences in Ys due to selection of some X
  • significant relationships between some Y and some Xs

Determining significant relationships generally involves performing regression analysis (you might call this "least-squares") on the data (with the number of coefficients in the relationship depending on the number of levels an X may take on); determining significant differences involves performing an analysis of variance of some sort on the data.

The concerns in DOE are as follows:

  1. Population size. This goes to the power of the test. Too small a population can cause you to imagine a significant difference between sets of data where none exists, or to imagine no difference exists where there really is one.

  2. Components of variation. What should be allowed to vary in your experiment (i.e. occurs naturally)? And what shouldn't (i.e. you have to fix it or account for it)? If you have control over some component of variation, you may be called upon to use that control; if you don't, you may be called upon to block out some component of variation.

  3. Randomization. To remove time-varying effects, you select random times to run tests.

  4. Blocking. To remove known or suspected biases, you arrange the tests in such a way that you can be sure you're running them both with and without the cause of the bias present.
The actual design in DOE involves using the statistical and intuitive tools at your disposal to define the four concerns above. For instance, a desired error rate may be used to determine a number of blocks; a desired magnitude of discernable difference between data sets may be used to determine population size.

If you have two sets of data, here's how to compare them to one another:

  • Check sets for normality (if they aren't all normally distributed, then you can't do the rest of this stuff until you transform the data to make them so; once they are, find the mean and standard deviation of each set).

  • Perform an F-test on the standard deviations to see if they are significantly different from one another.

    • If they are, check for heteroscedasticity: are the standard deviations a function of the means? are there any outliers? You may need to recheck the way the experiment was performed if heteroscedasticity arises -- if it's not done right, or if data is recorded incorrectly, well, there's your outlier. :-)
    • If they aren't, you get to "pool" the data.

  • Perform an appropriate t-test on the means (meaning that the right test depends on whether the standard deviations are significantly different) to determine significant differences.

When you have more than two data sets, the procedure is roughly the same, except that the comparison procedure for standard deviations (Foster-Burr test) and means (Student Newman Keuls if not pooled, 1-way ANOVA if pooled) must take into account the multiplicity of data sets.

An "optimal" sample size for the data sets depends on

  • how big an effect (i.e. difference in means) you want to detect
  • how much you want to risk missing it (type I error rate)
  • how much you want to risk overestimating it (type II error rate)
  • how much variation you expect in your final results (pooled standard deviation).

So how many experiments to you run? If you know an "optimal" sample size, and can run the experiment over and over for each combination of settings for each X, then you have what is called a factorial experiment. If instead this turns out to involve more data taking than you can afford, then you must find a way to "fractionalize" the factorial experiment. When you do this, you lose some accuracy in your results. A way you can do this is to recognize that a factorial experiment involves orthogonal arrays. For instance: if each X has two settings, you can pretend those settings are +1 and -1. Then, if you take the "dot product" of the settings for any two Xs, the "dot product" is zero. Simple example:

X1 -- +1 +1 -1 -1
X2 -- +1 -1 -1 +1

Two parameters, two settings each, four runs of the experiment, dot product of settings is zero. A fractional factorial experiment would take only part of this information, changing the settings of more than one X at a time. For instance,

X1 -- +1 -1
X2 -- -1 +1

Taguchi is one expert who recommends highly fractional factorial experiments. His orthogonal arrays may involve a small percentage of the total repetitions involved in a full factorial experiment, and thus would be run quickly and for low cost. The tradeoff involved in Taguchi design is as follows:

  1. You lose the effects of interactions between Xs, which must be assumed not to exist.
  2. Taguchi assumes processes that are already under statistical control, and used in a context of continuous improvement. His design therefore assumes that, even though you can't be quite sure of the results, they're only provisional anyway.

Tuve says most engineering experiments are verifications; that findings should be planned and predicted. This is often only possible in a general sense: you know what a system is supposed to do; you don't know how will it'll do it, or how long, or with what side-effects.

References

Juran, J. M. Quality Control Handbook. ISBN 0-07033-176-6
Keller, D. Introduction to DOE. Training course manual. Cleveland: Real World Quality Systems, 1993.
Tuve, G. Engineering Experimentation. NYC: McGraw-Hill, 1961. ISBN 0-07065-595-2


[Table of Contents] [Previous] [Next]