Design of Experiments, or DOE, is the procedure that invokes
statistical knowledge to help you to
- Maximize the information gleaned from a given number of
experiments.
- Minimize the number of experiments required to validate
(or invalidate) an hypothesis.
What you need to take advantage of DOE principles is basically
a good problem definition. A good one will include
- an agreed-upon objective
- one or more hypotheses to examine
- a selection of measurements/response variables/"Ys"
- a selection of factors/predictor variables/"Xs"
- a desired degree of certainty for everything that's
uncertain when you start
- records of relevant material properties and machine
specs
- predictions of results (to the extent possible)
- "buy-in" to the design from all participants in the
experiment
Each hypothesis you will examine fall into one of two categories:
- significant differences in Ys due to selection of some X
- significant relationships between some Y and some Xs
Determining significant relationships generally involves
performing regression analysis (you might call this
"least-squares") on the data (with the number of coefficients in
the relationship depending on the number of levels an X may take
on); determining significant differences involves performing an
analysis of variance of some sort on the data.
The concerns in DOE are as follows:
- Population size. This goes to the power
of the test. Too small a population can cause you to
imagine a significant difference between sets of data
where none exists, or to imagine no difference exists where
there really is one.
- Components of variation. What should be
allowed to vary in your experiment (i.e. occurs naturally)?
And what shouldn't (i.e. you have to fix it or account for
it)? If you have control over some component of variation,
you may be called upon to use that control; if you don't,
you may be called upon to block out some component of variation.
- Randomization. To remove time-varying
effects, you select random times to run tests.
- Blocking. To remove known or suspected
biases, you arrange the tests in such a way that you can be
sure you're running them both with and without the cause of
the bias present.
The actual design in DOE involves using the
statistical and intuitive tools at your disposal to define the
four concerns above. For instance, a desired error rate may be
used to determine a number of blocks; a desired magnitude of
discernable difference between data sets may be used to
determine population size.
If you have two sets of data, here's how to compare them to one
another:
- Check sets for normality (if they aren't all normally
distributed, then you can't do the rest of this stuff
until you transform the data to make them so; once they
are, find the mean and standard deviation of each set).
- Perform an F-test on the standard deviations to see if
they are significantly different from one another.
- If they are, check for heteroscedasticity: are the standard
deviations a function of the means? are there any outliers?
You may need to recheck the way the experiment was performed
if heteroscedasticity arises -- if it's not done right, or
if data is recorded incorrectly, well, there's your outlier.
:-)
- If they aren't, you get to "pool" the data.
- Perform an appropriate t-test on the
means (meaning that the right test depends on whether
the standard deviations are significantly different) to
determine significant differences.
When you have more than two data sets, the procedure is
roughly the same, except that the comparison
procedure for standard deviations (Foster-Burr test) and means
(Student Newman Keuls if not pooled, 1-way ANOVA if pooled) must
take into account the multiplicity of data sets.
An "optimal" sample size for the data sets depends on
- how big an effect (i.e. difference in means) you want to
detect
- how much you want to risk missing it (type I error rate)
- how much you want to risk overestimating it (type II error
rate)
- how much variation you expect in your final results (pooled
standard deviation).
So how many experiments to you run? If you know an "optimal"
sample size, and can run the experiment over and over for each
combination of settings for each X, then you have what is called
a factorial experiment. If instead this turns out to involve
more data taking than you can afford, then you must find a way to
"fractionalize" the factorial experiment. When you do this, you
lose some accuracy in your results. A way you can do this is to
recognize that a factorial experiment involves orthogonal arrays.
For instance: if each X has two settings, you can pretend those
settings are +1 and -1. Then, if you take the "dot product" of
the settings for any two Xs, the "dot product" is zero. Simple
example:
X1 -- +1 +1 -1 -1
X2 -- +1 -1 -1 +1
Two parameters, two settings each, four runs of the experiment,
dot product of settings is zero. A fractional factorial
experiment would take only part of this information, changing the
settings of more than one X at a time. For instance,
X1 -- +1 -1
X2 -- -1 +1
Taguchi is one expert who recommends highly fractional
factorial experiments. His orthogonal arrays may involve a small
percentage of the total repetitions involved in a full factorial
experiment, and thus would be run quickly and for low cost. The
tradeoff involved in Taguchi design is as follows:
- You lose the effects of interactions between Xs, which must
be assumed not to exist.
- Taguchi assumes processes that are already under statistical
control, and used in a context of continuous improvement.
His design therefore assumes that, even though you can't be
quite sure of the results, they're only provisional anyway.
Tuve says most engineering experiments are
verifications; that findings should be
planned and predicted. This is often only possible in
a general sense: you know what a system is supposed to
do; you don't know how will it'll do it, or how long,
or with what side-effects.
References
Juran, J. M.
Quality
Control Handbook. ISBN 0-07033-176-6
Keller, D. Introduction to DOE. Training course
manual. Cleveland: Real World Quality Systems, 1993.
Tuve, G.
Engineering
Experimentation. NYC: McGraw-Hill, 1961. ISBN 0-07065-595-2