Design of Experiments :Basic Concepts, Principles, CRD, RBD, LSD, Efficiency, Box Plot
Design
of Experiments
Introduction
Experiments are conducted by
investigators in all fields of study either to discover something about a
particular process or to compare the effects of several conditions on some
phenomenon. The general procedure in scientific research is to formulate
hypothesis and then to verify them directly or by their consequences. For this verification the collection of
observations is necessary. The design of experiments is essentially the pattern
of the observations to be collected. For valid inferences, one must go by
scientific approach which includes
a)
Planning of the experiment
b)
Obtaining relevant information from it regarding the statistical hypothesis
under study and
c) Making a statistical analysis of the data
Design of Experiments
It is defined as, the process of planning
the experiment to collect the appropriate data which may be analyzed by
statistical methods resulting in valid and objective conclusions. or
The logical construction of the experiment in
which degree of uncertainty with which inference is drawn may be well defined.
Terms and definitions
1. Experiment
An
experiment is means (device) of getting an answer to the question that the
experimenter has in mind. OR
An
experiment is an act or operation under taken in order to discover some unknown
principals or to test suggested or known truth. For example, for obtaining an
average I.Q. we measure the I.Q’s of given person, this process is the
experiment for our problem.
The
experiments can be classified into two categories,
(i)
Absolute experiments and (ii)
Comparative experiments.
Absolute
Experiments:
Absolute
experiments consist in determining a particular characteristic for a specified
population. For example; obtaining average
IQ of students in certain college, calculating
correlation coefficient between two characteristics.
Comparative
Experiments:
The comparative
experiment is an experiment in which two or more objects (treatments) are
compared to their effects. For example, in field experiments comparison of
different fertilizers, in medical experiments comparison of different medicines
are the comparative experiments.
Treatment:
The various
objects of comparison in a comparative experiment are termed as treatments. For
example, in medical experimentation different diets, medicines under comparison
are the treatments. In field experimentation different fertilizers, varieties of
crops, different methods of cultivation are the treatments.
The smallest division
of the experimental material to which the treatment is applied and on which the
observation is recorded is termed as experimental unit. For example, in medical
experiments a patient is an experimental unit, in field experiments a plot to
which a treatment is applied is an experimental unit.
Blocks:
While performing the
experiments most of the times whole experimental material is divided into
relatively homogeneous sub-groups or strata. These subgroups which are more
homogeneous amongst themselves are known as blocks.
5. Yield:
The measurement of the
variable under study on an experimental unit is known as yield.
Experimental Error:
A fundamental
phenomenon in replicated experiments is the variation in the measurements made
on the different experimental units, even when they get the same treatment. A
part of this variation is systematic and can be explained whereas the reminder
is to be taken of the random type. The
unexplained random part of the variation is termed as experimental error.
Experimental error includes all types of extraneous variation due to (i)
inherent variability in the experimental units. (ii) errors associated with the
measurements made and (iii) lack of representativeness of the sample to the
population under study.
Defn :
The variations in the
observations caused by uncontrolled factors are known as experimental error.
The experimental
error provides a basis for the confidence to be placed in the inference about
the population and hence it is important to estimate and control an
experimental error. The estimate of experimental error is obtained by
replication (repetition) and it can be controlled by the principal of local
control.
Precision of the experiment
The precision of an experiment is measured by the reciprocal of the variance of mean i.e.
where r is the number of replications.
As r increases, the precision also
increases and therefore increase to precision of the experiment the number of
replications must be increased. Since precision is inversely proportional to σ2 , the another way
to increase the precision is to control σ2 .
According to Prof.
R. A. Fisher, there are three basic principles of the design of experiments,
(1) Replication (2) Randomization and (3) Local Control.
Replication: Replication means
repetition of treatments under study. A treatment is repeated number of times
in order to obtain a more reliable estimate than it is possible from a single
observation. Thus, replication is necessary to increase the accuracy or
precision of estimates of the treatment effects. It also provides an estimate
of the error variance which is function of the differences amongst experimental
units under identical treatments.
The functions of replications are (i) to
provide an estimate of the error which is essential for comparison of the
treatments, (ii) to reduce the experimental error which enables us to obtain
more precise estimate of the treatment effects.
The most effective way to increase the
precision of an experiment is to increase the number of replications. The
number of replications in a particular case depends on the variability of the
experimental material, cost of taking observations etc. A thumb rule is to get
about 10 degrees of freedom for the experimental error and generally one should
not use less than four replicates.
2) Randomization:
Randomization is the process of
assigning the treatments to various experimental units in a purely chance
manner. The Principal of randomization is essential for a valid estimate of an
experimental error and also to minimize bias in the results. The main objects
of randomization are,
i. The validity of statistical tests of significance depends on the fact that the statistic under consideration obeys some statistical distribution. Randomization provides a logical basis for that and makes it possible to draw rigorous inductive inferences by the use of statistical theories based on probability theory. We have,
Where X1, X2,…., Xn are independently distributed. This
independence is achieved through randomization.
ii. Randomization
assures that the sources of uncontrolled variation operate randomly so that
their average effect on any group of units is zero. i.e. randomization ensures that different
treatments on the average are subject to equal environmental effect.
3) Local control:
If the experimental material is heterogeneous and different treatments
are allocated to various units at random over the entire experimental material,
then the heterogeneity of experimental units will enter the uncontrolled
factors and thus increases the experimental error. It is desirable to reduce
the experimental error as far as possible without increasing the number of
replications or without interfering the randomness which is essential for
statistical analysis.
When experimental material is heterogeneous,
the experimental error can be reduced by the principal of local control, in
which the heterogeneous experimental material is divided into homogeneous
groups (blocks) such that, the variation within each block is minimum and
between the blocks is maximum. The treatments are then allocated at random
within each block. Thus local control is the process of reducing the
experimental error by dividing the relatively heterogeneous experimental
material into homogeneous blocks.
Choice of size and shape of plots
In field experiments the size and shape
of plots influence the experimental error. If the total experimental area is
fixed, increase in size of plots automatically decreases the number of plots
and hence the number of replications.
In order to reduce the flow of
experimental material from one plot to another the strips of plant are left
between consecutive plots. These all experimental areas are known as ‘guard
areas’. As number of plots increases,
non experimental areas are also increases. H.F. Smith studied the effect of size and
shape of plots on precision of the experiment. He found that the variance per
unit area for plots of area x was
Vx = V1/xb , where V1 = variance among
plots of size unity and b is soil characteristic, 0 < b < 1.
i) If b=0, then Vx
= V1, in this case increase in plot size does not result in gain of
the precision.
ii) If b>0, then
Vx = V1/xb gives
Vx < V1 i.e. in this case the
precision of the experiment increases with an increase in plot size.
Usually, 0 < b < 1.
When definite
fertility counters are known, the maximum precision will be obtained by
arranging the plots in a block, with their long sides parallel to the direction
of the fertility gradient and by taking blocks one after another in the
direction of the gradient. In the absence of any knowledge of fertility
counters, it is better to use square plots and it is best to have small plots.
1.2 Completely Randomized Design (C.R.D.)
The simplest design using the two
essential principles of experiments that is the principle of replication and
randomization is known as C.R.D.
C.R.D. may be defined as ‘a design in
which the treatments are randomly allocated to the various experimental units
over the whole experimental material’.
Suppose that we have k treatments under comparison and the ith treatment is replicated r i times (i=1,2,…,k), then the total number of experimental units required for the experiment is Σ r i = N. In CRD we allocate the k treatments at random to the N experimental units subject to the condition that ith treatment appears in ri units (i=1,2,…,k).
In
particular if r i = r (i=1,2,…,k),
that is each treatment is replicated an
equal number of times r , then N= k.r and randomization
gives every group of r units an equal
chance of receiving treatments.
The
CRD is useful in small preliminary experiments and also in certain types of
animal or laboratory experiments where the experimental units are homogeneous.
Layout
means placement of the treatments under study the various experimental units.
Let
us consider a CRD with three treatments A, B and C and the number of
replications to be used be 5, 3 and 4 respectively.
In
order to allocate given treatments to the given 12 experimental units first we
obtained random permutation of the twelve numbers 1,2,3,4, up to 12.
The
random permutation of the numbers can be obtained by using random number tables
We
number the given experimental units from 1 to 12 in any convenient way. Then
the treatment A is applied to the five experimental units having the numbers
equal to the first five numbers in the random permutation. Treatment B is applied to those three
experimental units whose numbers are equal to the next three numbers in the
random permutation and treatment C is applied to the remaining four
experimental units.
Let
us consider CRD with k treatments and N experimental
units where ith treatment is to be replicated r i
times , i=1,2,…,k and Σ
r i = N.
Let
yij be the yield of the jth experimental unit receiving ith treatment i =1,2,…,k and
j=1,2,…ri
.
Mathematical Model
The fixed effect model for CRD is
yij = µ+ Ti+ €ij, i
=1,2,…,k and j=1,2,…ri .
where, µ is general mean effect.
Ti is equal to
effect due to ith treatment.
€ij is error associated with observation yij
The mathematical model is,
yij = µi
+ €ij , µi is ith treatment
effect
Let us define ,
and Ti = µi - µ , Ti is additional effect
due to ith
treatment.
Assumptions
Null hypothesis
In CRD, we have to compare given k
treatments, for that the null hypothesis is stated as,
H0 = There is no significant
difference between the treatment effects.
or H0 = the treatments are homogeneous.
or H0
= T
Least square estimates of the parameters µ and Ti
The mathematical model is ,
yij = µ
+ T
The least square
estimates for the parameters are obtained by minimizing the sum of squares E.
i.e. by solving the equations
Layout of an
experiment
The term layout refers to the placement of the treatments to the experimental units according to the conditions of the design.
Advantages and disadvantages of CRD
Advantages
1. CRD a flexible design. There is a complete flexibility in the number of treatments and number of their replications which may vary from treatment to treatment.
2. It is easy to layout to the design.
3. It results in maximum use of the experimental units since all the experimental material can be used.
4. Since the design is flexible, it simplifies the analysis when data on some experimental units or an entire treatment is missing.
5. It provides maximum number of degrees of freedom for estimation of experimental error which increases the precision of the experiment.
Disadvantages
1. The main disadvantage is that the principle of local control has not been used in this design due to which the experimental error is increased by the presence of entire variation among experimental units. This makes design less efficient and results in less sensitivity in detecting significant effects. Therefore CRD is seldom used in field experiments as it is difficult to get homogeneous experimental material in field experiments.
2. It is less accurate than other designs.
Applications
1. The CRD may be used in a chemical or a baking experiment where the experimental units are the parts of the thoroughly mixed chemical or powder.
***
Randomized block design (R.B.D.)
If the experimental units are not homogeneous,
we cannot use completely randomized design because in that case the variation
among the units will disturb the test of significance of the treatment effects.
The simplest design which enables us to take care of the variability among the
experimental units is randomized block design (R.B.D.). In R. B. D. all the three principles of designs are used.
Suppose we want to compare the effects
of k treatments, each treatment being replicated an equal number of times, say
r times. Then we need N=r. k experimental units which are not homogeneous. R. B. D. consists of two steps. The first step is to
divide the experimental units into r homogeneous groups which are also known as
blocks. Number of units in each block is equal to the number of
treatments. In agricultural field experiments, sometimes a fertility gradient
is present. In such a situation the blocks are placed across (perpendicular) to
the fertility ingredient in order to get homogeneous blocks and to have more
variation between the blocks.
The second step is to assign the
treatments at random to the units in a block for each block a fresh randomization
is done. Thus, in R. B. D., randomization
is restricted within the homogeneous block; also the variation among the blocks
is removed from variation due to error.
Layout of R.B.D.
The
treatments are first numbered from 1 to k. Experimental units in each block are also
numbered from 1 to k. The k treatments
are then allocated at random to the k units in a block. The random allocation can be made by
selecting a random permutation of numbers 1 to k by using random number table.
Let us obtain the layout of R. B. D. with five
treatments, each replicated three times. Here we required 15 experimental units
grouped into three homogeneous blocks each containing five units. We conveniently number the treatments and the
units in each block. Let A,B,C,D and E be the five treatments, then we get a random permutation of the
numbers 1 to 5 say 3,1,5,2,4 for the
first block then we apply the treatments as follows.
Unit No. |
1 |
2 |
3 |
4 |
5 |
Treatment |
B |
D |
A |
E |
C |
We find another random permutation for block 2 and same process is applied to the remaining blocks.
T.Y. B.Sc. (Design of Experiments)
Q 1.The factors
like spacing, date of
sowing and breeds
are often used as:
a) Experimental unit b) Treatment c) Replicate d) None of the above
Q 2. Randomization is a process in which the treatments are allocated to the
experimental units:
a) At the will of the investigator b) In a sequence
c) With the probability d) None of the above
Q 3. Randomization is the process which enables the experimenter to:
a)
Apply mathematical theories
b) Make probability statements
c)
Treat error independent
d) All the above
Q 4. Replication in the experiment means
a)
The number
of blocks
b) Total number of treatments
c)
The number of times a treatment
occurs in a experiment
d) None of the above
Q. 5. The decision about the number of replication is taken in view of:
a) Size of experimental units
b)
Competition among experimental
units
c)
Fraction to be
sampled
d) All the above
Q. 6. How many basic principles are in design of experiment?
a) 1 b)
2 c) 3 d)4
Q. 7. Experimental error
is due to:
a) Experimenter’s mistakes
b)
Extraneous factors
c)
Variation in treatment effects
d) None of the above
Q.8 Errors in statistical model are always taken to be:
a) a map b)
a plan of experiment c)an
architect d) all of these
Q.10 Local control is a device
to maintain…
a) Homogeneity among blocks b) homogeneity within block
c) Both
(a) and (b) d)
neither (a) nor (b)
Answers: 1-b, 2-c, 3-d, 4-c, 5-a, 6-c, 7-b, 8-c, 9-b, 10-b
Latin Square Design (LSD)
In RBD, the principle of local
control is used by grouping the experimental units in one way. i. e. according to blocks. This grouping of units
can be carried out in two ways; row wise grouping and column wise grouping. Such
a design in which the experimental units are grouped in two ways is 'Latin
Square Design'. This design is used with
advantages where the fertility gradient increases in two directions or the
fertility counters are not known. The LSD eliminates the initial variability
among the units in two directions.
In L.S.D. the number of replications of
each treatment is equal to the number of treatments. If there are m treatments,
then the number of experimental units required is m2. These m2
units are arranged in m rows and m columns. The m treatments are then
allocated, to these m2 units at random such that each treatment
occurs once and only once in each row and each column.
Defn
A design in which treatments are
allocated at random in the experimental units, so
that each treatment occurs once and only once in each row and in each column of
the arrangement of experimental units, is known as L.S.D.
Layout of L. S.
D.
Let there be m treatments under
comparison. The number of experimental units required is m2. The
whole experimental material is divided into m2 units arranged in m
rows and m columns. Then the m treatments are allocated to these units at
random so that each treatment occurs once and only once in each row and in each
column.
Consider a design with three treatments A, B and C, then these treatments can be allocated to the 32= 9 units arranged in three rows and three columns in 12 different ways. We may select one of these at random and allocate the treatments accordingly. For 2×2 and 3×3 Latin squares, there exists only one standard Latin square. ( A Latin square in which treatments say A,B,C and so on occurs in first row and first column in alphabetical order is called a standard Latin square.)
i. e. M × M = M
For
4 × 4 design, there are 576 arrangements
and for 5×5 design the possible number of arrangements is 16 1280.
In
3 × 3 LSD,
the 12 possible arrangements are given below.
For
ex: Total no. of possible Latin squares of
4×4= 4
(In 2 × 2 LSD, the 2 possible
arrangements)
Box Plots
Box plots are a graphical representation of data (easy to visualize descriptive statistics); they are also known as box-and-whisker diagrams. A box plot provides more information about the data than does a bar graph.
Things to know about box plots
· Sample is presented as a box.
· The spacing between the different parts of the box help to indicate the degree of dispersion (spread) and skewness in the data, and identify outliers.
· A box plot shows a 5-number data summary: minimum, first (lower) quartile, median, third (upper) quartile, maximum.
· The box is divided at the median.
· The length of the box is the interquartile
range (IQR).
· The 1st quartile is the bottom line.
· The 3rd quartile is the top line.
Quartiles divide frequency distributions
•
Q1 :1st or lower quartile: cuts off lowest 25% of
the data
•
Q2 :2nd quartile or median: 50% point, cuts data
set in half
•
Q3 :3rd quartile or upper quartile: cuts off
lowest 75% of the data (or highest 25%)
Q1 is the median of the first half of data set. Q3 is the median of the second half of data set.
The difference between the upper and lower quartiles is called the interquartile range. The interquartile range spans 50% of a data set, and eliminates the influence of outliers because the highest and lowest quarters are removed.
A biologist samples 12 red oak trees in a forest plot and counts the number of caterpillars on each tree. The following is a list of the number of caterpillars on each tree: 34, 47, 1, 15, 57, 24, 20, 11, 19, 50, 28, 37.
Comments
Post a Comment