Design of Experiments :Basic Concepts, Principles, CRD, RBD, LSD, Efficiency, Box Plot

 

Design of Experiments©

Introduction©

            Experiments are conducted by investigators in all fields of study either to discover something about a particular process or to compare the effects of several conditions on some phenomenon. The general procedure in scientific research is to formulate hypothesis and then to verify them directly or by their consequences.  For this verification the collection of observations is necessary. The design of experiments is essentially the pattern of the observations to be collected. For valid inferences, one must go by scientific approach which includes

a) Planning of the experiment

b) Obtaining relevant information from it regarding the statistical hypothesis under study and

c)  Making a statistical analysis of the data

Design of Experiments

      It is defined as, the process of planning the experiment to collect the appropriate data which may be analyzed by statistical methods resulting in valid and objective conclusions.  or

 The logical construction of the experiment in which degree of uncertainty with which inference is drawn may be well defined.

Terms and definitions

1. Experiment

An experiment is means (device) of getting an answer to the question that the experimenter has in mind. OR

An experiment is an act or operation under taken in order to discover some unknown principals or to test suggested or known truth. For example, for obtaining an average I.Q. we measure the I.Q’s of given person, this process is the experiment for our problem.

The experiments can be classified into two categories,

(i) Absolute experiments  and (ii) Comparative experiments.

 

Absolute Experiments:

Absolute experiments consist in determining a particular characteristic for a specified population.  For example; obtaining average IQ of students in certain college,  calculating correlation coefficient between two characteristics.

 

Comparative Experiments:

The comparative experiment is an experiment in which two or more objects (treatments) are compared to their effects. For example, in field experiments comparison of different fertilizers, in medical experiments comparison of different medicines are the comparative experiments.

 

Treatment:

 

The various objects of comparison in a comparative experiment are termed as treatments. For example, in medical experimentation different diets, medicines under comparison are the treatments. In field experimentation different fertilizers, varieties of crops, different methods of cultivation are the treatments.

 Experimental Unit (Plot):

The smallest division of the experimental material to which the treatment is applied and on which the observation is recorded is termed as experimental unit. For example, in medical experiments a patient is an experimental unit, in field experiments a plot to which a treatment is applied is an experimental unit.

Blocks:

While performing the experiments most of the times whole experimental material is divided into relatively homogeneous sub-groups or strata. These subgroups which are more homogeneous amongst themselves are known as blocks.

5. Yield:

The measurement of the variable under study on an experimental unit is known as yield.

Experimental Error:

A fundamental phenomenon in replicated experiments is the variation in the measurements made on the different experimental units, even when they get the same treatment. A part of this variation is systematic and can be explained whereas the reminder is to be taken  of the random type. The unexplained random part of the variation is termed as experimental error. Experimental error includes all types of extraneous variation due to (i) inherent variability in the experimental units. (ii) errors associated with the measurements made and (iii) lack of representativeness of the sample to the population under study.

Defn :

The variations in the observations caused by uncontrolled factors are known as experimental error.

       The experimental error provides a basis for the confidence to be placed in the inference about the population and hence it is important to estimate and control an experimental error. The estimate of experimental error is obtained by replication (repetition) and it can be controlled by the principal of local control.

Precision of the experiment

The precision of an experiment is measured by the reciprocal of the variance of mean i.e.    

                     

where r is the number of replications.

As r increases, the precision also increases and therefore increase to precision of the experiment the number of replications must be increased. Since precision is inversely proportional to σ2 , the another way to  increase the precision is to control σ2 .

 Principals of Design:

According to Prof. R. A. Fisher, there are three basic principles of the design of experiments,

(1) Replication  (2) Randomization and (3) Local Control.

Replication: Replication means repetition of treatments under study. A treatment is repeated number of times in order to obtain a more reliable estimate than it is possible from a single observation. Thus, replication is necessary to increase the accuracy or precision of estimates of the treatment effects. It also provides an estimate of the error variance which is function of the differences amongst experimental units under identical treatments.

The functions of replications are (i) to provide an estimate of the error which is essential for comparison of the treatments, (ii) to reduce the experimental error which enables us to obtain more precise estimate of the treatment effects.

The most effective way to increase the precision of an experiment is to increase the number of replications. The number of replications in a particular case depends on the variability of the experimental material, cost of taking observations etc. A thumb rule is to get about 10 degrees of freedom for the experimental error and generally one should not use less than four replicates.

2) Randomization:

Randomization is the process of assigning the treatments to various experimental units in a purely chance manner. The Principal of randomization is essential for a valid estimate of an experimental error and also to minimize bias in the results. The main objects of randomization are,

i. The validity of statistical tests of significance depends on the fact that the statistic under consideration obeys some statistical distribution. Randomization provides a logical basis for that and makes it possible to draw rigorous inductive inferences by the use of statistical theories based on probability theory. We have,      

                     

Where X1, X2,…., Xn  are independently distributed. This independence is achieved through randomization.

ii. Randomization assures that the sources of uncontrolled variation operate randomly so that their average effect on any group of units is zero. i.e.  randomization ensures that different treatments on the average are subject to equal environmental effect.

3) Local control:

       If the experimental material is heterogeneous and different treatments are allocated to various units at random over the entire experimental material, then the heterogeneity of experimental units will enter the uncontrolled factors and thus increases the experimental error. It is desirable to reduce the experimental error as far as possible without increasing the number of replications or without interfering the randomness which is essential for statistical analysis.

When experimental material is heterogeneous, the experimental error can be reduced by the principal of local control, in which the heterogeneous experimental material is divided into homogeneous groups (blocks) such that, the variation within each block is minimum and between the blocks is maximum. The treatments are then allocated at random within each block. Thus local control is the process of reducing the experimental error by dividing the relatively heterogeneous experimental material into homogeneous blocks.

Choice of size and shape of plots

In field experiments the size and shape of plots influence the experimental error. If the total experimental area is fixed, increase in size of plots automatically decreases the number of plots and hence the number of replications.

In order to reduce the flow of experimental material from one plot to another the strips of plant are left between consecutive plots. These all experimental areas are known as ‘guard areas’. As number of plots increases,   non experimental areas are also increases.  H.F. Smith studied the effect of size and shape of plots on precision of the experiment. He found that the variance per unit area for plots of area x was

                     Vx = V1/xb  , where V1 = variance among plots of size unity and b is soil characteristic,  0 < b < 1.

i) If b=0, then Vx = V1, in this case increase in plot size does not result in gain of the precision.

ii) If b>0, then Vx = V1/xb   gives Vx <  V1  i.e. in this case the precision of the experiment increases with an increase in plot size.

Usually,  0 < b < 1.

When definite fertility counters are known, the maximum precision will be obtained by arranging the plots in a block, with their long sides parallel to the direction of the fertility gradient and by taking blocks one after another in the direction of the gradient. In the absence of any knowledge of fertility counters, it is better to use square plots and it is best to have small plots.

 

©         

 1.2 Completely Randomized Design (C.R.D.)

          The simplest design using the two essential principles of experiments that is the principle of replication and randomization is known as C.R.D.

      C.R.D. may be defined as ‘a design in which the treatments are randomly allocated to the various experimental units over the whole experimental material’. 

Suppose that we have k  treatments under comparison and the ith treatment is replicated  r i  times (i=1,2,…,k),  then the total number of experimental units required for the experiment is Σ r i  = N. In CRD we allocate the k treatments at random to the N experimental units subject to the condition that  ith treatment appears in ri  units (i=1,2,…,k). 

In particular if  r i  = r  (i=1,2,…,k),  that is each treatment is replicated an equal number of times r ,  then N= k.r   and randomization gives every group of  r units an equal chance of receiving treatments.

The CRD is useful in small preliminary experiments and also in certain types of animal or laboratory experiments where the experimental units are homogeneous.

 Layout of CRD

Layout means placement of the treatments under study the various experimental units.

Let us consider a CRD with three treatments A, B and C and the number of replications to be used be 5, 3 and 4 respectively.

In order to allocate given treatments to the given 12 experimental units first we obtained random permutation of the twelve numbers 1,2,3,4, up to 12.

The random permutation of the numbers can be obtained by using random number tables

We number the given experimental units from 1 to 12 in any convenient way. Then the treatment A is applied to the five experimental units having the numbers equal to the first five numbers in the random permutation.  Treatment B is applied to those three experimental units whose numbers are equal to the next three numbers in the random permutation and treatment C is applied to the remaining four experimental units.

 Statistical analysis of CRD

Let us consider CRD with k treatments and  N experimental units where ith treatment is to be replicated  r i  times , i=1,2,…,k    and  Σ r i  = N.

Let yij be  the yield of the  jth experimental unit receiving  ith treatment  i =1,2,…,k  and  

j=1,2,…ri  .                                                                                                                       

 The observations can be put in tabular form as follows:

  


Mathematical Model  

 

The fixed effect model for CRD  is   

 

       yij µ+ Ti+ €ij,           i =1,2,…,k          and      j=1,2,…ri  .                                                                                                                                              

where,     µ is general mean effect.

              Ti is equal to effect due to ith treatment.

             €ij  is error associated with observation yij

The mathematical model is,

         yij  =  µi  + €ij  ,  µi  is ith treatment effect

Let us  define , 

                   

 and  Ti =  µi  - µ , Ti  is additional effect due to ith treatment.

  yij  =  µ + i  + €ij 

 

Assumptions



Null hypothesis

 

In CRD, we have to compare given k treatments, for that the null hypothesis is stated as,

H0 = There is no significant difference between the treatment effects.

or  H0 = the treatments are homogeneous.

or  H0 = T1 =T 2 = -------=Tk = 0            (Ti =0, for all i )

 Least square estimates of the parameters µ and Ti

       

     The mathematical model is ,

           yij  =  µ + Ti  + €ij  ,   i =1,2,…,k  and  j=1,2,…ri  .   


The least square estimates for the parameters are obtained by minimizing the sum of squares E. i.e.  by solving the equations 

 





Layout of an experiment

 The term layout refers to the placement of the treatments to the experimental units according to the conditions of the design.

 Advantages and disadvantages of CRD

 Advantages

 1. CRD a flexible design. There is a complete flexibility in the number of treatments and number of their replications which may vary from treatment to treatment.

 2. It is easy to layout to the design.

 3. It results in maximum use of the experimental units since all the experimental material can be   used.

 4. Since the design is flexible, it simplifies the analysis when data on some experimental units or an entire treatment is missing.

 5. It provides maximum number of degrees of freedom for estimation of experimental error which increases the precision of the experiment.

 Disadvantages

 1. The main disadvantage is that the principle of local control has not been used in this design due to which the experimental error is increased by the presence of entire variation among experimental units. This makes design less efficient and results in less sensitivity in detecting significant effects. Therefore CRD is seldom used in field experiments as it is difficult to get homogeneous experimental material in field experiments.

 2. It is less accurate than other designs.   

 Applications

 1. The CRD may be used in a chemical or a baking experiment where the experimental units are the parts of the thoroughly mixed chemical or powder.      

 ***

 Randomized block design (R.B.D.)

          If the experimental units are not homogeneous, we cannot use completely randomized design because in that case the variation among the units will disturb the test of significance of the treatment effects. The simplest design which enables us to take care of the variability among the experimental units is randomized block design (R.B.D.). In R. B. D.  all the three principles of designs are used.

       Suppose we want to compare the effects of k treatments, each treatment being replicated an equal number of times, say r times. Then we need N=r. k experimental units which are not homogeneous.  R. B. D.  consists of two steps. The first step is to divide the experimental units into r homogeneous groups which are also known as blocks. Number of units in each block is equal to the number of treatments. In agricultural field experiments, sometimes a fertility gradient is present. In such a situation the blocks are placed across (perpendicular) to the fertility ingredient in order to get homogeneous blocks and to have more variation between the blocks.

          The second step is to assign the treatments at random to the units in a block for each block a fresh randomization is done.  Thus, in R. B. D., randomization is restricted within the homogeneous block; also the variation among the blocks is removed from variation due to error.

Layout of R.B.D.

The treatments are first numbered from 1 to k.  Experimental units in each block are also numbered from 1 to k.  The k treatments are then allocated at random to the k units in a block.  The random allocation can be made by selecting a random permutation of numbers 1 to k by using random number table.

 Let us obtain the layout of R. B. D. with five treatments, each replicated three times. Here we required 15 experimental units grouped into three homogeneous blocks each containing five units.  We conveniently number the treatments and the units in each block. Let A,B,C,D and  E  be the five treatments,  then we get a random permutation of the numbers 1 to 5 say  3,1,5,2,4 for the first block then we apply the treatments as follows.

 

Unit No.

1

2

3

4

5

Treatment

B

D

A

E

C

 We find another random permutation for block 2 and same process is applied to the remaining blocks.












 Surprise Test: 01                                       Date: 11/08/2023 (10.15-10.30)

                             T.Y. B.Sc.  (Design of Experiments)   

       

Q 1.The factors like spacing, date of sowing and breeds are often used as:

a)   Experimental unit    b) Treatment    c) Replicate   d) None of the above


Q 2. Randomization is a process in which the treatments are allocated to the

   experimental    units:

        a)   At the will of the investigator    b)        In a sequence

       c)       With the probability                d)      None of the above

Q 3. Randomization is the process which enables the experimenter to:

a)       Apply mathematical theories

b)      Make probability statements

c)       Treat error independent

d)      All the above

Q 4. Replication in the experiment means

a)       The number of blocks

b)      Total number of treatments

c)       The number of times a treatment occurs in a experiment

d)      None of the above

 Q. 5. The decision about the number of replication is taken in view of:

   a)       Size of experimental units

  b)      Competition among experimental units

  c)       Fraction to be sampled

  d)      All the above

  Q. 6. How many basic principles are in design of experiment?

     a) 1            b) 2             c) 3             d)4

Q. 7. Experimental error is due to:

     a)       Experimenter’s mistakes

    b)      Extraneous factors

   c)       Variation in treatment effects

   d)      None of the above

Q.8 Errors in statistical model are always taken to be:


Q.9 An experimental design is:

a) a map               b) a plan of experiment          c)an architect      d) all of these

 

Q.10 Local control is a device to maintain…

a) Homogeneity among blocks              b) homogeneity within block

      c) Both (a) and (b)                                d) neither (a) nor (b)

 Answers: 1-b, 2-c, 3-d, 4-c, 5-a, 6-c, 7-b, 8-c, 9-b, 10-b

 

Latin Square Design (LSD)

          In RBD, the principle of local control is used by grouping the experimental units in one way. i. e.  according to blocks. This grouping of units can be carried out in two ways; row wise grouping and column wise grouping. Such a design in which the experimental units are grouped in two ways is 'Latin Square Design'.  This design is used with advantages where the fertility gradient increases in two directions or the fertility counters are not known. The LSD eliminates the initial variability among the units in two directions.

     In L.S.D. the number of replications of each treatment is equal to the number of treatments. If there are m treatments, then the number of experimental units required is m2. These m2 units are arranged in m rows and m columns. The m treatments are then allocated, to these m2 units at random such that each treatment occurs once and only once in each row and each column.

Defn

         A design in which treatments are allocated at random in the experimental units,  so that each treatment occurs once and only once in each row and in each column of the arrangement of experimental units, is known as L.S.D.

Layout  of  L. S. D.

       Let there be m treatments under comparison. The number of experimental units required is m2. The whole experimental material is divided into m2 units arranged in m rows and m columns. Then the m treatments are allocated to these units at random so that each treatment occurs once and only once in each row and in each column.

Consider a design with three treatments A, B and C, then these treatments can be allocated to the 32= 9 units arranged in three rows and three columns in 12 different ways.  We may select one of these at random and allocate the treatments accordingly. For 2×2 and 3×3 Latin squares, there exists only one standard Latin square. ( A Latin square in which treatments say A,B,C and so on occurs in first row and first column in alphabetical order is called a standard Latin square.)  

      

From standard Latin square we can generate a number of Latin squares by permitting rows, columns, and treatments which is known as transformation sets of Latin squares. Total no. of possible Latin-square of    M × M

i. e.    M × M = M  × (M-1) !  × No. of standard Latin squares.                    

For 4 × 4 design, there are 576  arrangements and for 5×5 design the possible number of arrangements is 16 1280.

In 3 × 3 LSD, the 12 possible arrangements are given below.

For ex: Total no. of possible Latin squares of   4×4= 4  ×(4-1) ! × 4    = 24*6*4 =576             

 
(In 2 × 2 LSD, the 2 possible arrangements)

 




 

  




















Box Plots

Box plots are a graphical representation of data (easy to visualize descriptive statistics); they are also known as box-and-whisker diagrams.  A box plot provides more information about the data than does a bar graph.

Things to know about box plots

·   Sample is presented as a box.

·  The spacing  between the different parts of the box help to indicate the degree of dispersion           (spread) and skewness in the data, and identify outliers.

·    A box plot shows a 5-number data summary: minimum, first (lower) quartile, median, third (upper)  quartile, maximum.

·   The box is divided at the median.

·   The length of the box is the interquartile range (IQR).

·   The 1st quartile is the bottom line.

·    The 3rd quartile is the top line.

 Example


Quartiles divide frequency distributions

          Q1 :1st or lower quartile: cuts off lowest 25% of the data

          Q2 :2nd quartile or median: 50% point, cuts data set in half

          Q3 :3rd quartile or upper quartile: cuts off lowest 75% of the data (or highest 25%)

Q1 is the median of the first half of data set. Q3 is  the median of the second half of data set.

The difference between the upper and lower quartiles is called the interquartile range. The interquartile range spans 50% of a data set, and eliminates the influence of outliers because the highest and lowest quarters are removed.

 Example:

A biologist samples 12 red oak trees in a forest plot and counts the number of caterpillars on each tree. The following is a list of the number of caterpillars on each tree: 34, 47, 1, 15, 57, 24, 20, 11, 19, 50, 28, 37.



If these k treatments compared by using r replications without considering the blocks, then the design becomes CRD and the mathematical model for this design is, 























 



 


Comments

Popular posts from this blog

Unit 1 : Multiple Regression , Multiple Correlation and Partial Correlation 1.1: Multiple Linear Regression (for trivariate data)

Time Series