Basic concepts in Testing of Hypothesis

Tests of Hypothesis

Sampling theory ©

             Sampling is a part of our day to day life. Sampling is preferred to complete enumeration due to the fact that it is less time consuming, less expensive more accurate and reliable. Also in some cases sampling is the only possible method of data collection.

Population

            Population is a group of items, units or subjects which is under reference of study. It may consist of finite of infinite number of units (Universe).

Sample

         Sample is a part or fraction of a population selected on some basis. It consists of a few items of population. In principle, a sample should be such that it is a true representative of the population. Usually a random sample is selected. By random sampling, we mean the sampling in which each and every unit of the population has an equal and independent chance of being included in the sample. 

       In order to draw inference about certain phenomenon, sampling is a well accepted tool. Entire population can not be studied due to several reasons (stated above). In such a situation sampling is used. A properly drawn sample is much useful in drawing reliable conclusions. Here we draw a sample from probability distribution rather than a group of objects.

Random sample from a continuous probability distribution

              A random sample from a  continuous probability distribution f(x,ϴ) is nothing but the values of independent and identically distributed ( i. i. d.) random variables with the common probability distribution f(x,ϴ).

Definition:  If x1, x2,----, xn  are i. i. d. random variables with p. d. f. f(x, ϴ), then they from a random sample  from the population with p. d. f. f(x,ϴ) .

Note: i) For drawing inference we use the numerical values of x1, x2, ----, xn.                               ii) The joint p. d. f. of x1, x2,----, xn  is

Statistic

              Using the random sample x1, x2,----, xn  we draw conclusion about the unknown probability distribution. However probability distribution can be studied if the parameter ϴ is known. We use sample observations for this purpose. The sampled observations are summarized. The summarized quantity is called as statistic (estimator).

Definition:  If x1, x2,----, xn  is a  random sample from a  probability distribution  f(x,ϴ), then t = t t(x1, x2,----, xn ) a function of sample values which does not involve the unknown parameter ϴ is called as a  statistic( estimator).

Parameter

               A function based on population values is called as parameter. If f(x,ϴ) is a p. d. f. then the constant ϴ, involved in it is called as parameter. Statistic is a random variable and parameter is a constant. Since statistic is a random variable, it possesses some probability distribution; it may not be the same as that of the parent distribution f(x,ϴ). However parameter being a constant does not possess probability distribution.

Estimator  

               An estimator is a function of sample values for estimating the population parameter. A particular value of an estimator from a fixed set of values of a random sample is known as estimate. An estimate stands for the value of a parameter. For ex:  sample mean X-bar  is an estimate of population mean  µ  and sample variance S2 is an estimate of population variance ơ2

.

Unbiased Estimate

                                  A statistic or estimator t is said to be an unbiased estimate of population parameter ϴ,  if     E(t) = ϴ,  i.e. E( statistic or estimator ) = parameter.

Sampling distribution of statistic and Standard error

                   If x1, x2,----, xn  is a  random sample from a  probability distribution  f(x,ϴ), then the probability distribution of T = t (x1, x2,----, xn ) is called as its sampling distribution and standard deviation of t is called as its standard error ( S.E.).

                      
In testing of hypothesis, standard error of T is important.  Some typical statistics along with standard error are :  

where p1 and p2 are proportions obtained using two samples from two populations with proportions P1 and P2 .  ©

***

 Tests of significance

      A very important aspect of the sampling theory is the study of the tests of significance. By      t tests of significance, we decide on the basis of sample results if the deviation between the             observed sample statistic and the parameter value or the deviation between two independent         sample statistics is significant or insignificant (due to chance or sampling fluctuations).

 Hypothesis  

 A definite statement about the population parameter is called as hypothesis. (A hypothesis is a          claim to be tested). For ex: a particular scooter gives average of 50 km per liter, proportion of   unemployed persons is same in two different states, average life of an article produced by           company A is  greater than company B.

 Null Hypothesis  

A hypothesis of no difference is called null hypothesis. OR Null hypothesis is the hypothesis which is tested for possible rejection under the assumption that it is true (Prof. R. A. Fisher). For example, in case of a single statistic, H0 will be that the sample statistic doesn’t differ significantly from the parameter. i.e. H0:μ = 0 and in the case of two statistic H0 will be that the sample statistics don’t differ significantly i.e. H0: 1 =  2.

 Choice of null hypothesis

    i) A hypothesis whose faulty rejection is more harmful.                                                   i

i) ii) A hypothesis under which, we can find the probability distribution of test statistic.

Alternative Hypothesis

Any hypothesis which is complementary to the null hypothesis is called an alternative hypothesis. It is denoted by H1. For example, if H0: 0  i.e. the population has a specified mean 0 , then the alternative hypothesis could be

 i) H1: 0   ( 0   or  0)  ii)  H1:   0   iii)   H1 0  

The alternative hypothesis in (i) is known as a two sided(tailed) alternative while in (ii) & (iii) are known as (one sided alternatives) right tailed & left tailed alternative respectively.  

Comments

Popular posts from this blog

Unit 1 : Multiple Regression , Multiple Correlation and Partial Correlation 1.1: Multiple Linear Regression (for trivariate data)

Time Series