Statistics is a complex science of measuring and analyzing various data. As in many other disciplines, the concept of a hypothesis exists in this industry. Thus, a hypothesis in statistics is a position that must be accepted or rejected. Moreover, in this industry there are several types of such assumptions, similar by definition, but different in practice. The null hypothesis is today's subject of study.
From general to particular: hypotheses in statistics
Another, no less important, departs from the basic definition of assumptions - the statistical hypothesis is the study of the general totality of objects important for science, regarding which scientists draw conclusions. It can be checked using a sample (part of the population). Here are some examples of statistical hypotheses:
1. The performance of the entire class may depend on the level of education of each student.
2. The initial course of mathematics is equally acquired by both children who came to school at 6 years old and children who came at 7.
In statistics, a simple hypothesis is called such an assumption, which uniquely characterizes a certain parameter of a quantity taken by a scientist.
Complex consists of several or an infinite number of simple. Indicate some area or not an exact answer.
It is useful to understand several definitions of hypotheses in statistics so as not to confuse them in practice.
The concept of the null hypothesis
The null hypothesis is a theory that there are some two aggregates that do not differ from each other. However, at the scientific level there is no concept of "do not differ", but there is "their similarity is zero." From this definition the concept was formed. In statistics, the null hypothesis is designated as H0. Moreover, the extreme value of the impossible (unlikely) is considered to be from 0.01 to 0.05 or less.
It is better to understand what the null hypothesis is, an example from life will help. The teacher at the university suggested that the different level of preparation of students of the two groups for the test work is caused by insignificant parameters, random reasons that do not affect the general level of education (the difference in the preparation of two groups of students is zero).
However, it is worthwhile to give an example of an alternative hypothesis - an assumption that refutes the assertion of the zero theory (H1). For example: the director of the university suggested that the different level in preparation for the test work for students of the two groups is caused by the use of different teaching methods by teachers (the difference in the preparation of the two groups is significant and there is an explanation).
Now you can immediately see the difference between the concepts of “null hypothesis” and “alternative hypothesis”. Examples illustrate these concepts.
Hypothesis Testing
To create an assumption is half the trouble. A real challenge for beginners is testing the null hypothesis. It is here that many expect difficulties.
Using the alternative hypothesis method, which claims the opposite of the zero theory, you can compare both options and choose the right one. This is how statistics work.
Let the null hypothesis H0, and the alternative H1, then:
H0: c = c0;
H1: c ≠ c0.
Here c is a certain average value of the population to be found, and c0 is the given value initially, in relation to which the hypothesis is checked. There is also a certain number X - the average value of the sample by which c0 is determined.
So, the check consists in comparing X and c0, if X = c0, then the null hypothesis is accepted. If X ≠ c0, then by assumption the alternative is considered true.
Trusted Verification Method
There is the most effective way by which the null statistical hypothesis is easily verified in practice. It consists in building a range of values up to 95% accuracy.
First you need to know the formula for calculating the confidence interval:
X - t * Sx ≤ c ≤ X + t * Sx,
where X is the initially given number based on an alternative hypothesis;
t - tabular values (student coefficient);
Sx is the standard average error, which is calculated as Sx = σ / √n, where the numerator is the standard deviation and the denominator is the sample size.
So, suppose the situation. Before repair, the conveyor produced 32.1 kg of final products per day, and after repair, according to the entrepreneur, the efficiency increased, and the conveyor, according to a weekly check, began to produce 39.6 kg on average.
The null hypothesis will argue that repairs did not affect the efficiency of the conveyor. An alternative hypothesis will say that the repair fundamentally changed the efficiency of the conveyor, so its productivity has improved.
From the table we find n = 7, t = 2,447, from where the formula will take the following form:
39.6 - 2.447 * 4.2 ≤ s ≤ 39.6 + 2.477 * 4.2;
29.3 ≤ s ≤ 49.9.
It turns out that the value 32.1 is in the range, and therefore, the value proposed by the alternative - 39.6 - is not automatically accepted. Remember that the null hypothesis is checked first for correctness, and then the opposite.
Varieties of denial
Prior to this, such a hypothesis construction option was considered, where H0 claims something, and H1 refutes this. Where could such a system be made up from:
H0: c = c0;
H1: c ≠ c0.
But there are two more related methods of refutation. For example, the null hypothesis states that the average grade rating of a class is more than 4.54, and the alternative then will say that the average grade of the same class is less than 4.54. And it will look like a system like this:
H0: s ⩾ 4.54;
H1: c <4.54.
Note that the null hypothesis states that the value is greater than or equal, and the statistical one is strictly less. The severity of the inequality sign is of great importance!
Statistical verification
A statistical test of null hypotheses is to use a statistical criterion. Such criteria are subject to various distribution laws.
For example, there is an F-criterion, which is calculated by the Fisher distribution. There is a T-test, most often used in practice, depending on the student distribution. Square criterion for Pearson's consent, etc.
Area of acceptance of the null hypothesis
In algebra there is the concept of "region of permissible values." This is such a segment or point on the X axis, on which there are many statistics values at which the null hypothesis is true. The extreme points of the segment are critical values. The rays on the right and left side of the segment are critical regions. If the found value is included in them, then the zero theory is refuted and an alternative is accepted.
Null hypothesis rebuttal
The null hypothesis in statistics is at times a very dodgy concept. During verification, it can make two types of errors:
1. The rejection of the true null hypothesis. We denote the first type as a = 1.
2. Acceptance of the false null hypothesis. The second type is denoted as a = 2.
It should be understood that these are not the same parameters, the outcomes of errors can vary significantly among themselves and have different samples.
An example of two types of errors
Complex concepts are easier to figure out with an example.
During the production of a certain medicine, scientists need extreme caution, since exceeding the dose of one of the components provokes a high level of toxicity of the finished drug, from which patients taking it can die. However, at the chemical level, an overdose cannot be detected.
Because of this, before releasing the medicine on sale, a small dose is checked on rats or rabbits by administering the drug to them.If most of the subjects die, then the medicine is not allowed for sale, if the experimental subjects are alive, then the medicine is allowed to be sold in pharmacies.
The first case: in fact, the medicine was not toxic, but during the experiment a mistake was made and the drug was classified as toxic and was not allowed for sale. A = 1.
The second case: in another experiment, when checking another batch of medicine, it was decided that the drug was not toxic, and it was allowed to go on sale, although in fact the drug was poisonous. A = 2.
The first option will entail large financial costs for the supplier-entrepreneur, since you have to destroy the entire batch of medicine and start from scratch.
The second situation will provoke the death of patients who bought and used this medicine.
Probability theory
Not only zero, but all hypotheses in statistics and economics are divided by level of significance.
Significance level - the percentage of errors of the first kind (deviation of the true null hypothesis).
• the first level is 5% or 0.05, that is, the probability of a mistake is 5 to 100 or 1 to 20.
• the second level is 1% or 0.01, that is, the probability is 1 to 100.
• the third level is 0.1% or 0.001, the probability is 1 to 1000.
Hypothesis Test Criteria
If scientists have already concluded that the null hypothesis is correct, then it must be tested. This is necessary to eliminate the error. There is a basic criterion for testing the null hypothesis, consisting of several stages:
1. The permissible error probability P = 0.05 is taken.
2. Statistics are selected for criterion 1.
3. By the well-known method is the range of acceptable values.
4. Now the value of statistics T.
5. If T (statistics) belongs to the domain of acceptance of the null hypothesis (as in the “trusting” method), then the assumptions are considered correct, which means that the null hypothesis itself remains true.
This is how statistics work. The null hypothesis, with proper verification, will be accepted or rejected.
It is worth noting that for ordinary entrepreneurs and users, the first three stages can be very difficult to perform accurately, so they are trusted by professional mathematicians. But 4 and 5 stages can be performed by any person who knows enough statistical methods of verification.