Types of sampling in statistics. Abstract: Sampling method in statistics


Plan

  • Introduction
  • 1. The role of sampling
  • Conclusion
  • Bibliography

Introduction

Statistics - analytical science, which is necessary for all modern specialists. A modern specialist cannot be literate if he does not master statistical methodology. Statistics are the most important tool for connecting an enterprise with society. Statistics is one of the most important disciplines in curriculum all specialties, because statistical literacy is an integral component higher education, and in terms of the number of hours allocated in the curriculum, it ranks one of the first places. When working with numbers, every specialist must know how certain data were obtained, what their nature of calculation is, how complete and reliable they are.

1. The role of sampling

The set of all units of a population that have a certain characteristic and are subject to study is called the general population in statistics.

In practice, for one reason or another, it is not always possible or impractical to consider the entire population. Then they are limited to studying only a certain part of it, the ultimate goal of which is to disseminate the results obtained to the entire general population, i.e. a sampling method is used.

To do this, a part of the elements, the so-called sample, is selected from the general population in a special way, and the results of processing the sample data (for example, arithmetic averages) are generalized to the entire population.

The theoretical basis of the sampling method is the law of large numbers. By virtue of this law, with limited dispersion of a characteristic in the population and a sufficiently large sample with a probability close to complete reliability, the sample mean can be arbitrarily close to the general mean. This law, which includes a group of theorems, has been proven strictly mathematically. Thus, the arithmetic mean calculated from the sample can reasonably be considered as an indicator characterizing the population as a whole.

2. Probabilistic selection methods ensuring representativeness

In order to be able to draw conclusions about the properties of the general population from the sample, the sample must be representative (representative), i.e. it must fully and adequately represent the properties of the general population. The representativeness of the sample can only be ensured if the data selection is objective.

The sample population is formed according to the principle of mass probabilistic processes without any exceptions from the accepted selection scheme; it is necessary to ensure relative homogeneity of the sample population or its division into homogeneous groups of units. When forming a sample population, a clear definition of the sampling unit must be given. An approximately equal size of sampling units is desirable, and the smaller the sampling unit, the more accurate the results will be.

Three selection methods are possible: random selection, selection of units according to a specific pattern, a combination of the first and second methods.

If the selection in accordance with the accepted scheme is carried out from a general population previously divided into types (layers or strata), then such a sample is called typical (or stratified, or stratified, or zoned). Another division of sampling by species is determined by whether the sampling unit is a unit of observation or a series of units (sometimes the term "nest" is used). In the latter case, the sample is called serial or nested. In practice, a combination of typical sampling and serial sampling is often used. IN mathematical statistics When discussing the problem of data selection, it is imperative to introduce the division of the sample into repeated and non-repetitive. The first corresponds to the scheme of a returnable ball, the second - to a non-returnable one (when considering the process of data selection using the example of selecting balls of different colors from an urn). In socio-economic statistics there is no point in using repeated sampling, therefore, as a rule, non-repetitive sampling is meant.

Since socio-economic objects have a complex structure, the sample can be quite difficult to organize. For example, to select households when studying population consumption large city, it is easier to first select territorial cells, residential buildings, then apartments or households, then the respondent. This type of sampling is called multi-stage sampling. At each stage, different selection units are used: larger ones at the initial stages; at the last stage, the selection unit coincides with the observation unit.

Another type of selective observation is multiphase sampling. Such a sample includes a certain number of phases, each of which differs in the detail of the observation program. For example, 25% of the entire population is surveyed according to short program, every 4th unit from this sample is examined for more than full program etc.

For any type of sampling, units are selected in three marked ways. Let's consider the random selection procedure. First of all, a list of population units is compiled, in which each unit is assigned a digital code (number or label). Then a draw is made. Balls with the corresponding numbers are placed in the drum, they are mixed and the balls are selected. The numbers drawn correspond to the units included in the sample; the number of numbers is equal to the planned sample size.

Selection by lot may be subject to biases caused by technical deficiencies (quality of balls, drum) and other reasons. Selection according to the table is more reliable from the point of view of objectivity random numbers. Such a table contains a series of randomly alternating numbers selected by electronic signals. Since we use the decimal numeric system 0, 1, 2,., 9, the probability of any digit appearing is 1/10. Therefore, if it were necessary to create a table of random numbers including 500 characters, then about 50 of them would be 0, the same number would be 1, etc.

Selection according to some design is often used (so-called directed sampling). The selection scheme is adopted to reflect the basic properties and proportions of the population. The simplest way: using lists of units of the general population, compiled so that the ordering of units is not related to the properties being studied, a mechanical selection of units is carried out with a step equal to N: n. Usually, the selection begins not with the first unit, but by stepping back half a step to reduce the possibility of sampling bias . The frequency of occurrence of units with certain characteristics, for example, students with a certain level of academic performance living in a dormitory, etc. will be determined by the structure that has developed in the general population.

To be more confident that the sample will reflect the structure of the population, the latter is divided into types (strata or areas), and random or mechanical selection is carried out from each type. Total number of units selected from different types, must correspond to the sample size.

Particular difficulties arise when there is no list of units, and selection must be made either on the ground or from product samples in a finished goods warehouse. In these cases, it is important to develop in detail a terrain orientation scheme and a selection scheme and follow it, avoiding deviations. For example, the enumerator is instructed to move from a certain bus stop north along the even side of the street and, having counted two houses from the first corner, enter the third and conduct a survey in every 5th residential building. Strict adherence to the adopted scheme ensures the fulfillment of the main condition for the formation of a representative sample - the objectivity of the selection of units.

Quota selection should be distinguished from random sampling, when the sample is constructed from units of certain categories (quotas), which must be represented in given proportions. For example, when surveying department store customers, it may be planned to select 150 respondents, including 90 women, of which 25 are girls, 20 are young women with small children, 35 are middle-aged women dressed in a business suit, 10 are women over 50 years old. and older; In addition, it was planned to survey 70 men, of which 25 were teenagers and young men, 20 were young men with children, 15 were men dressed in suits, 10 were men dressed in sportswear. For determining consumer orientations and preferences, such a sample may be good, but if we want to use it to establish the average amount of purchases and their structure, we will get unrepresentative results. This is because quota sampling aims to select certain categories.

A sample may be unrepresentative, even if it is formed in accordance with known proportions of the population, but the selection is carried out without any scheme - units are recruited in any way as long as to ensure the ratio of their categories in the same proportions as in the population (for example, ratio of men and women, respondents younger and older than working age and able-bodied, etc.).

These remarks should caution against such sampling approaches and re-emphasize the need for objective selection.

3. Organizational and methodological features of random, mechanical, typical and serial sampling

Depending on how the population elements are selected for the sample, there are several types of sampling surveys. Selection can be random, mechanical, typical and serial.

Random selection is such that all elements of the population have an equal chance of being selected. In other words, each element in the population has an equal chance of being included in the sample.

sampling statistical probabilistic random

The requirement of random selection is achieved in practice using lots or a table of random numbers.

When selecting by drawing lots, all elements of the general population are pre-numbered and their numbers are written on cards. After thorough shuffling from the pack in any way (in a row or in any other order), the required number of cards is selected, corresponding to the sample size. In this case, you can either put the selected cards aside (thus carrying out the so-called non-repetitive selection), or, having pulled out a card, write down its number and return it to the pack, thereby giving it the opportunity to appear in the sample again (repeated selection). When re-selecting, each time after the card is returned, the pack must be thoroughly shuffled.

The drawing of lots method is used in cases where the number of elements of the entire population being studied is small. With a large population, random selection by drawing lots becomes difficult. More reliable and less labor-intensive in the case of a large volume of processed data is the method of using a table of random numbers.

Mechanical selection is carried out as follows. If a 10% sample is formed, i.e. out of every ten elements one must be selected, then the entire set is conditionally divided into equal parts of 10 elements. An element is then randomly selected from the top ten. For example, the draw indicated number nine. The selection of the remaining sample elements is completely determined by the specified selection proportion N by the number of the first selected element. In the case under consideration, the sample will consist of elements 9, 19, 29, etc.

Mechanical selection should be used carefully, as there is a real danger of so-called systematic errors. Therefore, before making mechanical sampling, it is necessary to analyze the population being studied. If its elements are located randomly, then the sample obtained mechanically will be random. However, often the elements of the original set are partially or even completely ordered. An order of elements that has a correct repeatability, the period of which may coincide with the period of mechanical sampling, is highly undesirable for mechanical selection.

Often the elements of a population are ordered according to the value of the characteristic being studied in descending or ascending order and do not have periodicity. Mechanical selection from such a population takes on the character of directed selection, since individual parts of the population are represented in the sample in proportion to their numbers in the entire population, i.e. selection aims to make the sample representative.

Another type of directional selection is typical selection. It is necessary to distinguish typical selection from the selection of typical objects. The selection of typical objects was used in zemstvo statistics, as well as in budget surveys. At the same time, the selection of “typical villages” or “typical farms” was made according to certain economic characteristics, for example, by the size of land ownership per yard, by the occupation of the residents, etc. A selection of this kind cannot be the basis for the application of the sampling method, since its main requirement - randomness of selection - is not met here.

In the actual typical selection in the sampling method, the population is divided into groups that are qualitatively homogeneous, and then a random selection is made within each group. Typical selection is more difficult to organize than random selection itself, since certain knowledge about the composition and properties of the general population is required, but it gives more accurate results.

In serial selection, the entire population is divided into groups (series). Then, by random or mechanical selection, a certain part of these series is isolated and they are completely processed. In essence, serial selection is a random or mechanical selection carried out on aggregated elements of the original population.

In theoretical terms, serial sampling is the most imperfect of those considered. As a rule, it is not used for processing material, but it provides certain convenience when organizing a survey, especially in the study of agriculture. For example, annual sample surveys of peasant farms in the years preceding collectivization were carried out using serial sampling. It is useful for the historian to know about serial sampling because he may encounter the results of such surveys.

In addition to the classical methods of selection described above, other methods are also used in the practice of the sampling method. Let's look at two of them.

The population under study may have a multi-stage structure; it may consist of units of the first stage, which, in turn, consist of units of the second stage, etc. For example, provinces include counties, counties can be considered as a collection of volosts, volosts consist of villages, and villages consist of courtyards.

Multi-stage selection can be applied to such populations, i.e. carry out selection consistently at each stage. Thus, from a set of provinces, you can select counties (first stage) using a mechanical, typical or random method, then select volosts using one of the indicated methods (second stage), then select villages (third stage) and, finally, courtyards (fourth stage).

An example of two-stage mechanical selection is the long-practiced selection of workers' budgets. At the first stage, enterprises are mechanically selected, at the second - workers, whose budget is examined.

The variability of the characteristics of the objects under study may be different. For example, the provision of peasant farms with their own labor force fluctuates less than, say, the size of their crops. In this regard, a smaller sample on labor supply will be as representative as a larger sample on crop size. In this case, from the sample from which the size of crops is determined, it is possible to make a sample that is sufficiently representative to determine the supply of labor, thereby carrying out a two-phase selection. IN general case you can add the following phases, i.e. from the resulting subsample, make another subsample, etc. The same selection method is used in cases where the objectives of the study require different accuracy when calculating different indicators.

Task 1. Descriptive statistics

At the exam, 20 students received the following marks (on a 100-point scale):

1) Construct a series of frequency distributions, relative and accumulated frequencies for 5 intervals;

2) Construct a polygon, histogram and cumulative polygon;

3) Find the arithmetic mean, mode, median, first and third quartiles, interquartile range, standard deviation and coefficients of variation. Analyze the data using these characteristics and specify an interval that includes 50% of the central values ​​of the specified quantities.

1) x (min) =53, x (max) =98

R=x (max) - x (min) =98-53=45

h=R/1+3.32lgn, where n is sample size, n=20

h= 45/1+3.32*lg20= 9

a (i) is the lower limit of the interval, b (i) is the upper limit of the interval.

a (1) = x (min) - h/2, b (1) = a (1) +h, then if b (i) is the upper limit of the i-th interval (and a (i+1) = b (i)), then b (2) =a (2) +h, b (3) =a (3) +h, etc. The construction of intervals continues until the beginning of the next interval in order is equal to or greater than x (max).

a (1) = 47.5 b (1) = 56.5

a (2) = 56.5 b (2) = 65.5

a (3) = 65.5 b (3) = 74.5

a (4) = 74.5 b (4) = 83.5

a (5) = 83.5 b (5) = 92.5

a (6) = 92.5 b (6) = 101.5

Intervals, a (i) - b (i)

Frequency counting

Frequency, n (i)

Cumulative frequency, n (hi)

2) To construct graphs, we write down the variation series of the distribution (interval and discrete) of relative frequencies W (i) = n (i) /n, accumulated relative frequencies W (hi) and find the ratio W (i) /h by filling out the table.

x (i) =a (i) +b (i) /2; W(hi)=n(hi)/n

Statistical series of assessment distribution:

Intervals, a (i) - b (i)

To construct a histogram of relative frequencies, we plot partial intervals along the abscissa axis, and on each of them we construct a rectangle, the area of ​​which is equal to the relative frequency W (i) of the given i-th interval. Then the height of the elementary rectangle must be equal to W (i) /h.

From the histogram you can obtain a polygon of the same distribution if the midpoints of the upper bases of the rectangles are connected by straight segments.

To construct the cumulate of a discrete series, we plot the values ​​of the characteristic along the abscissa axis, and the relative accumulated frequencies W (hi) along the ordinate axis. We connect the resulting points with straight line segments. For an interval series, we plot the upper boundaries of the group along the abscissa axis.

3) We find the arithmetic mean using the formula:

The mode is calculated by the formula:

Lower limit of the modal interval; h is the width of the grouping interval; - frequency of the modal interval; - frequency of the interval preceding the modal one; - frequency of the interval following the modal one. = 23.125.

Let's find the median:

n=20: 53,58,59,59,63,67,68,69,71,73,78,79,85,86,87,89,91,91,98,98

Substituting the values, we get: Q1=65;

The second quartile value coincides with the median value, so Q2=75.5; Q3= 88.

The interquarter range is:

The root mean square (standard) deviation is found using the formula:

The coefficient of variation:

From these calculations it is clear that 50% of the central values ​​of these values ​​includes the interval 74.5 - 83.5.

Task 2. Statistical testing of hypotheses.

Preferences in sports for men, women and teenagers are as follows:

Test the hypothesis about the independence of preference from gender and age b = 0.05.

1) Testing the hypothesis about the independence of preferences in sports.

Pearson coefficient:

The table value of the chi-square test with a degree of freedom of 4 at b = 0.05 is equal to h 2 table = 9.488.

So, the hypothesis is rejected. The differences in preferences are significant.

2. Conformity hypothesis.

Volleyball as a sport is closest to basketball. Let's check the consistency in preferences for men, women and teenagers.

Ф 2 =0.1896+0.1531+0.1624+0.1786+0.1415+0.1533 = 0.979.

With a significance level b = 0.05 and degree of freedom k = 2, the table value h 2 table = 9.210.

Since Ф 2 >, the differences in preferences are significant.

Task 3. Correlation and regression analysis.

An analysis of road traffic accidents yielded the following statistics regarding the percentage of drivers under 21 years of age and the number of accidents with serious consequences per 1000 drivers:

Conduct graphical and correlation-regression analysis of data, predict the number of accidents with serious consequences for a city in which the number of drivers under 21 years old is 20% of total number drivers.

We get a sample size of n = 10.

x is the percentage of drivers under 21 years of age,

y is the number of accidents per 1000 drivers.

The linear regression equation is:

We sequentially calculate:

Similarly we find

Sample regression coefficient

The connection between x, y is strong.

The linear regression equation takes the form:

On drawing presented field scattering And schedule linear regression . We carry out forecast For x n =20 .

We get y n =0 .2 9*20-1 .4 6 = 4 .3 4 .

Forecast meaning happened more everyone values, presented V original table . This consequence Togo, What correlation addiction straight And coefficient equals 0,29 enough big . On every unit increments Dx He gives increment Dy =0 .3

Exercise 4 . Analysis temporary rows And forecasting .

Predict index values ​​for the coming week using:

a) the moving average method, choosing three-week data for its calculation;

b) exponential weighted average, choosing b = 0.1.

From the table of random numbers we find the numbers 41, 51, 69, 135, 124, 93, 91, 144, 10, 24.

We arrange them in ascending order: 10, 24, 41, 51, 69, 91, 93, 124, 135, 144.

We carry out a new numbering from 1 to 10. We obtain the initial data for ten weeks:

Exponential smoothing at b = 0.1 gives only one value.

For the middle of the entire period we get three forecasts: 12.855; 1309; 12,895.

There is agreement between these forecasts.

Exercise 5 . Index analysis.

The company is engaged in the transportation of goods. There are data for a number of years on the volume of transportation of 4 types of cargo and the cost of transporting a unit of cargo.

Determine simple price, quantity and value indices for each type of product, as well as the Laspeyres and Pache indices and the cost index. Comment on your results meaningfully.

Solution. Let's calculate simple indices:

Laspeyres index:

Pasche index:

Turkey cost:

Individual indices indicate differences in changes in prices and quantities for cargo A, B, C, D. Aggregate indices indicate general trends changes. In general, the cost of transported goods decreased by 13%. The reason is that the most expensive cargo decreased by 42% in quantity, and its tariff remained almost unchanged.

Years 16-20 are numbered in order from 1 to 5. The initial data takes the form:

First, let us examine the dynamics of the amount of cargo A.

Index

Absolute increases

Rates of growth, %

Growth rate, %

At this pace growth averaged By formulas :

, .

For tempo growth V any case T etc =T R -1 .

Now we are considering cargo D .

Index

Absolute increases

Rates of growth, %

Growth rate, %

Conclusion

Average values ​​and their varieties play a role in statistics big role. Average indicators are widely used in analysis, since it is in them that the patterns of mass phenomena and processes find their manifestation both in time and in space. So, for example, the pattern of increasing labor productivity is expressed in statistical indicators of the growth of average output per worker in industry, the pattern of steady growth in the level of well-being of the population is manifested in statistical indicators of an increase in the average income of workers and employees, etc.

Such descriptive characteristics of the distribution of a varying characteristic as mode and median are widely used. They are specific characteristics; their meaning is assigned to any specific option in the variation series.

So, to characterize the most frequently occurring value of a characteristic, a mode is used, and to show the quantitative limit of the value of a varying characteristic, which half of the members of the population have reached, the median is used.

Thus, average values ​​help to study the patterns of development of industry, a specific industry, society and the country as a whole.

Bibliography

1. Theory of statistics: Textbook / R.A. Shmoilova, V.G. Minashkin, N.A. Sadovnikova, E.B. Shuvalova; Edited by R.A. Shmoilova. - 4th ed., revised. and additional - M.: Finance and Statistics, 2005. - 656 p.

2. Gusarov V.M. Statistics: Tutorial for universities. - M.: UNITY-DANA, 2001.

4. Collection of problems in the theory of statistics: Textbook/ Ed. Prof.V. V. Glinsky and Ph.D. Sc., Associate Professor L.K. Serga. Ed. Z-e. - M.: INFRA-M; Novosibirsk: Siberian Agreement, 2002.

5. Statistics: Textbook/Kharchenko L-P., Dolzhenkova V.G., Ionin V.G. et al., ed. V.G. Ionina. - 2nd edition, revised. and additional - M.: INFRA-M. 2003.

Similar documents

    Descriptive statistics and statistical inference. Selection methods that ensure representativeness of the sample. The influence of the type of sample on the magnitude of the error. Tasks when applying the sampling method. Extension of observational data to the general population.

    test, added 02/27/2011

    The sampling method and its role. Development of modern theory of selective observation. Typology of selection methods. Methods for practical implementation of simple random sampling. Organization of a typical (stratified) sample. Sample size for quota selection.

    report, added 09/03/2011

    The purpose of selective observation and sampling. Features of the organization various types selective observation. Sampling errors and methods for their calculation. Application of the sampling method for the analysis of enterprises in the fuel and energy complex.

    course work, added 10/06/2014

    Sample observation as a method of statistical research, its features. Random, mechanical, typical and serial types of selection in the formation of sample populations. The concept and causes of sampling error, methods for determining it.

    abstract, added 06/04/2010

    The concept and role of statistics in the management mechanism modern economy. Continuous and not continuous statistical observation, description of the sampling method. Types of selection during selective observation, sampling errors. Production and financial indicators.

    course work, added 03/17/2011

    Study of plan implementation. Ten percent sample survey using random non-repetitive sampling method. The cost of production of the plant. Marginal sampling error. Dynamics of average prices and sales volume of a product. Variable price index.

    test, added 02/09/2009

    Obtaining a sample size of the n-normal distribution random variable. Finding the numerical characteristics of the sample. Data grouping and variation series. Frequency histogram. Empirical distribution function. Statistical estimation of parameters.

    laboratory work, added 03/31/2013

    The essence of the concepts of sampling and selective observation, the main types and categories of selection. Determining the size and size of the sample. Practical use statistical analysis selective observation. Calculation of errors of the sample proportion and sample mean.

    course work, added 02/17/2015

    The concept of selective observation. Representativeness errors, measuring sampling error. Determining the required sample size. Using a sampling method instead of a continuous one. Dispersion in the population and comparison of indicators.

    test, added 07/23/2009

    Types of selection and observation errors. Methods for selecting units in a sample population. Characteristic commercial activities enterprises. Sample survey of product consumers. Generalization of sample characteristics to the population.

Sample

Sample or sample population- a set of cases (subjects, objects, events, samples), using a certain procedure, selected from the general population to participate in the study.

Sample characteristics:

  • Qualitative characteristics of the sample - who exactly we choose and what sampling methods we use for this.
  • Quantitative characteristics of the sample - how many cases we select, in other words, sample size.

Necessity of sampling

  • The object of study is very extensive. For example, consumers of a global company’s products include a huge number of geographically dispersed markets.
  • There is a need to collect primary information.

Sample size

Sample size- the number of cases included in the sample population. For statistical reasons, it is recommended that the number of cases be at least 30-35.

Dependent and independent samples

When comparing two (or more) samples, an important parameter is their dependence. If a homomorphic pair can be established (that is, when one case from sample X corresponds to one and only one case from sample Y and vice versa) for each case in two samples (and this basis of relationship is important for the trait being measured in the samples), such samples are called dependent. Examples of dependent samples:

  • pairs of twins,
  • two measurements of any trait before and after experimental exposure,
  • husbands and wives
  • and so on.

If there is no such relationship between samples, then these samples are considered independent, For example:

Accordingly, dependent samples always have the same size, while the size of independent samples may differ.

Comparison of samples is made using various statistical criteria:

  • and etc.

Representativeness

The sample may be considered representative or non-representative.

Example of a non-representative sample

  1. A study with experimental and control groups, which are placed in different conditions.
    • Study with experimental and control groups using a pairwise selection strategy
  2. A study using only one group - an experimental one.
  3. A study using a mixed (factorial) design - all groups are placed in different conditions.

Sampling types

Samples are divided into two types:

  • probabilistic
  • non-probabilistic

Probability samples

  1. Simple probability sampling:
    • Simple resampling. The use of such a sample is based on the assumption that each respondent is equally likely to be included in the sample. Based on the list of the general population, cards with respondent numbers are compiled. They are placed in a deck, shuffled and a card is taken out at random, the number is written down, and then returned back. Next, the procedure is repeated as many times as the sample size we need. Disadvantage: repetition of selection units.

The procedure for constructing a simple random sample includes the following steps:

1. must be received full list members of the population and number this list. Such a list, recall, is called a sampling frame;

2. determine the expected sample size, that is, the expected number of respondents;

3. extract as many numbers from the random number table as we need sample units. If there should be 100 people in the sample, 100 random numbers are taken from the table. These random numbers can be generated by a computer program.

4. select from the base list those observations whose numbers correspond to the written random numbers

  • Simple random sampling has obvious advantages. This method is extremely easy to understand. The results of the study can be generalized to the population being studied. Most approaches to statistical inference involve collecting information using a simple random sample. However, the simple random sampling method has at least four significant limitations:

1. It is often difficult to create a sampling frame that would allow simple random sampling.

2. Simple random sampling may result in a large population, or a population distributed over a large geographic area, which significantly increases the time and cost of data collection.

3. The results of simple random sampling are often characterized by low precision and a larger standard error than the results of other probability methods.

4. As a result of using SRS, a non-representative sample may be formed. Although samples obtained by simple random sampling, on average, adequately represent the population, some of them are extremely misrepresentative of the population being studied. This is especially likely when the sample size is small.

  • Simple non-repetitive sampling. The sampling procedure is the same, only the cards with respondent numbers are not returned to the deck.
  1. Systematic probability sampling. It is a simplified version of simple probability sampling. Based on the list of the general population, respondents are selected at a certain interval (K). The value of K is determined randomly. The most reliable result is achieved with a homogeneous population, otherwise the step size and some internal cyclic patterns of the sample may coincide (sampling mixing). Disadvantages: the same as in a simple probability sample.
  2. Serial (cluster) sampling. Selection units are statistical series (family, school, team, etc.). The selected elements are subject to a complete examination. The selection of statistical units can be organized as random or systematic sampling. Disadvantage: Possibility of greater homogeneity than in the general population.
  3. Regional sampling. In the case of a heterogeneous population, before using probability sampling with any selection technique, it is recommended to divide the population into homogeneous parts, such a sample is called district sampling. Zoning groups can include both natural formations (for example, city districts) and any feature that forms the basis of the study. The characteristic on the basis of which the division is carried out is called the characteristic of stratification and zoning.
  4. "Convenience" sample. The “convenient” sampling procedure consists of establishing contacts with “convenient” sampling units - a group of students, sports team, with friends and neighbors. If you want to get information about people's reactions to a new concept, this type of sampling is quite reasonable. Convenience sampling is often used to pretest questionnaires.

Non-probability samples

Selection in such a sample is carried out not according to the principles of randomness, but according to subjective criteria - availability, typicality, equal representation, etc.

  1. Quota sampling - the sample is constructed as a model that reproduces the structure of the general population in the form of quotas (proportions) of the characteristics being studied. Number of sample elements with various combinations of the studied characteristics is determined in such a way that it corresponds to their share (proportion) in the general population. So, for example, if our general population consists of 5,000 people, of which 2,000 are women and 3,000 are men, then in the quota sample we will have 20 women and 30 men, or 200 women and 300 men. Quota samples are most often based on demographic criteria: gender, age, region, income, education, and others. Disadvantages: usually such samples are not representative, because it is impossible to take into account several social parameters at once. Pros: readily available material.
  2. Method snowball. The sample is constructed as follows. Each respondent, starting with the first, is asked for contact information of his friends, colleagues, acquaintances who would fit the selection conditions and could take part in the study. Thus, with the exception of the first step, the sample is formed with the participation of the research objects themselves. The method is often used when it is necessary to find and interview hard-to-reach groups of respondents (for example, respondents with high incomes, respondents belonging to the same professional group, respondents who have any similar hobbies/interests, etc.)
  3. Spontaneous sampling – sampling of the so-called “first person you come across”. Often used in television and radio polls. The size and composition of spontaneous samples is not known in advance, and is determined only by one parameter - the activity of respondents. Disadvantages: it is impossible to establish which population the respondents represent, and as a result, it is impossible to determine representativeness.
  4. Route survey – often used when the unit of study is the family. On the map settlement, in which the survey will be carried out, all streets are numbered. Using a table (generator) of random numbers are selected big numbers. Each big number is considered as consisting of 3 components: street number (2-3 first numbers), house number, apartment number. For example, the number 14832: 14 is the street number on the map, 8 is the house number, 32 is the apartment number.
  5. Regional sampling with selection of typical objects. If, after zoning, a typical object is selected from each group, i.e. an object that is close to the average in terms of most of the characteristics studied in the study, such a sample is called regionalized with the selection of typical objects.

6.Modal sampling. 7.expert sampling. 8. Heterogeneous sample.

Group Building Strategies

The selection of groups for participation in a psychological experiment is carried out using various strategies to ensure that internal and external validity are maintained to the greatest possible extent.

Randomization

Randomization, or random selection, is used to create simple random samples. The use of such a sample is based on the assumption that each member of the population is equally likely to be included in the sample. For example, to make a random sample of 100 university students, you can put pieces of paper with the names of all university students in a hat, and then take 100 pieces of paper out of it - this will be a random selection (Goodwin J., p. 147).

Pairwise selection

Pairwise selection- a strategy for constructing sampling groups, in which groups of subjects are made up of subjects who are equivalent in terms of secondary parameters that are significant for the experiment. This strategy is effective for experiments using experimental and control groups with the best option- attracting twin pairs (mono- and dizygotic), as it allows you to create...

Stratometric selection

Stratometric selection- randomization with the allocation of strata (or clusters). With this method of sampling, the general population is divided into groups (strata) with certain characteristics (gender, age, political preferences, education, income level, etc.), and subjects with the corresponding characteristics are selected.

Approximate Modeling

Approximate Modeling- drawing limited samples and generalizing conclusions about this sample to the wider population. For example, with the participation of 2nd year university students in the study, the data of this study applies to “people aged 17 to 21 years”. The admissibility of such generalizations is extremely limited.

Approximate modeling is the formation of a model that, for a clearly defined class of systems (processes), describes its behavior (or desired phenomena) with acceptable accuracy.

Notes

Literature

Nasledov A. D. Mathematical methods psychological research. - St. Petersburg: Rech, 2004.

  • Ilyasov F.N. Representativeness of survey results in marketing research // Sociological Research. 2011. No. 3. P. 112-116.

see also

  • In some types of studies, the sample is divided into groups:
    • experimental
    • control
  • Cohort

Links

  • The concept of sampling. Main characteristics of the sample. Sampling types

Wikimedia Foundation.

2010.:

Synonyms

    See what “Selection” is in other dictionaries: sample - a group of subjects representing a specific population and selected for an experiment or study. The opposite concept is the general totality. A sample is a part of the general population. Dictionary practical psychologist . M.: AST,... ...

    See what “Selection” is in other dictionaries: Great psychological encyclopedia - sample Part of the general population of elements that is covered by observation (often it is called a sample population, and a sample is the method of sampling observation itself). In mathematical statistics it is accepted... ...

    Technical Translator's Guide - (sample) 1. A small quantity of a product, selected to represent its entire quantity. See: sale by sample. 2. A small quantity of goods given to potential buyers to give them the opportunity to carry it out... ...

    Sample Dictionary of business terms - part of the general population of elements that is covered by observation (often it is called a sample population, and a sample is the method of sampling observation itself). In mathematical statistics, the principle of random selection is adopted; This… …

    - (sample) A random selection of a subgroup of elements from the main population, the characteristics of which are used to evaluate the entire population as a whole. The sampling method is used when it is too time-consuming or too expensive to survey the entire population... Economic dictionary

    Cm … Synonym dictionary

Sample study.

The concept of the sampling method.

Selective observation- this is a non-continuous observation in which the selection of population units to be studied is carried out randomly, the selected part is subjected to research, after which the results are distributed to the entire population.

The sampling method is used in cases where

1 when the observation itself is associated with damage or destruction of the observed units (yarn for spice, light bulb for combustion product)

2 large volume of aggregate

3 high costs (financial and labor).

Typically, 5-10% of the entire population is subjected to a sample survey, less often 15-25%.

The purpose of sample observation is to determine the characteristics of the general average and the general share (P). Characteristics of the sample population – sample mean and the sample share (w) differ from the general characteristics by the amount of sampling error ( ). Therefore, it is necessary to calculate the sampling error or representativeness error, which is determined using formulas developed in probability theory for each type of sample and selection method.

There are the following methods for selecting units:

1 tackling on a returned ball pattern, usually called resampling.

During repeated selection, the probability of each individual unit being included in the sample remains constant, because after selecting a unit, it returns to the population and can again be selected.

2 selection according to the unreturned ball scheme, called non-repetitive sampling. In this case, each selected unit is not returned back, and the probability of individual units getting into the sample changes all the time (for the remaining units it will increase) (drawing), random number tables for example 75 out of 780.

Types of samples.

1 Actually – random.

This is one in which the selection of units in the sample population is made directly from the entire mass of units in the general population.

In this case, the number of selected units is usually determined based on the accepted sample proportion.

For a sample, there is the ratio of the number of units in the sample population to the number of units in the general population N.

So, with a 5% sample from a batch of goods of 2000 units, the sample size n is 100 units. (
), and with a 20% sample it will be 400 units.

(
)

An important condition for a random sample itself is that each unit in the population is given an equal opportunity to be included in the sample.

With random sampling, the maximum sampling error for the average equal to

- variance of the sample population

n- sample size

t is the confidence coefficient, which is determined from the table of values ​​of the Laplace integral function for a given probability P.

With non-repetitive sampling, the maximum sampling error is determined by the formula for the average

where N is the population size of the share

To determine the ash content of coal, 100 coal samples were examined in a random sample. As a result of the survey, it was found that the average ash content of coal in the sample was 16%, = 5%. In 10 samples, the ash content of coal was >20% with a probability of 0.954 to determine the limits within which the average ash content of coal in the deposit and the proportion of coal with an ash content >20% will be found.

Average ash content

determine the maximum sampling error


2*0.5=1%

at p=0.954 t=2

share of coal with ash content >20%

sample share is determined

where m is the proportion of units possessing the trait

sampling error for fraction

With a probability of 0.954, it can be stated that the share of coal with an ash content of more than 20% in the deposit will be within

P= 10%+(-)6% or

Mechanical sampling.

This is actually a kind of random. In this case, the entire population is divided into n equal parts and then one unit is selected from each part.

All units of the general population must be located in a certain order. At the same time, in relation to the indicator being studied, units of the general population can be ordered according to significant, secondary or neutral characteristics. In this case, from each group the unit that is in the middle of each group should be selected. This avoids sampling bias.

Used: when examining customers in stores, visitors in clinics, every 5,4,3, etc.

Example mechanical sampling

To determine the average period of use of a short-term loan at the bank, a 5% mechanical sample will be made, which includes 100 accounts. As a result of the survey, it was found that the average period for using a short-term loan is 30 days with
9 days in 5 accounts; loan period > 60 days.

Sampling error

those. with a probability of 0.954 it can be stated that the term of using the loan fluctuates

1 within 30 days + (-) 2 days, i.e.

2 shares of loans with a maturity > 60 days.

the sample share will be

let's determine the fraction error

with a probability of 0.954 it can be stated that the share of loans in a bank with a maturity period of >60 days will be within

Typical sample.

The general population is divided into homogeneous typical groups. Then, from each typical group, a purely random or mechanical sample is used to individually select units into the sample population

For example: pr. tr. workers consisting of separate groups according to qualifications.

Important Feature– gives more accurate results compared to others, because a typological unit is involved in the sample.

The selection of observation units in the sample population is carried out using various methods. Let's consider a typical sample with proportional selection within typical groups.

The sample size from a typical group when selecting proportional to the number of typical groups is determined by the formula

Where =V samples from a typical group

= V typical group.

The maximum error of the sample mean and proportion with a non-repetitive random and mechanical selection method within typical groups is calculated using the formulas


Where = variance of the sample population

Example: typical sample

To determine the average age of men getting married in the region, a 5% sample was taken with the selection of units proportional to the number of typical groups

Mechanical selection was used within groups

With a probability of 0.954, determine the limits within which the average age men who got married and the share of men who got married a second time.

average age at which men in the sample get married

marginal sampling error

with a probability of 0.954 it can be stated that the average age of men getting married will be within

for men entering into a second marriage to be within

sample share is determined

the sample variance of the alternative attribute is equal to

with a probability of 0.954 it can be stated that the proportion of those getting married for the second time is within

Serial sampling.

In serial sampling, the population is divided into groups of equal size - series. Sample population series are selected. Within the series, continuous observation of units included in the series is carried out.

With non-repetitive selection And determined by the formula

Where
- inter-run variance

Where
sample series average

sample mean of serial sample

R is the number of series in the population

r - number of selected series

Example: in a workshop of 10 teams, in order to study their labor productivity, a 20% serial sample will be carried out, which includes 2 teams. As a result of the examination, it was found that

with a probability of 0.997, determine the limits within which the average output of the workshop workers will be.

the sample mean of a serial sample is determined by the formula

with a probability of 0.997 it can be stated that the average output of workshop workers is within

In the workshop's finished product warehouse there are 200 boxes of parts, 40 pieces in each box. To check the quality of finished products, a 10% batch sampling will be carried out. As a result of the sample, it was found that for defective parts it is 15%. The variance of the serial sample is 0.0049.

With a probability of 0.997, determine the limits within which the proportion of defective products in a batch of boxes lies

The proportion of defective parts will be within

Let's determine the maximum sampling error for the share using the formula

with a probability of 0.997 it can be stated that the proportion of defective parts

in the party is within

In the practice of designing sample observation, there is a need to find the sample size, which is necessary to ensure a certain accuracy in the calculation of general characteristics - the average and the proportion.

The maximum sampling error, the probability of its occurrence and the variation of the characteristic are known in advance.

By chance re-selection the sample size is determined by the formula

with random non-repetitive and mechanical sampling, sample size

for a typical sample

for serial sampling

Example: There are 2,000 families living in an area.

It is planned to conduct a sample survey of them using a random, non-repetitive selection method to find the average family size.

Determine the required sample size, provided that with a probability of 0.954 the sampling error will not exceed 1 person with a standard deviation of 3 people.

The city has a population of 10 thousand. families. Using mechanical sampling, it is proposed to determine the proportion of families with three or more children. What should the sample size be so that with probability P = 0.954 the sampling error does not exceed 0.02, if based on previous surveys it is known that the variance is 0.02?

Sample

Sample or sample population- a set of cases (subjects, objects, events, samples), using a certain procedure, selected from the general population to participate in the study.

Sample characteristics:

  • Qualitative characteristics of the sample - who exactly we choose and what sampling methods we use for this.
  • Quantitative characteristics of the sample - how many cases we select, in other words, sample size.

Necessity of sampling

  • The object of study is very extensive. For example, consumers of a global company’s products include a huge number of geographically dispersed markets.
  • There is a need to collect primary information.

Sample size

Sample size- the number of cases included in the sample population. For statistical reasons, it is recommended that the number of cases be at least 30-35.

Dependent and independent samples

When comparing two (or more) samples, an important parameter is their dependence. If a homomorphic pair can be established (that is, when one case from sample X corresponds to one and only one case from sample Y and vice versa) for each case in two samples (and this basis of relationship is important for the trait being measured in the samples), such samples are called dependent. Examples of dependent samples:

  • pairs of twins,
  • two measurements of any trait before and after experimental exposure,
  • husbands and wives
  • and so on.

If there is no such relationship between samples, then these samples are considered independent, For example:

Accordingly, dependent samples always have the same size, while the size of independent samples may differ.

Comparison of samples is made using various statistical criteria:

  • and etc.

Representativeness

The sample may be considered representative or non-representative.

Example of a non-representative sample

  1. A study with experimental and control groups, which are placed in different conditions.
    • Study with experimental and control groups using a pairwise selection strategy
  2. A study using only one group - an experimental one.
  3. A study using a mixed (factorial) design - all groups are placed in different conditions.

Sampling types

Samples are divided into two types:

  • probabilistic
  • non-probabilistic

Probability samples

  1. Simple probability sampling:
    • Simple resampling. The use of such a sample is based on the assumption that each respondent is equally likely to be included in the sample. Based on the list of the general population, cards with respondent numbers are compiled. They are placed in a deck, shuffled and a card is taken out at random, the number is written down, and then returned back. Next, the procedure is repeated as many times as the sample size we need. Disadvantage: repetition of selection units.

The procedure for constructing a simple random sample includes the following steps:

1. it is necessary to obtain a complete list of members of the population and number this list. Such a list, recall, is called a sampling frame;

2. determine the expected sample size, that is, the expected number of respondents;

3. extract as many numbers from the random number table as we need sample units. If there should be 100 people in the sample, 100 random numbers are taken from the table. These random numbers can be generated by a computer program.

4. select from the base list those observations whose numbers correspond to the written random numbers

  • Simple random sampling has obvious advantages. This method is extremely easy to understand. The results of the study can be generalized to the population being studied. Most approaches to statistical inference involve collecting information using a simple random sample. However, the simple random sampling method has at least four significant limitations:

1. It is often difficult to create a sampling frame that would allow simple random sampling.

2. Simple random sampling may result in a large population, or a population distributed over a large geographic area, which significantly increases the time and cost of data collection.

3. The results of simple random sampling are often characterized by low precision and a larger standard error than the results of other probability methods.

4. As a result of using SRS, a non-representative sample may be formed. Although samples obtained by simple random sampling, on average, adequately represent the population, some of them are extremely misrepresentative of the population being studied. This is especially likely when the sample size is small.

  • Simple non-repetitive sampling. The sampling procedure is the same, only the cards with respondent numbers are not returned to the deck.
  1. Systematic probability sampling. It is a simplified version of simple probability sampling. Based on the list of the general population, respondents are selected at a certain interval (K). The value of K is determined randomly. The most reliable result is achieved with a homogeneous population, otherwise the step size and some internal cyclic patterns of the sample may coincide (sampling mixing). Disadvantages: the same as in a simple probability sample.
  2. Serial (cluster) sampling. Selection units are statistical series (family, school, team, etc.). The selected elements are subject to a complete examination. The selection of statistical units can be organized as random or systematic sampling. Disadvantage: Possibility of greater homogeneity than in the general population.
  3. Regional sampling. In the case of a heterogeneous population, before using probability sampling with any selection technique, it is recommended to divide the population into homogeneous parts, such a sample is called district sampling. Zoning groups can include both natural formations (for example, city districts) and any feature that forms the basis of the study. The characteristic on the basis of which the division is carried out is called the characteristic of stratification and zoning.
  4. "Convenience" sample. The “convenience” sampling procedure consists of establishing contacts with “convenient” sampling units - a group of students, a sports team, friends and neighbors. If you want to get information about people's reactions to a new concept, this type of sampling is quite reasonable. Convenience sampling is often used to pretest questionnaires.

Non-probability samples

Selection in such a sample is carried out not according to the principles of randomness, but according to subjective criteria - availability, typicality, equal representation, etc.

  1. Quota sampling - the sample is constructed as a model that reproduces the structure of the general population in the form of quotas (proportions) of the characteristics being studied. The number of sample elements with different combinations of studied characteristics is determined so that it corresponds to their share (proportion) in the general population. So, for example, if our general population consists of 5,000 people, of which 2,000 are women and 3,000 are men, then in the quota sample we will have 20 women and 30 men, or 200 women and 300 men. Quota samples are most often based on demographic criteria: gender, age, region, income, education, and others. Disadvantages: usually such samples are not representative, because it is impossible to take into account several social parameters at once. Pros: readily available material.
  2. Snowball method. The sample is constructed as follows. Each respondent, starting with the first, is asked for contact information of his friends, colleagues, acquaintances who would fit the selection conditions and could take part in the study. Thus, with the exception of the first step, the sample is formed with the participation of the research objects themselves. The method is often used when it is necessary to find and interview hard-to-reach groups of respondents (for example, respondents with a high income, respondents belonging to the same professional group, respondents with any similar hobbies/interests, etc.)
  3. Spontaneous sampling – sampling of the so-called “first person you come across”. Often used in television and radio polls. The size and composition of spontaneous samples is not known in advance, and is determined only by one parameter - the activity of respondents. Disadvantages: it is impossible to establish which population the respondents represent, and as a result, it is impossible to determine representativeness.
  4. Route survey – often used when the unit of study is the family. On the map of the locality in which the survey will be carried out, all streets are numbered. Using a table (generator) of random numbers, large numbers are selected. Each large number is considered as consisting of 3 components: street number (2-3 first numbers), house number, apartment number. For example, the number 14832: 14 is the street number on the map, 8 is the house number, 32 is the apartment number.
  5. Regional sampling with selection of typical objects. If, after zoning, a typical object is selected from each group, i.e. an object that is close to the average in terms of most of the characteristics studied in the study, such a sample is called regionalized with the selection of typical objects.

6.Modal sampling. 7.expert sampling. 8. Heterogeneous sample.

Group Building Strategies

The selection of groups for participation in a psychological experiment is carried out using various strategies to ensure that internal and external validity are maintained to the greatest possible extent.

Randomization

Randomization, or random selection, is used to create simple random samples. The use of such a sample is based on the assumption that each member of the population is equally likely to be included in the sample. For example, to make a random sample of 100 university students, you can put pieces of paper with the names of all university students in a hat, and then take 100 pieces of paper out of it - this will be a random selection (Goodwin J., p. 147).

Pairwise selection

Pairwise selection- a strategy for constructing sampling groups, in which groups of subjects are made up of subjects who are equivalent in terms of secondary parameters that are significant for the experiment. This strategy is effective for experiments using experimental and control groups, with the best option being the involvement of twin pairs (mono- and dizygotic), as it allows you to create...

Stratometric selection

Stratometric selection- randomization with the allocation of strata (or clusters). With this method of sampling, the general population is divided into groups (strata) with certain characteristics (gender, age, political preferences, education, income level, etc.), and subjects with the corresponding characteristics are selected.

Approximate Modeling

Approximate Modeling- drawing limited samples and generalizing conclusions about this sample to the wider population. For example, with the participation of 2nd year university students in the study, the data of this study applies to “people aged 17 to 21 years”. The admissibility of such generalizations is extremely limited.

Approximate modeling is the formation of a model that, for a clearly defined class of systems (processes), describes its behavior (or desired phenomena) with acceptable accuracy.

Notes

Literature

Nasledov A. D. Mathematical methods of psychological research. - St. Petersburg: Rech, 2004.

  • Ilyasov F.N. Representativeness of survey results in marketing research // Sociological Research. 2011. No. 3. P. 112-116.

see also

  • In some types of studies, the sample is divided into groups:
    • experimental
    • control
  • Cohort

Links

  • The concept of sampling. Main characteristics of the sample. Sampling types

Wikimedia Foundation.

2010.:
  • Shchepkin, Mikhail Semenovich
  • Population

Synonyms

    See what “Selection” is in other dictionaries:- a group of subjects representing a specific population and selected for an experiment or study. The opposite concept is the general totality. A sample is a part of the general population. Dictionary of a practical psychologist. M.: AST,... ... . M.: AST,... ...

    See what “Selection” is in other dictionaries: Great psychological encyclopedia - sample Part of the general population of elements that is covered by observation (often it is called a sample population, and a sample is the method of sampling observation itself). In mathematical statistics it is accepted... ...

    Sample Technical Translator's Guide - (sample) 1. A small quantity of a product, selected to represent its entire quantity. See: sale by sample. 2. A small quantity of goods given to potential buyers to give them the opportunity to carry it out... ...

    Sample Dictionary of business terms - part of the general population of elements that is covered by observation (often it is called a sample population, and a sample is the method of sampling observation itself). In mathematical statistics, the principle of random selection is adopted; This… …

    SAMPLE- (sample) A random selection of a subgroup of elements from the main population, the characteristics of which are used to evaluate the entire population as a whole. The sampling method is used when it is too time-consuming or too expensive to survey the entire population... Economic dictionary

    See what “Selection” is in other dictionaries:- Cm … Synonym dictionary

One of the main components of a well-designed study is defining the sample and what a representative sample is. It's like the cake example. After all, you don’t have to eat the whole dessert to understand its taste? A small part is enough.

So, the cake is population (that is, all respondents who are eligible for the survey). It can be expressed geographically, for example, only residents of the Moscow region. Gender - women only. Or have age restrictions - Russians over 65 years old.

Calculating the population is difficult: you need to have data from the population census or preliminary assessment surveys. Therefore, usually the general population is “estimated”, and from the resulting number they calculate sample population or sample.

What is a representative sample?

Sample– this is a clearly defined number of respondents. Its structure should coincide as much as possible with the structure of the general population in terms of the main characteristics of selection.

For example, if potential respondents are the entire population of Russia, where 54% are women and 46% are men, then the sample should contain exactly the same percentage. If the parameters coincide, then the sample can be called representative. This means that inaccuracies and errors in the study are reduced to a minimum.

The sample size is determined taking into account the requirements of accuracy and economy. These requirements are inversely proportional to each other: the larger the sample size, the more accurate the result. Moreover, the higher the accuracy, the correspondingly more costs are required to conduct the study. And vice versa, the smaller the sample, the less costs it costs, and the less accurately and more randomly the properties of the general population are reproduced.

Therefore, to calculate the volume of choice, sociologists invented a formula and created special calculator:

Confidence probability And confidence error

What do the terms " confidence probability" And " confidence error"? Confidence probability is an indicator of measurement accuracy. And the confidence error is possible error research results. For example, with a population of more than 500,00 people (let’s say living in Novokuznetsk), the sample will be 384 people with a confidence probability of 95% and an error of 5% OR (with a confidence interval of 95±5%).

What follows from this? When conducting 100 studies with such a sample (384 people), in 95 percent of cases the answers obtained, according to the laws of statistics, will be within ±5% of the original one. And we will receive a representative sample with a minimum probability of statistical error.

After calculating the sample size is completed, you can see if there is a sufficient number of respondents in the demo version of the Questionnaire Panel. You can find out more about how to conduct a panel survey.

Did you like the article? Share with your friends!