Stratified sampling with proportional allocation
Description of the algorithms
we use the same algorithms as the sampling and simple random
systematic without replacement. The difference being of course the selection of the sample. We use the same principles for calculating total and proportions to finish on
statistical evaluation of the results.
Selection algorithm
The selection algorithm is as follows (example 220 samples):/* trie des données selon les strates */
proc sort data=tPres2007 out=tPres2007;
by Strate;
run;!
/* echantillonage Strate sur les echantillons */!
proc surveyselect data=tPres2007 n=220 reps=50 seed=1213 out=sasPres2007
stats;! Strata Strate / alloc=prop ;! size exprime;!/* équilibrqge de la taille des strates*/
run;!
/* trie des données selon les unités de réplication et les strates */
proc sort data=sasPres2007 out=sasPres2007;
by Replicate Strate;
run ;
We use the procedure sas "SURVEYSELECT" by selecting a stratification "Strata" on the variable Strate.
The "size" parameter is used to balance the strata population size. The algorithm selects a proportional allocation units according to the size of layers with the addition of "alloc = prop" option.
We used the "rep" parameter to generate 50 replications and the seed for the random generation: 1213. This generates 50 * 220 we pooled samples in the dataset "sasPres2007."
The "stats" variable we then used to calculate the total with the procedure "Surveymean".
For this algorithm, we trillion variables before starting the procedure Strate.
Then after selecting us retrions by number of replication and Strate.
The code is identical to the print samples 500 except that "n" is 500.
Size of Strata
To know the size of the strata resulting from the procedure, we use the procedure freq following:/* On compte les strates */!
proc freq data=sasPres2007;!
table Strate;!
by replicate;!
run;
The sample size per stratum according to this algorithm gives us 220 and 500
samples the following table.
Strate | Number of Samples for 220 | Number of Samples for 220 |
1 | 25 | 56 |
2 | 26 | 58 |
3 | 43 | 99 |
4 | 32 | 73 |
5 | 47 | 107 |
6 | 47 | 107 |
Calculation of total
Use the procedure "surveymean" is enhanced with a new parameter"Strata" which defines the stratification variable.
/* total Strate sur les echantillons */!
proc surveymeans data=sasPres2007 total=65466 SUM;!
var ROYA2 SARK2 EXPRIME2;!
by Replicate;!
strata Strate;!
weight Samplingweight;!
ods output Statistics=estimations_sasdetail ;!
run;
Example output from the first sample (first replication):
Variable | Total | Standard deviation |
Royal | 16582766 | 160889 |
Sarkozy | 18474255 | 161341 |
Exprimed | 35057020 | 92186 |
Results of candidates
On 220 samples
Comment and detailed results of Ségolène Royal
Mean = 0.46888, standard deviation = 0.00589, confidence interval 95% CI = [0.46721;
0.47055] A similar sample and comparing the systematic sampling and random selection
without replacement, we see better results with the effect of survey 0.59. Is here with many polling a much better accuracy. The effect of stratum
is clear and effective.
Comment and detailed results of Nicolas Sarkozy
Mean = 0.53112, standard deviation = 0.00589, confidence interval 95% CI = [0.52945, 0.53279]
The confidence interval is contained, more accurate and allows us to approach the true
value of candidate Nicolas Sarkozy.
On 500 samples
Comment and detailed results of Ségolène Royal
Mean = 0.47020
Standard deviation = 0.00346 confidence interval 95% CI = [0.46922, 0.47118]
We are here, on the most accurate results of this first study. All survey methods used, none could give a result as interesting.
Here, the effect of sampling is 0.65. By combining a large number of office and
good stratification, we get the good result.
Comment and detailed results of Nicolas Sarkozy
Mean = 0.52980, standard deviation = 0.00346, confidence interval 95% CI = [0.52882;
0.53078]
Extremely accurate this result highlights the quality of the methodology used and the associated algorithms.
No comments:
Post a Comment