Tuesday, March 11, 2014

Stratified sampling with proportional allocation

Stratified sampling with proportional allocation


Description of the algorithms


we use the same algorithms as the sampling and simple random
systematic without replacement. The difference being of course the selection of the sample. We use the same principles for calculating total and proportions to finish on
statistical evaluation of the results.

Selection algorithm

The selection algorithm is as follows (example 220 ​​samples):

/* trie des données selon les strates */
proc sort data=tPres2007 out=tPres2007;
by Strate;
run;!
/* echantillonage Strate sur les echantillons */!
proc surveyselect data=tPres2007  n=220 reps=50 seed=1213  out=sasPres2007
stats;!            Strata Strate / alloc=prop ;!            size exprime;!/* équilibrqge de la taille des strates*/
run;!
/* trie des données selon les unités de réplication et les strates */
proc sort data=sasPres2007 out=sasPres2007;
by Replicate Strate;
run ;

We use the procedure sas "SURVEYSELECT" by selecting a stratification "Strata" on the variable Strate.
The "size" parameter is used to balance the strata population size. The algorithm selects a proportional allocation units according to the size of layers with the addition of "alloc = prop" option.

We used the "rep" parameter to generate 50 replications and the seed for the random generation: 1213. This generates 50 * 220 we pooled samples in the dataset "sasPres2007."
The "stats" variable we then used to calculate the total with the procedure "Surveymean".
For this algorithm, we trillion variables before starting the procedure Strate.
Then after selecting us retrions by number of replication and Strate.
The code is identical to the print samples 500 except that "n" is 500.

Size of Strata

To know the size of the strata resulting from the procedure, we use the procedure freq following:
/* On compte les strates */!
proc freq data=sasPres2007;!
table Strate;!
by replicate;!
run;

The sample size per stratum according to this algorithm gives us 220 and 500
samples the following table.
StrateNumber of Samples for 220Number of Samples for 220
12556
22658
34399
43273
547107
647107

Calculation of total

Use the procedure "surveymean" is enhanced with a new parameter
"Strata" which defines the stratification variable.

/* total Strate sur les echantillons */!
proc surveymeans data=sasPres2007 total=65466 SUM;!  
var ROYA2 SARK2 EXPRIME2;!  
by Replicate;!  
strata Strate;!  
weight Samplingweight;!  
ods output Statistics=estimations_sasdetail ;!
run;

Example output from the first sample (first replication):
VariableTotalStandard
deviation
Royal16582766160889
Sarkozy18474255161341
Exprimed3505702092186


Results of candidates 

On 220 samples 

Comment and detailed results of Ségolène Royal



Mean = 0.46888, standard deviation = 0.00589, confidence interval 95% CI = [0.46721; 
0.47055] A similar sample and comparing the systematic sampling and random selection 
without replacement, we see better results with the effect of survey 0.59. Is here with many polling a much better accuracy. The effect of stratum 
is clear and effective.

Comment and detailed results of Nicolas Sarkozy



Mean = 0.53112, standard deviation = 0.00589, confidence interval 95% CI = [0.52945, 0.53279] 
The confidence interval is contained, more accurate and allows us to approach the true 
value of candidate Nicolas Sarkozy.

On 500 samples


Comment and detailed results of Ségolène Royal




Mean = 0.47020 
Standard deviation = 0.00346 confidence interval 95% CI = [0.46922, 0.47118] 
We are here, on the most accurate results of this first study. All survey methods used, none could give a result as interesting. 
Here, the effect of sampling is 0.65. By combining a large number of office and 
good stratification, we get the good result.

Comment and detailed results of Nicolas Sarkozy


Mean = 0.52980, standard deviation = 0.00346, confidence interval 95% CI = [0.52882; 
0.53078] 
Extremely accurate this result highlights the quality of the methodology used and the associated algorithms.



No comments:

Post a Comment