Monday, March 10, 2014

Simple random sampling without replacement

Simple random sampling without replacement

Introduction

First part describes the algorithms implemented. The selection of the sample, looking for totals and proportions, and eventually the statistical calculation. 

The second part shows the expected results for each candidate.
To calculate the average we will rely on the total so do all our calculations on total

Algorithms

The explanation of the method will be 220 ​​samples. We begin with the selection of the sample.
Then we focus on total return for our proportions and finally the calculation
the mean, standard deviation and histogram of our 50 replications

Selection of the sample

We use the procedure sas "SURVEYSELECT" by selecting the "SRS" method. This method uses an algorithm to hash Neat Floyd. the 
selection is made equal probability (n / N) and without replacement. 
Here is the code for 220 samples:

/* échantillonnage Sas sur 220 échantillons */!
proc surveyselect data=tPres2007 method=SRS n=220 reps=50 seed=1213
out=sasPres2007n220 stats;!
run;

We used the "rep" parameter to generate 50 replications and the seed for the random generation in 1213.
This generated 50 * 220 we pooled samples in the dataset "sasPres2007n220."
Each replication is identified by a column named "replicate". The "stats" variable we will then calculate the total for the procedure
"Surveymean". The code is identical to the print samples 500 except that "n" is 500.

Calculation of totals

To calculate totals, we use the method:

/* total Sas sur 220 echantillons */!
proc surveymeans data=sasPres2007 total=65466 SUM;!  
var ROYA2 SARK2 EXPRIME2;!  
by Replicate;!  
weight Samplingweight;!  
ods output Statistics=estimations_sasdetail ;!
run;

To calculate the total we use "weight" variables with sampling weights
calculate the "SURVEYSELECT" and we distinguish the total polling method.
The addition of the "total" parameter gives the total number of line "dataset" which is useful to the algorithm in the course of calculation.
We use "by" parameter to calculate the sample replication also create the variable "replicate" generated by the SURVEYSELECT procedure.

Example output from the first sample (first replication):
VariableTotalStandard
Deviation
Royal16573313777767
Sarkozy18283464807676
Expressed348567771486216

This estimator is close to 35158378 people who have actually expressed during the electoral suffrage.

Calculating proportions

To perform this simple calculation, we will launch several calls lock procedure and put the data in order.

/* transposition */! 
proc transpose data=estimations_sasdetail  out=FinalResultSasN220! 
label=varname;! by replicate;! 
run;! 
/* on met les résultat sur des colonnes différentes */! 
data FinalResultSasN220 ;!  
set FinalResultSasN220;!  
if _NAME_='Sum' then roya=col1 ;!  
if _NAME_='Sum' then sark=col2;!  
if _NAME_='Sum' then expr=col3;!  
if _NAME_='StdDev' then errorRoya=col1 ;!  
if _NAME_='StdDev' then errorSark=col2;! 
run;! 
/* on s assure de sommer les colonnes différencié par replicate */! 
proc summary data=FinalResultSasN220 SUM;! 
BY replicate;! 
VAR roya sark expr errorRoya errorSark;! 
output out=FinalResultSasN220 sum(roya)=roya sum(sark)=sark sum(expr)=expr 
sum(errorRoya)=errorRoya sum(errorSark)=errorSark;! 
run;!

/* on calcul les proportions */!
data FinalResultSasN220 ;!
set FinalResultSasN220;!
pctRoya=roya/expr ;!
pctSark=sark/expr;!
errorRoya=errorRoya/roya ;!
errorSark=errorSark/sark;!
run;!

/* on filtre les résultats */!
data FinalResultSasN220;!
set FinalResultSasN220(keep=replicate pctRoya pctSark errorRoya
errorSark);!
run;!

/* on affiche les résultats */!
proc print data=FinalResultSasN220 double;!  
var pctRoya pctSark errorRoya errorSark;!  
title 'Pct Royal et Sarkozy deuxième tour';!
run;

It goes through several transformations. The first part concerns the transposition columns results online.
We manage the results on multiple columns. With the "summary" procedure, we get a line through replication. the sums
being performed by replication on filtered and replication by single columns. we
does not really add up the results. Then we calculate the proportions of results and error candidates.
We filter for a last time to display and simplify future calculations.

Results on the proportions of 50 replications:

ObsRoyalSarkozyObsRoyalSarkozyObsRoyalSarkozy
10.475470.52453180.460460.53954350.463250.53675
20.478700.52130190.485390.51461360.470520.52948
30.493310.50669200.492800.50720370.474070.52593
40.458810.54119210.477810.52219380.466030.53397
50.477550.52245220.452190.54781390.479590.52041
60.470660.52934230.446840.55316400.474850.52515
70.462340.53766240.482390.51761410.464670.53533
80.466700.53330250.480680.51932420.473030.52697
90.459110.54089260.467110.53289430.461830.53817
100.475450.52455270.470910.52909440.459050.54095
110.450460.54954280.467300.53270450.475780.52422
120.470800.52920290.481260.51874460.480500.51950
130.479020.52098300.461580.53842470.465880.53412
140.475720.52428310.468290.53171480.483080.51692
150.463410.53659320.470710.52929490.478870.52113
160.463100.53690330.471820.52818500.460130.53987
170.473380.52662340.465660.53434

Calculate the mean, standard deviation, the confidence interval and the histogram of the results.

For this we will use the sas procedure "univariate":

/* on calcul la moyenne, ecart type, intervalle de confiance et
histogramme */!
proc univariate data=FinalResultSasN220 CIBASIC;!    var pctRoya pctSark;!
histogram pctRoya pctSark;!
run;

 The results for the 220 ​​and 500 samples will be discussed later. but their
generation has been performed with the procedure described above.

Results of candidates

On 220 samples

Comment and detailed results of Ségolène Royal




Mean = 0.47057,  difference   type = 0.00993,
Confidence interval  : IC  to  95%  =  [0.46775  ;  0.47339]
Ségolène Royal is, by evidence, a lower score to Nicolas Sarkozy. The distribution of
the result is very dense between 0.46 and 0.48. The variance is close to a percentage point, which is already accurate.
The confidence interval tells us more precisely this difference that gives us a
average between 0.46775 and 0.47339 to 95%.

Comment and detailed results of Nicolas Sarkozy



Mean = 0.52943, standard deviation = 0.00993, confidence interval 95% CI = [0.52661, 0.53225]
Nicolas Sarkozy made ​​a very good score than Ségolène Royal. The distribution of the result is very dense between 0.52 and 0.54. The variance is close to a percentage point.
The confidence interval tells us the difference between giving an average 0.52661 and
0.53225.

On 500 samples

Comment and detailed results of Ségolène Royal


Mean = 0.47019, standard deviation = 0.00527
Confidence interval 95% CI = [0.46869, 0.47169]
Ségolène Royal has a lower score to Nicolas Sarkozy. The distribution of the result is very dense between 0.4625 and 0.4775. The variance is close to half a percentage point,
which is more accurate than the sample of 220 polling stations. The confidence interval tells us the difference between giving an average 0.46869 and
0.47169.

Comment and detailed results of Nicolas Sarkozy

Mean = 0.52981 Standard deviation = 0.00527  Confidence interval 95% CI = [0.52831, 0.53131] Nicolas Sarkozy made ​​a higher score Ségolène Royal. The distribution of the result 

is dense. The variance is close to a half percentage point.



No comments:

Post a Comment