Statistics Study: March 2014

Tuesday, March 11, 2014

Election night simulation

Introduction

A regression estimator for scores of "Ségolène Royal and Nicolas Sarkozy" is built: we do the regression on data from the first round. One estimate
is made from a sample of n = 220 stratified proportional allocation of the variable Strate (we do not take into account the stratification does not complicate the
calculation of the estimator).

Summary of the study on election night.

We use the regression based on estimated totals. We will change
then in proportion to the second round.
As in the previous study, I propose a first summary of future results
we find later.
Here are the final table:

It shows a first estimate made from a stratified sampling allocation
proportional. Then intervenes recovery that recovers the results.
This leads to compare with the reality of elections. The results are rather good accuracy. This allows to highlight the effectiveness of recovery
in improving the final estimate.
In the next chapters we will dissect all the steps that were
to achieve this result. We will study, from sampling, through regression to the final recovery. The algorithms are described one by one.

Description of the algorithms

Selection of polling stations stratification

The following program remains close to the algorithm previously used by canceling the
replication.

/* trie des données selon les strates */!
proc sort data=tPres2007 out=tPres2007; !by Strate; !run;!
!
/* echantillonage Strate sur les echantillons */!
proc surveyselect data=tPres2007 n=220 seed=92217 out=sasPres2007
stats;!
Strata Strate / alloc=prop ; !
size exprime;/* équilibrage de la taille des strates*/!
run;!!

/* trie des données selon les unités de réplication et les strates */!proc
sort data=sasPres2007 out=sasPres2007; !by Strate; !run ;

Description of the parameters is the same as the stratified sampling with proportional allocation.
To verify the presence of measurable bias and sampling error, we
analyze the result of the draw by town, state, and slice of urban area.

Distribution by town

Here are the Top 20 city:

The town that is most represented in the polls is paris. selection
reveals many common which was selected one post office. In practice it seems quite difficult, the day of the election to allocate
offices as a separate item of the territory.

Distribution survey by department

Here are the top 30 departments

Parisian departments are strongly represented with 92, 93, 75.
The northern department was the shot, followed by the department of Moselle. It was therefore in the north certainly chosen some twenty common. our
drawing seems quite unbalanced in terms of this department because of extrema. this
can be the source of a significant bias.

Distribution by Region

The island region of France is the most represented, followed by the North Pas de Calais and the Rhône Alpe. This is certainly the second position to the extreme selection
in the department of Nord seen previously.

Distribution Slice of Urban Area

I board here shows the true proportion of urban population segments
French. Comparing with the proportions of our samples, we note
well enough that we meet this requirement share the same representation that allows us to reduce a number of means of sample selection.

Estimate before recovery

We estimate the total as before removing the replication setting year and adding variables to be used for recovery.
/* total Strate sur les echantillons */!
proc surveymeans data=sasPres2007 total=65466 SUM;!
var ROYA2 SARK2 EXPRIME2 !
/* données pour regression */! BAYR BESA BOVE BUFF LAGU LEPE NIHO ROYA SARK SCHI VILL VOYN abstention;!
strata Strate;!
weight Samplingweight;!
ods output Statistics=estimations_sasdetail ;!
run;

The added variables correspond to the result of the first round candidates.
Estimation results

Reorganization results after estimation

We take advantage of this procedure to rename variables by prefixing their name recovery by "s".
/* transposition */!
proc transpose data=estimations_sasdetail out=FinalResultSas!
label=varname;!run;!
/* on met les résultat sur des colonnes différentes */!
data FinalResultSas ;!
set FinalResultSas;!
if _NAME_='Sum' then roya=col1 ;!
if _NAME_='Sum' then sark=col2;!
if _NAME_='Sum' then expr=col3;!
if _NAME_='Sum' then sBAYR=col4;!
if _NAME_='Sum' then sBESA=col5;!
if _NAME_='Sum' then sBOVE=col6;!
if _NAME_='Sum' then sBUFF=col7;!
if _NAME_='Sum' then sLAGU=col8;!
if _NAME_='Sum' then sLEPE=col9;!
if _NAME_='Sum' then sNIHO=col10;!
if _NAME_='Sum' then sROYA=col11;!
if _NAME_='Sum' then sSARK=col12;!
if _NAME_='Sum' then sSCHI=col13;!
if _NAME_='Sum' then sVILL=col14;!
if _NAME_='Sum' then sVOYN=col15;!
if _NAME_='Sum' then sabstention=col16;!!
if _NAME_='StdDev' then errorRoya=col1 ;!
if _NAME_='StdDev' then errorSark=col2;!
run;!
/* on s assure de sommer les colonnes différencié par replicate */!
proc summary data=FinalResultSas SUM;!
VAR roya sark expr errorRoya errorSark !sBAYR sBESA sBOVE sBUFF sLAGU
sLEPE!sNIHO sROYA sSARK sSCHI sVILL sVOYN!sabstention ;!!
output out=FinalResultSas
sum(roya)=roya sum(sark)=sark sum(expr)=expr
sum(errorRoya)=errorRoya sum(errorSark)=errorSark!
sum(sBAYR)=sBAYR sum(sBESA)=sBESA sum(sBOVE)=sBOVE
sum(sBUFF)=sBUFF sum(sLAGU)=sLAGU sum(sLEPE)=sLEPE!
sum(sNIHO)=sNIHO sum(sROYA)=sROYA sum(sSARK)=sSARK
sum(sSCHI)=sSCHI sum(sVILL)=sVILL
sum(sVOYN)=sVOYN!sum(sabstention)=sabstention ;!
run;

Calculation of real total

The "summary" procedure is used to sum the results of the first round
by candidate.

/* on calcul le total des données du premier tour */!
proc summary data = tPres2007; !
var BAYR BESA BOVE BUFF LAGU LEPE NIHO ROYA SARK SCHI VILL VOYN
abstention; !
output out=ProcSumOutTotal !
(!rename=(BAYR=tBAYR BESA=tBESA BOVE=tBOVE BUFF=tBUFF LAGU=tLAGU
LEPE=tLEPE!NIHO=tNIHO ROYA=tROYA SARK=tSARK SCHI=tSCHI VILL=tVILL
VOYN=tVOYN abstention=tabstention !))
sum=; !
run;

The results are:

Regression on data from the first round

We enjoy rename regression results by prefixing the regression
regression coefficients. By "es" for the coefficients of Sarkozy by "er" for the coefficients of Royal and "e" for the coefficients of the cast of the second round.

/* regression sur sarkozy */!
PROC REG DATA=sasPres2007 OUTEST=estimSark!(!rename=(_TYPE_=T1 BAYR=esBAYR
BESA=esBESA BOVE=esBOVE BUFF=esBUFF LAGU=esLAGU LEPE=esLEPE!NIHO=esNIHO
ROYA=esROYA SARK=esSARK SCHI=esSCHI VILL=esVILL VOYN=esVOYN
abstention=esabstention !));!
MODEL SARK2=BAYR BESA BOVE BUFF LAGU LEPE NIHO ROYA SARK SCHI VILL VOYN
abstention;!
OUTPUT OUT=sortieSark ; !
RUN ;!

/* regression sur royal */!
PROC REG DATA=sasPres2007 OUTEST=estimRoya!(!rename=(_TYPE_=T2
BAYR=erBAYR BESA=erBESA BOVE=erBOVE BUFF=erBUFF LAGU=erLAGU
LEPE=erLEPE!NIHO=erNIHO ROYA=erROYA SARK=erSARK SCHI=erSCHI VILL=erVILL
VOYN=erVOYN abstention=erabstention !));!
MODEL ROYA2=BAYR BESA BOVE BUFF LAGU LEPE NIHO ROYA SARK SCHI VILL VOYN
abstention;!
OUTPUT OUT=sortieRoya ;!
RUN ;!

/* regression sur exprime */!
PROC REG DATA=sasPres2007 OUTEST=estimExprime!(!rename=(_TYPE_=T3
BAYR=eBAYR BESA=eBESA BOVE=eBOVE BUFF=eBUFF LAGU=eLAGU
LEPE=eLEPE!NIHO=eNIHO ROYA=eROYA SARK=eSARK SCHI=eSCHI VILL=eVILL
VOYN=eVOYN abstention=eabstention !));!
MODEL EXPRIME2=BAYR BESA BOVE BUFF LAGU LEPE NIHO ROYA SARK SCHI VILL VOYN
abstention;!
OUTPUT OUT=sortieExprime ;!
RUN ;

The coefficients results are

This regression shows three political tendencies of voters. The first trend involves people who voted BAYR, Bove, HiLo and VOYN. these

people with obvious difficulty to choose one of two candidates in the second round. Or even less voted in the second round.

The second trend is that of people voting for the candidate Sarkozy, we

find all shades of the political class itself right with LEPE, SARK and VILL candidates.

The third trend is the Royal candidate who finally meets more reps with BESA, BUFF, LAGU, Roya and SCHI.

Abstention may benefit to candidates.

Consolidation of Results

Filtered on interesting data for our final calculation.

data FinalResultSas (keep=Roya Sark expr!sBAYR sBESA sBOVE sBUFF sLAGU

sLEPE!sNIHO sROYA sSARK sSCHI sVILL sVOYN!sabstention sExprime

sVotants!tBAYR tBESA tBOVE tBUFF tLAGU tLEPE!tNIHO tROYA tSARK tSCHI tVILL

tVOYN !tabstention texprime tvotants!eBAYR eBESA eBOVE eBUFF eLAGU

eLEPE!eNIHO eROYA eSARK eSCHI eVILL eVOYN !eabstention !erBAYR erBESA

erBOVE erBUFF erLAGU erLEPE!erNIHO erROYA erSARK erSCHI erVILL erVOYN

!erabstention !esBAYR esBESA esBOVE esBUFF esLAGU esLEPE!esNIHO esROYA

esSARK esSCHI esVILL esVOYN !esabstention !);!

set FinalResultSas;!

merge ProcSumOutTotal estimExprime estimRoya estimSark;!

run;

Adjustments and estimates of total proportions

The calculation of the regression estimation is as follows:

Where T is the actual total parameter of each of the first round (prefixed by "t"). and
that 𝑇 (prefixed by "s") is the sample estimate of these parameters. And where β are the regression coefficients for each candidate and expressed.

/* Calcul final avec redressement */!
data FinalResultSas;!
set FinalResultSas;!

/* redressement royal */
Roya2=Roya+erBAYR*(tBAYR-sBAYR)+erBESA*(tBESA-sBESA) +erBOVE*(tBOVE-sBOVE)
!+erBUFF*(tBUFF-sBUFF) +erLAGU*(tLAGU-sLAGU) +erLEPE*(tLEPE-
sLEPE)!+erNIHO*(tNIHO-sNIHO) +erROYA*(tROYA-sROYA) +erSARK*(tSARK-sSARK)
!+erSCHI*(tSCHI-sSCHI) +erVILL*(tVILL-sVILL) +erVOYN*(tVOYN-sVOYN)
!+erabstention*(tabstention-sabstention);!

/* Redressement Sarkozy */
Sark2=Sark+esBAYR*(tBAYR-sBAYR)+esBESA*(tBESA-sBESA) +esBOVE*(tBOVE-sBOVE)
!+esBUFF*(tBUFF-sBUFF) +esLAGU*(tLAGU-sLAGU) +esLEPE*(tLEPE-
sLEPE)!+esNIHO*(tNIHO-sNIHO) +esROYA*(tROYA-sROYA) +esSARK*(tSARK-sSARK)
!+esSCHI*(tSCHI-sSCHI) +esVILL*(tVILL-sVILL) +esVOYN*(tVOYN-sVOYN)
!+esabstention*(tabstention-sabstention);!

/* Redressement du nombre d’exprimés */
Exprime2=Expr+eBAYR*(tBAYR-sBAYR)+eBESA*(tBESA-sBESA) +eBOVE*(tBOVE-sBOVE)
!+eBUFF*(tBUFF-sBUFF) +eLAGU*(tLAGU-sLAGU) +eLEPE*(tLEPE-
sLEPE)!+eNIHO*(tNIHO-sNIHO) +eROYA*(tROYA-sROYA) +eSARK*(tSARK-sSARK)
!+eSCHI*(tSCHI-sSCHI) +eVILL*(tVILL-sVILL) +eVOYN*(tVOYN-sVOYN)
!+eabstention*(tabstention-sabstention);!

/* Calcul des proportions */
pctRoya=Roya/Expr;!
pctSark=Sark/Expr;!
pctRoyaRedresse=Roya2/Exprime2;!
pctSarkRedresse=Sark2/Exprime2;!
run;

The recovery in numbers

We find here in one table all the parameters that will allow to straighten our estimator of the total. This allows us to see the error of estimate
totals. Here we see that we have overestimated the abstention and VOYN, Roya, LEPE, LAGU, BUFF and BESA candidate. And that you overestimated VILL candidates SCHI, SARK,
Nino, Bove and BAYR.

Results before and after adjustment

Here are the final results with their estimated values then recovered.

I board added the final result to compare both the total proportions.
First, we find that the result after recovery is extremely close to the expected end result as well as a share in total .
The recovery has reduced the weight of the Candidate Royal in favor of Candidate
Sarkozy . This has also helped restore the size of the people expressed. This result is due to an excellent draw that was used to select a distribution
balanced slices urban areas of the population. It might have been more accurate if the selection was not as described in the same department .
So we see two phenomena:

The choice of the polling stations and stratification.
The adjustment

These two phenomena have led to a precision and the second reduction
selection bias and coverage.

Conclusion

The results of all draws are:

Survey size	Methos	Sarkozy	Royal	Standard deviation
220	Simple Sampling	0.52943	0.47057	0.00993
220	Systemtic Sampling	0.53117	0.46883	0.00731
220	strata	0.53112	0.46888	0.00589
500	Simple Sampling	0.52981	0.47019	0.00527
500	Systemtic Sampling	0.52948	0.47052	0.00425
500	strata	0.52980	0.47020	0.00346

We observe two phenomena that lead a better estimate of policy variables.

The first is the number . Whatever the sampling method , the number of samples greatly improves the results. Knowing computing variance estimator of sampling without replacement is We see that over the "n" value of the number of samples increases, the variance decreases. We see a correlation between theory and practice.
The second is the sampling method. The simple random sampling without replacement is the easiest survey , the result is an estimate of basis for comparison . His estimate is higher than other estimates. the survey systematic sampling , we offer a better result with a practical implementation raises some question when was the result. Stratified sampling with proportional allocation of benefits lowest standard deviation between stratum alike. It is better by definition. Practice here agree well practice to theory.

Stratified sampling with proportional allocation

Description of the algorithms

we use the same algorithms as the sampling and simple random
systematic without replacement. The difference being of course the selection of the sample. We use the same principles for calculating total and proportions to finish on
statistical evaluation of the results.

Selection algorithm

The selection algorithm is as follows (example 220 samples):

/* trie des données selon les strates */
proc sort data=tPres2007 out=tPres2007;
by Strate;
run;!
/* echantillonage Strate sur les echantillons */!
proc surveyselect data=tPres2007 n=220 reps=50 seed=1213 out=sasPres2007
stats;! Strata Strate / alloc=prop ;! size exprime;!/* équilibrqge de la taille des strates*/
run;!
/* trie des données selon les unités de réplication et les strates */
proc sort data=sasPres2007 out=sasPres2007;
by Replicate Strate;
run ;

We use the procedure sas "SURVEYSELECT" by selecting a stratification "Strata" on the variable Strate.
The "size" parameter is used to balance the strata population size. The algorithm selects a proportional allocation units according to the size of layers with the addition of "alloc = prop" option.

We used the "rep" parameter to generate 50 replications and the seed for the random generation: 1213. This generates 50 * 220 we pooled samples in the dataset "sasPres2007."
The "stats" variable we then used to calculate the total with the procedure "Surveymean".
For this algorithm, we trillion variables before starting the procedure Strate.
Then after selecting us retrions by number of replication and Strate.
The code is identical to the print samples 500 except that "n" is 500.

Size of Strata

To know the size of the strata resulting from the procedure, we use the procedure freq following:
/* On compte les strates */!
proc freq data=sasPres2007;!
table Strate;!
by replicate;!
run;

The sample size per stratum according to this algorithm gives us 220 and 500
samples the following table.

Strate	Number of Samples for 220	Number of Samples for 220
1	25	56
2	26	58
3	43	99
4	32	73
5	47	107
6	47	107

Calculation of total

Use the procedure "surveymean" is enhanced with a new parameter
"Strata" which defines the stratification variable.

/* total Strate sur les echantillons */!
proc surveymeans data=sasPres2007 total=65466 SUM;!
var ROYA2 SARK2 EXPRIME2;!
by Replicate;!
strata Strate;!
weight Samplingweight;!
ods output Statistics=estimations_sasdetail ;!
run;

Example output from the first sample (first replication):

Variable	Total	Standard deviation
Royal	16582766	160889
Sarkozy	18474255	161341
Exprimed	35057020	92186

Results of candidates

On 220 samples

Comment and detailed results of Ségolène Royal

Mean = 0.46888, standard deviation = 0.00589, confidence interval 95% CI = [0.46721;
0.47055] A similar sample and comparing the systematic sampling and random selection
without replacement, we see better results with the effect of survey 0.59. Is here with many polling a much better accuracy. The effect of stratum
is clear and effective.

Comment and detailed results of Nicolas Sarkozy

Mean = 0.53112, standard deviation = 0.00589, confidence interval 95% CI = [0.52945, 0.53279]
The confidence interval is contained, more accurate and allows us to approach the true
value of candidate Nicolas Sarkozy.

On 500 samples

Comment and detailed results of Ségolène Royal

Mean = 0.47020
Standard deviation = 0.00346 confidence interval 95% CI = [0.46922, 0.47118]
We are here, on the most accurate results of this first study. All survey methods used, none could give a result as interesting.
Here, the effect of sampling is 0.65. By combining a large number of office and
good stratification, we get the good result.

Comment and detailed results of Nicolas Sarkozy

Mean = 0.52980, standard deviation = 0.00346, confidence interval 95% CI = [0.52882;
0.53078]
Extremely accurate this result highlights the quality of the methodology used and the associated algorithms.

Systematic survey

Description of the algorithms

We use the same algorithms as the simple random sampling without discount.
The difference being of course the selection of the sample. We use the same principles for calculating total and proportions to finish on statistical evaluation of the results.

Selection algorithm

The selection algorithm , which is close to the selection algorithm of a survey
simple random without replacement, is as follows (example 220 samples):

/* echantillonage Sas sur les echantillons */!
proc surveyselect data=tPres2007 method=SYS n=220 !
reps=50 seed=1213 out=sasPres2007 stats;!
run;

We use the procedure sas " SURVEYSELECT " by selecting the " SYS " method. The algorithm selects the units to a fixed interval . The size of the interval is
calculated by N / n . Here the interval is : 297 . Every 297 samples , the algorithm
selected value. The starting value which is therefore between 1 and 297 is selected at random . The selection is made equal probability (n / N) and without replacement.
We used the " rep " parameter to generate 50 replications and the seed for the random generation 1213.Ceci we generate 50 * 220 pooled samples in the
dataset " sasPres2007 ." The "stats" variable we used then to calculate the total with
the procedure " surveymean ." code is the same for the circulation of 500 samples with the difference that "n" is 500.

Results Of 220 samples

Comment samples and detailed results of Ségolène Royal

Mean = 0.46883, standard deviation = 0.00731 confidence interval 95% CI = [0.46675, 0.47091]

The accuracy of this draw is immediately more accurate than simple random sampling without replacement equal size survey.

The effect of sampling is 0.73. We won almost 26% difference - the type of the result.

Comment and detailed results of Nicolas Sarkozy

Mean = 0.53117

Standard deviation = 0.00731

Confidence interval 95% CI = [0.52909, 0.53325] This draw gives Sarkozy winner 53.117%. This result is close to the final value

53.04%. The variance is more accurate than simple random sampling with a sampling effect of nearly 73%.

On 500 samples

Comment and detailed results of Ségolène Royal

Mean = 0.47052 , standard deviation = 0.00425 , confidence interval 95% CI = [0 . 46931 ;

0.47173 ] I must say that by studying this distribution , it 's quickly see an effect

staircase is attached has the pulling method . The list of polling stations has not been sorted before the draw . We certainly face a sampling error,

cycle or not , even if it seems fairly obvious to determine. This can be a

explanation for this surprising result . A prior authorization by Slice of Urban Area certainly give us

a better result. The standard deviation here is smaller than the simple random sampling without replacement sample

equal size ( n = 500 ) .

The effect of this sample is 0.8 which is greater than that found on the sample size to 220 .

The Gaussian assumption seems compromised.

Comment and detailed results of Nicolas Sarkozy

Mean = 0.52948, standard deviation = 0.00425, confidence interval 95% CI = [0.52827;

0.53069] can be found on the distribution, the opposite effect on the outcome of candidate Royal staircase

down. The draw is not sorted according to a previously good distribution, we get this.

Monday, March 10, 2014

Simple random sampling without replacement

Introduction

First part describes the algorithms implemented. The selection of the sample, looking for totals and proportions, and eventually the statistical calculation.

The second part shows the expected results for each candidate.
To calculate the average we will rely on the total so do all our calculations on total

Algorithms

The explanation of the method will be 220 samples. We begin with the selection of the sample.
Then we focus on total return for our proportions and finally the calculation
the mean, standard deviation and histogram of our 50 replications

Selection of the sample

We use the procedure sas "SURVEYSELECT" by selecting the "SRS" method. This method uses an algorithm to hash Neat Floyd. the

selection is made equal probability (n / N) and without replacement.

Here is the code for 220 samples:

/* échantillonnage Sas sur 220 échantillons */!
proc surveyselect data=tPres2007 method=SRS n=220 reps=50 seed=1213
out=sasPres2007n220 stats;!
run;

We used the "rep" parameter to generate 50 replications and the seed for the random generation in 1213.
This generated 50 * 220 we pooled samples in the dataset "sasPres2007n220."
Each replication is identified by a column named "replicate". The "stats" variable we will then calculate the total for the procedure
"Surveymean". The code is identical to the print samples 500 except that "n" is 500.

Calculation of totals

To calculate totals, we use the method:

/* total Sas sur 220 echantillons */!
proc surveymeans data=sasPres2007 total=65466 SUM;!
var ROYA2 SARK2 EXPRIME2;!
by Replicate;!
weight Samplingweight;!
ods output Statistics=estimations_sasdetail ;!
run;

To calculate the total we use "weight" variables with sampling weights
calculate the "SURVEYSELECT" and we distinguish the total polling method.
The addition of the "total" parameter gives the total number of line "dataset" which is useful to the algorithm in the course of calculation.
We use "by" parameter to calculate the sample replication also create the variable "replicate" generated by the SURVEYSELECT procedure.

Example output from the first sample (first replication):

Variable	Total	Standard Deviation
Royal	16573313	777767
Sarkozy	18283464	807676
Expressed	34856777	1486216

This estimator is close to 35158378 people who have actually expressed during the electoral suffrage.

Calculating proportions

To perform this simple calculation, we will launch several calls lock procedure and put the data in order.

/* transposition */!

proc transpose data=estimations_sasdetail out=FinalResultSasN220!

label=varname;! by replicate;!

run;!

/* on met les résultat sur des colonnes différentes */!

data FinalResultSasN220 ;!

set FinalResultSasN220;!

if _NAME_='Sum' then roya=col1 ;!

if _NAME_='Sum' then sark=col2;!

if _NAME_='Sum' then expr=col3;!

if _NAME_='StdDev' then errorRoya=col1 ;!

if _NAME_='StdDev' then errorSark=col2;!

run;!

/* on s assure de sommer les colonnes différencié par replicate */!

proc summary data=FinalResultSasN220 SUM;!

BY replicate;!

VAR roya sark expr errorRoya errorSark;!

output out=FinalResultSasN220 sum(roya)=roya sum(sark)=sark sum(expr)=expr

sum(errorRoya)=errorRoya sum(errorSark)=errorSark;!

run;!

/* on calcul les proportions */!
data FinalResultSasN220 ;!
set FinalResultSasN220;!
pctRoya=roya/expr ;!
pctSark=sark/expr;!
errorRoya=errorRoya/roya ;!
errorSark=errorSark/sark;!
run;!

/* on filtre les résultats */!
data FinalResultSasN220;!
set FinalResultSasN220(keep=replicate pctRoya pctSark errorRoya
errorSark);!
run;!

/* on affiche les résultats */!
proc print data=FinalResultSasN220 double;!
var pctRoya pctSark errorRoya errorSark;!
title 'Pct Royal et Sarkozy deuxième tour';!
run;

It goes through several transformations. The first part concerns the transposition columns results online.
We manage the results on multiple columns. With the "summary" procedure, we get a line through replication. the sums
being performed by replication on filtered and replication by single columns. we
does not really add up the results. Then we calculate the proportions of results and error candidates.
We filter for a last time to display and simplify future calculations.

Results on the proportions of 50 replications:

Obs	Royal	Sarkozy	Obs	Royal	Sarkozy	Obs	Royal	Sarkozy
1	0.47547	0.52453	18	0.46046	0.53954	35	0.46325	0.53675
2	0.47870	0.52130	19	0.48539	0.51461	36	0.47052	0.52948
3	0.49331	0.50669	20	0.49280	0.50720	37	0.47407	0.52593
4	0.45881	0.54119	21	0.47781	0.52219	38	0.46603	0.53397
5	0.47755	0.52245	22	0.45219	0.54781	39	0.47959	0.52041
6	0.47066	0.52934	23	0.44684	0.55316	40	0.47485	0.52515
7	0.46234	0.53766	24	0.48239	0.51761	41	0.46467	0.53533
8	0.46670	0.53330	25	0.48068	0.51932	42	0.47303	0.52697
9	0.45911	0.54089	26	0.46711	0.53289	43	0.46183	0.53817
10	0.47545	0.52455	27	0.47091	0.52909	44	0.45905	0.54095
11	0.45046	0.54954	28	0.46730	0.53270	45	0.47578	0.52422
12	0.47080	0.52920	29	0.48126	0.51874	46	0.48050	0.51950
13	0.47902	0.52098	30	0.46158	0.53842	47	0.46588	0.53412
14	0.47572	0.52428	31	0.46829	0.53171	48	0.48308	0.51692
15	0.46341	0.53659	32	0.47071	0.52929	49	0.47887	0.52113
16	0.46310	0.53690	33	0.47182	0.52818	50	0.46013	0.53987
17	0.47338	0.52662	34	0.46566	0.53434

Calculate the mean, standard deviation, the confidence interval and the histogram of the results.

For this we will use the sas procedure "univariate":

/* on calcul la moyenne, ecart type, intervalle de confiance et
histogramme */!
proc univariate data=FinalResultSasN220 CIBASIC;! var pctRoya pctSark;!
histogram pctRoya pctSark;!
run;

The results for the 220 and 500 samples will be discussed later. but their
generation has been performed with the procedure described above.

Results of candidates

On 220 samples

Comment and detailed results of Ségolène Royal

Mean = 0.47057, difference type = 0.00993,
Confidence interval : IC to 95% = [0.46775 ; 0.47339]
Ségolène Royal is, by evidence, a lower score to Nicolas Sarkozy. The distribution of
the result is very dense between 0.46 and 0.48. The variance is close to a percentage point, which is already accurate.
The confidence interval tells us more precisely this difference that gives us a
average between 0.46775 and 0.47339 to 95%.

Comment and detailed results of Nicolas Sarkozy

Mean = 0.52943, standard deviation = 0.00993, confidence interval 95% CI = [0.52661, 0.53225]
Nicolas Sarkozy made a very good score than Ségolène Royal. The distribution of the result is very dense between 0.52 and 0.54. The variance is close to a percentage point.
The confidence interval tells us the difference between giving an average 0.52661 and
0.53225.

On 500 samples

Comment and detailed results of Ségolène Royal

Mean = 0.47019, standard deviation = 0.00527
Confidence interval 95% CI = [0.46869, 0.47169]
Ségolène Royal has a lower score to Nicolas Sarkozy. The distribution of the result is very dense between 0.4625 and 0.4775. The variance is close to half a percentage point,
which is more accurate than the sample of 220 polling stations. The confidence interval tells us the difference between giving an average 0.46869 and
0.47169.

Comment and detailed results of Nicolas Sarkozy

Mean = 0.52981 Standard deviation = 0.00527 Confidence interval 95% CI = [0.52831, 0.53131] Nicolas Sarkozy made a higher score Ségolène Royal. The distribution of the result

is dense. The variance is close to a half percentage point.