Tuesday, March 11, 2014

Election night simulation



Election night simulation

Introduction

A regression estimator for scores of "Ségolène Royal and Nicolas Sarkozy" is built: we do the regression on data from the first round. One estimate 
is made from a sample of n = 220 stratified proportional allocation of the variable Strate (we do not take into account the stratification does not complicate the 
calculation of the estimator). 


Summary of the study on election night. 

We use the regression based on estimated totals. We will change 
then in proportion to the second round. 
   As in the previous study, I propose a first summary of future results 
we find later. 
   Here are the final table:



It shows a first estimate made from a stratified sampling allocation 
proportional. Then intervenes recovery that recovers the results. 
This leads to compare with the reality of elections. The results are rather good accuracy. This allows to highlight the effectiveness of recovery 
in improving the final estimate. 
In the next chapters we will dissect all the steps that were 
to achieve this result. We will study, from sampling, through regression to the final recovery. The algorithms are described one by one.



Description of the algorithms 

  Selection of polling stations stratification 


The following program remains close to the algorithm previously used by canceling the 
replication.

/* trie des données selon les strates */! 
proc sort data=tPres2007 out=tPres2007; !by Strate; !run;! 

/* echantillonage Strate sur les echantillons */! 
proc surveyselect data=tPres2007  n=220  seed=92217  out=sasPres2007 
stats;! 
Strata Strate / alloc=prop ; ! 
size exprime;/* équilibrage de la taille des strates*/! 
run;!! 

/* trie des données selon les unités de réplication et les strates */!proc 
sort data=sasPres2007 out=sasPres2007; !by Strate; !run ; 

Description of the parameters is the same as the stratified sampling with proportional allocation. 
To verify the presence of measurable bias and sampling error, we 
analyze the result of the draw by town, state, and slice of urban area.


Distribution by town 

Here are the Top 20 city:



The town that is most represented in the polls is paris. selection 
reveals many common which was selected one post office. In practice it seems quite difficult, the day of the election to allocate 
offices as a separate item of the territory.


Distribution survey by department

Here are the top 30 departments



Parisian departments are strongly represented with 92, 93, 75. 
The northern department was the shot, followed by the department of Moselle. It was therefore in the north certainly chosen some twenty common. our 
drawing seems quite unbalanced in terms of this department because of extrema. this 
can be the source of a significant bias.


Distribution by Region




The island region of France is the most represented, followed by the North Pas de Calais and the Rhône Alpe. This is certainly the second position to the extreme selection 
in the department of Nord seen previously.


Distribution Slice of Urban Area





I board here shows the true proportion of urban population segments 
French. Comparing with the proportions of our samples, we note 
well enough that we meet this requirement share the same representation that allows us to reduce a number of means of sample selection.


Estimate before recovery


We estimate the total as before removing the replication setting year and adding variables to be used for recovery.
/* total Strate sur les echantillons */! 
proc surveymeans data=sasPres2007 total=65466 SUM;!    
var ROYA2 SARK2 EXPRIME2 !    
/* données pour regression */!    BAYR BESA BOVE BUFF LAGU LEPE NIHO ROYA SARK SCHI VILL VOYN abstention;!    
strata Strate;!    
weight Samplingweight;!    
ods output Statistics=estimations_sasdetail ;! 
run; 

The added variables correspond to the result of the first round candidates. 
Estimation results


Reorganization results after estimation 

We take advantage of this procedure to rename variables by prefixing their name recovery by "s".
/* transposition */! 
proc transpose data=estimations_sasdetail  out=FinalResultSas! 
label=varname;!run;! 
/* on met les résultat sur des colonnes différentes */! 
data FinalResultSas ;!  
set FinalResultSas;!  
if _NAME_='Sum' then roya=col1 ;!  
if _NAME_='Sum' then sark=col2;!  
if _NAME_='Sum' then expr=col3;!  
if _NAME_='Sum' then sBAYR=col4;!  
if _NAME_='Sum' then sBESA=col5;!  
if _NAME_='Sum' then sBOVE=col6;!  
if _NAME_='Sum' then sBUFF=col7;!  
if _NAME_='Sum' then sLAGU=col8;!  
if _NAME_='Sum' then sLEPE=col9;!  
if _NAME_='Sum' then sNIHO=col10;!  
if _NAME_='Sum' then sROYA=col11;!  
if _NAME_='Sum' then sSARK=col12;!  
if _NAME_='Sum' then sSCHI=col13;!  
if _NAME_='Sum' then sVILL=col14;!  
if _NAME_='Sum' then sVOYN=col15;!  
if _NAME_='Sum' then sabstention=col16;!!  
if _NAME_='StdDev' then errorRoya=col1 ;!  
if _NAME_='StdDev' then errorSark=col2;! 
run;! 
/* on s assure de sommer les colonnes différencié par replicate */! 
proc summary data=FinalResultSas SUM;! 
VAR roya sark expr errorRoya errorSark !sBAYR sBESA sBOVE sBUFF sLAGU 
sLEPE!sNIHO sROYA sSARK sSCHI sVILL sVOYN!sabstention ;!! 
output out=FinalResultSas  
sum(roya)=roya sum(sark)=sark sum(expr)=expr  
sum(errorRoya)=errorRoya sum(errorSark)=errorSark! 
sum(sBAYR)=sBAYR sum(sBESA)=sBESA sum(sBOVE)=sBOVE  
sum(sBUFF)=sBUFF sum(sLAGU)=sLAGU sum(sLEPE)=sLEPE! 
sum(sNIHO)=sNIHO sum(sROYA)=sROYA sum(sSARK)=sSARK  
sum(sSCHI)=sSCHI sum(sVILL)=sVILL 
sum(sVOYN)=sVOYN!sum(sabstention)=sabstention ;! 
run; 


Calculation of real total

The "summary" procedure is used to sum the results of the first round 
by candidate.

/* on calcul le total des données du premier tour */! 
proc summary data = tPres2007; ! 
var BAYR BESA BOVE BUFF LAGU LEPE NIHO ROYA SARK SCHI VILL VOYN 
abstention; ! 
output out=ProcSumOutTotal ! 
(!rename=(BAYR=tBAYR BESA=tBESA BOVE=tBOVE BUFF=tBUFF LAGU=tLAGU 
LEPE=tLEPE!NIHO=tNIHO ROYA=tROYA SARK=tSARK SCHI=tSCHI VILL=tVILL 
VOYN=tVOYN abstention=tabstention !))  
sum=; ! 
run;   

The results are:



Regression on data from the first round 

We enjoy rename regression results by prefixing the regression 
regression coefficients. By "es" for the coefficients of Sarkozy by "er" for the coefficients of Royal and "e" for the coefficients of the cast of the second round.

/* regression sur sarkozy */! 
PROC REG DATA=sasPres2007 OUTEST=estimSark!(!rename=(_TYPE_=T1 BAYR=esBAYR 
BESA=esBESA BOVE=esBOVE BUFF=esBUFF LAGU=esLAGU LEPE=esLEPE!NIHO=esNIHO 
ROYA=esROYA SARK=esSARK SCHI=esSCHI VILL=esVILL VOYN=esVOYN 
abstention=esabstention !));! 
MODEL SARK2=BAYR BESA BOVE BUFF LAGU LEPE NIHO ROYA SARK SCHI VILL VOYN 
abstention;! 
OUTPUT OUT=sortieSark ; ! 
RUN ;! 

/* regression sur royal */! 
PROC REG DATA=sasPres2007  OUTEST=estimRoya!(!rename=(_TYPE_=T2 
BAYR=erBAYR BESA=erBESA BOVE=erBOVE BUFF=erBUFF LAGU=erLAGU 
LEPE=erLEPE!NIHO=erNIHO ROYA=erROYA SARK=erSARK SCHI=erSCHI VILL=erVILL 
VOYN=erVOYN abstention=erabstention !));! 
MODEL ROYA2=BAYR BESA BOVE BUFF LAGU LEPE NIHO ROYA SARK SCHI VILL VOYN 
abstention;! 
OUTPUT OUT=sortieRoya ;! 
RUN ;! 

/* regression sur exprime */! 
PROC REG DATA=sasPres2007  OUTEST=estimExprime!(!rename=(_TYPE_=T3 
BAYR=eBAYR BESA=eBESA BOVE=eBOVE BUFF=eBUFF LAGU=eLAGU 
LEPE=eLEPE!NIHO=eNIHO ROYA=eROYA SARK=eSARK SCHI=eSCHI VILL=eVILL 
VOYN=eVOYN abstention=eabstention !));! 
MODEL EXPRIME2=BAYR BESA BOVE BUFF LAGU LEPE NIHO ROYA SARK SCHI VILL VOYN 
abstention;! 
OUTPUT OUT=sortieExprime ;! 
RUN ; 

The coefficients results are 

This regression shows three political tendencies of voters. The first trend involves people who voted BAYR, Bove, HiLo and VOYN. these 
people with obvious difficulty to choose one of two candidates in the second round. Or even less voted in the second round. 
The second trend is that of people voting for the candidate Sarkozy, we 
find all shades of the political class itself right with LEPE, SARK and VILL candidates. 
The third trend is the Royal candidate who finally meets more reps with BESA, BUFF, LAGU, Roya and SCHI. 

Abstention may benefit to candidates.


Consolidation of Results 

Filtered on interesting data for our final calculation.


data FinalResultSas (keep=Roya Sark expr!sBAYR sBESA sBOVE sBUFF sLAGU 
sLEPE!sNIHO sROYA sSARK sSCHI sVILL sVOYN!sabstention sExprime 
sVotants!tBAYR tBESA tBOVE tBUFF tLAGU tLEPE!tNIHO tROYA tSARK tSCHI tVILL 
tVOYN !tabstention texprime tvotants!eBAYR eBESA eBOVE eBUFF eLAGU 
eLEPE!eNIHO eROYA eSARK eSCHI eVILL eVOYN !eabstention !erBAYR erBESA 
erBOVE erBUFF erLAGU erLEPE!erNIHO erROYA erSARK erSCHI erVILL erVOYN 
!erabstention !esBAYR esBESA esBOVE esBUFF esLAGU esLEPE!esNIHO esROYA 
esSARK esSCHI esVILL esVOYN !esabstention !);!  
set FinalResultSas;!  
merge ProcSumOutTotal estimExprime estimRoya estimSark;! 
run; 

Adjustments and estimates of total proportions

The calculation of the regression estimation is as follows:
Where T is the actual total parameter of each of the first round (prefixed by "t"). and 
that 𝑇 (prefixed by "s") is the sample estimate of these parameters. And where β are the regression coefficients for each candidate and expressed.

/* Calcul final avec redressement */! 
data FinalResultSas;! 
set FinalResultSas;! 

/* redressement royal */ 
Roya2=Roya+erBAYR*(tBAYR-sBAYR)+erBESA*(tBESA-sBESA) +erBOVE*(tBOVE-sBOVE) 
!+erBUFF*(tBUFF-sBUFF) +erLAGU*(tLAGU-sLAGU) +erLEPE*(tLEPE-
sLEPE)!+erNIHO*(tNIHO-sNIHO) +erROYA*(tROYA-sROYA) +erSARK*(tSARK-sSARK) 
!+erSCHI*(tSCHI-sSCHI) +erVILL*(tVILL-sVILL) +erVOYN*(tVOYN-sVOYN) 
!+erabstention*(tabstention-sabstention);! 

/* Redressement Sarkozy */ 
Sark2=Sark+esBAYR*(tBAYR-sBAYR)+esBESA*(tBESA-sBESA) +esBOVE*(tBOVE-sBOVE) 
!+esBUFF*(tBUFF-sBUFF) +esLAGU*(tLAGU-sLAGU) +esLEPE*(tLEPE-
sLEPE)!+esNIHO*(tNIHO-sNIHO) +esROYA*(tROYA-sROYA) +esSARK*(tSARK-sSARK) 
!+esSCHI*(tSCHI-sSCHI) +esVILL*(tVILL-sVILL) +esVOYN*(tVOYN-sVOYN) 
!+esabstention*(tabstention-sabstention);! 

/* Redressement du nombre d’exprimés */ 
Exprime2=Expr+eBAYR*(tBAYR-sBAYR)+eBESA*(tBESA-sBESA) +eBOVE*(tBOVE-sBOVE) 
!+eBUFF*(tBUFF-sBUFF) +eLAGU*(tLAGU-sLAGU) +eLEPE*(tLEPE-
sLEPE)!+eNIHO*(tNIHO-sNIHO) +eROYA*(tROYA-sROYA) +eSARK*(tSARK-sSARK) 
!+eSCHI*(tSCHI-sSCHI) +eVILL*(tVILL-sVILL) +eVOYN*(tVOYN-sVOYN) 
!+eabstention*(tabstention-sabstention);! 

/* Calcul des proportions */ 
pctRoya=Roya/Expr;! 
pctSark=Sark/Expr;! 
pctRoyaRedresse=Roya2/Exprime2;! 
pctSarkRedresse=Sark2/Exprime2;! 
run; 

The recovery in numbers

We find here in one table all the parameters that will allow to straighten our estimator of the total. This allows us to see the error of estimate 
totals. Here we see that we have overestimated the abstention and VOYN, Roya, LEPE, LAGU, BUFF and BESA candidate. And that you overestimated VILL candidates SCHI, SARK, 
Nino, Bove and BAYR.


Results before and after adjustment

Here are the final results with their estimated values ​​then recovered.

I board added the final result to compare both the total proportions.
First, we find that the result after recovery is extremely close to the expected end result as well as a share in total .
The recovery has reduced the weight of the Candidate Royal in favor of Candidate
Sarkozy . This has also helped restore the size of the people expressed. This result is due to an excellent draw that was used to select a distribution
balanced slices urban areas of the population. It might have been more accurate if the selection was not as described in the same department .
So we see two phenomena: 

  • The choice of the polling stations and stratification.
  • The adjustment 

These two phenomena have led to a precision and the second reduction
selection bias and coverage.

No comments:

Post a Comment