Walking you through sep

sep can be downloaded as follows:

devtools::install_github("bonschorno/sep")
library(sep)
library(dplyr)
library(ggplot2)

Currently, sep offers the following functions:

#>  [1] "%>%"               "cb_mistable"       "cb_page"          
#>  [4] "cb_pages"          "cb_sumplot"        "cb_sumtab"        
#>  [7] "cb_table"          "eda_crosstable"    "eda_freqtable"    
#> [10] "helper_percentage" "n_par"             "plot_bar_h"       
#> [13] "plot_bar_v"        "plot_groupbar_h"   "plot_groupbar_v"  
#> [16] "sep_palette"       "sep_palettes"      "share_dk"         
#> [19] "theme_sep"

The prefixes indicate the step in the workflow. The functions beginning with cb simplify creating the codebook. The eda functions are for exploratory data analysis. The functions starting with plot, in turn, help to visualise the data. Finally, there are some additional helper functions.

FYI: The dataset used here is not publicly accessible for data protection reasons. However, it can be requested here.

load("upanel_w5.RData")

Very often we need to know the relative frequencies of the responses to a survey item. eda_freqtable helps to get these numbers quickly. By default, the logical operator is rm.dk = FALSE, i.e. the relative proportion of “Don’t know” responses is included.

eda_freqtable(w5, item = w5_q27)
#> Relative frequencies:
#>    -8     1     2     3     4     5     6     7 
#>  6.37  0.90  2.60 10.98 43.55 18.76 12.11  4.72
eda_freqtable(w5, item = w5_q27, rm.dk = T)
#> Relative frequencies:
#>     1     2     3     4     5     6     7 
#>  0.97  2.78 11.73 46.51 20.03 12.94  5.04

To calculate the relative frequencies based on two variables, you can use eda_crosstable. The two variables must be specified as characters in a vector. Here, too, “Don’t knows” can be included/excluded via the logical operator rm.dk.

eda_crosstable(w5, items = c("w5_q2", "w5_q11"))
#>      w5_q11
#> w5_q2    -8     1     2     3     4     5
#>     1  7.00  3.73 10.01 16.03 11.27  1.66
#>     2  4.71  2.87  6.83 13.65 17.56  4.67
eda_crosstable(w5, items = c("w5_q2", "w5_q11"), rm.dk = TRUE)
#>      w5_q11
#> w5_q2     1     2     3     4     5
#>     1  4.22 11.34 18.16 12.77  1.88
#>     2  3.26  7.73 15.46 19.89  5.29

Visualizations

Four plot functions are currently available: plot_bar_v, plot_bar_h, plot_groupbar_v, plot_groupbar_h. As the respective names imply, the first two functions are used to visualise one variable, while the last two allow visualisation with a grouping variable.

In their simplest forms, plot_bar_v and plot_bar_h plot the relative frequency of the item in a vertical and horizontal plot, respectively. The values of the items are not yet labelled and the proportion “Don’t know” responses must also be specified.

plot_bar_v(w5, item = w5_q11, textsize = 11)

There is a corresponding argument in the function for the share of “Don’t know” responses (dk_share). For the labels, you can use the underlying ggplot2 structure of the functions by simply adding further layers (in this case, scale_x_discrete to add labels). NB: When using the horizontal bar plots (with one or two variables), you have to reverse the order of your manual labels to display them correctly!! With further optional arguments, the bars’ colour or the relative frequencies’ position can be changed, among other things.

plot_bar_h(w5, item = w5_q11) +
    scale_x_discrete(labels = rev(c("Item One", "Item Two", "Item Three", "Item Four", "Item Five")))

Before moving on to the visualisations of grouped variables, it’s worth highlighting the helper function `sep_palette”. This function allows users to choose from a range of official ETH colour palettes. Available are the following palettes:

"Standard", "Darkblues", "Greens", "Lightblues", "Lightgreens", "Pinks"

You can select the palette of your choice by either saving it as an object or directly calling it in the plot function.

plot_groupbar_v(w5, item = w5_q11, by = w5_q2, labels = c("Female", "Male"), values = sep_palette("Standard")) +
 scale_x_discrete(labels = c("Item One", "Item Two", "Item Three", "Item Four", "Item Five"))

Again, note that with the horizontal plots, you have to change reverse the order of the manual labels to display them correctly!

plot_groupbar_h(w5, item = w5_q11, by = w5_q2, labels = c("Female", "Male"), values = sep_palette("Darkblues")) +
 scale_x_discrete(labels = rev(c("Item One", "Item Two", "Item Three", "Item Four", "Item Five")))