graph.jpg

Quantitative Analysis

The ability to quantify relationships is an extremely useful way to test political science theories and arguments. Simple analysis and visualization of data can be easy, but requires understanding an underlying grammar based on datasets, variables, and some basic mathematical functions. This page overviews that underlying grammar using two tools for quantitative analysis - Microsoft Excel and the R software environment.

 
 
 

These two tutorials teach the basic skills that you will need to analyze and visualize the data using either Excel (tutorial 1) or R (tutorial 2).

 

Each tutorial will cover:

1.     How to view and sort your data.

2.     How to manipulate your data to create new variables.

3.     How to create basic summaries of your variables.

4.     How to visualize your data.

 
 
 

Both tutorials use the running example of studying the relationship between regime type, freedom of the speech, and economic growth across countries of the world.

  • Regime types means the system of government in a country: democracy, autocracy, monarchy, etc. 

The tutorial will review data analysis to answer two questions:

(1) Do democracies maintain greater freedom of speech than non-democracies?

(2) Does the combination of freedom of speech and democracy lead to greater economic growth? 

Answering these questions requires data on regime type, freedom of speech, and economic growth by country.  These variables can be found in a dataset called the Quality of Government Database from the Quality of Government Institute. 

In each example, the basic question we will try to answer is whether democracy, freedom of speech, and economic output are related. We will recode variables to put them into shape for analysis, then use functions to look for average levels of freedom of speech across democracies and non-democracies, and then average economic output, for democracies and non-democracies, by differing levels of freedom of speech.  After finding the averages, we will visualize those averages using barcharts. 

To begin with either tutorial, use the links below download the data and codebook and save them to a location on your computer. If you do not already have this software on your computer, you will want to download and install Excel, R, and/or RStudio as well.

Click to download!

Click to download!

 

Each tutorial will examine four variables:

1.     cname

2.     ciri_speech,

3.     unna_gdppc

4.     ht_regtype1

Find those in the codebook and familiarize yourself with what the variable represents about each country and the values it can take on. 

To download the final plots generated in the Excel tutorial, click here. To download the script used in the R tutorial, click here.   

 

Tutorial 1: Quantitative Analysis in Excel

Viewing and Sorting your Data

Begin this video after opening your data in Excel. 

Commands covered in this video:

1.     Control F or Find & Select to find variables

2.     Control C or Clipboard > Copy to copy columns

3.     Control V or Clipboard > Paste to paste columns in new worksheet

Removing Missing Data & Creating New Variables

Commands covered in this video:

1.     Remove rows with missing values: after selecting columns from which to delete missing rows, Editing > Find & Select > Go to Special > Blanks then Cells > Delete > Delete Sheet Rows

2.     Recode a variable: =IF(E2=100,1,0) to create new variable setting E2 to 1 if E2 is 100, setting E2 to 0 otherwise.  This can be copied down to the entire column to change all cells in that column.

Finding Conditional Averages of Variables

Commands covered in this video:

1.     Find a conditional average (the average of a variable conditional on the value of another variable): =AVERAGEIF(E:E,1,B:B) to find average value of column B if column E is 1. =AVERAGEIF(E:E,0,B:B)  to find average value of column B if column E is 0.

2.     Create a pivot table (a table averaging a variable conditional on two or more variables): Select all columns and Insert > Pivot Table. To set up table, use “Pivot Table Fields”: 

a.     Drag independent variables to “Columns” and “Rows”. 

b.     Drag dependent variable to Values

c.     Values > Vale Field Settings to change summary of dependent variable, for example from sum to average. 

Visualizing your Data

Commands covered in this video:

1.     To turn a pivot table into a bar chart: select data in table and Insert > Charts > Bar Chart

Tutorial 2: Quantitative Analysis in R

Preparing your session, loading and viewing your data

Commands covered in this video:

1.     To install needed packages: install.packages(‘Tidyverse’), Tidyverse is the name of the package

2.     loan packages: library(tidyverse), tidyverse is the name of the package to be loaded

3.     load data set: MyData <- read_csv("qog_std_cs_jan18.csv"), the data "qog_std_cs_jan18.csv" from my computer is assigned to the object MyData

Removing missing data and creating new variables

Commands covered in this video:

#select certain variables from MyData
#add a new variable called democracy
#remove missing values,
##assign this all to a new object called Usedata

UseData = MyData %>%
   select(cname,ciri_speech,unna_gdppc,ht_regtype1) %>%
   mutate(democracy = ifelse(ht_regtype1==100,1,0)) %>%                                   filter(is.na(ciri_speech)==F,is.na(unna_gdppc)==F,is.na(ht_regtype1)==F)

Finding conditional averages of variables

Commands covered in this video:

1.     #find the average freedom of speech in democracies and non-democracies

average.speech.table = UseData %>%

 group_by(democracy) %>%

 summarize(average.speech = mean(ciri_speech)),

 

2.     #find the average gdp in democracies and non-democracies, by level of freedom of speech

average.econ.table = UseData %>%

  group_by(democracy,ciri_speech) %>%

  summarize(average.gdp = mean(unna_gdppc)),

Visualizing your Data

Commands covered in this video:

#make a plot of the average gdp in democracies and non-democracies, by level of freedom of speech and add descriptive axis labels

 

ggplot(data = average.econ.table, aes(y = average.gdp, x = as.factor(democracy), fill = as.factor(ciri_speech))) +

  geom_bar(stat = "identity", position = position_dodge()) +

  labs(fill = "Freedom of Speech",

       x = "Democracy",

       y = "Average GDP"),