Covid Travel Restrictions
Tableau, Python, Excel
To say one of the biggest life changing events of our time, the Covid-19 pandemic, presented an opportunity for Data Analysts/Scientists everywhere to flex their analysis and presentation skills is a bit of an understatement. Almost every Data Analyst that I know has worked on some project related to Covid, and I for sure wasn't about to be left out of that group. So for the final project in my Data Visualization class I decided to combine my love to travel and knowledge in Python/Tableau to create an easy to update visualization that allows users to see which countries they can travel to as a foreign citizen.
The dataset used in this project was downloaded from: https://github.com/OxCGRT/covid-policy-tracker/blob/master/documentation/codebook.md
To my best understanding this was a research project done by over 200 Oxford faculty and students to track a variety of Covid restrictions and how those impacted infection and death rates in the country. But what I was looking for was just the foreign travel restrictions.
I then then imported the data into Python to clean and structure the data.
The dataset was cleaned to reduce it down to just the four variables that were needed: ["CountryName","CountryCode","Date","C8_International travel controls"].
Here is a snippet of what the dataframe looked like after the above code was ran:
After that, I created a new data frame for holding only the unique country codes and the most recent International Travel Control ranking. This is also where the most difficult part of the project presented itself. The most recent Travel Control ranking was not on the same date for every country. Certain countries had values that were up to date to the current day, while others hadn't been updated in a few weeks. Because this dataset had every Travel Control value for each day that the Oxford team could gather data for, some countries had many more entries than others, making the location of the most recent value different for each country. This is why I didn't use the date to find the most recent value, it is definitely possible, but not easily doable. You can see my solution for this in the comments of my code down below.
Now all that's left is to combine the Country Name and Travel Restriction together with each other in one Dataframe.
This ends up looking something like this (which I can easily export to Excel to start the Tableau portion of the project):
The visualizations for this project were comprised of two choropleth plots of the entire world and one horizontal bar chart. In the first choropleth plot each country is colored from grey to orange based on a 0 – 4 travel restriction scale. This plot can be used by travelers to get up to date information on countries that they plan to travel to.
The second choropleth plot displays the mean travel restriction score (0 – 4) over time for each country. This plot was colored from very light blue/purple to very dark blue based on the country’s mean score. This plot is better served for people who use these visualizations to gather data about how a country has applied restrictions on average (thus allowing them to possibly predict future travel restriction levels).
A horizontal bar chart with the average score per country was also included for easier direct comparisons of countries’ overall scores. This chart was arranged from lowest to highest scores so that countries with similar scores can be easily grouped together and analyzed. Apologies for the low quality of the graph, I no longer have access to the file Tableau file to get a clearer picture.
The travel restriction point scale goes as follows:
0 - no restrictions
1 - screening arrivals
2 - quarantine arrivals from some or all regions
3 - ban arrivals from some regions
4 - ban on all regions or total border closure
In the future it would be interesting to try to automate this process. Right now the data must be manually downloaded and ran through the Python commands each time you would like to load a new day’s data, then the data must be uploaded to Tableau. If I used a headless chrome browser with packages like BS4 and Selenium and hosted this online I could automatically pull the data from the GitHub link once a day. Then use a script to run these commands and send the filtered data to a Tableau cloud server that automatically updates the visualizations.
This visualization also could become obsolete if the Oxford researchers who are gathering the data decide that they no longer want to continue this research, which eventually happened. Overall, the hardest part of the project was formatting the data into a dataset with one country and its most recent score, as many countries did not have a score for the most recent date, so instead the most recent score had to be found. This was an interesting project that tested my ability to manipulate data and then visualize this data in a useful way, and even better, I got to check off "Covid-19 Visualization" from my Data Analyst bucket list!