{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "kernelspec": { "display_name": "R", "language": "R", "name": "ir" }, "language_info": { "codemirror_mode": "r", "file_extension": ".r", "mimetype": "text/x-r-source", "name": "R", "pygments_lexer": "r", "version": "3.6.1" }, "colab": { "name": "stats306_review_empty.ipynb", "provenance": [], "collapsed_sections": [] } }, "cells": [ { "cell_type": "markdown", "metadata": { "id": "-OFp-J4uI_2o" }, "source": [ "# Lab 5: Midterm Review! (`gapminder` dataset)" ] }, { "cell_type": "code", "metadata": { "scrolled": true, "id": "xTo6sij2I_2v", "outputId": "50b61a07-9f6b-4df5-e1a6-a06340b4a23d" }, "source": [ "install.packages('dslabs') # install this package\n", "library(dslabs)\n", "library(tidyverse)" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──\n", "✔ ggplot2 3.2.1 ✔ purrr 0.3.3\n", "✔ tibble 2.1.3 ✔ dplyr 0.8.4\n", "✔ tidyr 1.0.2 ✔ stringr 1.4.0\n", "✔ readr 1.3.1 ✔ forcats 0.4.0\n", "── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──\n", "✖ dplyr::filter() masks stats::filter()\n", "✖ dplyr::lag() masks stats::lag()\n" ], "name": "stderr" } ] }, { "cell_type": "code", "metadata": { "scrolled": true, "id": "gASqOzYPI_2w", "outputId": "771dcfd1-0ede-4889-d2f8-b6a6bfb41f89" }, "source": [ "gapminder %>% glimpse # similar to head() or summary()" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "Observations: 10,545\n", "Variables: 9\n", "$ country Albania, Algeria, Angola, Antigua and Barbuda, Argen…\n", "$ year 1960, 1960, 1960, 1960, 1960, 1960, 1960, 1960, 1960…\n", "$ infant_mortality 115.40, 148.20, 208.00, NA, 59.87, NA, NA, 20.30, 37…\n", "$ life_expectancy 62.87, 47.50, 35.98, 62.97, 65.39, 66.86, 65.66, 70.…\n", "$ fertility 6.19, 7.65, 7.32, 4.43, 3.11, 4.55, 4.82, 3.45, 2.70…\n", "$ population 1636054, 11124892, 5270844, 54681, 20619075, 1867396…\n", "$ gdp NA, 13828152297, NA, NA, 108322326649, NA, NA, 96677…\n", "$ continent Europe, Africa, Africa, Americas, Americas, Asia, Am…\n", "$ region Southern Europe, Northern Africa, Middle Africa, Car…\n" ], "name": "stdout" } ] }, { "cell_type": "markdown", "metadata": { "id": "NWYWBEt2I_2x" }, "source": [ "### 1. Filter out all countries with `NA` for gdp and assign filtered dataset to a variable called `df`." ] }, { "cell_type": "code", "metadata": { "id": "xPJe8A4EI_2y", "outputId": "d26bb5c1-2bd9-42b8-e89f-48425452324c" }, "source": [ "" ], "execution_count": null, "outputs": [ { "output_type": "display_data", "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
countryyearinfant_mortalitylife_expectancyfertilitypopulationgdpcontinentregion
Algeria 1960 148.20 47.50 7.65 11124892 13828152297 Africa Northern Africa
Argentina 1960 59.87 65.39 3.11 20619075 108322326649 Americas South America
Australia 1960 20.30 70.87 3.45 10292328 96677859364 Oceania Australia and New Zealand
Austria 1960 37.30 68.75 2.70 7065525 52392699681 Europe Western Europe
Bahamas 1960 51.00 62.00 4.50 109526 1306269490 Americas Caribbean
Bangladesh 1960 176.30 46.20 6.73 48200702 12767231590 Asia Southern Asia
\n" ], "text/latex": "\\begin{tabular}{r|lllllllll}\n country & year & infant\\_mortality & life\\_expectancy & fertility & population & gdp & continent & region\\\\\n\\hline\n\t Algeria & 1960 & 148.20 & 47.50 & 7.65 & 11124892 & 13828152297 & Africa & Northern Africa \\\\\n\t Argentina & 1960 & 59.87 & 65.39 & 3.11 & 20619075 & 108322326649 & Americas & South America \\\\\n\t Australia & 1960 & 20.30 & 70.87 & 3.45 & 10292328 & 96677859364 & Oceania & Australia and New Zealand\\\\\n\t Austria & 1960 & 37.30 & 68.75 & 2.70 & 7065525 & 52392699681 & Europe & Western Europe \\\\\n\t Bahamas & 1960 & 51.00 & 62.00 & 4.50 & 109526 & 1306269490 & Americas & Caribbean \\\\\n\t Bangladesh & 1960 & 176.30 & 46.20 & 6.73 & 48200702 & 12767231590 & Asia & Southern Asia \\\\\n\\end{tabular}\n", "text/markdown": "\n| country | year | infant_mortality | life_expectancy | fertility | population | gdp | continent | region |\n|---|---|---|---|---|---|---|---|---|\n| Algeria | 1960 | 148.20 | 47.50 | 7.65 | 11124892 | 13828152297 | Africa | Northern Africa |\n| Argentina | 1960 | 59.87 | 65.39 | 3.11 | 20619075 | 108322326649 | Americas | South America |\n| Australia | 1960 | 20.30 | 70.87 | 3.45 | 10292328 | 96677859364 | Oceania | Australia and New Zealand |\n| Austria | 1960 | 37.30 | 68.75 | 2.70 | 7065525 | 52392699681 | Europe | Western Europe |\n| Bahamas | 1960 | 51.00 | 62.00 | 4.50 | 109526 | 1306269490 | Americas | Caribbean |\n| Bangladesh | 1960 | 176.30 | 46.20 | 6.73 | 48200702 | 12767231590 | Asia | Southern Asia |\n\n", "text/plain": [ " country year infant_mortality life_expectancy fertility population\n", "1 Algeria 1960 148.20 47.50 7.65 11124892 \n", "2 Argentina 1960 59.87 65.39 3.11 20619075 \n", "3 Australia 1960 20.30 70.87 3.45 10292328 \n", "4 Austria 1960 37.30 68.75 2.70 7065525 \n", "5 Bahamas 1960 51.00 62.00 4.50 109526 \n", "6 Bangladesh 1960 176.30 46.20 6.73 48200702 \n", " gdp continent region \n", "1 13828152297 Africa Northern Africa \n", "2 108322326649 Americas South America \n", "3 96677859364 Oceania Australia and New Zealand\n", "4 52392699681 Europe Western Europe \n", "5 1306269490 Americas Caribbean \n", "6 12767231590 Asia Southern Asia " ] }, "metadata": { "tags": [] } } ] }, { "cell_type": "markdown", "metadata": { "scrolled": true, "id": "Bzw36rUtI_20" }, "source": [ "### 2. `gdp` is in dollars. Convert it to billion dollars and save it to same dataset, df. " ] }, { "cell_type": "code", "metadata": { "id": "KGBgV-tZI_21" }, "source": [ "" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "Xr3RxZ0KI_22" }, "source": [ "### 3. Select only the countries in the year 2011 and save it to `df2011`." ] }, { "cell_type": "code", "metadata": { "id": "Qw9d-FjwI_22" }, "source": [ "" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "scrolled": false, "id": "WdK17QkLI_23" }, "source": [ "### 4. Which countries have data in both 1960 and 2011?" ] }, { "cell_type": "code", "metadata": { "id": "kXQiotf1I_24" }, "source": [ "" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "YMru_1viI_24" }, "source": [ "### 5. Using `df2011`, create a `gdp` box plot for each region. Make sure the region names are legible and also use `log10 scale`!" ] }, { "cell_type": "code", "metadata": { "id": "j_cEePqZI_25" }, "source": [ "" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "sTQ1L8s9I_25" }, "source": [ "### 6. Using `df2011`, find the maximum and minimum `gdp` in each region." ] }, { "cell_type": "code", "metadata": { "id": "Wn0EV03uI_26" }, "source": [ "" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "U7tkk1phI_27" }, "source": [ "### 7. Using `df2011`, find the average world gdp. Use 2 types of average: median and mean. Then, calculate the percentage of countries which have a gdp more than the world average in each region." ] }, { "cell_type": "code", "metadata": { "id": "lv0pa65LI_28" }, "source": [ "" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "S2UhgoKrI_2-" }, "source": [ "" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "RDgL4qEhI_2_" }, "source": [ "### 8. Using `df2011`, Find the gdp per capita, which is simply $GDP_{per capita} = \\frac{GDP}{Population}$. Find an appropriate plot to show its relationship with `gdp`, conditioned on `continent`. Remember that gdp in df2011 is in billions, so adjust for that in gdp per capita to convert back to dollars. When graphing, use the log10 scale for gdp like earlier!" ] }, { "cell_type": "code", "metadata": { "scrolled": false, "id": "wqA7MORNI_3A" }, "source": [ "" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "XQuRLBQZI_3A" }, "source": [ "### 9. Which country has the largest gdp per capita, among the ones that have $gdp>1000$ billion?" ] }, { "cell_type": "code", "metadata": { "id": "NunLnn-jI_3C" }, "source": [ "" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "ldUTYVSkI_3F" }, "source": [ "### 10 Which country has the life expectancy which is closest to life expectancy in US. Is it a single country? or countries?" ] }, { "cell_type": "code", "metadata": { "id": "vI--Ub7yI_3F" }, "source": [ "" ], "execution_count": null, "outputs": [] } ] }