Newer
Older
DataReport / family-scope / geo-scope.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Investigating the coverage of the trade datasets family\n",
    "## Setup SPARQL API"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from SPARQLWrapper import SPARQLWrapper2\n",
    "import pandas as pd\n",
    "from IPython.display import HTML\n",
    "\n",
    "endpoint = \"https://production-drafter-ons-alpha.publishmydata.com/v1/sparql/live\"\n",
    "sparql = SPARQLWrapper2(endpoint)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Retreive Data Structure Definitions\n",
    "This retreives the codelist(s) for each dimension, for each dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "with open('./dataset-structure-geo.sparql', 'r') as query_file:\n",
    "    query_str = query_file.read()\n",
    "\n",
    "sparql.setQuery(query_str)\n",
    "\n",
    "results = pd.DataFrame(sparql.query().bindings)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create URI presenters\n",
    "We can create presenters that will provide, for each URI, an html link to the resource (i.e. the URI itself) with the `rdfs:label` used as the text."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "def uri_to_label(uri_column,label_column):\n",
    "    return(dict(zip(results[uri_column].map(lambda x: x.value),\n",
    "                    results[label_column].map(lambda x: x.value))))\n",
    "\n",
    "labeller = {**uri_to_label('dataset_uri','dataset'),\n",
    "            **uri_to_label('dimension_uri','dimension'),\n",
    "            **uri_to_label('codelist_view_uri','codelist')}\n",
    "\n",
    "def presenter(uri):\n",
    "    return(f'<a href=\"{uri}\">{labeller[uri]}</a> ')\n",
    "\n",
    "def lister(presenters):\n",
    "    return(presenters.str.join(''))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Present Codelists in a matrix"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Prepare a matrix of datasets vs dimensions, with each cell summarising the codelists being used."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "codelists = results[['dataset_uri','dimension_uri','codelist_view_uri']].applymap(lambda x: presenter(x.value))\n",
    "codelists_pivot = codelists.pivot_table(index='dataset_uri',columns='dimension_uri',values='codelist_view_uri',aggfunc=lister)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Sort the matrix such that the most commonly re-used dimensions are to the left hand side."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>dimension_uri</th>\n",
       "      <th>&lt;a href=\"http://gss-data.org.uk/def/dimension/trade-partner-geography\"&gt;Trade Partner Geography&lt;/a&gt;</th>\n",
       "      <th>&lt;a href=\"http://gss-data.org.uk/def/dimension/citizenship\"&gt;Citizenship&lt;/a&gt;</th>\n",
       "      <th>&lt;a href=\"http://gss-data.org.uk/def/dimension/residence\"&gt;Residence&lt;/a&gt;</th>\n",
       "      <th>&lt;a href=\"http://gss-data.org.uk/def/dimension/trade-reporter-geography\"&gt;Trade Reporter Geography&lt;/a&gt;</th>\n",
       "      <th>&lt;a href=\"http://gss-data.org.uk/def/dimension/ons-fdi-area\"&gt;ONS FDI Area&lt;/a&gt;</th>\n",
       "      <th>&lt;a href=\"http://gss-data.org.uk/def/dimension/nationality\"&gt;Nationality&lt;/a&gt;</th>\n",
       "      <th>&lt;a href=\"http://gss-data.org.uk/def/dimension/foreign-geography\"&gt;Foreign geography&lt;/a&gt;</th>\n",
       "      <th>&lt;a href=\"http://gss-data.org.uk/def/dimension/country-of-ownership\"&gt;Country of Ownership&lt;/a&gt;</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>dataset_uri</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>&lt;a href=\"http://gss-data.org.uk/data/gss_data/migration/dwp-nin-registrations-to-overseas-nationals\"&gt;National Insurance number allocations to adult overseas nationals to March 2018&lt;/a&gt;</th>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>&lt;a href=\"http://gss-data.org.uk/concept-scheme?uri=http://gss-data.org.uk/def/concept-scheme/nationality\"&gt;Nationality&lt;/a&gt;</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>&lt;a href=\"http://gss-data.org.uk/data/gss_data/migration/ho-entry-visas\"&gt;Immigration statistics, October to December 2017: data tables second edition&lt;/a&gt;</th>\n",
       "      <td></td>\n",
       "      <td>&lt;a href=\"http://gss-data.org.uk/concept-scheme?uri=http://gss-data.org.uk/def/concept-scheme/ho-citizenship\"&gt;HO Citizenship&lt;/a&gt;</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>&lt;a href=\"http://gss-data.org.uk/data/gss_data/migration/ons-ltim-citizenship\"&gt;Long-term international migration 2.01a, citizenship, UK and England and Wales&lt;/a&gt;</th>\n",
       "      <td></td>\n",
       "      <td>&lt;a href=\"http://gss-data.org.uk/concept-scheme?uri=http://gss-data.org.uk/def/concept-scheme/ltim-citizenship\"&gt;LTIM Citizenship&lt;/a&gt;</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>&lt;a href=\"http://gss-data.org.uk/data/gss_data/migration/ons-ltim-country-of-residence\"&gt;Long-term international migration 2.02, country of last or next residence, UK and England and Wales&lt;/a&gt;</th>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>&lt;a href=\"http://gss-data.org.uk/concept-scheme?uri=http://gss-data.org.uk/def/concept-scheme/country-of-residence\"&gt;Country of Residence&lt;/a&gt;</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>&lt;a href=\"http://gss-data.org.uk/data/gss_data/migration/ons-ltim-passenger-survey-4-01\"&gt;International Passenger Survey 4.01, citizenship group by sex by age by country of last or next residence&lt;/a&gt;</th>\n",
       "      <td></td>\n",
       "      <td>&lt;a href=\"http://gss-data.org.uk/concept-scheme?uri=http://gss-data.org.uk/def/concept-scheme/ips-citizenship\"&gt;IPS Citizenship&lt;/a&gt;</td>\n",
       "      <td>&lt;a href=\"http://gss-data.org.uk/concept-scheme?uri=http://gss-data.org.uk/def/concept-scheme/country-of-residence\"&gt;Country of Residence&lt;/a&gt;</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>&lt;a href=\"http://gss-data.org.uk/data/gss_data/migration/ons-ltim-passenger-survey-4-02\"&gt;International Passenger Survey 4.02, main reason for migration by citizenship&lt;/a&gt;</th>\n",
       "      <td></td>\n",
       "      <td>&lt;a href=\"http://gss-data.org.uk/concept-scheme?uri=http://gss-data.org.uk/def/concept-scheme/ips-citizenship\"&gt;IPS Citizenship&lt;/a&gt;</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>&lt;a href=\"http://gss-data.org.uk/data/gss_data/migration/ons-ltim-passenger-survey-4-04\"&gt;International Passenger Survey 4.04, area of destination or origin within the UK by citizenship&lt;/a&gt;</th>\n",
       "      <td></td>\n",
       "      <td>&lt;a href=\"http://gss-data.org.uk/concept-scheme?uri=http://gss-data.org.uk/def/concept-scheme/ips-citizenship\"&gt;IPS Citizenship&lt;/a&gt;</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>&lt;a href=\"http://gss-data.org.uk/data/gss_data/migration/scotland-overseas\"&gt;Migration between Scotland and Overseas&lt;/a&gt;</th>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>&lt;a href=\"http://gss-data.org.uk/concept-scheme?uri=http://gss-data.org.uk/def/concept-scheme/foreign-geography\"&gt;Foreign geography&lt;/a&gt;</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>&lt;a href=\"http://gss-data.org.uk/data/gss_data/trade/hmrc_rts\"&gt;UK Regional Trade Statistics (RTS&lt;/a&gt;</th>\n",
       "      <td>&lt;a href=\"http://gss-data.org.uk/concept-scheme?uri=http://gss-data.org.uk/def/concept-scheme/hmrc-geographies\"&gt;HMRC Geographies&lt;/a&gt;</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>&lt;a href=\"http://gss-data.org.uk/concept-scheme?uri=http://gss-data.org.uk/def/concept-scheme/hmrc-regions\"&gt;HMRC Regions&lt;/a&gt;</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>&lt;a href=\"http://gss-data.org.uk/data/gss_data/trade/ons-fdi\"&gt;Foreign direct investment involving UK companies: outward&lt;/a&gt;</th>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>&lt;a href=\"http://gss-data.org.uk/concept-scheme?uri=http://gss-data.org.uk/def/concept-scheme/ons-fdi-area\"&gt;ONS FDI Area&lt;/a&gt;</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>&lt;a href=\"http://gss-data.org.uk/data/gss_data/trade/ons-international-trade-in-services\"&gt;International trade in services&lt;/a&gt;</th>\n",
       "      <td>&lt;a href=\"http://gss-data.org.uk/concept-scheme?uri=http://gss-data.org.uk/def/concept-scheme/ons-trade-areas-itis\"&gt;ONS Trade Areas ITIS&lt;/a&gt;</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>&lt;a href=\"http://gss-data.org.uk/data/gss_data/trade/ons-trade-in-goods\"&gt;Trade in goods: country-by-commodity imports and exports&lt;/a&gt;</th>\n",
       "      <td>&lt;a href=\"http://gss-data.org.uk/concept-scheme?uri=http://gss-data.org.uk/def/concept-scheme/ons-trade-areas-cord\"&gt;ONS Trade Areas CORD&lt;/a&gt;</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>&lt;a href=\"http://gss-data.org.uk/data/gss_data/trade/ons-uk-trade-in-services\"&gt;UK trade in services: service type by partner country, non-seasonally adjusted&lt;/a&gt;</th>\n",
       "      <td>&lt;a href=\"http://gss-data.org.uk/concept-scheme?uri=http://gss-data.org.uk/def/concept-scheme/ons-trade-areas-cord\"&gt;ONS Trade Areas CORD&lt;/a&gt;</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>&lt;a href=\"http://gss-data.org.uk/data/gss_data/trade/ons_abs\"&gt;Annual Business Survey importers and exporters&lt;/a&gt;</th>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>&lt;a href=\"http://gss-data.org.uk/concept-scheme?uri=http://gss-data.org.uk/def/concept-scheme/countries-of-ownership\"&gt;Countries of Ownership&lt;/a&gt;</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>&lt;a href=\"http://gss-data.org.uk/data/gss_data/trade/ons_cpa\"&gt;UK trade in goods by classification of product by activity time series&lt;/a&gt;</th>\n",
       "      <td>&lt;a href=\"http://gss-data.org.uk/concept-scheme?uri=http://gss-data.org.uk/def/concept-scheme/ons-trade-areas-cord\"&gt;ONS Trade Areas CORD&lt;/a&gt;</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>&lt;a href=\"http://gss-data.org.uk/data/gss_data/trade/ons_mrets\"&gt;UK trade time series&lt;/a&gt;</th>\n",
       "      <td>&lt;a href=\"http://gss-data.org.uk/concept-scheme?uri=http://gss-data.org.uk/def/concept-scheme/ons-trade-areas-cord\"&gt;ONS Trade Areas CORD&lt;/a&gt;</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>&lt;a href=\"http://gss-data.org.uk/data/hmrc-overseas-trade-statistics\"&gt;HMRC Overseas Trade Statistics&lt;/a&gt;</th>\n",
       "      <td>&lt;a href=\"http://gss-data.org.uk/concept-scheme?uri=http://gss-data.org.uk/def/concept-scheme/hmrc-geographies\"&gt;HMRC Geographies&lt;/a&gt;</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>&lt;a href=\"http://gss-data.org.uk/data/ons-bop-individual-country-data\"&gt;ONS BoP Individual Country Data&lt;/a&gt;</th>\n",
       "      <td>&lt;a href=\"http://gss-data.org.uk/concept-scheme?uri=http://gss-data.org.uk/def/concept-scheme/ons-trade-areas-cord\"&gt;ONS Trade Areas CORD&lt;/a&gt;</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dimension_counts = codelists_pivot.count('rows').sort_values(ascending=False)\n",
    "codelists_pivot = codelists_pivot.reindex(columns=dimension_counts.index).fillna('')\n",
    "pd.set_option('max_colwidth', -1) # don't truncate cells\n",
    "with open(\"dataset-dimensions-geo.html\", \"w\") as f:\n",
    "    f.write(codelists_pivot.to_html(escape=False))\n",
    "HTML(codelists_pivot.to_html(escape=True))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that although multiple datasets re-use the same common dimensions, they often use their own codelists. We know that there exist correspondances between these (e.g. BPM6 codes can be used to relate BOP and PB definitions) although we've not published the links for those yet."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}