National Data Platform (UDAL)

NHS England Datastore

SQL
R
NHS England
Author

Simon Wellesley-Miller

Published

February 5, 2023

Guide on how to query data in UDAL using R. A video tutorial is available on the NHSE-R FutureNHS site.

Query UDAL in R

First load in appropriate libraries; DBI is the main one for the connection to UDAL, and tidyverse allows you to to use dbpylr verbs.

library (DBI)
library (odbc)
library (tidyverse)
library (dbplyr)
source ('personal_credentials.R')
Important!

It is good practice not to publish connection strings! See R-Stuido’s guide on managing credentials.

One option is to put your credentials into a separate file (not under version control) as follows:

# personal_credentials.R
serv  <-"<database-server-address>"
db  <- "<database-name>"
user  <- "<udal-account-username>"

See the NHSE-R FutureNHS site for specifics on UDAL credentials.

Open Connection

This is the main connection command:

con_udal <- dbConnect(
  drv = odbc(),
  driver = "ODBC Driver 17 for SQL Server",
  server = serv,
  database = db,
  UID = user,
  authentication = "ActiveDirectoryInteractive"
)
Note

When this is run a separate window will open where you will need to enter two part authentication

Once that has been run you will see a connection appear in the connections tab - top right

Running SQL Query

You can parse a standard sql query using a dbGetQuery() command, it requires the connection string (that you have already specified above) and the sql code.

  • Each query can return one dataframe of data, if you have multiple queries use separate queries (there is a way around this but it is quite complex)
  • The dbGetQuery() does not like temporary tables (they are bad practice anyway), but it will accept common table expressions (CTEs) or subqueries
df_one <- dbGetQuery(
  conn = con_udal,
  statement = "

  SELECT  appointment_date,
          appointment_status,
          HCP_Type,
          Appointment_Mode,
          Count_Of_Appointments

  FROM  [UKHF_Appts_In_General_Practice].[Appts_GP_Daily1]

  WHERE Appointment_Date >= '2022-11-25'

  "
  )

Alternatively we can keep our sql separate and read in a script:

df_two <- dbGetQuery(con_udal, 
                     read_file("test_query.sql"))

Where test_query.sql is given as:

  SELECT  appointment_date,
          appointment_status,
          HCP_Type,
          Appointment_Mode,
          Count_Of_Appointments

  FROM  [UKHF_Appts_In_General_Practice].[Appts_GP_Daily1]

  WHERE Appointment_Date >= '2022-11-25'

Using dbpylr

Finally we can skip sql entirely and query using dplyr like verbs. First we make a connection to the table we want - note that this does not actually create a table - more of a virtual one:

dbpylr_table <- tbl(con_udal, 
                    in_schema("UKHF_Appts_In_General_Practice", 
                              "Appts_GP_Daily1"))

However we can then use our virtual table and query it using dplyr verbs.

Ensure to add collect at the end to convert the final results into an actual dataframe:

df_three <- dbpylr_table %>%
  select (Appointment_Date,
          Appointment_Status,
          HCP_Type,
          Appointment_Mode,
          Count_Of_Appointments) %>%
  filter (Appointment_Date >= '2022-11-25') %>%
  collect()
Important!

When you are experimenting with R code, do not use collect(). Only when you have finalised the code for the information being extracted from the database, use collect() to read the complete output into the R session.

Close Connection

When you have finished with the connection it is good practice to close it:

dbDisconnect(con_udal)

ta da!