National Data Platform (UDAL)
NHS England Datastore
Guide on how to query data in UDAL using R. A video tutorial is available on the NHSE-R FutureNHS site.
Query UDAL in R
First load in appropriate libraries; DBI is the main one for the connection to UDAL, and tidyverse allows you to to use dbpylr verbs.
library (DBI)
library (odbc)
library (tidyverse)
library (dbplyr)
source ('personal_credentials.R')It is good practice not to publish connection strings! See R-Stuido’s guide on managing credentials.
One option is to put your credentials into a separate file (not under version control) as follows:
# personal_credentials.R
serv <-"<database-server-address>"
db <- "<database-name>"
user <- "<udal-account-username>"See the NHSE-R FutureNHS site for specifics on UDAL credentials.
Open Connection
This is the main connection command:
con_udal <- dbConnect(
drv = odbc(),
driver = "ODBC Driver 17 for SQL Server",
server = serv,
database = db,
UID = user,
authentication = "ActiveDirectoryInteractive"
)When this is run a separate window will open where you will need to enter two part authentication
Once that has been run you will see a connection appear in the connections tab - top right
Running SQL Query
You can parse a standard sql query using a dbGetQuery() command, it requires the connection string (that you have already specified above) and the sql code.
- Each query can return one dataframe of data, if you have multiple queries use separate queries (there is a way around this but it is quite complex)
- The
dbGetQuery()does not like temporary tables (they are bad practice anyway), but it will accept common table expressions (CTEs) or subqueries
df_one <- dbGetQuery(
conn = con_udal,
statement = "
SELECT appointment_date,
appointment_status,
HCP_Type,
Appointment_Mode,
Count_Of_Appointments
FROM [UKHF_Appts_In_General_Practice].[Appts_GP_Daily1]
WHERE Appointment_Date >= '2022-11-25'
"
)Alternatively we can keep our sql separate and read in a script:
df_two <- dbGetQuery(con_udal,
read_file("test_query.sql"))Where test_query.sql is given as:
SELECT appointment_date,
appointment_status,
HCP_Type,
Appointment_Mode,
Count_Of_Appointments
FROM [UKHF_Appts_In_General_Practice].[Appts_GP_Daily1]
WHERE Appointment_Date >= '2022-11-25'Using dbpylr
Finally we can skip sql entirely and query using dplyr like verbs. First we make a connection to the table we want - note that this does not actually create a table - more of a virtual one:
dbpylr_table <- tbl(con_udal,
in_schema("UKHF_Appts_In_General_Practice",
"Appts_GP_Daily1"))However we can then use our virtual table and query it using dplyr verbs.
Ensure to add collect at the end to convert the final results into an actual dataframe:
df_three <- dbpylr_table %>%
select (Appointment_Date,
Appointment_Status,
HCP_Type,
Appointment_Mode,
Count_Of_Appointments) %>%
filter (Appointment_Date >= '2022-11-25') %>%
collect()When you are experimenting with R code, do not use collect(). Only when you have finalised the code for the information being extracted from the database, use collect() to read the complete output into the R session.
Close Connection
When you have finished with the connection it is good practice to close it:
dbDisconnect(con_udal)ta da!