sugr: data preparation

Preparation of Continouos Glucose Measurements for usage in a shiny app

data cleaning
timeseries
medical
ETL
Author

Tobias

Published

Apr 16, 2023

tl;dr: CGMs are a great source of (timeseries) data to tinker with. Here I describe the process of cleaning up and preparing the data of Medtronics systems in order to use it in a shiny app.

Abstract

I’m a type I diabetic and therefore use an insulin pump and a connected CGM - continouos glucose measurement - system from Medtronic. The company kindly provides access to the raw data, even though only by the use of propriatory software and a web interface. This means I can’t access the data through some sort of API to automate the collection thereof and have to run through that process manually on a regular basis.

Apart from that minor nuisance what’s more important though is the fact that the data needs some touch-ups before you can actually use them:

  • data from the pump and the GCM sensor are basically two different datasets but kept in the same csv
  • while the time interval is 5 minutes (a reading from the GCM), not much else is going on otherwise
  • direct blood glucose measurements, insulin doses, etc. are delivered whenever and not on-point of the 5min intervals

Here I’ll walk through the process of cleaning that up.

Reading the Raw Data

First let’s have a look at what we’ve got here. To save ourselves some headaches, I already put all the settings in all the places such that we get a nice tabular dataset. Most should be pretty self-explanatory: entries are delimited by a semicolon, the date format has to be sepcified and so on. I set skip = 6 because, well, that’s where the actual csv entries actually begin. Beforehand it’s just some unstructured data about the device, patient, date of recordng, etc.

data_raw <- read_delim("data/sugr/carelink-export-220217.csv", 
                  delim = ";", escape_double = FALSE, 
                  col_types = cols(Date = col_date(format = "%Y/%m/%d"), 
                                   Time = col_time(format = "%H:%M:%S")), 
                  locale = locale(decimal_mark = ","), 
                  trim_ws = TRUE, skip = 6)
New names:
• `` -> `...51`
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat)
glimpse(data_raw)
Rows: 9,236
Columns: 51
$ Index                                 <dbl> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10…
$ Date                                  <date> 2022-02-17, 2022-02-17, 2022-02…
$ Time                                  <time> 11:07:38, 11:02:39, 11:00:00, 1…
$ `New Device Time`                     <lgl> NA, NA, NA, NA, NA, NA, NA, NA, …
$ `BG Source`                           <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ `BG Reading (mg/dL)`                  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, …
$ `Linked BG Meter ID`                  <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ `Basal Rate (U/h)`                    <dbl> NA, NA, 0.65, NA, NA, NA, NA, NA…
$ `Temp Basal Amount`                   <lgl> NA, NA, NA, NA, NA, NA, NA, NA, …
$ `Temp Basal Type`                     <lgl> NA, NA, NA, NA, NA, NA, NA, NA, …
$ `Temp Basal Duration (h:mm:ss)`       <lgl> NA, NA, NA, NA, NA, NA, NA, NA, …
$ `Bolus Type`                          <chr> "Normal", "Normal", NA, "Normal"…
$ `Bolus Volume Selected (U)`           <dbl> 0.03, 0.05, NA, 0.05, 0.03, 0.03…
$ `Bolus Volume Delivered (U)`          <dbl> 0.03, 0.05, NA, 0.05, 0.03, 0.03…
$ `Bolus Duration (h:mm:ss)`            <lgl> NA, NA, NA, NA, NA, NA, NA, NA, …
$ `Prime Type`                          <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ `Prime Volume Delivered (U)`          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, …
$ Alarm                                 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ Suspend                               <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ Rewind                                <lgl> NA, NA, NA, NA, NA, NA, NA, NA, …
$ `BWZ Estimate (U)`                    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, …
$ `BWZ Target High BG (mg/dL)`          <lgl> NA, NA, NA, NA, NA, NA, NA, NA, …
$ `BWZ Target Low BG (mg/dL)`           <lgl> NA, NA, NA, NA, NA, NA, NA, NA, …
$ `BWZ Carb Ratio (g/U)`                <dbl> NA, NA, NA, NA, NA, NA, NA, NA, …
$ `BWZ Insulin Sensitivity (mg/dL/U)`   <lgl> NA, NA, NA, NA, NA, NA, NA, NA, …
$ `BWZ Carb Input (grams)`              <dbl> NA, NA, NA, NA, NA, NA, NA, NA, …
$ `BWZ BG Input (mg/dL)`                <dbl> NA, NA, NA, NA, NA, NA, NA, NA, …
$ `BWZ Correction Estimate (U)`         <dbl> NA, NA, NA, NA, NA, NA, NA, NA, …
$ `BWZ Food Estimate (U)`               <dbl> NA, NA, NA, NA, NA, NA, NA, NA, …
$ `BWZ Active Insulin (U)`              <lgl> NA, NA, NA, NA, NA, NA, NA, NA, …
$ `BWZ Status`                          <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ `Sensor Calibration BG (mg/dL)`       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, …
$ `Sensor Glucose (mg/dL)`              <dbl> NA, NA, NA, NA, NA, NA, NA, NA, …
$ `ISIG Value`                          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, …
$ `Event Marker`                        <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ `Bolus Number`                        <dbl> 236, 235, NA, 234, 233, 232, 231…
$ `Bolus Cancellation Reason`           <lgl> NA, NA, NA, NA, NA, NA, NA, NA, …
$ `BWZ Unabsorbed Insulin Total (U)`    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, …
$ `Final Bolus Estimate`                <dbl> NA, NA, NA, NA, NA, NA, NA, NA, …
$ `Scroll Step Size`                    <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ `Insulin Action Curve Time`           <lgl> NA, NA, NA, NA, NA, NA, NA, NA, …
$ `Sensor Calibration Rejected Reason`  <lgl> NA, NA, NA, NA, NA, NA, NA, NA, …
$ `Preset Bolus`                        <lgl> NA, NA, NA, NA, NA, NA, NA, NA, …
$ `Bolus Source`                        <chr> "CLOSED_LOOP_MICRO_BOLUS", "CLOS…
$ `BLE Network Device`                  <lgl> NA, NA, NA, NA, NA, NA, NA, NA, …
$ `Network Device Associated Reason`    <lgl> NA, NA, NA, NA, NA, NA, NA, NA, …
$ `Network Device Disassociated Reason` <lgl> NA, NA, NA, NA, NA, NA, NA, NA, …
$ `Network Device Disconnected Reason`  <lgl> NA, NA, NA, NA, NA, NA, NA, NA, …
$ `Sensor Exception`                    <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ `Preset Temp Basal Name`              <lgl> NA, NA, NA, NA, NA, NA, NA, NA, …
$ ...51                                 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, …
problems(data_raw) %>% select(-file) %>% print(n = 120)
# A tibble: 112 × 4
      row   col expected           actual                             
    <int> <int> <chr>              <chr>                              
  1   115    41 1/0/T/F/TRUE/FALSE 240                                
  2   493    22 1/0/T/F/TRUE/FALSE 120                                
  3   493    23 1/0/T/F/TRUE/FALSE 90                                 
  4   493    25 1/0/T/F/TRUE/FALSE 40                                 
  5   493    30 1/0/T/F/TRUE/FALSE 2,10                               
  6   505    22 1/0/T/F/TRUE/FALSE 120                                
  7   505    23 1/0/T/F/TRUE/FALSE 90                                 
  8   505    25 1/0/T/F/TRUE/FALSE 40                                 
  9   505    30 1/0/T/F/TRUE/FALSE 0,0                                
 10   922    22 1/0/T/F/TRUE/FALSE 120                                
 11   922    23 1/0/T/F/TRUE/FALSE 90                                 
 12   922    25 1/0/T/F/TRUE/FALSE 40                                 
 13   922    30 1/0/T/F/TRUE/FALSE 0,20                               
 14   933    22 1/0/T/F/TRUE/FALSE 160                                
 15   933    23 1/0/T/F/TRUE/FALSE 120                                
 16   933    25 1/0/T/F/TRUE/FALSE 40                                 
 17   933    30 1/0/T/F/TRUE/FALSE 4,20                               
 18   937    22 1/0/T/F/TRUE/FALSE 160                                
 19   937    23 1/0/T/F/TRUE/FALSE 120                                
 20   937    25 1/0/T/F/TRUE/FALSE 40                                 
 21   937    30 1/0/T/F/TRUE/FALSE 3,10                               
 22   944    22 1/0/T/F/TRUE/FALSE 160                                
 23   944    23 1/0/T/F/TRUE/FALSE 120                                
 24   944    25 1/0/T/F/TRUE/FALSE 40                                 
 25   944    30 1/0/T/F/TRUE/FALSE 2,40                               
 26   955    22 1/0/T/F/TRUE/FALSE 120                                
 27   955    23 1/0/T/F/TRUE/FALSE 90                                 
 28   955    25 1/0/T/F/TRUE/FALSE 40                                 
 29   955    30 1/0/T/F/TRUE/FALSE 0,20                               
 30   981    48 1/0/T/F/TRUE/FALSE USER_INITIATED                     
 31  1002    22 1/0/T/F/TRUE/FALSE 120                                
 32  1002    23 1/0/T/F/TRUE/FALSE 90                                 
 33  1002    25 1/0/T/F/TRUE/FALSE 40                                 
 34  1002    30 1/0/T/F/TRUE/FALSE 0,0                                
 35  1417    20 1/0/T/F/TRUE/FALSE Rewind                             
 36  2715    22 1/0/T/F/TRUE/FALSE 120                                
 37  2715    23 1/0/T/F/TRUE/FALSE 90                                 
 38  2715    25 1/0/T/F/TRUE/FALSE 40                                 
 39  2715    30 1/0/T/F/TRUE/FALSE 1,30                               
 40  2726    22 1/0/T/F/TRUE/FALSE 120                                
 41  2726    23 1/0/T/F/TRUE/FALSE 90                                 
 42  2726    25 1/0/T/F/TRUE/FALSE 40                                 
 43  2726    30 1/0/T/F/TRUE/FALSE 0,0                                
 44  2746    22 1/0/T/F/TRUE/FALSE 140                                
 45  2746    23 1/0/T/F/TRUE/FALSE 90                                 
 46  2746    25 1/0/T/F/TRUE/FALSE 40                                 
 47  2746    30 1/0/T/F/TRUE/FALSE 0,80                               
 48  2760    22 1/0/T/F/TRUE/FALSE 120                                
 49  2760    23 1/0/T/F/TRUE/FALSE 90                                 
 50  2760    25 1/0/T/F/TRUE/FALSE 40                                 
 51  2760    30 1/0/T/F/TRUE/FALSE 5,10                               
 52  2762    22 1/0/T/F/TRUE/FALSE 120                                
 53  2762    23 1/0/T/F/TRUE/FALSE 90                                 
 54  2762    25 1/0/T/F/TRUE/FALSE 40                                 
 55  2762    30 1/0/T/F/TRUE/FALSE 5,10                               
 56  2768    22 1/0/T/F/TRUE/FALSE 140                                
 57  2768    23 1/0/T/F/TRUE/FALSE 90                                 
 58  2768    25 1/0/T/F/TRUE/FALSE 40                                 
 59  2768    30 1/0/T/F/TRUE/FALSE 4,80                               
 60  2772    22 1/0/T/F/TRUE/FALSE 140                                
 61  2772    23 1/0/T/F/TRUE/FALSE 90                                 
 62  2772    25 1/0/T/F/TRUE/FALSE 40                                 
 63  2772    30 1/0/T/F/TRUE/FALSE 0,0                                
 64  2779    48 1/0/T/F/TRUE/FALSE USER_INITIATED                     
 65  2955    37 1/0/T/F/TRUE/FALSE User Request                       
 66  3286    20 1/0/T/F/TRUE/FALSE Rewind                             
 67  3321    37 1/0/T/F/TRUE/FALSE User Request                       
 68  3806    37 1/0/T/F/TRUE/FALSE User Request                       
 69  4338     1 a double           -------                            
 70  4338     2 date like %Y/%m/%d MiniMed 670G MMT-1781              
 71  4338     3 time like %H:%M:%S Sensor                             
 72  4338     4 1/0/T/F/TRUE/FALSE NG2307024H                         
 73  4338     5 51 columns         5 columns                          
 74  4339     1 a double           Index                              
 75  4339     2 date like %Y/%m/%d Date                               
 76  4339     3 time like %H:%M:%S Time                               
 77  4339     4 1/0/T/F/TRUE/FALSE New Device Time                    
 78  4339     6 a double           BG Reading (mg/dL)                 
 79  4339     8 a double           Basal Rate (U/h)                   
 80  4339     9 1/0/T/F/TRUE/FALSE Temp Basal Amount                  
 81  4339    10 1/0/T/F/TRUE/FALSE Temp Basal Type                    
 82  4339    11 1/0/T/F/TRUE/FALSE Temp Basal Duration (h:mm:ss)      
 83  4339    13 a double           Bolus Volume Selected (U)          
 84  4339    14 a double           Bolus Volume Delivered (U)         
 85  4339    15 1/0/T/F/TRUE/FALSE Bolus Duration (h:mm:ss)           
 86  4339    17 a double           Prime Volume Delivered (U)         
 87  4339    20 1/0/T/F/TRUE/FALSE Rewind                             
 88  4339    21 a double           BWZ Estimate (U)                   
 89  4339    22 1/0/T/F/TRUE/FALSE BWZ Target High BG (mg/dL)         
 90  4339    23 1/0/T/F/TRUE/FALSE BWZ Target Low BG (mg/dL)          
 91  4339    24 a double           BWZ Carb Ratio (g/U)               
 92  4339    25 1/0/T/F/TRUE/FALSE BWZ Insulin Sensitivity (mg/dL/U)  
 93  4339    26 a double           BWZ Carb Input (grams)             
 94  4339    27 a double           BWZ BG Input (mg/dL)               
 95  4339    28 a double           BWZ Correction Estimate (U)        
 96  4339    29 a double           BWZ Food Estimate (U)              
 97  4339    30 1/0/T/F/TRUE/FALSE BWZ Active Insulin (U)             
 98  4339    32 a double           Sensor Calibration BG (mg/dL)      
 99  4339    33 a double           Sensor Glucose (mg/dL)             
100  4339    34 a double           ISIG Value                         
101  4339    36 a double           Bolus Number                       
102  4339    37 1/0/T/F/TRUE/FALSE Bolus Cancellation Reason          
103  4339    38 a double           BWZ Unabsorbed Insulin Total (U)   
104  4339    39 a double           Final Bolus Estimate               
105  4339    41 1/0/T/F/TRUE/FALSE Insulin Action Curve Time          
106  4339    42 1/0/T/F/TRUE/FALSE Sensor Calibration Rejected Reason 
107  4339    43 1/0/T/F/TRUE/FALSE Preset Bolus                       
108  4339    45 1/0/T/F/TRUE/FALSE BLE Network Device                 
109  4339    46 1/0/T/F/TRUE/FALSE Network Device Associated Reason   
110  4339    47 1/0/T/F/TRUE/FALSE Network Device Disassociated Reason
111  4339    48 1/0/T/F/TRUE/FALSE Network Device Disconnected Reason 
112  4339    50 1/0/T/F/TRUE/FALSE Preset Temp Basal Name             

Okay, so, a Warning. What’s that about?

Alright, the first ~70ish are actual parsing errors that lead to empty cells. Not great, but also no entries of huge importance1, as far as I can tell. The rest though result from the circumstance that the file is basically seperated in CGM sensor data and insulin pump data, as mentioned in the beginning. Luckily though, readr/vroom just ignores the two rows where the new table begins - a device description and exactly same column names in the same order as before - and just continues parsing the CGM data. Let’s take a look at those whopping 51 Variables:

Looks pretty decent already. That’s a whole lotta <NA>s, though. We’ll have to look into that in another article.

The Clean-up Procedure

Here’s what needs to be done still:

  • clean up the column names
  • throw out the variables we don’t need (spoiler: most of them)
  • time precision by minutes should suffice
  • a combined datetime column would be nice
  • a column of weekdays, too, just for some quality of life improvement

It also totally makes sense to split pump from CGM data, of course. Since both are structurally identical, though, we’ll do that after the clean-up. At this point, we’re also only interested in variables that directly deal with glucose measurements, insulin dosage and carbohydrate intake2.

Codebook

The variables of interest are:

Variable Renamed to Description
Index index kinda self-explanatory
Date date Year-Month-Day
Time time hh:mm:ss
BG Reading (mg/dL) bg_direct a direct measurement of blood, usually by pricking a finger
Sensor Glucose (mg/dL) bg_sensor an indirect measurement of interstitial fluid by the gcm sensor
Basal Rate (U/h) basal_rate the hourly rate of insulin given (in ‘Units’; roughly 100U/ml)
BWZ BG Input (mg/dL) wiz_bg blood glucose level (manual input)
BWZ Carb Input (grams) wiz_carbs carbohydrates (manual input)
BWZ Carb Ratio (g/U) wiz_ratio ratio of carbs per unit of insulin
BWZ Correction Estimate (U) wiz_est_correction insulin to deliver to correct for too high bg level
BWZ Food Estimate (U) wiz_est_food insulin units to deliver for the amount of carbohydrates (carbs / ratio)
BWZ Unabsorbed Insulin Total (U) wiz_est_unabsorbed insulin to not deliver in order to not overcompensate
Final Bolus Estimate bolus_final amount of insulin to deliver
Bolus Volume Delivered (U) bolus_delivered the actual amount of insulin delivered

Cleaning and Variable Selection

To clean up column names, we could of course use some convenient helper function like clean_names() from the janitor package:

janitor::clean_names(data_raw) %>% names() %>% sort()
 [1] "alarm"                               "basal_rate_u_h"                     
 [3] "bg_reading_mg_d_l"                   "bg_source"                          
 [5] "ble_network_device"                  "bolus_cancellation_reason"          
 [7] "bolus_duration_h_mm_ss"              "bolus_number"                       
 [9] "bolus_source"                        "bolus_type"                         
[11] "bolus_volume_delivered_u"            "bolus_volume_selected_u"            
[13] "bwz_active_insulin_u"                "bwz_bg_input_mg_d_l"                
[15] "bwz_carb_input_grams"                "bwz_carb_ratio_g_u"                 
[17] "bwz_correction_estimate_u"           "bwz_estimate_u"                     
[19] "bwz_food_estimate_u"                 "bwz_insulin_sensitivity_mg_d_l_u"   
[21] "bwz_status"                          "bwz_target_high_bg_mg_d_l"          
[23] "bwz_target_low_bg_mg_d_l"            "bwz_unabsorbed_insulin_total_u"     
[25] "date"                                "event_marker"                       
[27] "final_bolus_estimate"                "index"                              
[29] "insulin_action_curve_time"           "isig_value"                         
[31] "linked_bg_meter_id"                  "network_device_associated_reason"   
[33] "network_device_disassociated_reason" "network_device_disconnected_reason" 
[35] "new_device_time"                     "preset_bolus"                       
[37] "preset_temp_basal_name"              "prime_type"                         
[39] "prime_volume_delivered_u"            "rewind"                             
[41] "scroll_step_size"                    "sensor_calibration_bg_mg_d_l"       
[43] "sensor_calibration_rejected_reason"  "sensor_exception"                   
[45] "sensor_glucose_mg_d_l"               "suspend"                            
[47] "temp_basal_amount"                   "temp_basal_duration_h_mm_ss"        
[49] "temp_basal_type"                     "time"                               
[51] "x51"                                

As you can see, that does indeed make them “workable”, but it’s still pretty convoluted and even occasionally confusing. Therefore, in a second step, we do some touch-ups manually and get rid of all the columns we’re not interested in. And since we’re already at it, why not tick off the rest of the items on our list, too?

data_full <- data_raw %>% 
  janitor::clean_names() %>% 
  transmute(
    index = index,
    datetime = round_date(ymd_hms(paste(date, time)), unit = "minute"),
    date  = date,
    wday  = wday(date, label = TRUE, week_start = 1),
    time  = time,
    basal_rate = basal_rate_u_h,
    bg_direct  = bg_reading_mg_d_l,
    bg_sensor  = sensor_glucose_mg_d_l,
    wiz_ratio = bwz_carb_ratio_g_u,
    wiz_carbs = bwz_carb_input_grams,
    wiz_bg = bwz_bg_input_mg_d_l,
    wiz_est_correction = bwz_correction_estimate_u,
    wiz_est_food = bwz_food_estimate_u,
    wiz_est_unabsorbed = bwz_unabsorbed_insulin_total_u,
    bolus_final = final_bolus_estimate, 
    bolus_delivered = bolus_volume_delivered_u
  )
Warning: There was 1 warning in `transmute()`.
ℹ In argument: `datetime = round_date(ymd_hms(paste(date, time)), unit =
  "minute")`.
Caused by warning:
!  2 failed to parse.

Another Warning, this time a parsing failure? Most certainly either when calling for the weekday labels or concoctenating and rounding the datetime:

anyNA(data_full$wday)
[1] TRUE
data_full %>% 
  filter(is.na(wday)) %>% 
  glimpse()
Rows: 2
Columns: 16
$ index              <dbl> NA, NA
$ datetime           <dttm> NA, NA
$ date               <date> NA, NA
$ wday               <ord> NA, NA
$ time               <time> NA, NA
$ basal_rate         <dbl> NA, NA
$ bg_direct          <dbl> NA, NA
$ bg_sensor          <dbl> NA, NA
$ wiz_ratio          <dbl> NA, NA
$ wiz_carbs          <dbl> NA, NA
$ wiz_bg             <dbl> NA, NA
$ wiz_est_correction <dbl> NA, NA
$ wiz_est_food       <dbl> NA, NA
$ wiz_est_unabsorbed <dbl> NA, NA
$ bolus_final        <dbl> NA, NA
$ bolus_delivered    <dbl> NA, NA

Two completely empty rows alright. Since even the index column is empty, it’s probably just an artefact of the “two datasets in one file issue”. We better check that out, too, though.

anyNA(data_raw$Index)
[1] TRUE
which(is.na(data_raw$Index))
[1] 4337 4338
data_raw[4336:4339, ] %>% 
  glimpse()
Rows: 4
Columns: 51
$ Index                                 <dbl> 4335, NA, NA, 4336
$ Date                                  <date> 2022-01-30, NA, NA, 2022-02-17
$ Time                                  <time> 23:50:04,       NA,       NA, 11…
$ `New Device Time`                     <lgl> NA, NA, NA, NA
$ `BG Source`                           <chr> NA, "-------", "BG Source", NA
$ `BG Reading (mg/dL)`                  <dbl> NA, NA, NA, NA
$ `Linked BG Meter ID`                  <chr> NA, NA, "Linked BG Meter ID", NA
$ `Basal Rate (U/h)`                    <dbl> NA, NA, NA, NA
$ `Temp Basal Amount`                   <lgl> NA, NA, NA, NA
$ `Temp Basal Type`                     <lgl> NA, NA, NA, NA
$ `Temp Basal Duration (h:mm:ss)`       <lgl> NA, NA, NA, NA
$ `Bolus Type`                          <chr> NA, NA, "Bolus Type", NA
$ `Bolus Volume Selected (U)`           <dbl> NA, NA, NA, NA
$ `Bolus Volume Delivered (U)`          <dbl> NA, NA, NA, NA
$ `Bolus Duration (h:mm:ss)`            <lgl> NA, NA, NA, NA
$ `Prime Type`                          <chr> NA, NA, "Prime Type", NA
$ `Prime Volume Delivered (U)`          <dbl> NA, NA, NA, NA
$ Alarm                                 <chr> NA, NA, "Alarm", NA
$ Suspend                               <chr> NA, NA, "Suspend", NA
$ Rewind                                <lgl> NA, NA, NA, NA
$ `BWZ Estimate (U)`                    <dbl> NA, NA, NA, NA
$ `BWZ Target High BG (mg/dL)`          <lgl> NA, NA, NA, NA
$ `BWZ Target Low BG (mg/dL)`           <lgl> NA, NA, NA, NA
$ `BWZ Carb Ratio (g/U)`                <dbl> NA, NA, NA, NA
$ `BWZ Insulin Sensitivity (mg/dL/U)`   <lgl> NA, NA, NA, NA
$ `BWZ Carb Input (grams)`              <dbl> NA, NA, NA, NA
$ `BWZ BG Input (mg/dL)`                <dbl> NA, NA, NA, NA
$ `BWZ Correction Estimate (U)`         <dbl> NA, NA, NA, NA
$ `BWZ Food Estimate (U)`               <dbl> NA, NA, NA, NA
$ `BWZ Active Insulin (U)`              <lgl> NA, NA, NA, NA
$ `BWZ Status`                          <chr> NA, NA, "BWZ Status", NA
$ `Sensor Calibration BG (mg/dL)`       <dbl> NA, NA, NA, NA
$ `Sensor Glucose (mg/dL)`              <dbl> NA, NA, NA, 122
$ `ISIG Value`                          <dbl> NA, NA, NA, 23.9
$ `Event Marker`                        <chr> NA, NA, "Event Marker", NA
$ `Bolus Number`                        <dbl> NA, NA, NA, NA
$ `Bolus Cancellation Reason`           <lgl> NA, NA, NA, NA
$ `BWZ Unabsorbed Insulin Total (U)`    <dbl> NA, NA, NA, NA
$ `Final Bolus Estimate`                <dbl> NA, NA, NA, NA
$ `Scroll Step Size`                    <chr> NA, NA, "Scroll Step Size", NA
$ `Insulin Action Curve Time`           <lgl> NA, NA, NA, NA
$ `Sensor Calibration Rejected Reason`  <lgl> NA, NA, NA, NA
$ `Preset Bolus`                        <lgl> NA, NA, NA, NA
$ `Bolus Source`                        <chr> NA, NA, "Bolus Source", NA
$ `BLE Network Device`                  <lgl> NA, NA, NA, NA
$ `Network Device Associated Reason`    <lgl> NA, NA, NA, NA
$ `Network Device Disassociated Reason` <lgl> NA, NA, NA, NA
$ `Network Device Disconnected Reason`  <lgl> NA, NA, NA, NA
$ `Sensor Exception`                    <chr> NA, NA, "Sensor Exception", NA
$ `Preset Temp Basal Name`              <lgl> NA, NA, NA, NA
$ ...51                                 <lgl> NA, NA, NA, NA

That confirms it. Actually, we should throw those lines out right at the beginning of the process.

Wrap-Up

Doing it all in one go might then look like this:

data_full <- read_delim("data/sugr/carelink-export-220217.csv", 
                  delim = ";", escape_double = FALSE, 
                  col_types = cols(Date = col_date(format = "%Y/%m/%d"), 
                                   Time = col_time(format = "%H:%M:%S")), 
                  locale = locale(decimal_mark = ","), 
                  trim_ws = TRUE, skip = 6) %>% 
  janitor::clean_names() %>% 
  filter(!is.na(index)) %>% 
  transmute(
    index              = index,
    datetime_rounded   = round_date(ymd_hms(paste(date, time)), unit = "minute"),
    datetime           = ymd_hms(paste(date, time)),
    date               = date,
    wday               = wday(date, label = TRUE, week_start = 1),
    time               = time,
    basal_rate         = basal_rate_u_h,
    bg_direct          = bg_reading_mg_d_l,
    bg_sensor          = sensor_glucose_mg_d_l,
    wiz_ratio          = bwz_carb_ratio_g_u,
    wiz_carbs          = bwz_carb_input_grams,
    wiz_bg             = bwz_bg_input_mg_d_l,
    wiz_est_correction = bwz_correction_estimate_u,
    wiz_est_food       = bwz_food_estimate_u,
    wiz_est_unabsorbed = bwz_unabsorbed_insulin_total_u,
    bolus_final        = final_bolus_estimate, 
    bolus_delivered    = bolus_volume_delivered_u
  )

Splitting CGM & Pump Data

Now that the whole thing is cleaned up the only thing left to do is splitting pump and sensor data into seperate dataframes. We could of course leave it as is, but a split brings some advantages. The CGM is not in sync with the (semi-manual) pump data; while the former happens very much exactly every 5 minutes3, the latter just happens whenever. Additionally, at the initial reading of the data we already noticed the - professionally speaking - sheer insane amount of NAs, which we can drastically reduced by the split. When we color every cell of the data by whether it contains a value or not, it looks like this:

Code
data_full %>% 
    map_df(is.na) %>% 
    map_df(function(.){ifelse(., "<NA>", "not <NA>")}) %>% 
    mutate(row_num = length(index) - seq_along(index)) %>% 
    pivot_longer(cols = -row_num,
                 names_to = "col", 
                 names_transform = as_factor,
                 values_to = "na") %>% 
  ggplot(aes(col, row_num, fill = na)) +
    geom_tile(height = .95, width = .95, alpha = .8) +
    scale_fill_manual(values = c("#ff3300","#00cc00")) +
    theme(legend.position = "top",
          legend.title = element_blank(),
          axis.text.x  = element_blank(),
          axis.text.y  = element_blank(),
          axis.title.x = element_blank(),
          axis.title.y = element_blank(),
          panel.grid.major = element_blank(),
          panel.grid.minor = element_blank())

That’s a lot of unused tabular real estate, what a waste… We can clearly see where pump data ends and CGM data begins, though. The ‘green block’ on the left hand side naturally consists of the timedate and indexing columns. We can also see that there are still otherwise empty rows in the lower part (the CGM data). There are two approaches by which to split the thing: (1) simply filter the bg_sensor column by checking for NA values and conveniently drop the empty CGM rows, too, or (2) use the previously identified index-row, such that those empty rows are kept and can be used to further investigate whats going on there4. For the sake of thoroughness, let’s go with Option 2, have a look at what’s going on there and possibly remove those empty rows later.

cutoff_idx <- data_full$index[min(which(!is.na(data_full$bg_sensor)))]

cgm <- data_full %>% 
  filter(index > cutoff_idx) %>% 
  select(index, datetime, date, wday, time, bg_sensor)

pump <- data_full %>% 
  filter(index <= cutoff_idx) %>% 
  select(-bg_sensor)
Code
cgm %>% 
  map_df(is.na) %>% 
  map_df(function(.){ifelse(., "<NA>", "not <NA>")}) %>% 
    mutate(row_num = length(index) - seq_along(index)) %>% 
    pivot_longer(cols = -row_num,
                 names_to = "col", 
                 names_transform = as_factor,
                 values_to = "na") %>% 
  ggplot(aes(col, row_num, fill = na)) +
    geom_tile(height = .95, width = .95, alpha = .8) +
    scale_fill_manual(values = c("#ff3300","#00cc00")) +
    theme(legend.position = "none",
          legend.title = element_blank(),
          axis.text.x  = element_blank(),
          axis.text.y  = element_blank(),
          axis.title.x = element_blank(),
          axis.title.y = element_blank(),
          panel.grid.major = element_blank(),
          panel.grid.minor = element_blank())
pump %>% 
  map_df(is.na) %>% 
  map_df(function(.){ifelse(., "<NA>", "not <NA>")}) %>% 
    mutate(row_num = length(index) - seq_along(index)) %>% 
    pivot_longer(cols = -row_num,
                 names_to = "col", 
                 names_transform = as_factor,
                 values_to = "na") %>% 
  ggplot(aes(col, row_num, fill = na)) +
    geom_tile(height = .95, width = .95, alpha = .8) +
    scale_fill_manual(values = c("#ff3300","#00cc00")) +
    theme(legend.position = "none",
          legend.title = element_blank(),
          axis.text.x  = element_blank(),
          axis.text.y  = element_blank(),
          axis.title.x = element_blank(),
          axis.title.y = element_blank(),
          panel.grid.major = element_blank(),
          panel.grid.minor = element_blank())

CGM Data

Pump Data

Well, better. There’s still room for improvement, especially regarding the pump data: Lots of small bolus deliveries (the last column) not accompanied by any other action, which we could further split up, for example. Doing so is mainly a matter of iterating on the steps we’ve taken so far and at this point we’re already in a good position from which we one can start further exploration and analysis of the data. The initial clean- and split-up procedure can therefore be considered concluded. Cheers!

Footnotes

  1. mostly hard-coded things like ratios or target values that change slightly during a day but stay the same across all days↩︎

  2. other variables contain aforementioned “hard codings” or diagnostics on device usage such as low battery warnings↩︎

  3. with an unspecific shift happening once a week whenever the sensor has to be replaced↩︎

  4. Actually, a third possibility is the extraction of only those empty rows within the CGM block, but losing the context in which they appear is not very practical.↩︎