Do you want to share your content on R-bloggers? Click here if you have a blog, or here If you don’t.
1. Introduction
1.1. The use of AI agents and the case for “prompt” choices
We entered The use of the AI pair programmer, Copilot, while he works tidyverse. In this message we lead the reader through the fundamental engine of the successful use of AI agents – instructions. Promotion is the method to instruct an LLM to perform a task by carefully designing an input text. Our reason will be the “regular” or “zero-shot” that is the most used question technology Where you can easily ask the agent a question and receive an answer. When using iteration, it is usually recommended to get a more exactly answer, especially with coding outputs. This technique is flexible and easy to use, but in my other more complex coding and analysis work flows I have also used few shot Give reason, where you give the LLM in your prompt examples and/or context. In this message we will concentrate on iterative zero-shot prompt.
1.2. A summary of working with Copilot in Rstudio
Setting Copilot in Rstudio: Add Copilot as a pair programmer to your Rstudio is simple and has a seamless interface. View the position documentation here. Make sure you switch it on! See below.
2. Use Copilot to get data entertainment
After cleaning up our data as we do in it fragmentWe will now use Copilot for our first task – to get some useful summaries of the fields that we think will probably influence the results of patient care. We do this in section 2 of our analysis post. Note that to give the right guidance to Copilot for these summaries, we must know which fields we summarize and their underlying class Within your R -workspace, which we will all specify in our prompt.
These are the fields that we will summarize.
primary_diag | First admission Covariate | Primary diagnosis in recording |
a1c | First admission Covariate | Whether a patient received the A1C test or not |
diabetesMed | intermediate response | whether a patient received diabetes medication |
acarbose:troglitazone | intermediate response | Individual 23 diabetes drugs that were prescribed or not, and if prescribed, stable, raised or lowered |
readmitted | End | Whether the patient was recovered early (<30 days) |
Table 1: Important data fields and their descriptions
Notes: View the clip where I show the promptly that I wrote to reach the summary code. Note the following –
- The Ghost Text Auto-Complete that is typical of Copilot as soon as you start materializing the initial rule of the code.
- The details of the prompt – the field names and that they are
factorFields, etc.Note that I went back and adjusted the promptly because I forgot to ask for frequency rates. I also went back and added
arrange()Myself to sort for a nicer look. See the last code used in the analysis immediately after the video. Note that this does not bring us to the final bable outputs and table layout that were done manually.
Click on “Show the code” to view the Copilot code output from the interaction above.
Show the code
# ---------- ---------- ---------- ---------- # # Final code output after iterating with Copilot # ---------- ---------- ---------- ---------- # summary_a1c <- D %>% group_by(a1c) %>% summarise(count = n(), percent = n() / nrow(D) * 100) %>% arrange(desc(percent)) summary_primary_diag <- D %>% group_by(primary_diag) %>% summarise(count = n(), percent = n() / nrow(D) * 100) %>% arrange(desc(percent)) summary_readmitted <- D %>% group_by(readmitted) %>% summarise(count = n(), percent = n() / nrow(D) * 100) %>% arrange(desc(percent)) summary_change <- D %>% group_by(change) %>% summarise(count = n(), percent = n() / nrow(D) * 100) %>% arrange(desc(percent)) summary_diabetesMed <- D %>% group_by(diabetesMed) %>% summarise(count = n(), percent = n() / nrow(D) * 100) %>% arrange(desc(percent))
Click on “Show the code” to see the final code used to clean the data and generate the summaries.
Show the code
# --------------------------------------------- #
# Load necessary libraries
# --------------------------------------------- #
library(dplyr)
library(ggplot2)
library(tidyr)
library(kableExtra)
library(gridExtra)
library(grid)
library(lattice)
# --------------------------------------------- #
# Read in the data
# --------------------------------------------- #
D <- read.csv("https://raw.githubusercontent.com/VidishaVac/healthcare-analytics/refs/heads/main/dataset_diabetes/diabetic_data.csv", sep=",")
# -------------------------------------------------------- #
# Clean and re-define HbA1c and primary diag
# -------------------------------------------------------- #
D <- D %>%
mutate(a1c=ifelse(A1Cresult=="None", "not measured", "measured"),
primary_diag = case_when(
diag_1 %in% c(390:459, 785) ~ "Circulatory",
diag_1 %in% c(460:519, 786) ~ "Respiratory",
diag_1 %in% c(520:579, 787) ~ "Digestive",
diag_1 %in% c(580:629, 788) ~ "Genitourinary",
diag_1 %in% c(630:679) ~ "Pregnancy",
diag_1 %in% c(680:709, 782) ~ "Skin",
diag_1 %in% c(710:739) ~ "Musculoskeletal",
diag_1 %in% c(740:759) ~ "Congenital",
diag_1 %in% c(800:999) ~ "Injury",
grepl("^250", diag_1) ~ "Diabetes",
is.na(diag_1) ~ "Missing",
TRUE ~ "Other"
))
# Remove columns not used, patients with a discharge disposition of "expired" or "hospice"
D <- D %>% select(1,2,7:9,25:52) %>%
filter(!discharge_disposition_id %in% c('11','13','14','19','20','21'))
# Summarize
summary_a1c <- D %>%
group_by(a1c) %>%
summarise(count = n(), percent = n() / nrow(D) * 100) %>%
arrange(desc(percent))
summary_primary_diag <- D %>%
group_by(primary_diag) %>%
summarise(count = n(), percent = n() / nrow(D) * 100) %>%
arrange(desc(percent))
summary_readmitted <- D %>%
group_by(readmitted) %>%
summarise(count = n(), percent = n() / nrow(D) * 100) %>%
arrange(desc(percent))
summary_change <- D %>%
group_by(change) %>%
summarise(count = n(), percent = n() / nrow(D) * 100) %>%
arrange(desc(percent))
summary_diabetesMed <- D %>%
group_by(diabetesMed) %>%
summarise(count = n(), percent = n() / nrow(D) * 100) %>%
arrange(desc(percent))
# summary_primary_diag %>%
# kable(digits = 1, format.args = list(big.mark = ","),
# caption = "Breakdown of the primary patient diagnosis upon admission for a diabetic encounter") %>%
# row_spec(1, background = "yellow", font_size = "larger")We show a sample summary for the patient’s primary diagnosis when admission – Brand on how the diagnoses of blood circulation are good for the most common reason for hospital admissions in diabetic encounters. This is an essential insight and an important theme that our analyzes go through.
To consume the summary data more easily, we use a circular barplot. Although I did not use Copilot for the actual barplot, I used GPT-4O (directly in a browser) to reach this clean 2-column look for the plot and the notes on the right. See here for the full conversation.
Finally, see “Plotnotitions” for more information about the design.

The circular barplot offers a visually fascinating and effective way to convey categorical comparisons, combining aesthetic attraction with immediate interpretability. One of the strengths of the design here is the ability to emphasize both absolute and relative differences at the same time by labeling clear axis and a proportional arc length. Circulatory Disease, for example, accounts for 30% of the sample – about 30,000 encounters – while diabetes as a diagnosis of primary admission only represents 9%, or less than 10,000. These contrasts are immediately not only clear in the numbers, but also in the visual impact of the plot. The minimalist style, combined with concise explanatory text above the graph, makes every visual an independent and accessible summary-so-to-analytical empire as a reading friendly.
Show the code
df <- summary_primary_diag
df <- df %>% mutate(
label=c(paste(round(df$percent[1:5]), "%", sep=""),rep(NA,nrow(df)-5)))
p0 <- ggplot(data=df, aes(x=reorder(primary_diag,count), y=count, fill=percent)) +
geom_bar(stat="identity", color="black") +
coord_polar() +
scale_fill_gradientn(
"Percent",
colours = c( "#6C5B7B","#C06C84","#F67280","#F8B195")) +
geom_text(aes(x=primary_diag, y=count/2,label = label), size=3.5,
color="black",
fontface="bold") +
scale_y_continuous(breaks = seq(0, 40000, by = 10000),
limits = c(-10000, 40000),
expand = c(0, 0)) +
annotate("text", x = 11.95, y = 12000, label = "10,000", size = 2.5, color = "black") +
annotate("text", x = 11.95, y = 22000, label = "20,000", size = 2.5, color = "black") +
annotate("text", x = 11.95, y = 32000, label = "30,000", size = 2.5, color = "black") +
theme(
# Remove axis ticks and text
axis.title = element_blank(),
axis.ticks = element_blank(),
axis.text.y = element_blank(),
# Use gray text for the region names
axis.text.x = element_text(color = "gray12", size = 10),
legend.position = "none",
plot.title = element_text(face = "bold", size = 15),
plot.subtitle = element_text(size = 9)) +
# Add labels
labs(
title = "\nPatient primary diagnoses",
subtitle = paste(
"\nPrimary diagnoses of admitted patients vary considerably.",
"Even for diabetic encounters, the most common admitting diagnosis",
"is circulatory or heart disease. We will call this out in our analysis,",
"comparing patient outcomes for these high-risk but non-diabetic",
"admissions with diabetic admissions (which are only 9% of the",
"admitting or primary diagnoses).",
sep = "\n"
)) 3. Requesting pitfalls: what you ask is what you get
Copilot is powerful and easy to use, but in this section we will dive a little more in the interest of iteration-how effective promotion must often be coordinated, and how insufficient or non-specific guidance you can lead.
Consider the 23 medication recipes, each categorized in one of the four levels: 𝘕𝘰, 𝘚𝘵𝘦𝘢𝘥𝘺, 𝘜𝘱 or 𝘋𝘰𝘸𝘯., In which the medicines include known drugs such as insulin and metformin. I wanted a strong visual for watching all 23 in 1 plot, and I started with my own code for a barplot. Then I wanted to explore alternative plott types that could reveal more nuanced insights.
🖼️ In the image below you can follow my conversation with Copilot: I started with a vague prompt (“can you repair this empty plot?”) Then it refined a bit (“Can you use a different kind of plot?”)
🔍 Only when I gave clear, well -scoped instructions, Copilot returned a useful, meaningful visualization.
💡 Collection meals: The quality of your output depends on the quality of your prompt. Copilot is powerful, but only if you lead it with precision.
![]()
Click on “Show the code” to see the actual back and forth conversation with Copilot that was the basis of the final visualization.
Show the code
# ---------- # # Prompt 1 # ---------- # # Consider "meds" and "p0" above. # Can you recommend a plot that is not geom_col # or geom_bar but that will capture how they are # being prescribed, using "freq"? # ---------- # # Output 1 # ---------- # p1 <- ggplot(data=meds, aes(x=medication, y=freq, fill=value)) + geom_area(position = "fill") + theme_minimal() + coord_flip() + labs(title = "Proportion of Medication Prescriptions", y = "Proportion (%)", x = "Medication") # ---------- # # Prompt 2 # ---------- # # This plot above is blank, can you fix this? # ---------- # # Output 2 # ---------- # p1 <- ggplot(data=meds, aes(x=medication, y=freq, fill=value)) + geom_area(position = "fill") + theme_minimal() + coord_flip() + labs(title = "Proportion of Medication Prescriptions", y = "Proportion (%)", x = "Medication") + scale_y_continuous(labels = scales::percent_format(scale = 1)) # ---------- # # Prompt 3 # ---------- # # It is still blank, can you use a different type of plot? # ---------- # # Output 3 # ---------- # p2 <- ggplot(data=meds, aes(x=medication, y=freq, fill=value)) + geom_tile() + theme_minimal() + coord_flip() + labs(title = "Medication Prescription Frequency", y = "Frequency (%)", x = "Medication") + scale_y_continuous(labels = scales::percent_format(scale = 1)) # ---------- # # Prompt 4 # ---------- # # Do you think geom_tile needs "medication" and "value" to be on the axes and "freq" # to be the fill? # ---------- # # Output 4 # ---------- # p3 <- ggplot(data=meds, aes(x=value, y=medication, fill=freq)) + geom_tile() + theme_minimal() + labs(title = "Medication Prescription Frequency", y = "Medication", x = "Value") + scale_fill_gradient(low = "blue", high = "red") + scale_x_continuous(labels = scales::percent_format(scale = 1)) # ---------- # # Final tweak -- Prompt 5 # ---------- # # It seems discrete values have been supplied to continuous scale, can you fix? # ---------- # # Output 5 # ---------- # p4 <- ggplot(data=meds, aes(x=value, y=medication, fill=freq)) + geom_tile() + theme_minimal() + labs(title = "Medication Prescription Frequency", y = "Medication", x = "Value") + scale_fill_gradient(low = "blue", high = "red") + scale_x_discrete(labels = scales::percent_format(scale = 1))
Let’s view the final code used for the visualization outputs, generated from the above Copilot interview. See below plot design notes for the motivation behind the original Barplot and the crucial difference between the specifications of the 2 suddenly, and how it relates to the above Copilot iteration. Click on “Show the code” to see the underlying plot -generating code.
Plot 1: Original Barplot with reversed coordinates
Show the code
# Reshaping wide to get a better view of medications
meds <- D %>%
select(metformin:metformin.pioglitazone) %>%
gather(key = "medication", value = "value") %>%
group_by(medication, value) %>% summarise(n=n()) %>% mutate(pct = (n / sum(n))*100)
# My original barplot with flipped coordinates
p0 <- ggplot(data=meds, aes(x=medication, y=pct, fill=value)) + geom_col() +
labs(title = "Diabetes medications bar plot",
x = "Medication name",
y = "Percent prescribed", fill="Medication value") + theme_minimal() + coord_flip() +
theme(plot.title = element_text(hjust = 0.5))
p0
Plot notes: The original barplot on the left is a slightly different use of geom_col(). Although it is normally used for a bar gram use, it has a fairly useful “heatmap” in this plot. To be used coord_flip()We are able to clearly see the full name of each medicine. In my mind this is a slightly better look than the medication names on the X-axis are in a 45-degree angle. We have further improved this by also using geom_tile().
Plot 2: Heatmap with an improved palette and labels
Show the code
# Heatmap with improved color palette and labels
p1 <- ggplot(data=meds, aes(x=value, y=medication, fill=pct)) + geom_tile() +
labs(title = "Diabetes medication heatmap",
x = "Medication value",
y = "Medication name", fill="Percent prescribed") + theme_minimal() +
theme(plot.title = element_text(hjust = 0.5)) +
scale_fill_gradientn(colors = palette.colors(9),
limits = c(0, 100)) +
geom_text(aes(label = paste(round(pct, 1), "%", sep="")), color = "white", size = 3)
p1
Plot notes: When switching to one geom_tile() Heatmap -Plot, the more nuanced recipe variations become much clearer. However, the standard color parameters of Copilot output 4 do not offer this so clearly. I adjust these color parameters to make them more robust among shortcomings in the color vision, making them easy to distinguish for all viewers and the actual variations between the prescribed medicines appear. I also add data labels to the last plot.
Finally, pay attention to the very important difference in the aes() Specifications between 2 suddenly, where the Copilot output 3 goes out of the conversation and needs a refined promptly for the left side geom_col()the fill parameter is the value or the levels of the medication, while for the right side geom_tile()It is the actual percentage. The legend to the right of each plot shows this.
4. What is the following
We will continue to present the use of AI agents in our workflow for the analysis of healthcare during use tidyverseTo show how health technology stakeholders can benefit in this industry, especially given the wealth of the data in our health systems. In my opinion, this data effectively using the right tools to extract insights that influence decisions of care delivery, is not only our final goal, but also crucial for the progress of the use of technology in health care.
Related
#Note #health #technology #Copilot #Tips #guidance #RBloggers


