How AI Coding Assistants Can Impact Your Coding and Analysis | R bloggers

How AI Coding Assistants Can Impact Your Coding and Analysis | R bloggers

[This article was first published on Seascapemodels, and kindly contributed to R-bloggers]. (You can report a problem with the content on this page here)


Want to share your content on R bloggers? click here if you have a blog, or here if you don’t.

I am a frequent user of AI coding assistants to speed up my coding. I usually use Github Copilot and the Roo Code agentic extension for VSCode, with several large language models (LLMs).

We hear a lot about biases and LLMs, which made me wonder: What biases could using coding assistants introduce into my coding and data analysis?

Bias is a concern in other areas of workplace applications of LLMs. For example, LLMs are known to be sexist, biased in recommending scientific articles from certain parts of the world, and even biased toward Elon Musk’s opinion.

I had thought that using LLMs primarily to support R code would mean that I would be fairly immune to bias issues in LLMs.

But after a recent experience with Python, I realized that there are subtle types of biases in LLMs that can have a big impact on the quality of an analysis.

I had considered that LLMs might recommend certain types of statistical tests more often than others, perhaps giving me a preference for the dominant statistical methodologies (just as they seem to recommend frequentist rather than Bayesian generalized linear models more often).

But I didn’t worry about that. I have a fairly broad view of the statistics in my field, and I don’t usually use LLMs to recommend new methods. I look to the scientific literature for that advice.

LLMs may have their own specific ways of implementing code, but the end result of any given statistical test is the same, it doesn’t matter much how you get there.

Using Python has changed my view on this. I’m a lifelong R coder, not a Python coder. However, I wanted to learn some deep learning analytics, for which I needed Python. So I started using my coding assistant to write a tutorial for me on how to learn deep learning in Python.

The way LLM coding assistants can affect your code is that most are set up to be sycophantic. So they reinforce your own prejudices by confirming that everything you have is good.

This became clear when I, the R coder, brought my R-informed ideas for writing code to Python. The LLM led me down paths for implementing code that makes sense in R-land, but is not best practice for implementation in Python.

I realized this when I started referencing actual Python tutorials.

More generally, LLMs tend to acknowledge your requests. For a data analysis workflow, this means they tend to reinforce the biases you start with.

Errors that cause the code to fail are okay because they are easy to detect. More worryingly, the LLM amplifies subtle errors in the choice of metrics.

I have one pre-print on using LLMs and AI coding assistants for ecological data analysis. One of our recommendations is to split your workflow into parts.

It especially helps to choose the analysis before you plan how the code will be written (and which software or packages you will use). When you put these two different decisions together, you’re more likely to make mistakes that compound. It is best to choose your analysis using AI, but also by reading the literature. Then ask the AI ​​to help you implement that.

The other problem is that you have finished your analysis and the LLM has led you to think it is excellent. You submit that analysis to peer review and the reviewers think differently.

So it’s a good idea to test your work against the discipline standards, and ideally have expert colleagues look at it as well. Take everything an LLM says with a grain of salt, especially if it agrees with you.


#Coding #Assistants #Impact #Coding #Analysis #bloggers

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *