Do you want to share your content on R-bloggers? Click here if you have a blog, or here If you don’t.
Introduction
In the first part From this series I looked at the use of R in Excel to obtain descriptive statistics. In this second part of the series I will look at the use of R in Excel to perform linear regression, specifically with the help of the lm() function. lm() is a real workhorse function. It can be used to perform both a few and multiple regression and different types of variance analyzes. For this demonstration I will only concentrate on a few and multiple regression.
The workbook for this part of the series is: “Part II – R in Excel – Linear regression.xlsx”. As earlier, the worksheet contains ‘references’ links to external references. The worksheet ‘Libraries’ loads extra (non-default) packages. In this demonstration I use the datarium And broom Packages. The ‘Datasets’ worksheet contains the data that is referred to in the worksheets.
The data
The data set is the marketing data set of the datarium package. It consists of a data frame with the impact of three advertising media (YouTube, Facebook and newspaper) on sale. Data is the advertising budget in thousands of dollars together with the sale. The advertisement experiment has been repeated 200 times. The data set is described here.
If the data set is not available, check that the datarium Library is loaded by going to the worktop of the libraries and evaluating:
=RScript.Evaluate(library(datarium), TRUE)
Simple linear regression
We make a simple linear model in the simple linear regression worktop.
smodel <- lm(sales ~ youtube, data = marketing)
As we did in the first part of this series, we unpack the ‘Smodel’. Usually this is just a matter of evaluating the corresponding R objects, for example the residues and the coefficients. We also ask for the reliability intervals of 95% for the coefficients.
![]()
We can currently compare the results with the summary output that is supplied by the regression function of the analysis tool suit (including the reliability intervals of 95%).
It is worth emphasizing that the output returned to Excel is not ‘formatted’ via the add -in, such as in R/Rstudio. When we call summary(smodel) In R we get a well -known table output that summarizes the most important characteristics of the model. However, what is sent back to Excel via the add -in is easier. So it’s worth looking at this. We can see from the ADD environment that the smodel_summary is a list of 11 items. To use this, we must unpack some of the individual list items.
![]()
Here we can see the actual call and the details of the model including the statistics (sigma” r-squared and so forth). It needs some extra work to unpack the summary data of the model residues to produce a table output.
as.data.frame(as.array(summary(smodel$residuals)))
This formula summarizes the model residues and first forces the data in an array (to get the names) and then as a data frame. We perform a similar processing to get the coefficients in a table layout.
With all the data available in this form, and with a little effort we can now construct a summary output that is comparable to R.
![]()
Multiple linear regression
The following worksheet shows several linear regression with the same marketing data. This is comparable to the previous worksheet and illustrates how relevant values can be removed from the returned model data.
In this case, however, we use the package broom. This can help tidy up the output data.
![]()
In earlier examples we have extracted model results by combining the returned model name with labels of the model. To make it easier to pick up results, the add -in offers some extra functions for searching for models: Model.Results performs a list of results from the model. Model.Result Performs the result that has been obtained from one item from the list of model results. Optionally, the result can be made as a data frame. This is a bit more convenient than evaluating scripts of the form model name'$coeffcientsetc. By extracting the model data and the summary, we can construct an output that is comparable to that of R.
Logistics
The last example shows logistics regression. At this point you may think that this massage and extracting output data is not entirely satisfactory. It is a lot of work and it can be pretty brittle (if the cell references change, for example). Moreover, the models are becoming more complicated (see Part III in the series) This approach may become more messier. To reduce this, the add-in offers a few wrapper functions around regression. These help with both the setup and the export of more complex models.
![]()
The Modelingangen are specified as a block of parameters and the function Regression.GLM is used instead of the R -Script -equivalent using glm. The model outputs can be ‘requested’ using the Model.Results and the Model.Result Formulas. Even with this, however, there is some effort to perform a summary that is comparable to that of R.
Pack
In this message we looked at how R in Excel to use linear regression, and we spent some time demonstrating how the various components can be removed from the model data. This will be useful in the following two parts of this series.
Related
#Part #Excel #Linear #regression #RBloggers


