Every time there is a new major update from The RFoundation (like the recent 3.6.0release in April). I’m always happy to seethe continuing progress and the combination of new features and bugfixes, but I also dread the upgrade because it means I have to addressthe issue of what to do about the burgeoning number of packages(libraries) I have installed.
Haven enables R to read and write various data formats used by other statistical packages by wrapping the fantastic ReadStat C library written by Evan Miller. Haven is part of the tidyverse. Currently it supports: SAS: readsas reads.sas7bdat +.sas7bcat files and readxpt reads SAS transport files (version 5 and version 8). Character vector of names of packages that are allowed to be masked. These would typically be base packages attached by default. Some packages have restrictive licenses, and there is a mechanism to allow users to be aware of such licenses. The libxls C library is used to support.xls, which abstracts away many of the complexities of the underlying binary format. To parse.xlsx, we use the RapidXML C library.
Up until now I confess I simply have sort of “winged it”, done theupgrade and either manually thought about what packages I “really”needed or just grabbed a few essentials and then let my needs dictatewhatever else I reloaded. This time I decided to get serious about theprocess and pay attention to not only what I was doing but documentingit and keeping a record via some amount of coding (and this post).
I’m aware that there are full-fledged packagemanagers likepackrat
and checkpoint
and even a package designed to manage theupgrade for you on windows, but I’m a Mac user and wanted to do thingsmy own way and I don’t need that level of sophistication.
So I set out to do the following:
- Capture a list of everything I had installed under
R 3.5.3
and,very importantly, as much as I could about where I got the packagee.g.CRAN
orGitHub
or ??? - Keep a copy for my own edification and potential future use.
- Do a clean
R 3.6.0
install and not copy any library directoriesmanually. - Take a look at the list I produced in #1 above but mainly to justdownload and install the exact same packages if I can find them.
- Make the process mainly scripted and automatic and available againfor the future.
Helpful background
As I was searching the web I found a few helpful posts that saved metime in building my own solution. The primary was thisposton Stack Overflow
. I wanted to extend the function listed there to doa little more of my work for me. Instead of just being able to generatea listing of what I had installed from GitHub I wanted to be able todetermine most of the places I get packages from, which are CRAN
,GitHub
and R-Forge
.
So let’s load tidyverse
to have access to all it’s various functionsand features and then build a dataframe called allmypackages
with thebasic information about the packages I currently have installed in R3.5.3.
Note - I’m writing this after already upgrading so there will be a fewinconsistencies in the output
- This could just as easily be a
tibble
but I choseas.data.frame
- I am deliberately removing base packages from the dataframe by
filter
- I am eliminating columns I really don’t care about with
select
A function to do the hard work
As I mentioned above the stack overflow post was a good start but Iwanted more information from the function. Rather than TRUE/FALSE to isit github I would like as much information as possible about where I gotthe package. The package~source
function will be applied to thePackage
column for each row of our dataframe. For exampleas.character(packageDescription('ggplot2')$Repository)
will get back“CRAN”, and as.character(packageDescription('CHAID')$Repository)
will yield “R-Forge”. For GitHub packages the result is character(0)
which has a length
of zero. So we’ll test with an if else
clause. Ifwe get an answer like “CRAN” we’ll just return
it. If not, we’ll seeif there is a GitHub repo listed withas.character(packageDescription(pkg)$GithubRepo)
as well as a GitHubusername as.character(packageDescription(pkg)$GithubUsername)
. If theyexist we’ll concatenate and return. If not we’ll return “Other”. Besidesbeing good defensive programming this may catch the package you havebuilt for yourself as is the case for me.
What’s in your libraries?
Now that we have the package_source
function we can add a column toour data frame and do a little looking.
And just to be on the safe side we’ll also write a copy out as a csvfile so we have it around in case we ever need to refer back.
Go ahead and install R 3.6.0
At this point we have what we need, so go ahead and download and installR 3.6.0. At the end of the installation process you’ll have a pristinecopy with a new library directory. When next you restart R and R Studioyou’ll see a clean new version. Let’s make use of our data frame toautomate most of the process of getting nice clean copies of thelibraries we want.
We’ll start by getting the entire tidyverse
since we need severalparts and because installing it will trigger the installation of quite afew dependencies and bootstrap our work.
Now we have R 3.6.0 and some additional packages. Let’s see what we cando. First let’s create two dataframes, one with our old list and onewith what we have right now. Then we can use anti_join
to make adataframe that lists the differences thediff
. We can use filter
andpull
to generate a vector of just the the packages that are on CRAN wewant to install.
**Note – I’m faking the output rather than reinstalling all thesepackages on my machine so you will see packages from the tidyverse
inthe listing **
Just do it!
Now that you have a nice automated list of everything that is a CRANpackage you can give it a final look and see if there is anything elseyou’d like to filter out. Once you are sure the list is right one finalpipe will set the process in motion.
Depending on the speed of your network connection and the number ofpackages you have that will run for a few minutes.
That takes care of our CRAN packages. What about GitHub?
Here’s another chance to review what you have and whether you still wantneed these packages. I could automate the process and once again feedthe right vector to devtools::install_github()
but instead I choose tohandle these manually as indevtools::install_github('leeper/slopegraph')
.
Same with the one package I get from R-Forge…
At the end of this process you should have a nice clean R install thathas all the packages you choose to maintain as well as a detailedlisting of what those are.
Done!
Library Tidyverse On Mac Computer
I hope you’ve found this useful. I am always open to comments,corrections and suggestions.
Library Tidyverse On Macbook Pro
Chuck (ibecav at gmail dot com)