R Coding Style Best Practices - Datanovia (2024)


06 Mar

R Coding Style Best Practices

Alboukadel

|

R Programming

|

This article describes the essentials of R coding style best practices. It’s based on the tidyverse style guide. Google’s current guide is also derived from the tidyverse style guide.

Two importants R packages are available to help you in applying the R coding style best practices:

  • styler allows you to interactively restyle selected text, files, or entire projects. It includes an RStudio add-in, the easiest way to re-style existing code.

R Coding Style Best Practices - Datanovia (2)

The goal of styler is to provide non-invasive pretty-printing of R source code while adhering to the tidyverse formatting rules. It can be installed using the following R code: install.packages("styler"). Key functions are:

  • style_file(): styles .R, .Rmd .Rnw and .Rprofile, files.
  • style_dir(): styles all .R and/or .Rmd files in a directory.
  • style_pkg(): styles the source files of an R package.

The styler functionality is made available through other tools, most notably: usethis::use_tidy_style() styles your project according to the tidyverse style guide.

  • lintr performs automated checks to confirm that you conform to the style guide. It can be installed using the following R code: install.packages("lintr").

R Coding Style Best Practices - Datanovia (3)

The lintr lints are automatically displayed in the RStudio Markers pane (Rstudio versions > v0.99.206). In order to show the “Markers” pane in RStudio: Menu “Tools” -> “Global Options…”, a window with title “Options” will pop up. In that window: Click “Code” on the left; Click “Diagnostics” tab; check “Show diagnostics for R”.

To lint a source file test.R type in the Console lintr::lint("test.R") and look at the result in the “Markers” pane.

This package also includes two addins for linting the current source and package. To bind the addin to a keyboard shortcut navigate to Tools > addins > Browse Addins > Keyboard Shortcuts. It’s recommended to use Alt+Shift+L for linting the current source code and Ctrl+Shift+Alt+L to code the package. These are easy to remember as you are Alt+Shift+L(int) 😉

R Coding Style Best Practices - Datanovia (4)

In this tutorial, you will learn best practices for:

  • File naming and content structures
  • Variables and object naming conventions
  • R syntax and pipes best practices


Contents:

  • Files
    • File naming conventions
    • File content structure
  • Syntax
    • Object naming convention
    • Spacing
    • Argument names
    • Code blocks
    • Long lines
    • Assignment
    • Semicolons
    • Quotes
    • Comments
  • Functions
    • Function naming convention
    • Long line function definition
    • return() function
  • Pipes
    • Introduction
    • Whitespace
    • Long lines
    • Short pipes
  • References

Files

File naming conventions

File names should be meaningful and end in .R. Avoid using special characters in file names - stick with numbers, letters, -, and _.

# Goodfit_models.Rutility_functions.R# Badfit models.Rfoo.rstuff.r

If files should be run in a particular order, prefix them with numbers. If it seems likely you’ll have more than 10 files, left pad with zero:

00_download.R01_explore.R...09_model.R10_visualize.R

If you later realise that you’ve missed some steps, it’s tempting to use 02a, 02b, etc. However, it’s generally better to bite the bullet and rename all files.

File content structure

  • Load all required packages at the very beginning of the file
  • Use commented lines of - and = to break up your file into easily readable chunks.
# Load data ---------------------------# Plot data ---------------------------

Syntax

Object naming convention

  • Use only lowercase letters and numbers.
  • Use underscores (_) (so called snake case) to separate words within a name.
  • Use names that are concise and meaningful (this is not easy!).
  • Generally, variable names should be nouns and function names should be verbs.
  • Where possible, avoid re-using names of common functions and variables. This will cause confusion for the readers of your code.
  • If you find yourself attempting to cram data into variable names (e.g. model_2018, model_2019, model_2020), consider using a list or data frame instead.
# Examples of variable names ------------# Goodday_oneday_1# BadDayOnedayone

Spacing

Commas

Always put a space after a comma, never before, just like in regular English.

# Goodx[, 1]# Badx[,1]x[ ,1]x[ , 1]

Parentheses

Do not put spaces inside or outside parentheses for regular function calls.

# Goodmean(x, na.rm = TRUE)# Badmean (x, na.rm = TRUE)mean( x, na.rm = TRUE )

Place a space before and after () when used with if, for, or while.

# Goodif (debug) { show(x)}# Badif(debug){ show(x)}

Place a space after () used for function arguments:

# Goodfunction(x) {}# Badfunction (x) {}function(x){}

Infix operators

Most infix operators (==, +, -, <-, etc.) should always be surrounded by spaces:

# Goodheight <- (feet * 12) + inchesmean(x, na.rm = 10)# Badheight<-feet*12+inchesmean(x, na.rm=10)

There are a few exceptions, to this rule: ::, :::, $, @, [, [[, ^, unary -, unary +, and :.

# Goodsqrt(x^2 + y^2)df$zx <- 1:10# Badsqrt(x ^ 2 + y ^ 2)df $ zx <- 1 : 10

Extra spaces

Adding extra spaces ok if it improves alignment of = or <-.

# Goodlist( total = a + b + c, mean = (a + b + c) / n)# Also finelist( total = a + b + c, mean = (a + b + c) / n)

Do not add extra spaces to places where space is not usually allowed.

Argument names

A function’s arguments typically fall into two broad categories: one supplies the data to compute on; the other controls the details of computation. When you call a function, you typically omit the names of data arguments, because they are used so commonly. If you override the default value of an argument, use the full name:

# Goodmean(1:10, na.rm = TRUE)# Badmean(x = 1:10, , FALSE)mean(, TRUE, x = c(1:10, NA))

Avoid partial matching.

Code blocks

Curly braces, {}, define the most important hierarchy of R code. To make this hierarchy easy to see:

  • { should be the last character on the line. Related code (e.g., an if clause, a function declaration, a trailing comma, …) must be on the same line as the opening brace.
  • The contents should be indented by two spaces.
  • } should be the first character on the line.
# Goodif (y < 0 && debug) { message("y is negative")}if (y == 0) { if (x > 0) { log(x) } else { message("x is negative or zero") }} else { y^x}# Badif (y < 0 && debug) {message("Y is negative")}if (y == 0){ if (x > 0) { log(x) } else { message("x is negative or zero") }} else { y ^ x }

Long lines

  • Strive to limit your code to 80 characters per line.
  • If a function call is too long to fit on a single line, use one line each for the function name, each argument, and the closing ). This makes the code easier to read and to change later.
  • You can place several arguments on the same line if they are closely related to each other
# Gooddo_something_very_complicated( something = "that", requires = many, arguments = "some of which may be long")# Baddo_something_very_complicated("that", requires, many, arguments, "some of which may be long" )

Assignment

Use <-, not =, for assignment.

# Goodx <- 5# Badx = 5

Semicolons

Don’t put ; at the end of a line, and don’t use ; to put multiple commands on one line.

Quotes

Use ", not ', for quoting text. The only exception is when the text already contains double quotes and no single quotes.

# Good"Text"'Text with "quotes"''<a href="http://style.tidyverse.org">A link</a>'# Bad'Text''Text with "double" and \'single\' quotes'


Comments

If you need comments to explain what your code is doing, consider rewriting your code to be clearer. If you discover that you have more comments than code, consider switching to R Markdown.

Functions

Function naming convention

Use verbs for function names:

# Goodadd_row()permute()# Badrow_adder()permutation()

Long line function definition

If a function definition runs over multiple lines, indent the second line to where the definition starts.

# Goodlong_function_name <- function(a = "a long argument", b = "another argument", c = "another long argument") { # As usual code is indented by two spaces.}# Badlong_function_name <- function(a = "a long argument", b = "another argument", c = "another long argument") { # Here it's hard to spot where the definition ends and the # code begins}

return() function

  • Only use return() for early returns. Otherwise, rely on R to return the result of the last evaluated expression.
  • Return statements should always be on their own line.
# Goodfind_abs <- function(x) { if (x > 0) { return(x) } x * -1}add_two <- function(x, y) { x + y}# Badadd_two <- function(x, y) { return(x + y)}

If your function is called primarily for its side-effects (like printing, plotting, or saving to disk), it should return the first argument invisibly. This makes it possible to use the function as part of a pipe. print methods should usually do this, like this example from httr:

print.url <- function(x, ...) { cat("Url: ", build_url(x), "\n", sep = "") invisible(x)}

Pipes

Introduction

Use %>% to emphasise a sequence of actions.

Avoid using the pipe when:

  • You need to manipulate more than one object at a time. Reserve pipes for a sequence of steps applied to one primary object.
  • There are meaningful intermediate objects that could be given informative names.

Whitespace

%>% should always have a space before it, and should usually be followed by a new line. After the first step, each line should be indented by two spaces.

# Goodiris %>% group_by(Species) %>% summarize_if(is.numeric, mean) %>% ungroup() %>% gather(measure, value, -Species) %>% arrange(value)# Badiris %>% group_by(Species) %>% summarize_all(mean) %>%ungroup %>% gather(measure, value, -Species) %>%arrange(value)

Long lines

If the arguments to a function don’t all fit on one line, put each argument on its own line and indent:

iris %>% group_by(Species) %>% summarise( Sepal.Length = mean(Sepal.Length), Sepal.Width = mean(Sepal.Width), Species = n_distinct(Species) )

Short pipes

A one-step pipe can stay on one line, but unless you plan to expand it later on, you should consider rewriting it to a regular function call.

# Goodiris %>% arrange(Species)iris %>% arrange(Species)arrange(iris, Species)


Recommended for you

This section contains best data science and self-development resources to help you on your path.

Coursera - Online Courses and Specialization

Data science

  • Course: Machine Learning: Master the Fundamentals by Stanford
  • Specialization: Data Science by Johns Hopkins University
  • Specialization: Python for Everybody by University of Michigan
  • Courses: Build Skills for a Top Job in any Industry by Coursera
  • Specialization: Master Machine Learning Fundamentals by University of Washington
  • Specialization: Statistics with R by Duke University
  • Specialization: Software Development in R by Johns Hopkins University
  • Specialization: Genomic Data Science by Johns Hopkins University

Popular Courses Launched in 2020

  • Google IT Automation with Python by Google
  • AI for Medicine by deeplearning.ai
  • Epidemiology in Public Health Practice by Johns Hopkins University
  • AWS Fundamentals by Amazon Web Services

Trending Courses

  • The Science of Well-Being by Yale University
  • Google IT Support Professional by Google
  • Python for Everybody by University of Michigan
  • IBM Data Science Professional Certificate by IBM
  • Business Foundations by University of Pennsylvania
  • Introduction to Psychology by Yale University
  • Excel Skills for Business by Macquarie University
  • Psychological First Aid by Johns Hopkins University
  • Graphic Design by Cal Arts

Amazon FBA

Amazing Selling Machine

  • Free Training - How to Build a 7-Figure Amazon FBA Business You Can Run 100% From Home and Build Your Dream Life! by ASM

Books - Data Science

Our Books

  • Practical Guide to Cluster Analysis in R by A. Kassambara (Datanovia)
  • Practical Guide To Principal Component Methods in R by A. Kassambara (Datanovia)
  • Machine Learning Essentials: Practical Guide in R by A. Kassambara (Datanovia)
  • R Graphics Essentials for Great Data Visualization by A. Kassambara (Datanovia)
  • GGPlot2 Essentials for Great Data Visualization in R by A. Kassambara (Datanovia)
  • Network Analysis and Visualization in R by A. Kassambara (Datanovia)
  • Practical Statistics in R for Comparing Groups: Numerical Variables by A. Kassambara (Datanovia)
  • Inter-Rater Reliability Essentials: Practical Guide in R by A. Kassambara (Datanovia)

Others

  • R for Data Science: Import, Tidy, Transform, Visualize, and Model Data by Hadley Wickham & Garrett Grolemund
  • Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurelien Géron
  • Practical Statistics for Data Scientists: 50 Essential Concepts by Peter Bruce & Andrew Bruce
  • Hands-On Programming with R: Write Your Own Functions And Simulations by Garrett Grolemund & Hadley Wickham
  • An Introduction to Statistical Learning: with Applications in R by Gareth James et al.
  • Deep Learning with R by François Chollet & J.J. Allaire
  • Deep Learning with Python by François Chollet

Version: Français

No Comments

Give a comment

R Coding Style Best Practices - Datanovia (2024)
Top Articles
Latest Posts
Article information

Author: Roderick King

Last Updated:

Views: 6238

Rating: 4 / 5 (71 voted)

Reviews: 94% of readers found this page helpful

Author information

Name: Roderick King

Birthday: 1997-10-09

Address: 3782 Madge Knoll, East Dudley, MA 63913

Phone: +2521695290067

Job: Customer Sales Coordinator

Hobby: Gunsmithing, Embroidery, Parkour, Kitesurfing, Rock climbing, Sand art, Beekeeping

Introduction: My name is Roderick King, I am a cute, splendid, excited, perfect, gentle, funny, vivacious person who loves writing and wants to share my knowledge and understanding with you.