Robust data management and statistical analysis have become increasingly important in horticultural science. Greater complexity in experimental design as well as the rise in “-omics” technologies (e.g., genomics, metabolomics, phenomics) have required the use of computers to execute customized and sophisticated analyses that draw from a large amount of information. Computers must also meet the demand to store and curate large amounts of data generated through experimental equipment, sensors, or surveys. The amount of these data across all scientific disciplines has been expanding exponentially since the 1990s, and often requires uniform curation and detailed documentation to share and analyze across research groups (Howe et al., 2008). This uniformity in both data curation and analysis improves repeatability and allows others to make use of the data and data products to further support scientific discovery.
Many computer programs are available to perform statistical analysis. Programs vary based on their user interface (text, graphics); program language, including C (Bell Laboratories, Murray Hill, NJ), R (University of Auckland, Auckland, New Zealand), or Python (Centrum Wiskunde & Informatica, Amsterdam, The Netherlands); and licensing (proprietary, open source). Although requests for statistical computing skills in job postings have increased broadly since 2005, growth in R programming skills increased over 400%, underscoring this software’s significance in data-oriented positions. In comparison, there was just a 25% increase in postings seeking SAS (SAS Institute, Cary, NC) skills during this period (Indeed.com, 2016). R uses a text-based or command-line interface, and is compiled through its own computing language, also called R, which promotes consistent style and documentation. In addition to the command-line interface, R offers a graphical user interface and an integrated development environment that makes R accessible to beginners and powerful for advanced users (Rstudio, Boston, MA). R has attracted a strong interest in the biological sciences due to its open-source nature and active development community that spans many sectors and disciplines. Such open-source software is free of charge with anyone able to participate in software development and contribute program packages containing custom functions and operations (Ihaka and Gentleman, 1996). Numerous packages exist relevant to the agricultural sciences, such as those for data management [dplyr (Wickham, 2015)], elaborate graphing [ggplot2 (Wickham, 2009)], analysis of genomic data [Bioconductor (Huber et al., 2015)], or mixed linear model analysis [lme4 (Bates et al., 2015)]. Despite the open access to R and its many resources, programming in R requires a large early investment to learning about software development in a general programming environment by users.
We designed a 90-min workshop to introduce horticultural scientists to basic computer programming with R to help beginning users navigate this learning curve. This workshop was inspired by the data management education methods developed by Data Carpentry (Teal et al., 2015a). We created a slide presentation as well as an R script that was distributed to participants to facilitate their learning during the workshop. The workshop specifically addressed the following points: data structures and workflow; how to find help and install additional packages from open-source software; and how to import, subset, and export data. The workshop occurred on 4 Aug. 2015 at the ASHS Annual Conference in New Orleans, LA, sponsored jointly by Computer Applications in Horticulture and Graduate Student working groups.
Howe, D., Costanzo, M., Fey, P., Gojobori, T., Hannick, L., Hide, W., Hill, D.P., Kania, R., Schaeffer, M., St Pierre, S. & Twigger, S. 2008 Big data: The future of biocuration Nature 455 47 50
Huber, W., Carey, V.J., Gentleman, R., Anders, S., Carlson, M., Carvalho, B.S., Bravo, H.C., Davis, S., Gatto, L., Girke, T. & Gottardo, R. 2015 Orchestrating high-throughput genomic analysis with Bioconductor Nat. Methods 12 115 121
Indeed.com 2016 R statistics, SAS statistics, and SPSS statistics job trends. 1 Aug. 2015. <http://www.indeed.com/jobtrends/Sas%2CR%2CSPSS.html>
R Project 2015 The R Project for Statistical Computing. 4 Aug. 2015. <https://www.r-project.org/>
Susko, A.Q. & Brym, Z.T. 2015 2015 ASHS computing workshop materials. 4 Aug. 2015. <https://figshare.com/articles/2015_ASHS_Computing_Workshop_Materials/2068122>
Teal, T.K., Cranston, K.A., Lapp, H., White, E., Wilson, G., Ram, K. & Pawlik, A. 2015a Data carpentry: Workshops to increase data literacy for researchers Intl. J. Digital Curation 10 135 143
Teal, T.K., Cranston, K.A., Lapp, H., White, E., Wilson, G., Ram, K. & Pawlik, A. 2015b Quick reference sheet. 4 Aug. 2015. <http://www.datacarpentry.org/semester-biology/materials/r-intro/>
Teal, T.K., Cranston, K.A., Lapp, H., White, E., Wilson, G., Ram, K. & Pawlik, A. 2015c Self-guided student materials. 4 Aug. 2015. <http://www.datacarpentry.org/semester-biology/START-for-self-guided-students/>
Teal, T.K., Cranston, K.A., Lapp, H., White, E., Wilson, G., Ram, K. & Pawlik, A. 2015d Teaching materials for instructors. 4 Aug. 2015. <http://www.datacarpentry.org/lessons/>
White, J.W., Beattie, D.J. & Kubek, P. 1990 Inquiry learning with videodiscs and computers: An innovative teaching method for horticulture courses HortScience 25 385 388
Wickham, H. 2009 ggplot2: Elegant graphics for data analysis. 1st ed. Springer, Berlin, Germany
Wickham, H. 2015 dplyr: A grammar or data manipulation. 4 Aug. 2015. <https://cran.r-project.org/web/packages/dplyr/dplyr.pdf>
Wilson, G., Aruliah, D.A., Brown, C.T., Chue Hong, N.P., Davis, M., Guy, R.T., Haddock, S.H.D., Huff, K.D., Mitchell, I.M., Plumbley, M.D. & Waugh, B. 2014 Best practices for scientific computing PLoS Biol. 12 e1001745