Robust data management and statistical analysis have become increasingly important in horticultural science. Greater complexity in experimental design as well as the rise in “-omics” technologies (e.g., genomics, metabolomics, phenomics) have required the use of computers to execute customized and sophisticated analyses that draw from a large amount of information. Computers must also meet the demand to store and curate large amounts of data generated through experimental equipment, sensors, or surveys. The amount of these data across all scientific disciplines has been expanding exponentially since the 1990s, and often requires uniform curation and detailed documentation to share and analyze across research groups (Howe et al., 2008). This uniformity in both data curation and analysis improves repeatability and allows others to make use of the data and data products to further support scientific discovery.
Many computer programs are available to perform statistical analysis. Programs vary based on their user interface (text, graphics); program language, including C (Bell Laboratories, Murray Hill, NJ), R (University of Auckland, Auckland, New Zealand), or Python (Centrum Wiskunde & Informatica, Amsterdam, The Netherlands); and licensing (proprietary, open source). Although requests for statistical computing skills in job postings have increased broadly since 2005, growth in R programming skills increased over 400%, underscoring this software’s significance in data-oriented positions. In comparison, there was just a 25% increase in postings seeking SAS (SAS Institute, Cary, NC) skills during this period (Indeed.com, 2016). R uses a text-based or command-line interface, and is compiled through its own computing language, also called R, which promotes consistent style and documentation. In addition to the command-line interface, R offers a graphical user interface and an integrated development environment that makes R accessible to beginners and powerful for advanced users (Rstudio, Boston, MA). R has attracted a strong interest in the biological sciences due to its open-source nature and active development community that spans many sectors and disciplines. Such open-source software is free of charge with anyone able to participate in software development and contribute program packages containing custom functions and operations (Ihaka and Gentleman, 1996). Numerous packages exist relevant to the agricultural sciences, such as those for data management [dplyr (Wickham, 2015)], elaborate graphing [ggplot2 (Wickham, 2009)], analysis of genomic data [Bioconductor (Huber et al., 2015)], or mixed linear model analysis [lme4 (Bates et al., 2015)]. Despite the open access to R and its many resources, programming in R requires a large early investment to learning about software development in a general programming environment by users.
We designed a 90-min workshop to introduce horticultural scientists to basic computer programming with R to help beginning users navigate this learning curve. This workshop was inspired by the data management education methods developed by Data Carpentry (Teal et al., 2015a). We created a slide presentation as well as an R script that was distributed to participants to facilitate their learning during the workshop. The workshop specifically addressed the following points: data structures and workflow; how to find help and install additional packages from open-source software; and how to import, subset, and export data. The workshop occurred on 4 Aug. 2015 at the ASHS Annual Conference in New Orleans, LA, sponsored jointly by Computer Applications in Horticulture and Graduate Student working groups.
HoweD.CostanzoM.FeyP.GojoboriT.HannickL.HideW.HillD.P.KaniaR.SchaefferM.St PierreS.TwiggerS.2008Big data: The future of biocurationNature4554750
HuberW.CareyV.J.GentlemanR.AndersS.CarlsonM.CarvalhoB.S.BravoH.C.DavisS.GattoL.GirkeT.GottardoR.2015Orchestrating high-throughput genomic analysis with BioconductorNat. Methods12115121
Indeed.com2016R statistics SAS statistics and SPSS statistics job trends. 1 Aug. 2015. <http://www.indeed.com/jobtrends/Sas%2CR%2CSPSS.html>
R Project2015The R Project for Statistical Computing. 4 Aug. 2015. <https://www.r-project.org/>
SuskoA.Q.BrymZ.T.20152015 ASHS computing workshop materials. 4 Aug. 2015. <https://figshare.com/articles/2015_ASHS_Computing_Workshop_Materials/2068122>
TealT.K.CranstonK.A.LappH.WhiteE.WilsonG.RamK.PawlikA.2015aData carpentry: Workshops to increase data literacy for researchersIntl. J. Digital Curation10135143
TealT.K.CranstonK.A.LappH.WhiteE.WilsonG.RamK.PawlikA.2015bQuick reference sheet. 4 Aug. 2015. <http://www.datacarpentry.org/semester-biology/materials/r-intro/>
TealT.K.CranstonK.A.LappH.WhiteE.WilsonG.RamK.PawlikA.2015cSelf-guided student materials. 4 Aug. 2015. <http://www.datacarpentry.org/semester-biology/START-for-self-guided-students/>
TealT.K.CranstonK.A.LappH.WhiteE.WilsonG.RamK.PawlikA.2015dTeaching materials for instructors. 4 Aug. 2015. <http://www.datacarpentry.org/lessons/>
WhiteJ.W.BeattieD.J.KubekP.1990Inquiry learning with videodiscs and computers: An innovative teaching method for horticulture coursesHortScience25385388
WickhamH.2009ggplot2: Elegant graphics for data analysis. 1st ed. Springer Berlin Germany
WickhamH.2015dplyr: A grammar or data manipulation. 4 Aug. 2015. <https://cran.r-project.org/web/packages/dplyr/dplyr.pdf>
WilsonG.AruliahD.A.BrownC.T.Chue HongN.P.DavisM.GuyR.T.HaddockS.H.D.HuffK.D.MitchellI.M.PlumbleyM.D.WaughB.2014Best practices for scientific computingPLoS Biol.12e1001745