These notes were co-written by myself and a number of other people, including Greg Kiar, Eric Bridgeford, JB Poline, and Tristan Glatard. Inspired by the FAIR Guiding Principles for scientific data management and stewardship (see here), we devised the FIRM guidelines for scientific software development and stewardship, specifically numerical packages. The FIRM guidelines stipulate that code should be Findable, Installable, Runnable, and Modifiable by anybody in the world. Below is a working draft of our ideas; as always, your feedback is solicited.
- To make your code findable, we recommend three steps:
- Make the code open source on a searchable code repository (e.g., github or gitlab).
- Generate a permanent Digital Object Identifier (DOI) so that you can freely move the code to other web-servies if you so desire without breaking the links (e.g., using zenodo).
- Add a license so that others can freely use your code without worrying about legal ramifications (see here for options).
Provide installation guidelines, including 1-line installation instructions with system requirements (including hardware and OS), software dependencies, and expected install time.
Deposit your code into a standard package manager, such as CRAN for R or PyPi for Python. You might also provide a container or virtual machine image with your package pre-installed, for example, using Docker, Singularity or Gigantum.
Provide a demo, including requisite data, expected results, and runtime on specified hardware. The demo should be simple, intuitive, and fast to run. We recommend using Rmarkdown for R and a Jupyter Notebook for Python.
Write a readme with a quick start guide, including installation and a simplified (plain text) version of the demo.
- Include contribution guidelines, including:
- Write unit tests for each function. Examples are testthat for R and unittest for Python.
- Incorporate continuous integration, for example, using either TravisCI or CircleCI.
- Add the following badges to your repo:
- stable release version so people know which release they are on (from package manager),
- documentation to indicate that you generated documentation,
- code quality to indicate that your code is written using modern best practices,
- coverage to indicate the extent to which you have written tests for your functions,
- build status to indicate whether the virtual machine that contains the latest version of your code is running,
- total number of downloads,
- Finally, benchmarks establishing current performance (using appropriate metrics) on standard problems, and better yet also comparing to other standard methods. Ideally, the code the generate the benchmark numbers are provided in Jupyter notebooks provided in your Gigantum project.
A few examples of numerical packages that we have released that satisfy all (or most of) these rules include: