There are many parallel challenges between industry and academia: we all struggle to make our code easy to run and collaborate with our colleagues across departments, or even within a department. Even within a single lab there are many challenges in making sure that somebody else can reuse the code. Ultimately, every piece of code that has value should be transparent, reproducible, and easy for others to use.
Reproducibility is extremely important, because if your model or analysis fails you have to understand why. You have to go back to the drawing board and see how you can make the model more robust – and how can you do that if it was not reproducible in the first place?
In pharmaceutical companies as well as academic labs there is a lot of turnover. People with AI expertise tend to be attracted to other companies and other labs, so when somebody leaves your environment and you have to put somebody else on that project, then you need very good documentation in place, a very high level of transparency, and a high level of reproducibility. Otherwise that new person will have a hard time figuring out how that initial complex model was designed and may or may not be able to run the code successfully, yet alone improve it.
The Princess Margaret Research Centre (PM) is dedicated to transparency and reproducibility. When I saw how Code Ocean worked, how easy it is to use, and how it guarantees reproducibility and transparency, I made it mandatory to use Code Ocean within the lab from that moment on.
Another aspect of Code Ocean that scientists will find extremely valuable is that it has been built using open architecture standards. You can export the work and use it outside of Code Ocean with open source tools such as Docker, git, Jupyter, RStudio and more. That is critical for scientific work to have its maximum impact. Anyone can benefit from this reproducible work even if they are not Code Ocean users.
For scientists it has huge value, because you can start peeking into the code, changing the parameters, click “run again”, and get the new results in a matter of seconds, minutes, or hours depending upon the length of the computation.
The older path was to download the paper, look at the text description, if you’re lucky the code is there, and that’s usually a process that takes between a couple of months or even years if you don’t give up before that. Code Ocean made it a matter of minutes or hours, so it was a game changer for us, and we embarked on that journey and have used the platform ever since then.
This blog is Part 2 of our three-part series on Benjamin Haibe-Kains’ view of the larger state of genomic research and the major challenges and opportunities facing the field today. Read Part 1 on hiring and onboarding here.