I have to admit that I lost track of how many conversations I had last year with computational researchers of all levels and in organizations ranging from tiny to large about the challenges they face every day when doing their jobs.
I am grateful for each and every one of these discussions. To say I learned a lot would be an understatement. While all organizations grapple with their unique set of challenges, some common themes have emerged that I want to share with you.
- Most computational biology leaders struggle to define best practices, e.g. standardizing methodologies, analysis and workflows and leveraging previous work.
- Most if not all computational research teams rely on open source packages developed by the scientific academic community. The challenge is that these tools don’t always play well together. Computational researchers end up spending about 40% of their time trying to make various packages and tools work together and reproducing results.
- DevOps tasks are the bane of every computational scientist’s existence. Setting up cloud infrastructure and security or configuring and accessing data are not core competencies of computational biologists who often struggle with the tasks or need help from a software engineer. As a rule, scientists want to do science, not software engineering and many feel that technology is getting in their way.
- Talking about technology, there are some technology trends and patterns that can be observed in smaller biotech companies. These companies:
- generally start out in the cloud
- might initially be R shops or Python shops but invariably end up using both mainly so they have access to a larger talent pool. R and Python continue to dominate. Although Julia offers some benefits, most researchers haven’t heard of this new computing language.
- predominantly prefer AWS, but GCP’s generous cloud credit for startups and Big Query are major incentives. Azure’s market share is negligible to nonexistent.
- mostly adopt Benchling as ELN/LIMS
- The larger the organization the more siloed information becomes an issue. Lack of an easy way to share work in progress or results, the inability to collaborate efficiently with internal and external partners as well as challenges associated with quick iterations slow down progress. These issues also have the tendency to take the fun out of being a scientist who does cutting-edge research and with that increase the risk researchers leaving.
- There is a consensus among senior leaders that most important success factor in drug discovery is the ability to process research data efficiently. There is also consensus that there is a lot of room for improvement when it comes to efficient data processing.
The common thread in these conversations is that there is a skill gap between bioinformaticians and computational biologist on one side and software engineers on the other. While the skills of both groups are needed to answer the big scientific questions the organization is tackling, no common, centralized and standardized platform exists that allows easy collaboration and sharing. Such a platform is needed to allow computational researchers to do their work without running into coding issues that they aren’t trained to handle or force software engineers to deep dive into science.
Adding bench scientists or external partners like clinicians to the mix stresses the system even more. Their input and insights are critically important but they are not trained in computational approaches and even less in coding. A centralized platform therefore needs to include options like user-friendly apps that allows lab scientists to interact with the data without coding.
At Code Ocean we are dedicated to developing such a centralized platform and closing that skills gap. Our Compute Capsules contain code, data, computing environment and results and allows researchers with all levels of coding experience to efficiently analyze date, share their work and reproduce results.