In the world of scientific research, reproducibility has long been a persistent challenge. A survey conducted by Nature found that more than 70% of researchers encountered difficulties when attempting to reproduce experiments conducted by their peers, and more than half have struggled to reproduce their own work. This issue hampers team progress and collaboration, and for decision makers it introduces doubt into the scientific process.
Code Ocean, now with a new Result Lineage feature, delivers trusted science by offering a visual representation of the complete history of every data result generated within its Digital Lab. This first-of-its-kind feature addresses the disconnect from original source data that plagues computational research. As research data travels from scientist to scientist, it becomes scattered across various tools and locations, making it difficult to share data and code — and almost impossible to reproduce results.
Simon Adar, Co-founder and CEO of Code Ocean, emphasizes the significance of Result Lineage as a solution to the ‘last mile’ problem in computational research. “Scientists spend too much of their time and energy struggling to reproduce previous results to determine whether they are valid. Result Lineage guarantees the traceability and reproducibility of scientific results by illustrating the entire history of each scientific result at a glance. This enhances both the speed and quality of scientific discoveries. It means results presented to decision makers can be trusted, as every result is generated with full lineage for traceability and reproducibility.”
Every data asset generated in Code Ocean’s platform has a result provenance metadata box containing information about how the result was created, where the input data came from, and how it has been used. This metadata is automatically generated. Result Lineage takes this information and presents it in a color-coded graph, so researchers can understand the reproducibility of a result at a glance. Each point on the lineage graph links back to the original source data, providing an easy way to trace its origin and track its usage history. Result Lineage currently covers the entire backward provenance of a result, including the input data, Capsules and pipeline output. Any computational step in the Result Lineage can be reproduced with the click of a button, making the process highly streamlined and efficient.
Code Ocean Lineage Graph showing Result Provenance of a Bulk RNA-Seq Pipeline. A visual representation of the input Data, Capsules, and Pipeline used to generate the final result, “Star Output GSE157194”. Each node of the lineage graph includes a direct link to the asset used for analysis.
David Feng, Director of Scientific Computing at the Allen Institute for Neural Dynamics adds, “We often talk about the importance of provenance tracking, but Code Ocean makes it a transparent part of the analytical workflow. The lineage graph gives us confidence that we will always be able to reproduce our work.”
By addressing the long-standing challenges of traceability, reproducibility, and reuse, Code Ocean continues to empower researchers to accelerate their discoveries, collaborate effectively, and ensure the trustworthiness of their work.