Bioinformatics: Data Knowledge Management With Complex Datasets

Add bookmark

Pharma IQ
05/31/2016

With the growth of biologics showing no signs of stopping, bioinformatics continues to be a key driver when it comes to R&D development. Laboratories and their data systems are now expected to handle and manage biotechnology’s densely structured proteins, with some antibodies standing at 20,000 atoms each. (3)

“The ability to maximise the utility of biological data emerging from pharmaceutical, biopharmaceutical, and biotechnology research and development can significantly improve development timelines, success and project costs.”, according to Andrew Lemon, co-founder and CEO at The Edge Software Consultancy. (1) However, turning this into a reality has been difficult due to the lack of adequate management from data storage solutions.

We spoke to Patrick Ansems, EMEAI Director at PerkinElmer who said, "Biology is a strategic area for us as big data analytics can play an important role in unlocking insights from complex biological data. . For example, if you look at translational science, it’s all about aggregating the data together to understand the disease.”

The term Bioinformatics is perhaps changing in terms of what it means. Historically, it was similar to cheminfomatics – dealing with how to analyze biological entity structures, align and manipulate biological sequences. Here the definition is moving much further toward biological process data management notes Paul Denny- Gouldson (IDBS).

In response to this market surge, more biotech investment is forecasted as opposed to that for small molecule informatics. This has been seen in Q1 of 2016 with Sanofi dedicating a 300 million capital injection to its biologics site in Belgium to support its monoclonal antibodies portfolio and Core Informatics launching a suite of new applications for therapeutic biologics discovery and development.This monetary focus should contribute to addressing some of the dominant complexities with bioinformatics, which are: the vast data volume, process complexity and complications with integration.

[inlinead]

Data volume

Biologics, genomics, poteanomics and metabolomics for example, produce large volumes of data by their very nature. Paul Denny-Gouldson (IDBS) noted that both the complexity and volume of data produces the need to advance the research and enrich data so it can be visualised and interpreted, as some imaging processes can result in millions of data points and pictures per run. He added that resolving this will require sophisticated mathematical calculations and AI tools to look for patterns and notify the scientists.

Process complexity

A large mass of work is being conducted within biological sample tracking and a lot of the data that is being generated is big data, which raises questions on how the more complex, data sets should be managed. This is focused on not just gathering the data to store it but to measure the biology – combined with the omics and imaging sessions.

Cell line development presents a tranche of challenges in regards to tracking and developing aspects on the cell line. The informatics systems must track and identify the location of the finding, so at a later date the scientist can be easily informed on the methods which led to the results. Also, considerations for sophisticated informatics support are required for departments which need to view the data simultaneously.

With fermentation - which works with instrument and real time data - after analyzing the data the variables that will impact the outcome the most can be seen and then consequently strategies can be created to manage them. However, in some cases thousands of variables can be revealed, leaving labs with the task of trying to map out the best form of analytics to apply.