AI in Pharma Industry: Large Language Models in Digital Documentation

July 18, 2023

The majority of life science enterprises are now embracing artificial intelligence (AI) for a wide range of applications within their organizations. While many people are familiar with the use of AI in pharma  (download our free whitepaper) and biotech market enhancing drug discovery and design, there are numerous other high-impact applications worth noting. 

Recently, major players in the field have announced significant advancements in Natural Language Processing (NLP) models, such as OpenAI's GPT-3 and Google's PaLM. What do these developments mean for the life science industry, and specifically, for pharmaceutical companies? 

Pharma companies have been transitioning from paper-based to digital documentation. However, the efforts required to standardize data structures for both current and historical data can be immense. In some cases, the cost-benefit analysis may even favor redoing experiments rather than mining old data due to the extensive time and resources required. With the latest advancements in the NLP space, this situation could change dramatically. 

What are Large Language Models

Large Language Models (LLMs) are a special kind of machine learning model that is trained to work with text. Usually, they are trained by showing the model a huge set of text (corpus) containing examples of how people tend to use various words and by using Self Supervision Learning techniques there is no need for a very costly data labelling procedure. In industries where large volumes of textual data are analyzed, the usage of large language models (LLMs) is inevitable. Compared to other kinds of models, they handle bigger data sets, have a complex architectural design, and possess near-human level accuracy in tasks such as generating text, summaries, translations, answering questions, and sentiment analysis. 

One of the most prominent LLMs that has gained significant attention is GPT-3. Although it was not the first of its kind, LLMs like GPT-3 have come close to human-level accuracy in various tasks. With platforms like ChatGPT-3, LLMs have become accessible to the public, leading to skyrocketing usage and exploration of their applications at work, school, and home. Companies can now leverage the immense computing power of these models at a reasonable cost, implementing ideas faster and more affordably without always needing to consult field experts. 

How can it be leveraged by Pharma



If NLP technology and LLMs have already started to be implemented within the pharmaceutical industry, the recent breakthroughs in these technologies are increase their usability in various use-cases, including: 

  1. Predicting drug interactions and side effects: By cross-analyzing data from existing drugs in the scientific literature or in the personal database of the organization, similarities between known drugs can inform on potential side effets and interactions of drug candidates. This can help detect and prevent adverse reactions in patients at an early stage by suggesting specific tests to run before moving forward with a candidate. 
  1. Unlocking correlations leading to personalized medicine: LLMs can extract critical information from Electronic Health Records written by doctors, identifying hidden symptoms and surfacing correlations with other cases. This knowledge can then be leveraged by physicians to make informed predictions about the medications that may work best for individual patients.  
  1. Optimizing clinical trials: LLMs can scan patients’ medical history to identify eligible participants for a study according to the clinical trial requirements. When done manually, this process is extremely time-consuming and can delay the introduction of crucial drugs to the market. 
  1. Diagnosis assistance: Doctors can use specifically trained LLMs to quickly determine the relationship between a patient's symptoms and their potential diagnosis. This wealth of data can be harnessed to improve patient outcomes. 

Interestingly, a relatively unexplored area still is the analysis and optimization of experiments and tests before clinical trials, due to the low amount of digitally available documentation collected during R&D experiments. Mining data in real-time to provide insights to scientists while designing new experiments or working at the bench level could significantly enhance laboratory operations. NLP and high-performing LLMs have the potential to boost every aspect of a scientist's work in the lab. 

 While these models are incredibly powerful, there is still much to learn about the underlying mechanisms that drive their functionality. Often, they act as "black boxes," making it difficult to understand exactly how they arrive at a particular result. Like any new advancement, there are risks associated with their use. One must be cautious that the model does not generate a false answer that appears convincing to the human eye. 

Numerous legal questions surrounding intellectual property protection, liability for incorrect outputs, and copyright issues have yet to be fully addressed by authorities. However, It is evident that these organizations are acknowledging the potential benefits of AI in this field, and are taking a cautious approach to evaluating how it can be utilized in a safe and effective manner The FDA and other regulatory bodies are engaging with experts to explore the impact of AI in the pharmaceutical industry, and have emphasized the importance of validating these models. This represents a significant shift in the landscape of AI and offers numerous opportunities for progress and innovation. 

 How can it be leveraged by digital lab assistants



NLP is at the core of digital lab assistants, such as the LabTwin app. Large Language models have the ability to replicate routine and established tasks in a laboratory, such as scheduling instrument bookings, detecting patterns in data, etc. This can help providing contextual prompts if a parameter falls outside of the set range for example. They can also automate the process of reagent ordering by understanding from the company-wide documentation the amount of reagents used on a daily basis. Checking reagents levels is still today a tedious task performed by humans on a regular basis. At LabTwin, we aim at utilizing LLMs to facilitate these types of tasks and surfacing insights for the collected data.  
Reciprocally, retrieving data from lab informatics in a digestible way at the bench is a challenge which can also be tackled by LLMs. LabTwin is leveraging the technology to process content queried from our digital lab assistant in a way which can be easily voiced back and interacted with. Incorporating such AI processes in our technology not only saves time in the experiment planning but is what makes it smart and user-friendly.  


 As with any emerging technology, enterprises must carefully evaluate the potential applications of Large Language Models (LLMs), considering factors such as use cases, performance, legal and technical limitations, and more. However, it is vital to explore and understand the significant ROI that LLMs can offer. Failing to keep up with this rapidly evolving trend may make it challenging to catch up later on. For instance, in March 2023, OpenAI launched an improved and more advanced version of GPT-3, called GPT-4, within a short time span.  

It is essential to remember that implementing AI in pharmaceutical industry is a journey, and already starting to develop a strategy, along with its execution plan, will ensure ongoing learning with the trend. Embrace trials and errors, as some use cases and their applications could be game-changing for your industry or company. 

At LabTwin GmbH, we have been focusing on leveraging LLMs for operational lab efficiency from the beginning. While there have been significant advances in the field, certain limitations persist, and predicting the future is always a challenge. Nonetheless, we are committed to staying at the forefront of this exciting technology. If you would like to learn more about how LLMs can benefit your organization, please feel free to schedule a demo with us 

This blog is the first one of the AI Conversations blog series aiming at discussing the latest advances in the field of AI and NLP. Subscribe to our quarterly Newsletter to stay up to date with our latest content. 

 References: (White paper here)  

stay updated

Subscribe to our newsletter.

Ellipse 136-2
Ellipse 52 (Stroke) (1)-1