Pharmaceutical clinical trials present both a compelling and challenging use case for Artificial Intelligence. Look at one aspect of trials where the promise is clear – trials are packed with data, require extensive analysis and summarization, and operate in an industry where speed to market will give drugs longer times until patent expiry. However, data summarization needs to be error-free, complete, in exactly the right form, and thoroughly traceable in case questions arise.
Threading that needle is tough. One company that’s attempting to do so is Yseop, with offices in Paris, New York and Lyon. Its story shows both the opportunity and the obstacles in the way.
The Problem to be Solved
Clinical trials lead to submissions to regulatory authorities such as the US Food & Drug Administration or the European Medicines Agency. These submissions have to follow exacting standards and cannot have mistakes – those can derail a submission and cause both delay and pain for the companies involved. While the submissions make an argument, they do so with massive amounts of data – who took the drug or a placebo, was it effective, was there an adverse event, and so on. Moreover, the data needs summarization, and the process needs to be described too such as how the trial was run, what population took the drug, and what results were observed.
The submissions are voluminous, critical, and time-sensitive. Much in the status quo can be improved upon. Yseop’s CEO, Emmanuel Walckenaer, said in an interview with me. “It takes 28 weeks on average between the database lock until the submission. We want to provide an acceleration.”
The Requirements
Some uses of AI are relatively low stakes. If an AI system places a sub-optimal advertisement onto a Facebook page, it’s not the end of the world. But clinical trials are different – datapoints cannot be omitted, they need to be correct, and they definitely cannot be hallucinated by the AI system. So, while a Large Language Model is ideal for creating text summaries of data, it cannot fall prey to some of the occasional failings of LLMs.
The requirements don’t stop there, though. Security is another issue. Walckenaer explains, “Our material is clinical data, which is the most important asset for big pharma companies. Absolutely, you cannot train your model with that data; we use synthetic or public data. And there can be no leakage.”
Another job that customers are trying to get done is to keep control. “All of our customers are telling us we don’t want a black box,” notes Walckenaer. “They want to control exactly what they write and how they write it. One customer, for instance, has hundreds of medical writers who have a consistent way of writing and they want full control. That’s a big deal that’s very difficult to achieve.”
This relates to one of the biggest obstacles to AI adoption. “The management is super-excited to do this, but then you’ve got the end users who are actually scared,” says Walckenaer. “To help cross the chasm, you have to be perfect in user control, it must be very easy to use, and the outputs have to achieve perfection as well.”
The Approach
For Yseop, the solution to these problems required a combination of approaches. Multiple LLMs are used based on their fitness for the purpose. OpenAI is “complete overkill” to do simple summarization, believes Walckenaer. In a coming version of the system, the company will use an LLM to supervise the work of other LLMs, doing quality checks. And the company also uses classic change management, picking initial champions, constantly measuring their experience and adjusting the system, and thereby building confidence by others to adopt the technology too.