TORTUS Logo
TORTUS Logo

Contact Sales

We are scientists

One of our four core values as a company is ‘stay curious’ – valuing science and evidence generation above all else. 

The Science Behind TORTUS

Introducing the C.R.E.O.L.A platform.

As fast as we can, as slow as we need

At TORTUS, our commitment to safety and innovation is embodied in the CREOLA platform. This clinician-in-the-loop framework is designed to ensure the accuracy and reliability of our AI systems before they are deployed in clinical settings. Through continuous evaluation by medical professionals, CREOLA helps identify and correct potential errors in AI-generated content, ranging from speech-to-text conversions to medical summaries.

We are pioneers in LLM clinical evaluation.

Deployment of AI technology into healthcare has required not only following a rigorous framework of assessment for LLMs in this case, but having to build those framework systems and the tools to do so in the first place. CREOLA is a first-of-its-kind system that enabled us to utilise real human clinicians to help us iterate our systems safely and continously.

This is only the beginning.​

Our ongoing research now looks forward to exploring evaluations for every element of our AI stack today and tomorrow, inc new methodologies, as well as exploring ways to automate our existing human in the loop validation systems to enable a live system in production, guard railing and evaluating outputs continuously.

Our Features

CREOLA is actively being built with the help of over 100 clinicians

Clinician-in-the-loop validation

Tailored for clinical AI applications in healthcare.

Safe iteration

Ensures the integration of new models and techniques without compromising safety.

Extensive clinician involvement

Over 100 clinicians actively engaged.

Proven effectiveness

More than 500 labeled episodes generated in just a few months.

FAQ

1) You need to closely define the task – for example in our use case we are taking audio of a consultation and creating medical documentation

2) Then you need to define the metrics of success – in this case we look at time saving in high detail, the total length of the consultation time, but also the direct vs indirect care time, the quality of the documentation (using a standardized tool) and the clinical and patient subjective experience, including measuring cognitive load (using the NASA task load index)

3) You then need to define the metrics of failure – what could go wrong? For this task what can go wrong with generative AI models is adding in specific details to the note that weren’t in the audio – this is known as a hallucination. Or the note misses important clinical details from the audio – this is termed as an omission. We then define these into minor and major – with major being clinical impact on the patients diagnosis, treatment or management in the context of the consultation.

4) Then you need to define how you are going to measure success and failure – in our studies we compare before and after implementation against standard of care, and use in clinic human observers to capture very granular detail. On the failure side, we have a bespoke platform for creating, running and labelling these errors with real clinicians – CREOLA. This gives us high detail on both sides of a very specific task.

5) Lastly you then need to collect real-world evidence – running trials within a real-world setting to sufficient numbers to demonstrate the hypothesis. 

This is a basic framework we use to assess any of our AI tool stack – starting with ambient scribe technology and moving into other areas such as coding, ordering, diagnoses, etc

We are unique in our space in that we don’t train or build many of our own models – we constantly clinically evaluate the market and once we are satisfied with safety and data privacy, we incorporate that model into our AI stack, guard railing it and carefully controlling it as needed. This approach necessitates a constant clinical evaluation of every model, hence the need to build and test our systems. This has the advantage of being incredibly agile – we are now on our 5th speech-to-text model for example in the last 12 months as our primary. We can also then stack models and prioritize them in the stack depending on their clinical results, allowing multiple layers of high quality redundancies. The other advantage is O.S.L.E.R is ALWAYS state-of-the-art as a system by design, and customers can escape vendor lock-in when clinical AI is moving so very fast right now.

TORTUS is an expanding team of clinicians and ML scientists, with over 100 publications between us. The co-founders are both academics and scientists, we have multiple post-PhD ML scientists in house, as well as academic clinicians. Our advisors include Eric Jang, VP of AI at 1X Robotics, and Dr Sarah Gebauer MD and advisor on AI to the RAND corporation in the US. Our partnerships include some of the most innovative and digitally advanced hospitals on the planet, such as GOSH.

Every single module of our AI interface as we build will be tested and researched in the same rigor as above, before being incorporated into the project. That means we will have clinical trial grade evidence for any future feature, carefully evaluated, and that will allow us to explore even more advanced clinical diagnostic tools and assistant capabilities in the right way in the future.

TORTUS Logo