One of our four core values as a company is ‘stay curious’ – valuing science and evidence generation above all else. 

The Science Behind Tortus

Introducing the CREOLA Platform

Advanced AI for Safer Healthcare

At Tortus, our commitment to safety and innovation is embodied in the CREOLA platform. This clinician-in-the-loop framework is designed to ensure the accuracy and reliability of our AI systems before they are deployed in clinical settings. Through continuous evaluation by medical professionals, CREOLA helps identify and correct potential errors in AI-generated content, ranging from speech-to-text conversions to medical summaries.

Core Values: Scientific Integrity and Curiosity

Central to Tortus’s mission is our core value of ‘Stay Curious.’ We prioritize the scientific method in all our endeavors, ensuring that our developments in AI not only meet rigorous standards of safety and effectiveness but also continuously push the boundaries of what is possible in medical technology.

Publications and Ongoing Research

Our dedication to advancing healthcare AI is also evident in our active research agenda. We are exploring the effects of electronic health records (EHR) on cognitive load among healthcare providers. By understanding these impacts, Tortus aims to refine AI tools to better support clinicians, reducing the mental strain associated with documentation and administrative tasks.


Clinician-in-the-loop validation:

Tailored for clinical AI applications in healthcare.

Safe iteration:

Ensures the integration of new models and techniques without compromising safety.

Extensive clinician involvement:

Over 100 clinicians actively engaged.

Proven effectiveness:

More than 500 labeled episodes generated in just a few months.


1) You need to closely define the task – for example in our use case we are taking audio of a consultation and creating medical documentation

2) Then you need to define the metrics of success – in this case we look at time saving in high detail, the total length of the consultation time, but also the direct vs indirect care time, the quality of the documentation (using a standardized tool) and the clinical and patient subjective experience, including measuring cognitive load (using the NASA task load index)

3) You then need to define the metrics of failure – what could go wrong? For this task what can go wrong with generative AI models is adding in specific details to the note that weren’t in the audio – this is known as a hallucination. Or the note misses important clinical details from the audio – this is termed as an omission. We then define these into minor and major – with major being clinical impact on the patients diagnosis, treatment or management in the context of the consultation.

4) Then you need to define how you are going to measure success and failure – in our studies we compare before and after implementation against standard of care, and use in clinic human observers to capture very granular detail. On the failure side, we have a bespoke platform for creating, running and labelling these errors with real clinicians – CREOLA. This gives us high detail on both sides of a very specific task.

5) Lastly you then need to collect real-world evidence – running trials within a real-world setting to sufficient numbers to demonstrate the hypothesis. 

This is a basic framework we use to assess any of our AI tool stack – starting with ambient scribe technology and moving into other areas such as coding, ordering, diagnoses, etc

We are unique in our space in that we don’t train or build many of our own models – we constantly clinically evaluate the market and once we are satisfied with safety and data privacy, we incorporate that model into our AI stack, guard railing it and carefully controlling it as needed. This approach necessitates a constant clinical evaluation of every model, hence the need to build and test our systems. This has the advantage of being incredibly agile – we are now on our 5th speech-to-text model for example in the last 12 months as our primary. We can also then stack models and prioritize them in the stack depending on their clinical results, allowing multiple layers of high quality redundancies. The other advantage is O.S.L.E.R is ALWAYS state-of-the-art as a system by design, and customers can escape vendor lock-in when clinical AI is moving so very fast right now.

TORTUS is an expanding team of clinicians and ML scientists, with over 100 publications between us. The co-founders are both academics and scientists, we have multiple post-PhD ML scientists in house, as well as academic clinicians. Our advisors include Eric Jang, VP of AI at 1X Robotics, and Dr Sarah Gebauer MD and advisor on AI to the RAND corporation in the US. Our partnerships include some of the most innovative and digitally advanced hospitals on the planet, such as GOSH.

Every single module of our AI interface as we build will be tested and researched in the same rigor as above, before being incorporated into the project. That means we will have clinical trial grade evidence for any future feature, carefully evaluated, and that will allow us to explore even more advanced clinical diagnostic tools and assistant capabilities in the right way in the future.