Audio and text data processing.

Mar 7, 2024

In collaboration with our partner Humanitas, I carried out a proof of concept on the use of recent AI technologies (mainly LLM, including Zephyr, OpenZephyrChat, Mixtral) to create a chatbot system that integrates with internal tools. This project was extended to also support:

Audio transcription, which would be useful for processing meeting minutes, for example.
Real-time audio transcription, which is interesting for getting immediate feedback during a conversation.
Voice synthesis, for assistance to visually impaired people.

The solutions chosen for the audio part are OpenAI Whisper for audio transcription, and pyttsx3 (and espeak) for audio synthesis.

A demo is available at https://chronos.spiderweak.fr, the demonstration is quite slow, it runs on a device with limited resources.

This project is the first step towards 3 other internal projects of the partner:

VOIP server integrated voice assistant (Asterisk)
Personal assistant for people with visual impairments
Drone control interface (Mission Planner)

LLM Speech-to-text

Antoine BERNARD

Postdoctoral Fellow @ Polytechnique Montréal

Tech enthousiast, interested in computer networks, distributed processing and a bit of AI.