Hi folks, I'm building an application with Membrane and MembraneRTCEngine and am having some trouble with garbled & dropped audio resulting from some issues with timestamping audio from an HTTP source.
The application lets a user join a WebRTC audio call with a chatbot "peer".
The user can click a button in their browser which triggers the chatbot to "speak" -- under the hood, this requests audio data via HTTP which is then piped through a chatbot "peer" endpoint.
Currently, I am able to hear the audio received from the HTTP source, but it is garbled, truncated, or otherwise incoherent.
My Chatbot endpoint uses Membrane.LiveMixer
to mix audio from two sources:
- An HTTP Source, parsed with
Membrane.RawAudioParser
to apply timestamps - A Silence generator which is
Membrane.SilenceGenerator
passed throughMembrane.RealTimer
Membrane.RawAudioParser
allows you to set pts_offset
value, which Membrane.LiveMixer
uses to select the correct audio to mix from each stream.
The examples for Membrane.RawAudioParser
and Membrane.LiveMixer
use constant offsets (e.g. "If Source B starts 5 seconds after Source A...").
However, I do not know the offset to set in advance -- it should be the difference between the start of the Chatbot endpoint / silence generator, and the start of the stream from the HTTP source.
Do you have any suggestions as to how to correctly set the pts_offset for Membrane.RawAudioParser
when I do not know the offset in advance? Is there a way to set it dynamically (e.g. using start_of_stream
events?)