Hi folks, I'm building an application with Membrane and MembraneRTCEngine and am having some trouble with garbled & dropped audio resulting from some issues with timestamping audio from an HTTP source.
The application lets a user join a WebRTC audio call with a chatbot "peer".
The user can click a button in their browser which triggers the chatbot to "speak" -- under the hood, this requests audio data via HTTP which is then piped through a chatbot "peer" endpoint.
Currently, I am able to hear the audio received from the HTTP source, but it is garbled, truncated, or otherwise incoherent.
My Chatbot endpoint uses
Membrane.LiveMixer to mix audio from two sources:
- An HTTP Source, parsed with
Membrane.RawAudioParserto apply timestamps
- A Silence generator which is
Membrane.RawAudioParser allows you to set
pts_offset value, which
Membrane.LiveMixer uses to select the correct audio to mix from each stream.
The examples for
Membrane.LiveMixer use constant offsets (e.g. "If Source B starts 5 seconds after Source A...").
However, I do not know the offset to set in advance -- it should be the difference between the start of the Chatbot endpoint / silence generator, and the start of the stream from the HTTP source.
Do you have any suggestions as to how to correctly set the pts_offset for
Membrane.RawAudioParser when I do not know the offset in advance? Is there a way to set it dynamically (e.g. using