RTC Engine, HTTP Sources, and pts offsets

Hi folks, I'm building an application with Membrane and MembraneRTCEngine and am having some trouble with garbled & dropped audio resulting from some issues with timestamping audio from an HTTP source.

The application lets a user join a WebRTC audio call with a chatbot "peer".

The user can click a button in their browser which triggers the chatbot to "speak" -- under the hood, this requests audio data via HTTP which is then piped through a chatbot "peer" endpoint.

Currently, I am able to hear the audio received from the HTTP source, but it is garbled, truncated, or otherwise incoherent. My Chatbot endpoint uses Membrane.LiveMixer to mix audio from two sources:

  • An HTTP Source, parsed with Membrane.RawAudioParser to apply timestamps
  • A Silence generator which is Membrane.SilenceGenerator passed through Membrane.RealTimer

Membrane.RawAudioParser allows you to set pts_offset value, which Membrane.LiveMixer uses to select the correct audio to mix from each stream.

The examples for Membrane.RawAudioParser and Membrane.LiveMixer use constant offsets (e.g. "If Source B starts 5 seconds after Source A..."). However, I do not know the offset to set in advance -- it should be the difference between the start of the Chatbot endpoint / silence generator, and the start of the stream from the HTTP source.

Do you have any suggestions as to how to correctly set the pts_offset for Membrane.RawAudioParser when I do not know the offset in advance? Is there a way to set it dynamically (e.g. using start_of_stream events?)

8 responses