View Full Version : [Standards] UPDATED: XEP-0167 (Jingle RTP Sessions)
XMPP Extensions Editor
06-04-2008, 11:54 PM
Version 0.20 of XEP-0167 (Jingle RTP Sessions) has been released.
Abstract: This specification defines a Jingle application type for negotiating a session that uses the Real-time Transport Protocol (RTP) to exchange media such as voice or video. The application type includes a straightforward mapping to Session Description Protocol (SDP) for interworking with SIP media endpoints.
Changelog: In accordance with list consensus, generalized to cover all RTP media, not just audio; corrected text regarding payload types sent by responder in order to match SDP approach. (psa)
Diff: http://is.gd/r20
URL: http://www.xmpp.org/extensions/xep-0167.html
Jeff Muller
06-06-2008, 09:24 PM
Just a quick question:
I didn't quite glean this from the spec and am not sure if it's been
discussed in this forum, but is there a way to associate two streams (or two
<content /> entities)? Typically, for a video "call", there are two streams,
audio and video. You want these two streams associated in the client a) so
that they can be presented in an associated way (camera and speaker controls
near each other), and b) so that they can be associated for lip sync.
Especially if there are two video streams (for example, there's a document
camera), you want to know which is the "main" stream that goes (by default)
in the main window with the audio controls. Or for that matter, if you only
want to allow one video stream, you know which one to do a content-remove
on.
Or, is it to be inferred that for a single session, there can be at most one
entry for each content type, and that any others would be yet another
session (not sure I like that). I have no idea which approach maps better to
SIP.
Also, it seems to me that, although "ringing" and "hold", would typically be
associated with a session, I could see how "mute" would be associated with
individual streams (<content/>). I may be in a voice-video session, but
temporarily want to mute only video, because I need to pick my nose, or
scratch an intimate area, or whatever, and then un-mute again. Otherwise,
how would session-mute be different than session-hold? Perhaps <mute />
could include an optional "name" property which, if present, specified the
name of a particular <content /> entity???
Thanks for listening,
Jeff
"XMPP Extensions Editor" <editor (AT) xmpp (DOT) org> wrote in message
news:E1K40uV-0002v7-00 (AT) apollo (DOT) ..
> Version 0.20 of XEP-0167 (Jingle RTP Sessions) has been released.
>
> Abstract: This specification defines a Jingle application type for
> negotiating a session that uses the Real-time Transport Protocol (RTP) to
> exchange media such as voice or video. The application type includes a
> straightforward mapping to Session Description Protocol (SDP) for
> interworking with SIP media endpoints.
>
> Changelog: In accordance with list consensus, generalized to cover all RTP
> media, not just audio; corrected text regarding payload types sent by
> responder in order to match SDP approach. (psa)
>
> Diff: http://is.gd/r20
>
> URL: http://www.xmpp.org/extensions/xep-0167.html
>
>
Peter Saint-Andre
06-09-2008, 08:31 PM
On 06/06/2008 1:23 PM, Jeff Muller wrote:
> Just a quick question:
>
> I didn't quite glean this from the spec and am not sure if it's been
> discussed in this forum, but is there a way to associate two streams (or
> two <content /> entities)? Typically, for a video "call", there are two
> streams, audio and video. You want these two streams associated in the
> client a) so that they can be presented in an associated way (camera and
> speaker controls near each other), and b) so that they can be associated
> for lip sync. Especially if there are two video streams (for example,
> there's a document camera), you want to know which is the "main" stream
> that goes (by default) in the main window with the audio controls. Or
> for that matter, if you only want to allow one video stream, you know
> which one to do a content-remove on.
Wouldn't the associated media simply be part of the same RTP session? Or
do you want the ability to associate media across RTP sessions?
> Or, is it to be inferred that for a single session, there can be at most
> one entry for each content type, and that any others would be yet
> another session (not sure I like that). I have no idea which approach
> maps better to SIP.
No, I think you can have multiple entries per media type -- for example,
a room pan and a podium view for video from a conference.
> Also, it seems to me that, although "ringing" and "hold", would
> typically be associated with a session, I could see how "mute" would be
> associated with individual streams (<content/>). I may be in a
> voice-video session, but temporarily want to mute only video, because I
> need to pick my nose, or scratch an intimate area, or whatever, and then
> un-mute again. Otherwise, how would session-mute be different than
> session-hold? Perhaps <mute /> could include an optional "name" property
> which, if present, specified the name of a particular <content /> entity???
That makes sense, I'll modify XEP-0167 accordingly.
> Thanks for listening,
No, thank you! ;-)
Peter
--
Peter Saint-Andre
https://stpeter.im/
Jeff Muller
06-09-2008, 10:19 PM
----- Original Message -----
From: "Peter Saint-Andre" <stpeter (AT) stpeter (DOT) im>
Newsgroups: gmane.network.jabber.standards-jig
To: "XMPP Extension Discussion List" <standards (AT) xmpp (DOT) org>
Sent: Monday, June 09, 2008 2:29 PM
Subject: Re: UPDATED: XEP-0167 (Jingle RTP Sessions)
> On 06/06/2008 1:23 PM, Jeff Muller wrote:
>> Just a quick question:
>>
>> I didn't quite glean this from the spec and am not sure if it's been
>> discussed in this forum, but is there a way to associate two streams (or
>> two <content /> entities)? Typically, for a video "call", there are two
>> streams, audio and video. You want these two streams associated in the
>> client a) so that they can be presented in an associated way (camera and
>> speaker controls near each other), and b) so that they can be associated
>> for lip sync. Especially if there are two video streams (for example,
>> there's a document camera), you want to know which is the "main" stream
>> that goes (by default) in the main window with the audio controls. Or
>> for that matter, if you only want to allow one video stream, you know
>> which one to do a content-remove on.
>
> Wouldn't the associated media simply be part of the same RTP session? Or
> do you want the ability to associate media across RTP sessions?
I'm definitely not an RTP expert here. But from a quick web search... Isn't
each multimedia type limited to a separate RTP session? From what I read, a
session really just consists of the port pairs for the (single) RTP and
(single) RTCP streams. Maybe?
>> Or, is it to be inferred that for a single session, there can be at most
>> one entry for each content type, and that any others would be yet
>> another session (not sure I like that). I have no idea which approach
>> maps better to SIP.
>
> No, I think you can have multiple entries per media type -- for example,
> a room pan and a podium view for video from a conference.
That's what I would have hoped/expected. Although that poses another
interesting situation. In your example, either of those streams could be
associated with the audio, as opposed to a completely separate video stream.
So, lets say we combine your example, with also sending a auxiliary
audio/video stream (let's say, we're streaming a local multimedia file
that's a training video). How would we associate the speaker's voice stream
with the two in-room video views, and the training video's audio with the
video? I realize the is quite an elaborate scenario, but at least in terms
of protocol, we should be able to express it.
>> Also, it seems to me that, although "ringing" and "hold", would
>> typically be associated with a session, I could see how "mute" would be
>> associated with individual streams (<content/>). I may be in a
>> voice-video session, but temporarily want to mute only video, because I
>> need to pick my nose, or scratch an intimate area, or whatever, and then
>> un-mute again. Otherwise, how would session-mute be different than
>> session-hold? Perhaps <mute /> could include an optional "name" property
>> which, if present, specified the name of a particular <content />
>> entity???
>
> That makes sense, I'll modify XEP-0167 accordingly.
Coolio!
>> Thanks for listening,
>
> No, thank you! ;-)
>
> Peter
>
>
> --
> Peter Saint-Andre
> https://stpeter.im/
>
>
Jeff Muller
06-09-2008, 10:31 PM
"Peter Saint-Andre" <stpeter (AT) stpeter (DOT) im> wrote in message
news:484D76A6.8020003 (AT) stpeter (DOT) im...
> On 06/06/2008 1:23 PM, Jeff Muller wrote:
>> Just a quick question:
>>
>> I didn't quite glean this from the spec and am not sure if it's been
>> discussed in this forum, but is there a way to associate two streams (or
>> two <content /> entities)? Typically, for a video "call", there are two
>> streams, audio and video. You want these two streams associated in the
>> client a) so that they can be presented in an associated way (camera and
>> speaker controls near each other), and b) so that they can be associated
>> for lip sync. Especially if there are two video streams (for example,
>> there's a document camera), you want to know which is the "main" stream
>> that goes (by default) in the main window with the audio controls. Or
>> for that matter, if you only want to allow one video stream, you know
>> which one to do a content-remove on.
>
> Wouldn't the associated media simply be part of the same RTP session? Or
> do you want the ability to associate media across RTP sessions?
>
>> Or, is it to be inferred that for a single session, there can be at most
>> one entry for each content type, and that any others would be yet
>> another session (not sure I like that). I have no idea which approach
>> maps better to SIP.
>
> No, I think you can have multiple entries per media type -- for example,
> a room pan and a podium view for video from a conference.
>
>> Also, it seems to me that, although "ringing" and "hold", would
>> typically be associated with a session, I could see how "mute" would be
>> associated with individual streams (<content/>). I may be in a
>> voice-video session, but temporarily want to mute only video, because I
>> need to pick my nose, or scratch an intimate area, or whatever, and then
>> un-mute again. Otherwise, how would session-mute be different than
>> session-hold? Perhaps <mute /> could include an optional "name" property
>> which, if present, specified the name of a particular <content />
>> entity???
>
> That makes sense, I'll modify XEP-0167 accordingly.
OK, I hate to be a pest, but...
If individual <content/> streams are able to be muted, we also need a way to
individually "unmute" them. Now, you could also add a "name" attribute to
<active/>, but then <active/> becomes a little overloaded (and unwieldy).
For example, lets say I mute video, then put the call on hold, and then want
to "unhold". In my opinion, video should still be muted. In my mind, "mute"
and "hold" are different enough concepts, that they need independent ways of
un-doing them (although, from a streaming perspective, putting a call on
"hold" would essentially "mute" all channels, but from a user's perspective,
they're different states).
What does everyone else think?
>> Thanks for listening,
>
> No, thank you! ;-)
>
> Peter
>
>
> --
> Peter Saint-Andre
> https://stpeter.im/
>
>
Olivier Crête
06-09-2008, 10:31 PM
On Mon, 2008-06-09 at 16:17 -0400, Jeff Muller wrote:
> > On 06/06/2008 1:23 PM, Jeff Muller wrote:
> >> I didn't quite glean this from the spec and am not sure if it's been
> >> discussed in this forum, but is there a way to associate two streams (or
> >> two <content /> entities)? Typically, for a video "call", there are two
> >> streams, audio and video. You want these two streams associated in the
> >> client a) so that they can be presented in an associated way (camera and
> >> speaker controls near each other), and b) so that they can be associated
> >> for lip sync. Especially if there are two video streams (for example,
> >> there's a document camera), you want to know which is the "main" stream
> >> that goes (by default) in the main window with the audio controls. Or
> >> for that matter, if you only want to allow one video stream, you know
> >> which one to do a content-remove on.
> >
> > Wouldn't the associated media simply be part of the same RTP session? Or
> > do you want the ability to associate media across RTP sessions?
>
> I'm definitely not an RTP expert here. But from a quick web search... Isn't
> each multimedia type limited to a separate RTP session? From what I read, a
> session really just consists of the port pairs for the (single) RTP and
> (single) RTCP streams. Maybe?
You definitely want to be able to associate multiple RTP sessions to
synchronize them. We should define that all the sessions within the same
Jingle negotiation should be synchronized.
All the RTP sessions (call media aka m= lines) inside the same SDP are
supposed to be synchronized too.
> >> Or, is it to be inferred that for a single session, there can be at most
> >> one entry for each content type, and that any others would be yet
> >> another session (not sure I like that). I have no idea which approach
> >> maps better to SIP.
> >
> > No, I think you can have multiple entries per media type -- for example,
> > a room pan and a podium view for video from a conference.
>
> That's what I would have hoped/expected. Although that poses another
> interesting situation. In your example, either of those streams could be
> associated with the audio, as opposed to a completely separate video stream.
> So, lets say we combine your example, with also sending a auxiliary
> audio/video stream (let's say, we're streaming a local multimedia file
> that's a training video). How would we associate the speaker's voice stream
> with the two in-room video views, and the training video's audio with the
> video? I realize the is quite an elaborate scenario, but at least in terms
> of protocol, we should be able to express it.
Imho, if you have multiple audio/video inputs, they should all be
synchronized. If you want to have "semantic" associations, then you
should probably put something meaningful in the "name" attribute and
have the application do the magic.
--
Olivier Crête
olivier.crete (AT) collabora (DOT) co.uk
Collabora Ltd
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
iEYEABECAAYFAkhNks8ACgkQHTiOWk7ZorsCGQCeNax6CtTP6z p74qNKY5ZFGF+s
ez0AnjzLdqqwwcKs5K6MqgMLfn1LrmBi
=T2vm
-----END PGP SIGNATURE-----
Jeff Muller
06-09-2008, 10:36 PM
"Olivier Crête" <olivier.crete (AT) collabora (DOT) co.uk> wrote in message
news:1213043408.21627.11.camel (AT) TesterTop3 (DOT) tester.ca...
[snip]
>> That's what I would have hoped/expected. Although that poses another
>> interesting situation. In your example, either of those streams could be
>> associated with the audio, as opposed to a completely separate video
>> stream.
>> So, lets say we combine your example, with also sending a auxiliary
>> audio/video stream (let's say, we're streaming a local multimedia file
>> that's a training video). How would we associate the speaker's voice
>> stream
>> with the two in-room video views, and the training video's audio with the
>> video? I realize the is quite an elaborate scenario, but at least in
>> terms
>> of protocol, we should be able to express it.
>
>Imho, if you have multiple audio/video inputs, they should all be
>synchronized. If you want to have "semantic" associations, then you
>should probably put something meaningful in the "name" attribute and
>have the application do the magic.
Sounds reasonable. Thanks.
--
Olivier Crête
olivier.crete (AT) collabora (DOT) co.uk
Collabora Ltd
Peter Saint-Andre
06-09-2008, 11:57 PM
On 06/09/2008 2:29 PM, Jeff Muller wrote:
>
> "Peter Saint-Andre" <stpeter (AT) stpeter (DOT) im> wrote in message
> news:484D76A6.8020003 (AT) stpeter (DOT) im...
>> On 06/06/2008 1:23 PM, Jeff Muller wrote:
>>> Just a quick question:
>>>
>>> I didn't quite glean this from the spec and am not sure if it's been
>>> discussed in this forum, but is there a way to associate two streams (or
>>> two <content /> entities)? Typically, for a video "call", there are two
>>> streams, audio and video. You want these two streams associated in the
>>> client a) so that they can be presented in an associated way (camera and
>>> speaker controls near each other), and b) so that they can be associated
>>> for lip sync. Especially if there are two video streams (for example,
>>> there's a document camera), you want to know which is the "main" stream
>>> that goes (by default) in the main window with the audio controls. Or
>>> for that matter, if you only want to allow one video stream, you know
>>> which one to do a content-remove on.
>>
>> Wouldn't the associated media simply be part of the same RTP session? Or
>> do you want the ability to associate media across RTP sessions?
>>
>>> Or, is it to be inferred that for a single session, there can be at most
>>> one entry for each content type, and that any others would be yet
>>> another session (not sure I like that). I have no idea which approach
>>> maps better to SIP.
>>
>> No, I think you can have multiple entries per media type -- for example,
>> a room pan and a podium view for video from a conference.
>>
>>> Also, it seems to me that, although "ringing" and "hold", would
>>> typically be associated with a session, I could see how "mute" would be
>>> associated with individual streams (<content/>). I may be in a
>>> voice-video session, but temporarily want to mute only video, because I
>>> need to pick my nose, or scratch an intimate area, or whatever, and then
>>> un-mute again. Otherwise, how would session-mute be different than
>>> session-hold? Perhaps <mute /> could include an optional "name" property
>>> which, if present, specified the name of a particular <content />
>>> entity???
>>
>> That makes sense, I'll modify XEP-0167 accordingly.
>
> OK, I hate to be a pest, but...
> If individual <content/> streams are able to be muted, we also need a
> way to individually "unmute" them. Now, you could also add a "name"
> attribute to <active/>, but then <active/> becomes a little overloaded
> (and unwieldy). For example, lets say I mute video, then put the call on
> hold, and then want to "unhold". In my opinion, video should still be
> muted. In my mind, "mute" and "hold" are different enough concepts, that
> they need independent ways of un-doing them (although, from a streaming
> perspective, putting a call on "hold" would essentially "mute" all
> channels, but from a user's perspective, they're different states).
Naturally, that is quite sensible. I'll update the spec yet again. ;-)
Peter
--
Peter Saint-Andre
https://stpeter.im/
Peter Saint-Andre
06-10-2008, 12:04 AM
On 06/09/2008 2:30 PM, Olivier Crête wrote:
> On Mon, 2008-06-09 at 16:17 -0400, Jeff Muller wrote:
>>> On 06/06/2008 1:23 PM, Jeff Muller wrote:
>>>> I didn't quite glean this from the spec and am not sure if it's been
>>>> discussed in this forum, but is there a way to associate two streams (or
>>>> two <content /> entities)? Typically, for a video "call", there are two
>>>> streams, audio and video. You want these two streams associated in the
>>>> client a) so that they can be presented in an associated way (camera and
>>>> speaker controls near each other), and b) so that they can be associated
>>>> for lip sync. Especially if there are two video streams (for example,
>>>> there's a document camera), you want to know which is the "main" stream
>>>> that goes (by default) in the main window with the audio controls. Or
>>>> for that matter, if you only want to allow one video stream, you know
>>>> which one to do a content-remove on.
>>> Wouldn't the associated media simply be part of the same RTP session? Or
>>> do you want the ability to associate media across RTP sessions?
>> I'm definitely not an RTP expert here. But from a quick web search... Isn't
>> each multimedia type limited to a separate RTP session? From what I read, a
>> session really just consists of the port pairs for the (single) RTP and
>> (single) RTCP streams. Maybe?
>
> You definitely want to be able to associate multiple RTP sessions to
> synchronize them. We should define that all the sessions within the same
> Jingle negotiation should be synchronized.
>
> All the RTP sessions (call media aka m= lines) inside the same SDP are
> supposed to be synchronized too.
So what is the right term for a synchronized set of RTP sessions (e.g.,
the audio and video sessions from Section 9.3 of XEP-0167)?
Peter
--
Peter Saint-Andre
https://stpeter.im/
Olivier Crête
06-10-2008, 12:17 AM
On Mon, 2008-06-09 at 16:02 -0600, Peter Saint-Andre wrote:
> On 06/09/2008 2:30 PM, Olivier Crête wrote:
> > On Mon, 2008-06-09 at 16:17 -0400, Jeff Muller wrote:
> >>> On 06/06/2008 1:23 PM, Jeff Muller wrote:
> >>>> I didn't quite glean this from the spec and am not sure if it's been
> >>>> discussed in this forum, but is there a way to associate two streams (or
> >>>> two <content /> entities)? Typically, for a video "call", there are two
> >>>> streams, audio and video. You want these two streams associated in the
> >>>> client a) so that they can be presented in an associated way (camera and
> >>>> speaker controls near each other), and b) so that they can be associated
> >>>> for lip sync. Especially if there are two video streams (for example,
> >>>> there's a document camera), you want to know which is the "main" stream
> >>>> that goes (by default) in the main window with the audio controls. Or
> >>>> for that matter, if you only want to allow one video stream, you know
> >>>> which one to do a content-remove on.
> >>> Wouldn't the associated media simply be part of the same RTP session? Or
> >>> do you want the ability to associate media across RTP sessions?
> >> I'm definitely not an RTP expert here. But from a quick web search... Isn't
> >> each multimedia type limited to a separate RTP session? From what I read, a
> >> session really just consists of the port pairs for the (single) RTP and
> >> (single) RTCP streams. Maybe?
> >
> > You definitely want to be able to associate multiple RTP sessions to
> > synchronize them. We should define that all the sessions within the same
> > Jingle negotiation should be synchronized.
> >
> > All the RTP sessions (call media aka m= lines) inside the same SDP are
> > supposed to be synchronized too.
>
> So what is the right term for a synchronized set of RTP sessions (e.g.,
> the audio and video sessions from Section 9.3 of XEP-0167)?
There does not seem to be a standard name for the set of synchronized
RTP sessions. In SIP, they call it a SIP session (how confusing can that
be). In Farsight2, we call it a conference (but it may not be the
greatest name). I think you can just write something like "all RTP
sessions defined in the same Jingle channel should be synchronized" or
something to that effect.
--
Olivier Crête
olivier.crete (AT) collabora (DOT) co.uk
Collabora Ltd
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
iEYEABECAAYFAkhNq7EACgkQHTiOWk7ZorstQQCeOGF1Ei6wr7 JFx/5x2kFr2fBj
BBAAn1umT3KyA4bVPKPp0P8h8ekjvlYy
=Mk9B
-----END PGP SIGNATURE-----
Peter Saint-Andre
06-10-2008, 12:19 AM
On 06/09/2008 4:16 PM, Olivier Crête wrote:
> On Mon, 2008-06-09 at 16:02 -0600, Peter Saint-Andre wrote:
>> On 06/09/2008 2:30 PM, Olivier Crête wrote:
>>> On Mon, 2008-06-09 at 16:17 -0400, Jeff Muller wrote:
>>>>> On 06/06/2008 1:23 PM, Jeff Muller wrote:
>>>>>> I didn't quite glean this from the spec and am not sure if it's been
>>>>>> discussed in this forum, but is there a way to associate two streams (or
>>>>>> two <content /> entities)? Typically, for a video "call", there are two
>>>>>> streams, audio and video. You want these two streams associated in the
>>>>>> client a) so that they can be presented in an associated way (camera and
>>>>>> speaker controls near each other), and b) so that they can be associated
>>>>>> for lip sync. Especially if there are two video streams (for example,
>>>>>> there's a document camera), you want to know which is the "main" stream
>>>>>> that goes (by default) in the main window with the audio controls. Or
>>>>>> for that matter, if you only want to allow one video stream, you know
>>>>>> which one to do a content-remove on.
>>>>> Wouldn't the associated media simply be part of the same RTP session? Or
>>>>> do you want the ability to associate media across RTP sessions?
>>>> I'm definitely not an RTP expert here. But from a quick web search... Isn't
>>>> each multimedia type limited to a separate RTP session? From what I read, a
>>>> session really just consists of the port pairs for the (single) RTP and
>>>> (single) RTCP streams. Maybe?
>>> You definitely want to be able to associate multiple RTP sessions to
>>> synchronize them. We should define that all the sessions within the same
>>> Jingle negotiation should be synchronized.
>>>
>>> All the RTP sessions (call media aka m= lines) inside the same SDP are
>>> supposed to be synchronized too.
>> So what is the right term for a synchronized set of RTP sessions (e.g.,
>> the audio and video sessions from Section 9.3 of XEP-0167)?
>
> There does not seem to be a standard name for the set of synchronized
> RTP sessions. In SIP, they call it a SIP session (how confusing can that
> be). In Farsight2, we call it a conference (but it may not be the
> greatest name). I think you can just write something like "all RTP
> sessions defined in the same Jingle channel should be synchronized" or
> something to that effect.
Right now I have this:
***
A Jingle negotiation MAY result in the establishment of multiple RTP
sessions (e.g., one for audio and one for video). An application SHOULD
consider all of the RTP sessions that are established via the same
Jingle negotiation to be synchronized for purposes of streaming,
playback, recording, etc.
***
Perhaps it's not a good idea to include the text about purposes...
Peter
--
Peter Saint-Andre
https://stpeter.im/
Olivier Crête
06-10-2008, 12:22 AM
On Mon, 2008-06-09 at 16:17 -0600, Peter Saint-Andre wrote:
> On 06/09/2008 4:16 PM, Olivier Crête wrote:
> > On Mon, 2008-06-09 at 16:02 -0600, Peter Saint-Andre wrote:
> >> On 06/09/2008 2:30 PM, Olivier Crête wrote:
> >>> On Mon, 2008-06-09 at 16:17 -0400, Jeff Muller wrote:
> >>>>> On 06/06/2008 1:23 PM, Jeff Muller wrote:
> >>>>>> I didn't quite glean this from the spec and am not sure if it's been
> >>>>>> discussed in this forum, but is there a way to associate two streams (or
> >>>>>> two <content /> entities)? Typically, for a video "call", there are two
> >>>>>> streams, audio and video. You want these two streams associated in the
> >>>>>> client a) so that they can be presented in an associated way (camera and
> >>>>>> speaker controls near each other), and b) so that they can be associated
> >>>>>> for lip sync. Especially if there are two video streams (for example,
> >>>>>> there's a document camera), you want to know which is the "main" stream
> >>>>>> that goes (by default) in the main window with the audio controls. Or
> >>>>>> for that matter, if you only want to allow one video stream, you know
> >>>>>> which one to do a content-remove on.
> >>>>> Wouldn't the associated media simply be part of the same RTP session? Or
> >>>>> do you want the ability to associate media across RTP sessions?
> >>>> I'm definitely not an RTP expert here. But from a quick web search.... Isn't
> >>>> each multimedia type limited to a separate RTP session? From what I read, a
> >>>> session really just consists of the port pairs for the (single) RTP and
> >>>> (single) RTCP streams. Maybe?
> >>> You definitely want to be able to associate multiple RTP sessions to
> >>> synchronize them. We should define that all the sessions within the same
> >>> Jingle negotiation should be synchronized.
> >>>
> >>> All the RTP sessions (call media aka m= lines) inside the same SDP are
> >>> supposed to be synchronized too.
> >> So what is the right term for a synchronized set of RTP sessions (e.g.,
> >> the audio and video sessions from Section 9.3 of XEP-0167)?
> >
> > There does not seem to be a standard name for the set of synchronized
> > RTP sessions. In SIP, they call it a SIP session (how confusing can that
> > be). In Farsight2, we call it a conference (but it may not be the
> > greatest name). I think you can just write something like "all RTP
> > sessions defined in the same Jingle channel should be synchronized" or
> > something to that effect.
>
> Right now I have this:
>
> ***
>
> A Jingle negotiation MAY result in the establishment of multiple RTP
> sessions (e.g., one for audio and one for video). An application SHOULD
> consider all of the RTP sessions that are established via the same
> Jingle negotiation to be synchronized for purposes of streaming,
> playback, recording, etc.
>
> ***
>
> Perhaps it's not a good idea to include the text about purposes...
That seems good to me, with or without the purposes
--
Olivier Crête
olivier.crete (AT) collabora (DOT) co.uk
Collabora Ltd
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
iEYEABECAAYFAkhNrM4ACgkQHTiOWk7Zort0vQCeNOxJ4/RfTjM3JPaDrll5ODn8
5FIAnjwd+SYv3QEL0ayAG4fsqUQtDhpC
=pGFq
-----END PGP SIGNATURE-----
Jeff Muller
06-10-2008, 08:37 PM
"XMPP Extensions Editor" <editor (AT) xmpp (DOT) org> wrote in message
news:E1K40uV-0002v7-00 (AT) apollo (DOT) ..
> Version 0.20 of XEP-0167 (Jingle RTP Sessions) has been released.
>
> Abstract: This specification defines a Jingle application type for
> negotiating a session that uses the Real-time Transport Protocol (RTP) to
> exchange media such as voice or video. The application type includes a
> straightforward mapping to Session Description Protocol (SDP) for
> interworking with SIP media endpoints.
>
> Changelog: In accordance with list consensus, generalized to cover all RTP
> media, not just audio; corrected text regarding payload types sent by
> responder in order to match SDP approach. (psa)
>
> Diff: http://is.gd/r20
>
> URL: http://www.xmpp.org/extensions/xep-0167.html
>
>
At the end of Section 5, ir mentions "directionality", "sendonly",
"recvonly", and "sendrecv", but these keywords and concepts aren't discussed
anywhere else in the document. Is this incomplete, a cut-n-paste error, or
something else?
I'm hoping there's more to this, as I was wondering how one would negotiate
a single-direction stream, such as a webcam.
Jeff
Peter Saint-Andre
06-10-2008, 08:42 PM
On 06/10/2008 12:36 PM, Jeff Muller wrote:
>
> "XMPP Extensions Editor" <editor (AT) xmpp (DOT) org> wrote in message
> news:E1K40uV-0002v7-00 (AT) apollo (DOT) ..
>> Version 0.20 of XEP-0167 (Jingle RTP Sessions) has been released.
>>
>> Abstract: This specification defines a Jingle application type for
>> negotiating a session that uses the Real-time Transport Protocol (RTP)
>> to exchange media such as voice or video. The application type
>> includes a straightforward mapping to Session Description Protocol
>> (SDP) for interworking with SIP media endpoints.
>>
>> Changelog: In accordance with list consensus, generalized to cover all
>> RTP media, not just audio; corrected text regarding payload types sent
>> by responder in order to match SDP approach. (psa)
>>
>> Diff: http://is.gd/r20
>>
>> URL: http://www.xmpp.org/extensions/xep-0167.html
>>
>>
>
> At the end of Section 5, ir mentions "directionality", "sendonly",
> "recvonly", and "sendrecv", but these keywords and concepts aren't
> discussed anywhere else in the document. Is this incomplete, a
> cut-n-paste error, or something else?
>
> I'm hoping there's more to this, as I was wondering how one would
> negotiate a single-direction stream, such as a webcam.
Oops, I used the wrong terms. What we want is defined in XEP-0166 via
the 'senders' attribute, which can have a value of "initiator",
"responder", or "both", as specified here:
http://www.xmpp.org/extensions/xep-0166.html#def-content
I'll correct the error.
Peter
--
Peter Saint-Andre
https://stpeter.im/
Peter Saint-Andre
06-10-2008, 08:48 PM
On 06/10/2008 12:40 PM, Peter Saint-Andre wrote:
> On 06/10/2008 12:36 PM, Jeff Muller wrote:
>> "XMPP Extensions Editor" <editor (AT) xmpp (DOT) org> wrote in message
>> news:E1K40uV-0002v7-00 (AT) apollo (DOT) ..
>>> Version 0.20 of XEP-0167 (Jingle RTP Sessions) has been released.
>>>
>>> Abstract: This specification defines a Jingle application type for
>>> negotiating a session that uses the Real-time Transport Protocol (RTP)
>>> to exchange media such as voice or video. The application type
>>> includes a straightforward mapping to Session Description Protocol
>>> (SDP) for interworking with SIP media endpoints.
>>>
>>> Changelog: In accordance with list consensus, generalized to cover all
>>> RTP media, not just audio; corrected text regarding payload types sent
>>> by responder in order to match SDP approach. (psa)
>>>
>>> Diff: http://is.gd/r20
>>>
>>> URL: http://www.xmpp.org/extensions/xep-0167.html
>>>
>>>
>> At the end of Section 5, ir mentions "directionality", "sendonly",
>> "recvonly", and "sendrecv", but these keywords and concepts aren't
>> discussed anywhere else in the document. Is this incomplete, a
>> cut-n-paste error, or something else?
>>
>> I'm hoping there's more to this, as I was wondering how one would
>> negotiate a single-direction stream, such as a webcam.
>
> Oops, I used the wrong terms. What we want is defined in XEP-0166 via
> the 'senders' attribute, which can have a value of "initiator",
> "responder", or "both", as specified here:
>
> http://www.xmpp.org/extensions/xep-0166.html#def-content
>
> I'll correct the error.
Done:
http://is.gd/uxG
/psa
Olivier Crête
06-15-2008, 12:24 AM
On Mon, 2008-06-09 at 18:21 -0400, Olivier Crête wrote:
> On Mon, 2008-06-09 at 16:17 -0600, Peter Saint-Andre wrote:
> > On 06/09/2008 4:16 PM, Olivier Crête wrote:
> > > On Mon, 2008-06-09 at 16:02 -0600, Peter Saint-Andre wrote:
> > >> On 06/09/2008 2:30 PM, Olivier Crête wrote:
> > >>> On Mon, 2008-06-09 at 16:17 -0400, Jeff Muller wrote:
> > >>>>> On 06/06/2008 1:23 PM, Jeff Muller wrote:
> > >>>>>> I didn't quite glean this from the spec and am not sure if it's been
> > >>>>>> discussed in this forum, but is there a way to associate two streams (or
> > >>>>>> two <content /> entities)? Typically, for a video "call", there are two
> > >>>>>> streams, audio and video. You want these two streams associated in the
> > >>>>>> client a) so that they can be presented in an associated way (camera and
> > >>>>>> speaker controls near each other), and b) so that they can be associated
> > >>>>>> for lip sync. Especially if there are two video streams (for example,
> > >>>>>> there's a document camera), you want to know which is the "main" stream
> > >>>>>> that goes (by default) in the main window with the audio controls. Or
> > >>>>>> for that matter, if you only want to allow one video stream, you know
> > >>>>>> which one to do a content-remove on.
> > >>>>> Wouldn't the associated media simply be part of the same RTP session? Or
> > >>>>> do you want the ability to associate media across RTP sessions?
> > >>>> I'm definitely not an RTP expert here. But from a quick web search.... Isn't
> > >>>> each multimedia type limited to a separate RTP session? From what I read, a
> > >>>> session really just consists of the port pairs for the (single) RTP and
> > >>>> (single) RTCP streams. Maybe?
> > >>> You definitely want to be able to associate multiple RTP sessions to
> > >>> synchronize them. We should define that all the sessions within the same
> > >>> Jingle negotiation should be synchronized.
> > >>>
> > >>> All the RTP sessions (call media aka m= lines) inside the same SDP are
> > >>> supposed to be synchronized too.
> > >> So what is the right term for a synchronized set of RTP sessions (e.g.,
> > >> the audio and video sessions from Section 9.3 of XEP-0167)?
> > >
> > > There does not seem to be a standard name for the set of synchronized
> > > RTP sessions. In SIP, they call it a SIP session (how confusing can that
> > > be). In Farsight2, we call it a conference (but it may not be the
> > > greatest name). I think you can just write something like "all RTP
> > > sessions defined in the same Jingle channel should be synchronized" or
> > > something to that effect.
> >
> > Right now I have this:
> >
> > ***
> >
> > A Jingle negotiation MAY result in the establishment of multiple RTP
> > sessions (e.g., one for audio and one for video). An application SHOULD
> > consider all of the RTP sessions that are established via the same
> > Jingle negotiation to be synchronized for purposes of streaming,
> > playback, recording, etc.
> >
> > ***
> >
> > Perhaps it's not a good idea to include the text about purposes...
>
> That seems good to me, with or without the purposes
I don't know if we want to go this way.. but a new IETF draft was just
published to add a way to explicitly state the grouping to
synchronization purposes.
URL:
http://www.ietf.org/internet-drafts/draft-ietf-mmusic-rfc3388bis-00.txt
--
Olivier Crête
olivier.crete (AT) collabora (DOT) co.uk
Collabora Ltd
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.9 (GNU/Linux)
iEYEABECAAYFAkhURNAACgkQHTiOWk7ZorvkewCeMTMT/+SmPyAuclDD2MD+aPil
BSAAn1Dz4e6md3v/4gDORIRx41+3A/9O
=uYY/
-----END PGP SIGNATURE-----
Unnikrishnan V
06-15-2008, 08:24 AM
Still my 2 cents to use SDP instead of xmlized SDP in jingle and avoid
running behind all changes.
Let jingle do session management and not session description.
thanx
unni
On Sat, Jun 14, 2008 at 3:23 PM, Olivier Crête <
olivier.crete (AT) collabora (DOT) co.uk> wrote:
>
> On Mon, 2008-06-09 at 18:21 -0400, Olivier Crête wrote:
> > On Mon, 2008-06-09 at 16:17 -0600, Peter Saint-Andre wrote:
> > > On 06/09/2008 4:16 PM, Olivier Crête wrote:
> > > > On Mon, 2008-06-09 at 16:02 -0600, Peter Saint-Andre wrote:
> > > >> On 06/09/2008 2:30 PM, Olivier Crête wrote:
> > > >>> On Mon, 2008-06-09 at 16:17 -0400, Jeff Muller wrote:
> > > >>>>> On 06/06/2008 1:23 PM, Jeff Muller wrote:
> > > >>>>>> I didn't quite glean this from the spec and am not sure if it's
> been
> > > >>>>>> discussed in this forum, but is there a way to associate two
> streams (or
> > > >>>>>> two <content /> entities)? Typically, for a video "call", there
> are two
> > > >>>>>> streams, audio and video. You want these two streams associated
> in the
> > > >>>>>> client a) so that they can be presented in an associated way
> (camera and
> > > >>>>>> speaker controls near each other), and b) so that they can be
> associated
> > > >>>>>> for lip sync. Especially if there are two video streams (for
> example,
> > > >>>>>> there's a document camera), you want to know which is the "main"
> stream
> > > >>>>>> that goes (by default) in the main window with the audio
> controls. Or
> > > >>>>>> for that matter, if you only want to allow one video stream, you
> know
> > > >>>>>> which one to do a content-remove on.
> > > >>>>> Wouldn't the associated media simply be part of the same RTP
> session? Or
> > > >>>>> do you want the ability to associate media across RTP sessions?
> > > >>>> I'm definitely not an RTP expert here. But from a quick web
> search... Isn't
> > > >>>> each multimedia type limited to a separate RTP session? From what
> I read, a
> > > >>>> session really just consists of the port pairs for the (single)
> RTP and
> > > >>>> (single) RTCP streams. Maybe?
> > > >>> You definitely want to be able to associate multiple RTP sessions
> to
> > > >>> synchronize them. We should define that all the sessions within the
> same
> > > >>> Jingle negotiation should be synchronized.
> > > >>>
> > > >>> All the RTP sessions (call media aka m= lines) inside the same SDP
> are
> > > >>> supposed to be synchronized too.
> > > >> So what is the right term for a synchronized set of RTP sessions
> (e.g.,
> > > >> the audio and video sessions from Section 9.3 of XEP-0167)?
> > > >
> > > > There does not seem to be a standard name for the set of synchronized
> > > > RTP sessions. In SIP, they call it a SIP session (how confusing can
> that
> > > > be). In Farsight2, we call it a conference (but it may not be the
> > > > greatest name). I think you can just write something like "all RTP
> > > > sessions defined in the same Jingle channel should be synchronized"
> or
> > > > something to that effect.
> > >
> > > Right now I have this:
> > >
> > > ***
> > >
> > > A Jingle negotiation MAY result in the establishment of multiple RTP
> > > sessions (e.g., one for audio and one for video). An application SHOULD
> > > consider all of the RTP sessions that are established via the same
> > > Jingle negotiation to be synchronized for purposes of streaming,
> > > playback, recording, etc.
> > >
> > > ***
> > >
> > > Perhaps it's not a good idea to include the text about purposes...
> >
> > That seems good to me, with or without the purposes
>
> I don't know if we want to go this way.. but a new IETF draft was just
> published to add a way to explicitly state the grouping to
> synchronization purposes.
>
> URL:
> http://www.ietf.org/internet-drafts/draft-ietf-mmusic-rfc3388bis-00.txt
>
> --
> Olivier Crête
> olivier.crete (AT) collabora (DOT) co.uk
> Collabora Ltd
>
Peter Saint-Andre
06-16-2008, 05:40 PM
On 06/15/2008 12:22 AM, Unnikrishnan V wrote:
> Still my 2 cents to use SDP instead of xmlized SDP in jingle and avoid
> running behind all changes.
>
> Let jingle do session management and not session description.
I think that ship sailed long ago.
Peter
--
Peter Saint-Andre
https://stpeter.im/
Jeff Muller
08-06-2008, 03:24 PM
"Peter Saint-Andre" <stpeter (AT) stpeter (DOT) im> wrote in message
news:484DA6CA.8030404 (AT) stpeter (DOT) im...
> On 06/09/2008 2:29 PM, Jeff Muller wrote:
>>
>> "Peter Saint-Andre" <stpeter (AT) stpeter (DOT) im> wrote in message
>> news:484D76A6.8020003 (AT) stpeter (DOT) im...
>>> On 06/06/2008 1:23 PM, Jeff Muller wrote:
>>>> Just a quick question:
>>>>
>>>> I didn't quite glean this from the spec and am not sure if it's been
>>>> discussed in this forum, but is there a way to associate two streams
>>>> (or
>>>> two <content /> entities)? Typically, for a video "call", there are two
>>>> streams, audio and video. You want these two streams associated in the
>>>> client a) so that they can be presented in an associated way (camera
>>>> and
>>>> speaker controls near each other), and b) so that they can be
>>>> associated
>>>> for lip sync. Especially if there are two video streams (for example,
>>>> there's a document camera), you want to know which is the "main" stream
>>>> that goes (by default) in the main window with the audio controls. Or
>>>> for that matter, if you only want to allow one video stream, you know
>>>> which one to do a content-remove on.
>>>
>>> Wouldn't the associated media simply be part of the same RTP session? Or
>>> do you want the ability to associate media across RTP sessions?
>>>
>>>> Or, is it to be inferred that for a single session, there can be at
>>>> most
>>>> one entry for each content type, and that any others would be yet
>>>> another session (not sure I like that). I have no idea which approach
>>>> maps better to SIP.
>>>
>>> No, I think you can have multiple entries per media type -- for example,
>>> a room pan and a podium view for video from a conference.
>>>
>>>> Also, it seems to me that, although "ringing" and "hold", would
>>>> typically be associated with a session, I could see how "mute" would be
>>>> associated with individual streams (<content/>). I may be in a
>>>> voice-video session, but temporarily want to mute only video, because I
>>>> need to pick my nose, or scratch an intimate area, or whatever, and
>>>> then
>>>> un-mute again. Otherwise, how would session-mute be different than
>>>> session-hold? Perhaps <mute /> could include an optional "name"
>>>> property
>>>> which, if present, specified the name of a particular <content />
>>>> entity???
>>>
>>> That makes sense, I'll modify XEP-0167 accordingly.
>>
>> OK, I hate to be a pest, but...
>> If individual <content/> streams are able to be muted, we also need a
>> way to individually "unmute" them. Now, you could also add a "name"
>> attribute to <active/>, but then <active/> becomes a little overloaded
>> (and unwieldy). For example, lets say I mute video, then put the call on
>> hold, and then want to "unhold". In my opinion, video should still be
>> muted. In my mind, "mute" and "hold" are different enough concepts, that
>> they need independent ways of un-doing them (although, from a streaming
>> perspective, putting a call on "hold" would essentially "mute" all
>> channels, but from a user's perspective, they're different states).
>
> Naturally, that is quite sensible. I'll update the spec yet again. ;-)
>
Peter, in the text for the <active/> element, it says "If no 'name'
attribute is included, the recipient MUST assume that all sessions are
active". There is also similar text for the other informational messages.
Instead of "all sessions", shouldn't it be "all content streams for the
session", or some other appropriately worded text?
Jeff
> Peter
>
> --
> Peter Saint-Andre
> https://stpeter.im/
>
>
vBulletin® v3.8.0 Release Candidate 2, Copyright ©2000-2009, Jelsoft Enterprises Ltd.