View Full Version : [Standards] Pub/Sub & RSS
Kelly S
05-22-2008, 02:28 AM
Hi all. I am just getting back into XMPP after a few years away from it and I am digging into the Pub/Sub XEP. I am looking for a bit of feedback. I am writting the service which will pull data from RSS and "publish" it via Jabber. However I need a way to relate the unique RSS url to the XMPP pub/sub name so that I know where to store these every time. In the leaf node would I place the RSS url in the <item node="">? This doesn't sound too intuitive because I imagine some Pub/Sub "browsers" use this as the node name, is this correct? This would look ugly if you had http://www.site.com/rss.xml and 100's of others in there lol. I don't think the nodename necessarily needs to be the unique identifier. I'm sure I am just understanding wrong :) Any clarification would be great! Thanks!
__________________________________________________ _______________
Try Chicktionary, a game that tests how many words you can form from the letters given. Find this and more puzzles at Live Search Games!
http://g.msn.ca/ca55/207
Peter Saint-Andre
05-22-2008, 05:22 AM
On 05/21/2008 6:26 PM, Kelly S wrote:
>
> Hi all. I am just getting back into XMPP after a few years away from
> it and I am digging into the Pub/Sub XEP. I am looking for a bit of
> feedback. I am writting the service which will pull data from RSS and
> "publish" it via Jabber. However I need a way to relate the unique
> RSS url to the XMPP pub/sub name so that I know where to store these
> every time. In the leaf node would I place the RSS url in the <item
> node="">? This doesn't sound too intuitive because I imagine some
> Pub/Sub "browsers" use this as the node name, is this correct? This
> would look ugly if you had http://www.site.com/rss.xml and 100's of
> others in there lol. I don't think the nodename necessarily needs to
> be the unique identifier. I'm sure I am just understanding wrong :)
> Any clarification would be great! Thanks!
The NodeID are opaque. You can make them whatever you want. If it's
easiest for you to make them the same as your RSS feed URLs, go for it.
Just curious: what flavor of RSS are you using? Are you indeed sending
RSS over XMPP, or are you sending Atom instead?
Peter
--
Peter Saint-Andre
https://stpeter.im/
Kelly S
05-22-2008, 06:11 AM
Thanks for the reply!
I took a look at what your saying about NodeID and I understand a bit more clearly now. Wow the Pub/Sub spec is large!
Anyways. I'm thinking of writing a service where users can "subscribe" to feeds off the web. A service will be monitoring all "feeds" and pulling the RSS/Atom/whatever off of the web and populating the Pub/Sub nodes in batch so users get notifications.
So really the service will create the new node to represent the news feed if its not already found in the Pub/Sub query and then add it to its "queue" of feeds to poll.
I'm not quite sure what XEP to use to allow a user to "request" pulling of a feed. I haven't figured that part out yet.
I'm also not sure if I want to mix Atom nodes with RSS nodes etc. I could create 1 format to use for the "entry" and transform them to all match but for sites which add extra metadata to entries such as Digg with its DiggCount I would like to maintain that. That is the beauty of XML :)
That way Jabber clients who understand extensible items can display them.
I like the idea of the NodeID being the feeds url because then this "polling" service can use the feeds URL easily as the SET after downloading content.
Any suggestions / recommendations would be great to hear!Another item I am unclear about is as I am polling for news data, how can I easily check if an "entry" exists already? I'd rather not have to keep a cache somewhere of all the items I have created already.
Although performance is going to suck if I have to check every entry before inserting. Is there a way to batch insert, and disallow duplicate entries based on *something* like entry title or something?
It's nice to be back in Jabber land :) I'm back into the old mindset where I have a ZILLION ideas rushing into my head about all the crazy things I can do with these XEPs lol.
Any help would be great!
Thanks so much!
> Date: Wed, 21 May 2008 21:11:43 -0600> From: stpeter (AT) stpeter (DOT) im> To: standards (AT) xmpp (DOT) org> Subject: Re: [Standards] Pub/Sub & RSS> > On 05/21/2008 6:26 PM, Kelly S wrote:> > > > Hi all. I am just getting back into XMPP after a few years away from> > it and I am digging into the Pub/Sub XEP. I am looking for a bit of> > feedback. I am writting the service which will pull data from RSS and> > "publish" it via Jabber. However I need a way to relate the unique> > RSS url to the XMPP pub/sub name so that I know where to store these> > every time. In the leaf node would I place the RSS url in the <item> > node="">? This doesn't sound too intuitive because I imagine some> > Pub/Sub "browsers" use this as the node name, is this correct? This> > would look ugly if you had http://www.site.com/rss.xml and 100's of> > others in there lol. I don't think the nodename necessarily needs to> > be the unique identifier. I'm sure I am just understanding wrong :)> > Any clarification would be great! Thanks! > > The NodeID are opaque. You can make them whatever you want. If it's> easiest for you to make them the same as your RSS feed URLs, go for it.> > Just curious: what flavor of RSS are you using? Are you indeed sending> RSS over XMPP, or are you sending Atom instead?> > Peter> > -- > Peter Saint-Andre> https://stpeter.im/>
__________________________________________________ _______________
Try Chicktionary, a game that tests how many words you can form from the letters given. Find this and more puzzles at Live Search Games!
http://g.msn.ca/ca55/207
Peter Saint-Andre
05-22-2008, 06:56 PM
On 05/21/2008 10:09 PM, Kelly S wrote:
> Thanks for the reply!
>
> I took a look at what your saying about NodeID and I understand a bit
> more clearly now. Wow the Pub/Sub spec is large!
Don't be afraid, it's just comprehensive. :)
> Anyways. I'm thinking of writing a service where users can
> "subscribe" to feeds off the web. A service will be monitoring all
> "feeds" and pulling the RSS/Atom/whatever off of the web and
> populating the Pub/Sub nodes in batch so users get notifications.
>
> So really the service will create the new node to represent the news
> feed if its not already found in the Pub/Sub query and then add it to
> its "queue" of feeds to poll.
>
> I'm not quite sure what XEP to use to allow a user to "request"
> pulling of a feed. I haven't figured that part out yet.
You can request the items in a feed by using this:
http://www.xmpp.org/extensions/xep-0060.html#subscriber-retrieve
> I'm also not sure if I want to mix Atom nodes with RSS nodes etc. I
> could create 1 format to use for the "entry" and transform them to
> all match but for sites which add extra metadata to entries such as
> Digg with its DiggCount I would like to maintain that. That is the
> beauty of XML :)
But RSS isn't XML. :P
> That way Jabber clients who understand extensible items can display
> them.
>
> I like the idea of the NodeID being the feeds url because then this
> "polling" service can use the feeds URL easily as the SET after
> downloading content.
Sure. :)
> Any suggestions / recommendations would be great to hear!Another item
> I am unclear about is as I am polling for news data, how can I easily
> check if an "entry" exists already? I'd rather not have to keep a
> cache somewhere of all the items I have created already.
>
> Although performance is going to suck if I have to check every entry
> before inserting. Is there a way to batch insert, and disallow
> duplicate entries based on *something* like entry title or something?
>
>
> It's nice to be back in Jabber land :) I'm back into the old mindset
> where I have a ZILLION ideas rushing into my head about all the crazy
> things I can do with these XEPs lol.
>
> Any help would be great!
>
> Thanks so much!
You might want to join this list for discussion:
http://mail.jabber.org/mailman/listinfo/social
"XMPP and Social Networking, Two Great Tastes That Taste Great Together!"
/psa
Kelly S
05-22-2008, 07:53 PM
> Date: Thu, 22 May 2008 10:53:34 -0600> From: stpeter (AT) stpeter (DOT) im> To: standards (AT) xmpp (DOT) org> Subject: Re: [Standards] Pub/Sub & RSS> > On 05/21/2008 10:09 PM, Kelly S wrote:> > Thanks for the reply!> > > > I took a look at what your saying about NodeID and I understand a bit> > more clearly now. Wow the Pub/Sub spec is large!> > Don't be afraid, it's just comprehensive. :)> > > Anyways. I'm thinking of writing a service where users can> > "subscribe" to feeds off the web. A service will be monitoring all> > "feeds" and pulling the RSS/Atom/whatever off of the web and> > populating the Pub/Sub nodes in batch so users get notifications.> > > > So really the service will create the new node to represent the news> > feed if its not already found in the Pub/Sub query and then add it to> > its "queue" of feeds to poll.> > > > I'm not quite sure what XEP to use to allow a user to "request"> > pulling of a feed. I haven't figured that part out yet.> > You can request the items in a feed by using this:> > http://www.xmpp.org/extensions/xep-0060.html#subscriber-retrieve>
Sorry I don't think I explained too well. Users will be requesting to add feeds to their account, which in turn will subscribe them to the corresponding pub/sub node.
But if someone requests a feed we are not "publishing" we need to begin publishing that new feed and subscribe the user to it.
From then on that feed will be publishing new entries for when additional people start subscribing to it.
I'm not sure how to handle the user requesting a new feed that doesn't exist yet so that we can start pulling/publishing it.
> > I'm also not sure if I want to mix Atom nodes with RSS nodes etc. I> > could create 1 format to use for the "entry" and transform them to> > all match but for sites which add extra metadata to entries such as> > Digg with its DiggCount I would like to maintain that. That is the> > beauty of XML :)> > But RSS isn't XML. :P>
RSS isn't XML? I'm not sure what you mean lol. Are they not exposed as XML documents? lol.
> > That way Jabber clients who understand extensible items can display> > them.> > > > I like the idea of the NodeID being the feeds url because then this> > "polling" service can use the feeds URL easily as the SET after> > downloading content.> > Sure. :)> > > Any suggestions / recommendations would be great to hear!Another item> > I am unclear about is as I am polling for news data, how can I easily> > check if an "entry" exists already? I'd rather not have to keep a> > cache somewhere of all the items I have created already.> > > > Although performance is going to suck if I have to check every entry> > before inserting. Is there a way to batch insert, and disallow> > duplicate entries based on *something* like entry title or something?> > > > > > It's nice to be back in Jabber land :) I'm back into the old mindset> > where I have a ZILLION ideas rushing into my head about all the crazy> > things I can do with these XEPs lol.> > > > Any help would be great!> > > > Thanks so much!> > You might want to join this list for discussion:> > http://mail.jabber.org/mailman/listinfo/social> > "XMPP and Social Networking, Two Great Tastes That Taste Great Together!"> > /psa>
Also a bunch of concerns pop into my head that I'm still unclear about when maintaining all this "feed" data.
1. Is there any way I can just publish the latest feeds I pulled down and pubsub discard any duplicates that may already exist? Or is the right way to handle this is to query every single entry individually to check if they exist before publishing them?
1.a If I have to query each individually. Would I make the entry id the url of the entry so that I can check for duplicates as I pull them down off the web and read the entries? (example 4 old entries, but 1 new entry since last pull).
2. Entry requesting. Is there any sort of querying we can use against them? If we are publishing tons of entries, someone may want to browse/read them but only request X amount at a time, or only newer than a certain date etc.. I don't think pulling the entire entry history of a couple months is going to be too efficient.
Some of my concerns for #2 is because we are going to mobilize this, data plans are expensive for mobile devices. In this country it can be $25/1.5MB/month.
We want to try and make our mobile client query data efficiently but do these capabilities exist in Pub/Sub?
Thanks so much for the help :)
__________________________________________________ _______________
Kelly S
05-23-2008, 12:22 AM
Oh sorry I shouldn't be posting dev questions in standards ML right?
Should I bring this to jdev or that social ML you mentioned?
Sorry I didn't mean to spam the standards mailing list.
> Date: Thu, 22 May 2008 10:53:34 -0600> From: stpeter (AT) stpeter (DOT) im> To: standards (AT) xmpp (DOT) org> Subject: Re: [Standards] Pub/Sub & RSS> > On 05/21/2008 10:09 PM, Kelly S wrote:> > Thanks for the reply!> > > > I took a look at what your saying about NodeID and I understand a bit> > more clearly now. Wow the Pub/Sub spec is large!> > Don't be afraid, it's just comprehensive. :)> > > Anyways. I'm thinking of writing a service where users can> > "subscribe" to feeds off the web. A service will be monitoring all> > "feeds" and pulling the RSS/Atom/whatever off of the web and> > populating the Pub/Sub nodes in batch so users get notifications.> > > > So really the service will create the new node to represent the news> > feed if its not already found in the Pub/Sub query and then add it to> > its "queue" of feeds to poll.> > > > I'm not quite sure what XEP to use to allow a user to "request"> > pulling of a feed. I haven't figured that part out yet.> > You can request the items in a feed by using this:> > http://www.xmpp.org/extensions/xep-0060.html#subscriber-retrieve> > > I'm also not sure if I want to mix Atom nodes with RSS nodes etc. I> > could create 1 format to use for the "entry" and transform them to> > all match but for sites which add extra metadata to entries such as> > Digg with its DiggCount I would like to maintain that. That is the> > beauty of XML :)> > But RSS isn't XML. :P> > > That way Jabber clients who understand extensible items can display> > them.> > > > I like the idea of the NodeID being the feeds url because then this> > "polling" service can use the feeds URL easily as the SET after> > downloading content..> > Sure. :)> > > Any suggestions / recommendations would be great to hear!Another item> > I am unclear about is as I am polling for news data, how can I easily> > check if an "entry" exists already? I'd rather not have to keep a> > cache somewhere of all the items I have created already.> > > > Although performance is going to suck if I have to check every entry> > before inserting. Is there a way to batch insert, and disallow> > duplicate entries based on *something* like entry title or something?> > > > > > It's nice to be back in Jabber land :) I'm back into the old mindset> > where I have a ZILLION ideas rushing into my head about all the crazy> > things I can do with these XEPs lol.> > > > Any help would be great!> > > > Thanks so much!> > You might want to join this list for discussion:> > http://mail.jabber.org/mailman/listinfo/social> > "XMPP and Social Networking, Two Great Tastes That Taste Great Together!"> > /psa>
__________________________________________________ _______________
If you like crossword puzzles, then you'll love Flexicon, a game which combines four overlapping crossword puzzles into one!
http://g.msn.ca/ca55/208
Peter Saint-Andre
05-23-2008, 05:56 AM
On 05/22/2008 4:21 PM, Kelly S wrote:
> Oh sorry I shouldn't be posting dev questions in standards ML right?
>
> Should I bring this to jdev or that social ML you mentioned?
>
> Sorry I didn't mean to spam the standards mailing list.
No worries. In general this list is for development of new protocols
(and clarification of existing ones), whereas the jdev list is for
discussion about how to use those protocols to build applications. But
the lines are a bit fuzzy sometimes. :)
Peter
--
Peter Saint-Andre
https://stpeter.im/
Tomasz Sterna
05-26-2008, 11:25 AM
Dnia 2008-05-22, czw o godzinie 13:52 -0400, Kelly S pisze:
> But if someone requests a feed we are not "publishing" we need to
> begin publishing that new feed and subscribe the user to it.
> >From then on that feed will be publishing new entries for when
> additional people start subscribing to it.
> I'm not sure how to handle the user requesting a new feed that doesn't
> exist yet so that we can start pulling/publishing it.
Couldn't you just treat subscription request to a non-existing yet node,
as a feed request? Fetch it in background and return the items or
non-existing node error when the given URL is not an RSS feed.
Similarly to how conference room creation works - when you enter a
non-existing room, it is created.
--
/\_./o__ Tomasz Sterna
(/^/(_^^' http://www.xiaoka.com/
.._.(_.)_ im:smoku (AT) xiaoka (DOT) com
Peter Saint-Andre
05-27-2008, 11:50 PM
On 05/26/2008 3:23 AM, Tomasz Sterna wrote:
> Dnia 2008-05-22, czw o godzinie 13:52 -0400, Kelly S pisze:
>> But if someone requests a feed we are not "publishing" we need to
>> begin publishing that new feed and subscribe the user to it.
>> >From then on that feed will be publishing new entries for when
>> additional people start subscribing to it.
>> I'm not sure how to handle the user requesting a new feed that doesn't
>> exist yet so that we can start pulling/publishing it.
>
> Couldn't you just treat subscription request to a non-existing yet node,
> as a feed request? Fetch it in background and return the items or
> non-existing node error when the given URL is not an RSS feed.
>
> Similarly to how conference room creation works - when you enter a
> non-existing room, it is created.
Yes that would be the smart thing to do.
Peter
--
Peter Saint-Andre
https://stpeter.im/
vBulletin® v3.8.0 Release Candidate 2, Copyright ©2000-2009, Jelsoft Enterprises Ltd.