5.2.23
12/09/2023

[#297] Cannot properly parse xhtml-im using gloox
Summary Cannot properly parse xhtml-im using gloox
Queue gloox
Queue Version SVN
Type Enhancement
State Resolved
Priority 1. Low
Owners js (at) camaya (dot) net
Requester pulkomandy (at) gmail (dot) com
Created 02/13/2022 (664 days ago)
Due
Updated 03/14/2023 (270 days ago)
Assigned
Resolved 03/14/2023 (270 days ago)

History
03/14/2023 07:13:25 PM Jakob Schröter Comment #3
Assigned to Jakob Schröter
State ⇒ Resolved
Reply to this comment
Thanks for the report and the inspiration.
This is now added in svn, will be in 1.0.25. Compile with 
--enable-xhtmlim to expose the API.
02/13/2022 07:31:10 PM pulkomandy (at) gmail (dot) com Comment #2 Reply to this comment
I made it work by making the Tag::NodeList class public and exposing 
an API to access the nodes. Then I can iterate on them in the correct 
order.

You can find this patch here: 
https://github.com/haikuports/haikuports/blob/master/net-libs/gloox/patches/gloox-1.0.24.patchset#L102
02/13/2022 05:26:40 PM pulkomandy (at) gmail (dot) com Comment #1
State ⇒ Unconfirmed
Priority ⇒ 1. Low
Type ⇒ Enhancement
Summary ⇒ Cannot properly parse xhtml-im using gloox
Queue ⇒ gloox
Reply to this comment
I am trying to parse XHTML-IM using Gloox.

I have a message that looks like this:

<html xmlns='http://jabber.org/protocol/xhtml-im'><body 
xmlns='http://www.w3.org/1999/xhtml'>[<span 
style='color:blue;'>HaikuArchives/Renga</span>] <span 
style='color:brown;'>pulkomandy</span> pushed <span 
style='color:green;'>1</span> commit to <span 
style='color:green;'>master</span> [+0/-0/±1] <span 
style='color:lightmagenta;'>https://github.com/HaikuArchives/Renga/compare/8d709ab45869...98879d2afdec</span></body></html>

gloox Tag representation for this is a Tag with a cdata string 
containging " "[]  pushed  commit to  [+0/-0/±1]", and 5 children for 
the first level spans.

There is no info allowing to know where the spans should be inserted 
in the cdata string. So I don't know how to rebuild the correct 
message from this. It looks like the Tag class may store the cdata as 
separate strings for each piece, but even that may not be enough, as I 
don't know if I have to start the message with a cdata or a span.

Do I need to use a separate XML parser for the XHTML handling? This is 
a bit unfortunate.