5.2.23
10/01/2022

[#297] Cannot properly parse xhtml-im using gloox
Summary Cannot properly parse xhtml-im using gloox
Queue gloox
Queue Version SVN
Type Enhancement
State Unconfirmed
Priority 1. Low
Owners
Requester pulkomandy (at) gmail (dot) com
Created 02/13/2022 (230 days ago)
Due
Updated 02/13/2022 (230 days ago)
Assigned
Resolved

History
02/13/2022 07:31:10 PM pulkomandy (at) gmail (dot) com Comment #2 Reply to this comment
I made it work by making the Tag::NodeList class public and exposing 
an API to access the nodes. Then I can iterate on them in the correct 
order.

You can find this patch here: 
https://github.com/haikuports/haikuports/blob/master/net-libs/gloox/patches/gloox-1.0.24.patchset#L102
02/13/2022 05:26:40 PM pulkomandy (at) gmail (dot) com Comment #1
State ⇒ Unconfirmed
Priority ⇒ 1. Low
Type ⇒ Enhancement
Summary ⇒ Cannot properly parse xhtml-im using gloox
Queue ⇒ gloox
Reply to this comment
I am trying to parse XHTML-IM using Gloox.

I have a message that looks like this:

<html xmlns='http://jabber.org/protocol/xhtml-im'><body 
xmlns='http://www.w3.org/1999/xhtml'>[<span 
style='color:blue;'>HaikuArchives/Renga</span>] <span 
style='color:brown;'>pulkomandy</span> pushed <span 
style='color:green;'>1</span> commit to <span 
style='color:green;'>master</span> [+0/-0/±1] <span 
style='color:lightmagenta;'>https://github.com/HaikuArchives/Renga/compare/8d709ab45869...98879d2afdec</span></body></html>

gloox Tag representation for this is a Tag with a cdata string 
containging " "[]  pushed  commit to  [+0/-0/±1]", and 5 children for 
the first level spans.

There is no info allowing to know where the spans should be inserted 
in the cdata string. So I don't know how to rebuild the correct 
message from this. It looks like the Tag class may store the cdata as 
separate strings for each piece, but even that may not be enough, as I 
don't know if I have to start the message with a cdata or a span.

Do I need to use a separate XML parser for the XHTML handling? This is 
a bit unfortunate.