Quantcast
Channel: UNIX and Linux Forums
Viewing all articles
Browse latest Browse all 16232

C - libxml - Get raw content

$
0
0
Hi everyone and happy new year,

I'm currently on a project involving xml document parsing and I use the library libxml2.

In my xml document I have this line (that only a test...):
Code:

<firstname>&#xe9;&#xe0;&#xa3;&#xe8;!</firstname>
and when I retrieve the content of this node I end up with
Code:

éà£è!
This is a normal behaviour because the parser find out that the encoding is UTF8. But it's not exactly UTF8 but a custom encoding that I cannot change and that depend on different parameters.

So I tried to retrieve the raw data from the node without UTF8 decoding : in other word in want to retrieve
Code:

&#xe9;&#xe0;&#xa3;&#xe8;!
I already tried different approaches :
  • Personal encoding handler
  • No Handler
  • Encoding context
  • the function xmlGetRawString
  • dump function
  • and many more that I didn't remember
But no one worked, I always get "éà£è!".

Here is a summary of my code which doesn't work:
Code:

// [...]
xmlDocument = xmlParseFile(xmlFileName);
// [...]
// with XPath I find the node
// [...]
tempChar = xmlNodeGetContent(tempNode);
// [...]

Have you guys an idea to solve my problem?

Thanks in advance.

P.S. Sorry for my poor English, but it's not my mother tong and I still need to train it.

Viewing all articles
Browse latest Browse all 16232

Trending Articles