Page 1 of 2

Large XML File will not Format

Posted: Fri Mar 09, 2012 6:56 am
by sblaylock
I have a large xml file that won't format in PrimalXML 2011 or PrimalScript 2011. It's over 8000 lines long according to PrimalScript. I have emailed the xml to support@sapien.com.
Scott.

Large XML File will not Format

Posted: Fri Mar 09, 2012 7:00 am
by davidc
What version of PrimalXML are you using? David

Large XML File will not Format

Posted: Fri Mar 09, 2012 7:07 am
by sblaylock
2011

Large XML File will not Format

Posted: Sun Mar 11, 2012 9:53 pm
by davidc
What build of PrimalXML 2011? When I open the file it is automatically formatted. Verify that the following setting is enabled: Options->Format XML On Open Also try changing the extension to xml instead of txt, but either case should work. If the issue persists, please send in a screenshot. David

Large XML File will not Format

Posted: Mon Mar 12, 2012 3:06 am
by sblaylock
Hi David,
Sorry about the version question. I'm running the latest build - 2.0.5. I have the option set to Format XML On Open.

The issue is happening when I paste the xml from our log files. I have sent you another file that won't format on open.
Scott.

Large XML File will not Format

Posted: Mon Mar 12, 2012 4:37 am
by davidc
OK I see the problem. There are carriage return / newline in the middle of some tags, which is causing errors with the file. If you remove the line breaks from the middle of the tags it should format.Note: Error messages are displayed in the Output panel. David

Large XML File will not Format

Posted: Wed Mar 14, 2012 6:20 am
by sblaylock

Hi David,
So the XML formatter/parser doesn't have the ability to understand a cr/lf in the middle of a tag then.

Our xml is being streamed into a log file, and we have no way of knowing when the log will add a cr/lf.

For us to walk 8000+ lines of xml and pull the cr/lfs out is a non-starter.

Is there a way to have the parser be more intelligent to understand the cr/lf is in the middle of a tag? cr/lf before and after a tag could be legit, but not in the middle of a tag.

Scott

Large XML File will not Format

Posted: Wed Mar 14, 2012 6:58 am
by Alexander Riedel
Your xml example contains this:
</n1:attr
ibuteValue>

The file is actually only 26 lines, they just are very long. The CRLF makes the XML malformed, every parser I know of will throw that out. The initial parsing show this error:
Line 2, Column 0. Error 104: Unable to retrieve a token; Missing end bracket while parsing end tag 'n1:attr'.

If you have control over the process that streams the XML, simply add a CRLF after each close tag symbol '>' so that the log won't add them at random.

Large XML File will not Format

Posted: Wed Mar 14, 2012 8:19 am
by sblaylock
Hi Alexander,

Unfortunately we don't have control to the streams that dump the XML.

Somewhere along the way when I was trying to get the xml formatted, I dropped it into PrimalScript 2011 and selected Format XML. It came back with a warning that the xml was over something like 8120 lines - can't remember the exact message.

This is one of the largest dumps of xml we've dropped into PrimalXML, and it seems odd to me that all the other log files from the same system have been fine. Could there be a limit to the amount of xml it can parse? What seems odd too is it would throw a cr/lf in line 2 i.e. near the top, where, I would think, if the xml was too large from the mainframe it would drop the cr/lf near the bottom.

Anyway, it's rare for us to work with these large xml responses, so we can live with the issue.

Thanks for getting back to me,
Scott.

Large XML File will not Format

Posted: Wed Mar 14, 2012 9:02 am
by Alexander Riedel
The message warns about line length not number of lines. There is no really limit to the amount, it's just that if the XML is malformed it doesn't know what to do with it. Having a CR/LF in the middle of a tag name is just a syntactical error. Size, line length etc. have nothing to do with that.

The line number refers to the line in the original XML stream, not the reformatted one, because PrimalScript tries to parse the XML FIRST and only if that fails it reformats it as text.