Andre's Blog
Perfection is when there is nothing left to take away
Generating HTML 4.01 in FCKeditor

FCKeditor is a great application. I cannot say enough of this. However, one little problem for those who need output in HTML is that the editor only generates output in XHTML, which invalidates HTML pages with XML constructs, such as <br />.

Searching through FCKeditor's forums and asking there wasn't much help - some of the FCKeditor folks simply dismissed HTML as something that should not be used in the first place (passionate, but not very smart), others suggested searching for "/>" sequences in XHTML and replacing them with ">". This technique will work for most text input, but may fail for constructs like <input name="in1" value="/>"/>, which is valid XHTML because the right angle bracket doesn't have to be encoded in XML.

I wanted to use a solution that would avoid doing any form of search-and-replace in XHTML, so I turned to server-side XSL and created a style sheet that would strip off XML attributes, like xml:space, and processing instructions and would output the result of the transformation as HTML:

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<xsl:stylesheet version="1.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
    xmlns:html="http://www.w3.org/1999/xhtml">
<xsl:output method="html" version="4.01" 
    omit-xml-declaration="yes" 
    media-type="text/html"/>

<!-- apply templates recursively for each child and attribute -->
<xsl:template match="node()">
<xsl:copy>
<xsl:apply-templates select="attribute::*|child::node()"/>
</xsl:copy>
</xsl:template>

<!-- copy attributes, text nodes and comments -->
<xsl:template match="attribute::*|text()|comment()">
<xsl:copy-of select="."/>
</xsl:template>

<!-- ignore xml:* attributes and processing instructions -->
<xsl:template match="attribute::xml:*|processing-instruction()"/>
</xsl:stylesheet>

Soon I realized that this doesn't really work because the editor outputs a collection of unrelated HTML paragraphs, which do not have the root node and cannot be used to construct the input XML document. So, I changed the back-end code to create an artificial root node and added this fragment to the original XSL stylesheet:

<!-- ignore the artificial root element -->
<xsl:template match="/root">
<xsl:apply-templates select="child::node()"/>
</xsl:template>

This took care of the scattered XHTML elements, but I ran into another problem - some XHTML references, such as &nbsp; are not a part of the XML standard and caused an error when XHTML input was being converted into a DOM document. The simple solution to this was to turn off HTML entities in the FCKeditor:

FCKConfig.ProcessHTMLEntities = false;

, which resulted in &nbsp; rendered as &#160; and could be processed by the XML parser just fine.

On the server side, I added a shared XSL document to the application scope in the global.asa file:

<script language="JScript" runat="server">
function Application_OnStart()
{
   var xslDoc = Server.CreateObject("Msxml2.FreeThreadedDOMDocument");
   
   xslDoc.async = false;
   xslDoc.load(Server.MapPath("/library/xhtml.xsl"));

   if(xslDoc.parseError.errorCode == 0)
      Application.Contents("XSLT_XHTML_stylesheet") = xslDoc;
   else
      xslDoc = null;
}
</script>

, and this function to convert XHTML to HTML:

function XHTML2HTML(xhtml)
{
   var xmlDoc;    // XHTML document
   var xslDoc;    // shared XSL document
   var xslt;      // XSL template object
   var xslProc;   // XSL processor object

   try {
      // use the XSL document created in global.asa on application start
      if((xslDoc = Application.Contents("XSLT_XHTML_stylesheet")) == null)
         return xhtml;

      // create the XML document for XHTML
      xmlDoc = Server.CreateObject("Msxml2.DOMDocument");
      xmlDoc.async = false;
      xmlDoc.preserveWhiteSpace = true;
      
      // add an artificial root, so scattered XHTML elements can be loaded
      xmlDoc.loadXML("<root>" + xhtml + "</root>");

      // return the original XHTML in case of an error
      if(xmlDoc.parseError.errorCode != 0)
         return xhtml;

      // now create an XSL template and a processor
      xslt = Server.CreateObject("Msxml2.XSLTemplate");
      xslt.stylesheet = xslDoc;
      
      xslProc = xslt.createProcessor();
      
      // and transform XHTML to HTML (root node is removed in XSL)
      xslProc.input = xmlDoc;
      xslProc.transform();
      
      // got HTML, send it back
      return xslProc.output;
   }
   catch (err) {
      // return the original XHTML in case of an error
      return xhtml;
   }
}

This was it! The only thing that was bugging me was losing the ability to retain character entity references (i.e. &nbsp; etc) and I decided to see if I can hook the XHTML DTD to the back-end XML parser, so it could pick up character entity references.

Well, that didn't quite work out the way I was hoping. At first, MSXML simply refused to open the XHTML DTD. It turned out that MSXML6 disables DTD retrieval by default, for security reasons. When I enabled ProhibitDTD and ResolveExternals properties, MSXML started to complain about an unspecified error in the XHTML DTD, so I left character entity references alone, for the time being.

Comments:
Name:

Comment: