While most content management systems, such as blogs, allow users edit HTML directly, more specialized ones, such as discussion forums, allow users to use alternative syntax that is easier to control and adapt to particular needs. BBCode, which stands for Bulletin Board Code, is one example of such alternative.
BBCode tags are enclosed in square brackets instead of angle brackets used in HTML, which makes it easy to mix BBCode and HTML because square brackets have no significance in the latter.
In other words, if a forum page is being rendered, posts can be HTML-encoded first to avoid any HTML security issues and then BBCode tags may be converted to HTML using regular expressions. Any mismatched BBCode tags are either ignored or forced to close to generate well-formed HTML.
A typical approach to replacing BBCode tags is to use a set of regular expressions, one for each tag, similar to the one below, which replaces any sequence of [b] tags in the specified string variable with HTML equivalents:
text.replace( /\[b\](.+?)\[\/b]/gi, "<b>$1</b>" );
This regular expression guarantees that any incomplete tags missing either the start or the end tag will be ignored. It also will, however, replace any mismatched tags, such as these:
[b][i]abc[/b]def[/i]
, producing malformed HTML.
One way to control mismatched tags is to process start and end tags individually and maintain a parsing state, which can be used to detect malformed BBCode markup. The source code below is written in JavaScript and uses regular expressions to match three patterns and call the textToHtmlCB function every time a match is found. The function keeps all open tags in a stack and checks the top of the stack when closing tags.
// ----------------------------------------------------------------------- // Copyright (c) 2008, Stone Steps Inc. // All rights reserved // http://www.stonesteps.ca/legal/bsd-license/ // // This is a BBCode parser written in JavaScript. The parser is intended // to demonstrate how to parse text containing BBCode tags in one pass // using regular expressions. // // The parser may be used as a backend component in ASP or in the browser, // after the text containing BBCode tags has been served to the client. // // Following BBCode expressions are recognized: // // [b]bold[/b] // [i]italic[/i] // [u]underlined[/u] // [s]strike-through[/s] // [samp]sample[/samp] // // [color=red]red[/color] // [color=#FF0000]red[/color] // [size=1.2]1.2em[/size] // // [url]http://blogs.stonesteps.ca/showpost.asp?pid=33[/url] // [url=http://blogs.stonesteps.ca/showpost.asp?pid=33][b]BBCode[/b] Parser[/url] // // [q=http://blogs.stonesteps.ca/showpost.asp?pid=33]inline quote[/q] // [q]inline quote[/q] // [blockquote=http://blogs.stonesteps.ca/showpost.asp?pid=33]block quote[/blockquote] // [blockquote]block quote[/blockquote] // // [pre]formatted // text[/pre] // [code]if(a == b) // print("done");[/code] // // text containing [noparse] [brackets][/noparse] // // ----------------------------------------------------------------------- var opentags; // open tag stack var crlf2br = true; // convert CRLF to <br>? var noparse = false; // ignore BBCode tags? var urlstart = -1; // beginning of the URL if zero or greater (ignored if -1) // aceptable BBcode tags, optionally prefixed with a slash var tagname_re = /^\/?(?:b|i|u|pre|samp|code|colou?r|size|noparse|url|s|q|blockquote)$/; // color names or hex color var color_re = /^(:?black|silver|gray|white|maroon|red|purple|fuchsia|green|lime|olive|yellow|navy|blue|teal|aqua|#(?:[0-9a-f]{3})?[0-9a-f]{3})$/i; // numbers var number_re = /^[\\.0-9]{1,8}$/i; // reserved, unreserved, escaped and alpha-numeric [RFC2396] var uri_re = /^[-;\/\?:@&=\+\$,_\.!~\*'\(\)%0-9a-z]{1,512}$/i; // main regular expression: CRLF, [tag=option], [tag] or [/tag] var postfmt_re = /([\r\n])|(?:\[([a-z]{1,16})(?:=([^\x00-\x1F"'\(\)<>\[\]]{1,256}))?\])|(?:\[\/([a-z]{1,16})\])/ig; // stack frame object function taginfo_t(bbtag, etag) { this.bbtag = bbtag; this.etag = etag; } // check if it's a valid BBCode tag function isValidTag(str) { if(!str || !str.length) return false; return tagname_re.test(str); } // // m1 - CR or LF // m2 - the tag of the [tag=option] expression // m3 - the option of the [tag=option] expression // m4 - the end tag of the [/tag] expression // function textToHtmlCB(mstr, m1, m2, m3, m4, offset, string) { // // CR LF sequences // if(m1 && m1.length) { if(!crlf2br) return mstr; switch (m1) { case '\r': return ""; case '\n': return "<br>"; } } // // handle start tags // if(isValidTag(m2)) { // if in the noparse state, just echo the tag if(noparse) return "[" + m2 + "]"; // ignore any tags if there's an open option-less [url] tag if(opentags.length && opentags[opentags.length-1].bbtag == "url" && urlstart >= 0) return "[" + m2 + "]"; switch (m2) { case "code": opentags.push(new taginfo_t(m2, "</code></pre>")); crlf2br = false; return "<pre><code>"; case "pre": opentags.push(new taginfo_t(m2, "</pre>")); crlf2br = false; return "<pre>"; case "color": case "colour": if(!m3 || !color_re.test(m3)) m3 = "inherit"; opentags.push(new taginfo_t(m2, "</span>")); return "<span style=\"color: " + m3 + "\">"; case "size": if(!m3 || !number_re.test(m3)) m3 = "1"; opentags.push(new taginfo_t(m2, "</span>")); return "<span style=\"font-size: " + Math.min(Math.max(m3, 0.7), 3) + "em\">"; case "s": opentags.push(new taginfo_t(m2, "</span>")); return "<span style=\"text-decoration: line-through\">"; case "noparse": noparse = true; return ""; case "url": opentags.push(new taginfo_t(m2, "</a>")); // check if there's a valid option if(m3 && uri_re.test(m3)) { // if there is, output a complete start anchor tag urlstart = -1; return "<a href=\"" + m3 + "\">"; } // otherwise, remember the URL offset urlstart = mstr.length + offset; // and treat the text following [url] as a URL return "<a href=\""; case "q": case "blockquote": opentags.push(new taginfo_t(m2, "</" + m2 + ">")); return m3 && m3.length && uri_re.test(m3) ? "<" + m2 + " cite=\"" + m3 + "\">" : "<" + m2 + ">"; default: // [samp], [b], [i] and [u] don't need special processing opentags.push(new taginfo_t(m2, "</" + m2 + ">")); return "<" + m2 + ">"; } } // // process end tags // if(isValidTag(m4)) { if(noparse) { // if it's the closing noparse tag, flip the noparse state if(m4 == "noparse") { noparse = false; return ""; } // otherwise just output the original text return "[/" + m4 + "]"; } // highlight mismatched end tags if(!opentags.length || opentags[opentags.length-1].bbtag != m4) return "<span style=\"color: red\">[/" + m4 + "]</span>"; if(m4 == "url") { // if there was no option, use the content of the [url] tag if(urlstart > 0) return "\">" + string.substr(urlstart, offset-urlstart) + opentags.pop().etag; // otherwise just close the tag return opentags.pop().etag; } else if(m4 == "code" || m4 == "pre") crlf2br = true; // other tags require no special processing, just output the end tag return opentags.pop().etag; } return mstr; } // // post must be HTML-encoded // function parseBBCode(post) { var result, endtags, tag; // convert CRLF to <br> by default crlf2br = true; // create a new array for open tags if(opentags == null || opentags.length) opentags = new Array(0); // run the text through main regular expression matcher result = post.replace(postfmt_re, textToHtmlCB); // reset noparse, if it was unbalanced if(noparse) noparse = false; // if there are any unbalanced tags, make sure to close them if(opentags.length) { endtags = new String(); // if there's an open [url] at the top, close it if(opentags[opentags.length-1].bbtag == "url") { opentags.pop(); endtags += "\">" + post.substr(urlstart, post.length-urlstart) + "</a>"; } // close remaining open tags while(opentags.length) endtags += opentags.pop().etag; } return endtags ? result + endtags : result; }
The HTML below can be used to see the parser in action. Save the parser in a file called bbcode.js and save the HTML in another file in the same directory.
<html> <head> <title>BBCode Test</title> <script type="text/javascript" src="bbcode.js"></script> <script type="text/javascript"> function outputBBCode(textarea) { var out = document.getElementById("out"); var out_html = document.getElementById("out_html"); var html = parseBBCode(textarea.value); if(!out.firstChild) out.appendChild(document.createTextNode(html)); else out.replaceChild(document.createTextNode(html), out.firstChild); out_html.innerHTML = html; } </script> </head> <body> <textarea id="in" rows="12" cols="80">[b]bold[/b], [i]italic[/i], [u]underlined[/u], [s]strike-through[/s], [samp]sample[/samp] [url]http://blogs.stonesteps.ca/showpost.asp?pid=33[/url] [url=http://blogs.stonesteps.ca/showpost.asp?pid=33][i]BBCode[/i] Parser[/url] Inline [q=http://blogs.stonesteps.ca/showpost.asp?pid=33]quote[/q] [blockquote=http://blogs.stonesteps.ca/showpost.asp?pid=33]Block quote[/blockquote][pre]formatted text[/pre][code]if(a == b) print("done");[/code]text containing [noparse] [brackets] [/noparse] c[b][color=red]o[/color][/b][b][color=green]l[/color][/b][b][color=blue]o[/color][/b]rs and [size=1.2]text size[/size] [b][i]mismatched [/b] tags[/i] remaining text should not affect page HTML. </textarea> <div> <input type="submit" value="Submit" onclick="outputBBCode(document.getElementById("in"))" style="vertical-align: top"> </div> <div id="out_html" style="border: 1px solid #777; margin: 1em auto; padding: 5px 3px;"> </div> <p>This paragraph should not be formatted in any way after BBCode is converted to HTML, even if there are mismatched or mixed BBCode tags.</p> <div id="out" style="border: 1px solid #777; margin: 1em auto; padding: 5px 3px;"> </div> </body> </html>
The parser may be used with ASP, as long as the script language is identified as JScript. Alternatively, it can be used as a client script to convert BBCode tags to HTML directly in the browser.
A simple one-level list would be similar to noparse - once a list is opened, keep looking for [*] and emit <li> or </li><li> until the closing list tag and emit </li></ul> at that point. A multi-level list, with formatting in between, would be more complex to implement.
I know this post is old but any chance you could advise on how to add [list][/list] and [*]
thanks
Hey,
just wanted to say I've designed and implemented something called BBCode 2.0 - which close tags by guessing like this: [b][i]test[/][/].
I hope someone finds it useful - it's specified on http://doc.ke.mu/doc/bb and javascript (node.js) implementation is available as npm package called bb2. So, simply:
npm install bb2
Hi.
Nice post man, thank you for this.
url=foo.bar#id doesn't work. The following patch fixes it:
--- bbcode.js.orig 2011-06-30 14:54:18.000000000 +0400
+++ bbcode.js 2011-06-30 15:08:48.000000000 +0400
@@ -53,7 +53,7 @@
var number_re = /^[\\.0-9]{1,8}$/i;
// reserved, unreserved, escaped and alpha-numeric [RFC2396]
-var uri_re = /^[-;\/\?:@&=\+\$,_\.!~\*'\(\)%0-9a-z]{1,512}$/i;
+var uri_re = /^[-;\/\?:@&=\+\$,_\.!~\*'\(\)%0-9a-z#]{1,512}$/i;
// main regular expression: CRLF, [tag=option], [tag] or [/tag]
var postfmt_re = /([\r\n])|(?:\[([a-z]{1,16})(?:=([^\x00-\x1F"'\(\)<>\[\]]{1,256}))?\])|(?:\[\/([a-z]{1,16})\])/ig;