Shortcode Shortcomings

Dec 20, 2008   //   by Hackadelic   //   WordPress  //  7 Comments

This post requires a certain degree of technical savvyness.

BUG: it wasn't me, I swearWhen experimenting with various shortcode locations in the post content, I noticed that under some conditions the shortcode behaved a bit strange with regard to paragraph arrangement. So I implemented a couple of “diagnostic” shortcodes and filters to further investigate the problem. But before I go into that, follow me on a short tour through what the Shortcode API is.[toc class=”toc-right”]

The Shortcode API

From a user’s perspective, a shortcode is a denoted by a pair of tags of the form [SHORTCODE]some text[/SHORTCODE]1, a notation borrowed from BBCode.

Now, plug-ins have been providing functionality through similar (but varying) notation way before there’s been a Shortcode API. But they had to parse the entire post content as a whole and determine on their own what’s inside, and what’s outside the shortcode tags.

Because this pattern was so widely spread, the WordPress team decided to ease everybody’s life by (a) standardizing the notation and (b) providing standard protocols to follow, and built-in mechanisms to access shortcode elements – the Shortcode API. Here’s what WordPress says about it:

The API handles all the tricky parsing, eliminating the need for writing a custom regular expression for each shortcode. Helper functions are included for setting and fetching default attributes. The API support both self-closing and enclosing shortcodes.

All a shortcode plug-in needs to do is handle what’s in the shortcode. In fact, it doesn’t even see anything outside the shortcode. It gets it’s meal served nicely pre-digested by a friendly WordPress. Now, this sound anything but bad, doesn’t it? It actually sounds pretty great really!

That is… if only it worked! 🙁

Malformed XHTML

Let’s take a look at this example entry as it is entered in the editor:

Some text here [SHORTCODE]

Enclosed

text[/SHORTCODE] more text.

Before the plugin that handles the shortcode is activated, the entry is internally represented as:

<p>Some text here [SHORTCODE]</p>
<p>Enclosed</p>
<p>text[/SHORTCODE] more text.</p>

Assuming the plug-in will, after it’s activated, substitute the shortcode by, say, “XXX”, then this is what it’d result to:

<p>Some text here XXX more text.</p>

Now, assume the plug-in wanted to do something more complicated, and substituted the shortcode content by, say

<div class="my_fancy_css_class">shortcode content</div>

This will result in:

<p>Some text here <div class="my_fancy_css_class"></p>
<p>Enclosed</p>
<p>text</div> more text.</p>

Ooops! Bad luck! That’s a crap of a HTML. It’s not even consistently parsable by browsers.2 (I’ve faced this problem already when I was working on my Sliding Notes plug-in. On the plug-in homepage I’ve explained some aspects of it, too.)

Dangling And Leaking Paragraphs

O.K., I thought, the Shortcode API is not as usefull, as I first assumed. It does relieve me of some work on the parsing side, but I have to invest efforts into working my way around its insufficiencies instead. But at least it is consistent, and once I’ve found a workaround…

Consistent?

Yeah, would have been nice if it was! But nope! It is not!

See this example, with a sample shortcode that simply substitutes itself with itself (and hence should not make a difference if the shortcode is active or not):

Some text here

[SHORTCODE] the tag is first on a line

Enclosed

text[/SHORTCODE] more text.

This is what it internally looks like, before it’s been through the Shortcode API:

<p>Some text here </p>
<p>[SHORTCODE] the tag is first on a line</p>
<p>Enclosed</p>
<p>text[/SHORTCODE] more text.</p>

And this is the HTML after pre-digestion by the Shortcode API:

<p>Some text here </p>
[SHORTCODE] the tag is first on a line</p>
<p>Enclosed</p>
<p>text[/SHORTCODE] more text.</p>

Note the missing opening paragraph tag in line 2 ?!?

On other occasions, it was the closing paragraph tag that was missing. On others yet, both were, like here:

<p>Some text</p>
[SHORTCODE]
<p>Enclosed text</p>
<p>[/SHORTCODE]</p>
<p>More text.</p>

The corresponding text as entered in the editor is:

Some text.

[SHORTCODE]

Enclosed text

[/SHORTCODE]

More text.

Sometimes it ate a paragraph tag near the opening shortcode tag, sometimes near the closing one. It was despairing. Hence my:

Conclusions

  • The Shortcode API “eats” the opening or closing paragraph tag, or both.
  • In neither case can the plug-in handler do something about it, because it has no influence on what’s going on outside of the shortcode tags.
  • The conditions under which the API misbehaves are unreproducible. You never know when it’ll hit you, until it does. In this regard, it is similar to a “chaotic system”.
  • Life is too short to mess around with stupid issues like this.

Ergo: I’m resorting to “good old” filters until the issue has been resolved.

  1. This is a simplified representation, sufficient for the purpose of this article. A full description is available at the WordPress site. []
  2. A careful reader adept with WordPress API’s might have correctly noticed that this effect is not entirely to the merit of the Shortcode API. At first sight it may seem this has something to do with how TinyMCE, WordPress’ visual editor, processes the entered text. But my diagnosis shows that the aforementioned processing is with WordPress’ filters, not TinyMCE itself. (Filters of priority higher then normal see a “normalized” version of the text, which is pretty close to what the text looks like in the editor. Ergo, the text stored in the database does not manifest the issue.) I still see it as a shortcoming of the Shortcode API, because it’s lacking awareness of the processing that takes place in its native environment. []

7 Comments

  • I recently had the same issue on a WordPress site I’ve built. The problem for me was that I was using a plug-in to disable the “wpautop” auto-formatting of content, but this plug-in was not disabling the “shortcode_unautop” filter as well, causing WordPress to “eat” paragraph tags in the content as you described.

    The plug-in I am using is called “ps_disable_auto_formatting”: http://wordpress.org/extend/plugins/ps-disable-auto-formatting/

    I fixed it by adding: remove_filter(‘the_content’, ‘shortcode_unautop’); after line 72 of the main plug-in file.

    This kind person has a post about it here, for a different plug-in “wp_unformatted”: http://spielwiese.fontein.de/2010/09/27/a-non-wordpress-bug/

    Perhaps this is related? It certainly worked for me, and it might explain why the WordPress guys got a bit annoyed, since this bug doesn’t seem to be their fault.

    TinyMCE has nothing to do with this particular bug, and I find it to be a pretty solid wysiwyg editor, given what it has to actually do. I’d also comment that the very reason I use “ps_disable_auto_formatting” is that I find it makes TinyMCE work the way it was intended, as wpautop seems to completely destroy paragraph tags as you edit content in the editor which is very annoying.

    Good luck!

    • Travis, you are right about TinyMCE. The problem is indeed related to wpauotp and shortcode_unautop. Alas, simply turning them off is not the solution.

      But fortunately, I’ve just found a workable fix for that. 🙂

      At least so far, it seems to fix things the way I want.

  • First, I know this is an older post but I still had to comment…exactly the issues I have experienced with the WP editor and shortcodes…and it seems to be ongoing and the WP development team just won’t listen (in fact one got quite upset with me when I mentioned they needed to get a quality editor). Long story short, shortcodes are a great idea for WP users but at the same time, it’s extremely buggy for typical users who have to work with the editor WP has. I’ve never been a fan of tinymce which is ultimately a nasty nasty editor and it’s what many others are based on. It changes my code, it doesn’t play well with shortcodes, and too proprietary. The irony here is that I use Joomla as well and the editor I use is the JCE editor which is based on the tinymce, but the development team there is top notch with this one.

  • Any progress made on this issue. This is my first stop because you were first up in Google… I may find out more as I dig deeper…

    • Hi Dave,

      it is not mine to improve the shortcode handling in WordPress. It is in the hands of the WordPress team.

      One thing that I’ve discovered since is that it is probably not the API, or not the API alone. Much of it is “courtesy” of WP’s TinyMCE integration.

  • The shortcode API is a big problem with WordPress.

    I know there are content creators who use the Visual Editor to input their content and shortcodes.

    When they do this, paragraph tags wrap around the shortcode and gets rendered when they publish their article.

    This, of course, does not validate the HTML!

    This is something WordPress has to address!

    • Ray, you are so damn right. Shortcodes could be so usefull for assembling macro-content from micro-content, if it wasn’t for the crappy API implementation.

      The “paragraph wrapping” thing (and other ) occurs even in the HTML editor – AFAICT it is due to built-in WordPress filters that are triggered before editor content is saved to the database, and after it is read from the database. I’m experimenting with another (not yet published) shortcode right now, and it blows my mind that when the shortcode returns something wrapped in a block element (like a div), WordPress prepends a <p> tag without closing it, which my browser interprets as a <p></p> (but other browsers have other interpretations).

      I sure hope WP improves on this dramatically.

Blog Categories

I have come here to chew bubblegum and kick ass...
and I'm all out of bubblegum.
-- Nada in They Live