Before WordPress 5.0, content for posts was stored as a string. It may or may not have contained HTML, line breaks and other formatting is meaningful, also shortcodes. WordPress 5.0 introduced a block based editor. It stores content as a string of HTML, with extra annotations as well as semantic HTML markup and attributes representing the data model. That HTML is parsed server-side -- removing extra annotations and replacing dynamic content -- before outputting HTML
Some developers would prefer that the block attributes, including content were stored in separate, queryable columns in the database or were presented as an object. Maybe that would have been better -- I disagree later in this post -- but it's not how it works. More importantly, you can parse the blocks into a structured object with PHP and JavaScript. That's best of both worlds, we get the interoperability of using HTML to represent HTML, and developers can work with content as an object to modify the content or how it is rendered as HTML.
I recently published a helpful tool for using the block parser in React apps. It helps when you want to replace WordPress' default block parsing with your own React components. In this article, I'll get to why I like WordPress, very imperfect approach and making it easier to modify with what I'm working on. I'll also look at how it compares to markdown-based content in Jamstack-style sites. I am a big fan of both approaches and this is not about debating one versus the other. Both are better, both maybe a better choice than the other, depending on the circumstances.
Shelob9 / block-content
Renders "raw" post content with WordPress block markup in it using React components, which you optionally provide.
Block Content Renderer
Renders "raw" post content with WordPress block markup in it using React components you optionally provide. Uses @wordpress/block-serialization-default-parser
.
This works with the "raw" value returned by WordPress REST API for post title, content, excerpt, etc. You must request with ?context=edit
which requires permission to edit the post.
BETA Probably don't use. An experiment by Josh Pollock.
Why / Status
WordPress parses block-based content to HTML before displaying it in a front-end theme. This HTML is also returned by the REST API and WPGraphQL. With a JavaScript front-end, in a headless site or what not, you may want to treat the block content as an object for several reasons.
- Change the markup -- add classes to paragraphs, change element types, etc.
- Sanitize content
- Re-order or reformat content.
WordPress' block parser converts blocks to objects. These objects have block attributes and the inner HTML. This library will…
Content Can Be An Object In A String
First, everything is a string in the database. A JSON column is a string with special annotations for translating it into an object. Relational databases like MySQL are great for putting it all back together. And if every block was its own row in a table, we could query for all blocks of a certain type. That and similar queries would make a GraphQL API for WordPress even cooler.
It is common when developing a Gatsby site to have a directory of markdown files stored in the same git repo as the code. When Gatsby generates the site, it parses the markdown in those files to an abstract syntax tree and then uses that tree to render HTML. Generally MDX is used to provide React components for each HTML element.
Gatsby provides APIs to hook in while that's happening so you can add business logic like "always add this class to paragraphs" or "make a fancy blockquote markup" or "insert an ad between sections."
I'm over generalizing a lot here. But the point is that minimal markup is stored with the content. The markup is generated at build-time, by treating the string of markup as an object.
Back To WordPress
When content is edited by the block editor, there is a lot of HTML markup in the_content field of the database. It's semantic HTML, making heavy use of comments and data attributes. The extra annotations, the Gutenberg grammar, are removed before sending the content to the browser in most settings.
The WordPress REST API returns an object for post content. It contains one or more properties. It should always return a "rendered" property. That is the same HTML as we get in the front-end. If you have permission to edit posts and append ?context=edit you will have a "raw" property in the content object.
That has the unparsed content. You can do the same thing WordPress does with it: use a PHP or JavaScript parser to convert it into an array of block objects and then walk that array to generate HTML.
This article covered parsing with JavaScript. Micah Wood wrote a good post on using the PHP parser and expose it on REST API endpoint. I also recommend this explanation of how dynamic block parsing works server-side by default. You may also want to look at Roy Sivan's Gutenberg Object Plugin which copies block data to a separate table, and exposes it on REST API endpoint.
Why This Matters
The rendered HTML returned by the REST API can be rendered with React, using dangerouslySetInnerHTML:
const PostContent = ({rendered}) => {
function createMarkup() {
return { __html: rendered };
}
return <div dangerouslySetInnerHTML={createMarkup()} />;
}
This is not the best solution, because you are opening yourself up to XSS attacks by letting React evaluate raw HTML like that. What if you have React components you want to use for rendering the content, for consistency with the rest of the site?
In these situations, using a block parser may be helpful. For example, you can parse out links and replace them with React components, for example Gatsby's Link component in place of links.
Customizing Block Rendering
As I said earlier, I'm working on a helper for working with the parser in React apps for headless WordPress sites. WordPress always returns post content with a "rendered" property, which contains the pre-parsed HTML, if you request a post with context=edit
query param and have permission to edit, you also get back a "raw" property. That's what we're working with here.
WordPress' block parser works with that raw string, like this:
import { parse } from "@wordpress/block-serialization-default-parser";
const blocks = parse( `<!-- wp:paragraph -->
<p>Hi Roy</p>
<!-- /wp:paragraph -->`);
That returns an object full of blocks, some of which have blocks inside them. I'm working on a utility that makes it easier to use this parser to render content using components supplied by the developer.
This library includes a component called BlockContent
, which renders raw block content, by parsing the blocks, sanitizing the content and converting it to React elements. Remember, this request must be made by a user with permission to edit the post, and with the context query param set to edit.
Here's an example of a Post component that uses it:
import {BlockContent} from "@shelob9/block-content";
export const Post = ({post}) => {
return (
<article>
<BlockContent rawContent={post.content.raw} />
</article>
)
}
That's cool, but it doesn't help customize what React components are used to render the block content or to add business logic to the rendering. To do that, you need to set up a provider and supply it with components.
Here is an example of the components you could use. In this example, all "a" elements in post content are replaced with Gatsby's link component and all "p" elements get a different class:
const components = {
//replace links with Gatsby's link component.
a:({children,className,href}) => (
<Link className={className} to={href}>{children}</Link>
),
}
In this example, we add an additional class name to all paragraphs:
const components = {
//Add a custom class to paragraphs
p : ({children,className}) => (
<p className={`${className} custom-class`}>{children}</p>
),
}
There is no need to supply all elements. If, for example, no component for p
elements are provided, a generic one is generated.
These components are passed to the ThemeProvider
component. That provider needs to go around all elements that use BlockContent
:
import {ThemeProvider} from "@shelob9/block-content";
import {Link} from "gatsby";
import {Post} from "your/post/component";
import components form "your/components";
//Mock data
let raw = `<!-- wp:paragraph -->
<p>Hi Roy</p>
<!-- /wp:paragraph -->`;
let post = {
content: {
raw,
rendered: '<p>Hi Roy</p>'
}
};
//Wrap everything in the theme provider
const App = () => {
return(
<ThemeProvider components={components}>
<Post post={post} />
</ThemeProvider>
)
}
Try It And Let Me Know What You Think
This is a new project. If you have a chance to use it, let me know what you think, in the comments or on Twitter. I will add more control over sanitizing content and attributes next, but would be super happy to know what you wish this could do.
yarn add @Shelob9/block-content
npm install @Shelob9/block-content
Shelob9 / block-content
Renders "raw" post content with WordPress block markup in it using React components, which you optionally provide.
Block Content Renderer
Renders "raw" post content with WordPress block markup in it using React components you optionally provide. Uses @wordpress/block-serialization-default-parser
.
This works with the "raw" value returned by WordPress REST API for post title, content, excerpt, etc. You must request with ?context=edit
which requires permission to edit the post.
BETA Probably don't use. An experiment by Josh Pollock.
Why / Status
WordPress parses block-based content to HTML before displaying it in a front-end theme. This HTML is also returned by the REST API and WPGraphQL. With a JavaScript front-end, in a headless site or what not, you may want to treat the block content as an object for several reasons.
- Change the markup -- add classes to paragraphs, change element types, etc.
- Sanitize content
- Re-order or reformat content.
WordPress' block parser converts blocks to objects. These objects have block attributes and the inner HTML. This library will…
I Think This Is Good
Yes, a table structure for block data would make it easier to do MySQL query based on blocks. I love to think about an alternate reality or possible future where blocks can be used as a graph database of some sort.
In the strange world we do live in, post content is a string and I think that's good. With a table based system, the content -- what site owners care about -- you would need MySQL and PHP to convert that to HTML.
Gutenberg markup in HTML makes parsing optional and can be done without PHP and MySQL. There are JS and PHP clients. Also, it's a spec you could implement in Go, because you're Chris Wiegman or whatever.
That's why I think this tradeoff makes sense. But, if querying against block attributes is a requirement, then those block attributes should be saved in post meta, so queries can be done based on those meta fields. I recommend this post by Helen Hou-Sandí wrote about working with meta fields in the block editor if you want to learn more about how to do that.
I know this may be a contrarian opinion, but using strings of HTML is not a bad way to represent content blocks. It is way more human readable and interoperable than JSON or storing in MySQL. With the parsers, available to use, when the rendered HTML doesn't fit our need, we can customize how the rendering works, to fit our needs.
Sane defaults and plenty of ways to modify core behavior. Yes, it's a little messy, but it works and is very extensible when it needs to be. That's the vibe that makes WordPress so useful, right?
Featured Image by Joeri Römer on Unsplash
Top comments (0)