Once we know how to perform the HTML-to-Structured-Text conversion, we only have to do some minor changes to make it work also for Markdown content.
As we just saw, the datocms-html-to-structured-text
package knows how to convert an hast
tree (HTML) to a dast
tree (Structured Text), so if we can convert a Markdown string to hast
, then the rest of the code will be basically the same.
Luckily, hast
is part of the unified ecosystem, which also includes:
an analogue specification for representing Markdown in a syntax tree called mdast
;
a tool to convert Markdown strings to mdast
;
a tool to convert mdast
trees to hast
.
Let's install all the packages we need:
npm install --save-dev unified@9 remark-parse@9 mdast-util-to-hast@10
We can now create a function similar to htmlToStructuredText
called markdownToStructuredText
that connects all the dots:
// ./migrations/utils/markdownToStructuredText.tsimport unified from 'unified';import toHast from 'mdast-util-to-hast';import parse from 'remark-parse';import {hastToStructuredText,Options,HastRootNode,} from 'datocms-html-to-structured-text';import { validate } from 'datocms-structured-text-utils';export default async function markdownToStructuredText(markdown: string,options: Options,) {if (!markdown) {return null;}const mdastTree = unified().use(parse).parse(markdown);const hastTree = toHast(mdastTree) as HastRootNode;const result = await hastToStructuredText(hastTree, options);const validationResult = validate(result);if (!validationResult.valid) {throw new Error(validationResult.message);}return result;}
We can now create a new migration script:
> datocms migrations:new convertMarkdownArticlesCreated migrations/1612340785_convertMarkdownArticles.ts
And basically copy the previous migration, just replacing the name of the model (from html_article
to markdown_article
), and the call to htmlToStructuredText
with a call to markdownToStructuredText
:
// ./migrations/1612340785_convertMarkdownArticles.tsimport getModelIdsByApiKey from './utils/getModelIdsByApiKey';import createStructuredTextFieldFrom from './utils/createStructuredTextFieldFrom';import markdownToStructuredText from './utils/markdownToStructuredText';import convertImgsToBlocks from './utils/convertImgsToBlocks';import getAllRecords from './utils/getAllRecords';import swapFields from './utils/swapFields';import { Client, SimpleSchemaTypes } from '@datocms/cli/lib/cma-client-node';type MdArticleType = SimpleSchemaTypes.Item & {title: string;content: string;};export default async function (client: Client) {const modelIds = await getModelIdsByApiKey(client);await createStructuredTextFieldFrom(client, 'markdown_article', 'content', [modelIds.image_block.id,]);const records = (await getAllRecords(client,'markdown_article',)) as MdArticleType[];for (const record of records) {const structuredTextContent = await markdownToStructuredText(record.content,convertImgsToBlocks(client, modelIds),);await client.items.update(record.id, {structured_text_content: structuredTextContent,});if (record.meta.status !== 'draft') {await client.items.publish(record.id);}}await swapFields(client, 'markdown_article', 'content');}
We can now run the new migration inside the sandbox environment we already created for the first migration:
> datocms migrations:run --source=with-structured-text --in-place✔ Running 1612340785_convertMarkdownArticles.ts...Done!
To migrate Modular Content fields into Structured Text fields, we must acknowledge the fact that both fields allow nested record blocks: the difference between the two is that Modular Content is basically an array of record blocks, while in Structed Text record blocks are inside the dast
tree in nodes of type block
. In other words, our task here is, for every modular content, to transform an array of block records into a single dast
document. It's up to us to decide how to convert each block we encounter into one/many nodes into our dast
document.
Let's take a look at the project schema again:
The existing Modular Content field supports three block types:
Text (which in turn contains a text
Markdown field);
Code (which has two fields, one that contains the actual code and another that stores the language);
Image (which, as we already know, it contains a single-asset field called image
).
Here's the code for our migration:
// ./migrations/1612340785_convertModularArticles.tsimport { Document, Node, validate } from 'datocms-structured-text-utils';import getModelIdsByApiKey from './utils/getModelIdsByApiKey';import createStructuredTextFieldFrom from './utils/createStructuredTextFieldFrom';import getAllRecords from './utils/getAllRecords';import swapFields from './utils/swapFields';import markdownToStructuredText from './utils/markdownToStructuredText';import convertImgsToBlocks from './utils/convertImgsToBlocks';import { Client, SimpleSchemaTypes } from '@datocms/cli/lib/cma-client-node';type ModularArticleType = SimpleSchemaTypes.Item & {title: string;content: any;};export default async function (client: Client) {const modelIds = await getModelIdsByApiKey(client);await createStructuredTextFieldFrom(client,'modular_content_article','content',[modelIds.image_block.id, modelIds.text_block.id, modelIds.code_block.id],);const records = (await getAllRecords(client,'modular_content_article',)) as ModularArticleType[];for (const record of records) {const rootNode = {type: 'root',children: [] as Node[],};for (const block of record.content) {switch (block.relationships.item_type.id) {case modelIds.text_block.id: {const markdownSt = await markdownToStructuredText(block.text,convertImgsToBlocks(client, modelIds),);if (markdownSt) {rootNode.children = [...rootNode.children,...markdownSt.document.children,];}break;}case modelIds.code_block.id: {rootNode.children.push({type: 'code',language: block.language,code: block.code,});break;}default: {delete block.id;delete block.meta;delete block.createdAt;delete block.updatedAt;rootNode.children.push({type: 'block',item: block,});break;}}}const result = {schema: 'dast',document: rootNode,} as Document;const validationResult = validate(result);if (!validationResult.valid) {throw new Error(validationResult.message);}await client.items.update(record.id, {structured_text_content: result,});if (record.meta.status !== 'draft') {await client.items.publish(record.id);}}await swapFields(client, 'modular_content_article', 'content');}
Every time we need to convert a Modular Content field, we start by creating an empty Dast root
node (that is, one with no children, line 33).
Then, for every block contained in the modular content (line 38), we're going to accumulate children inside the root
node:
If it is a Text block (line 40), we use the markdownToStructuredText
function to convert its Markdown content into a Dast tree, then take the children of the resulting root
node and add them to our accumulator;
Since Dast supports nodes of type code
, if we encounter a Code block (line 55), we simply convert it to code
node, and add it to the accumulator;
If we find an Image block (line 63), we'll wrap the block into a Dast block
node, and add it to the accumulator as it is.
Once you get to know the Structured Text format, it becomes quite straightforward converting from/to its Dast tree representation of nodes, and the DatoCMS API, coupled with migrations/sandbox environments, makes it easy to perform any kind of treatment to your content.
You can download the final code from this Github repo.