Introduction to paragraphs migrations in Drupal

Submitted by dinarcon on Wed, 08/14/2019 - 23:00

Today we will present an introduction to paragraphs migrations in Drupal. The example consists of migrating paragraphs of one type, then connecting the migrated paragraphs to nodes. A separate image migration is included to demonstrate how they are different. At the end, we will talk about behavior that deletes paragraphs when the host entity is deleted. Let’s get started.

Example mapping for paragraph reference field

Getting the code

You can get the full code example at https://github.com/dinarcon/ud_migrations The module to enable is `UD paragraphs migration introduction` whose machine name is `ud_migrations_paragraph_intro`. It comes with three migrations: `ud_migrations_paragraph_intro_paragraph`, `ud_migrations_paragraph_intro_image`, and `ud_migrations_paragraph_intro_node`. One content type, one paragraph type, and four fields will be created when the module is installed.

Note: Configuration placed in a module’s `config/install` directory will be copied to Drupal’s active configuration. And if those files have a `dependencies/enforced/module` key, the configuration will be removed when the listed modules are uninstalled. That is how the content type, the paragraph type, and the fields are automatically created and deleted.

You can get the Paragraph module is using composer: `composer require drupal/paragraphs`. This will also download its dependency: the Entity Reference Revisions module. If your Drupal site is not composer-based, you can get the code for both modules manually.

Understanding the example set up

The example code creates one paragraph type named UD book paragraph (`ud_book_paragraph`). It has two “Text (plain)” fields: Title (`field_ud_book_paragraph_title`) and Author (`field_ud_book_paragraph_author`). A new UD Paragraphs (`ud_paragraphs`) content type is also created. This has two fields: Image (`field_ud_image`) and Favorite book (`field_ud_favorite_book`) containing references to images and book paragraphs imported in separate migrations. The words in parenthesis represent the machine names of the different elements.

The paragraph migration

Migrating into a paragraph type is very similar to migrating into a content type. You specify the source, process the fields making any required transformation, and set the destination entity and bundle. The following code snippet shows the source, process, and destination sections:

Loading gist https://gist.github.com/dinarcon/7608b3afd5db0793a66b1ca6d94e5dd8

The most important part of a paragraph migration is setting the destination plugin to `entity_reference_revisions:paragraph`. This plugin is actually provided by the Entity Reference Revisions module. It is very important to note that paragraphs entities are revisioned. This means that when you want to create a reference to them, you need to provide two IDs: `target_id` and `target_revision_id`. Regular entity reference fields like files, images, and taxonomy terms only require the `target_id`. This will be further explained with the node migration.

The other configuration that you can optionally set in the destination section is `default_bundle`. The value will be the machine name of the paragraph type you are migrating into. You can do this when all the paragraphs for a particular migration definition file will be of the same type. If that is not the case, you can leave out the `default_bundle` configuration and add a mapping for the `type` entity property in the process section.

You can execute the paragraph migration with this command: `drush migrate:import
ud_migrations_paragraph_intro_paragraph`. After running the migration, there is not much you can do to verify that it worked. Contrary to other entities, there is no user interface, available out of the box, that lists all paragraphs in the system. One way to verify if the migration worked is to manually create a View that shows paragraphs. Another way is to query the database directly. You can inspect the tables that store the paragraph fields’ data. In this example, the tables would be:

  • `paragraph__field_ud_book_paragraph_author` for the current author.
  • `paragraph__field_ud_book_paragraph_title` for the current title.
  • `paragraph_r__8c3a9563ac` for all the author revisions.
  • `paragraph_r__3fa7e9863a` for all the title revisions.

Each of those tables contains information about the bundle (paragraph type), the entity id, the revision id, and the migrated field value. Table names are derived from the machine names of the fields. If they are too long, the field name will be hashed to produce a shorter table name. Having to query the database is not ideal. Unfortunately, the options available to check if a paragraph migration worked are limited at the moment.

The node migration

The node migration will serve as the host for both referenced entities: images and paragraphs. The image migration is very similar to the one explained in a previous article. This time, the focus will be the paragraph migration. Both of them are set as dependencies of the node migration, so they need to be executed in advance. The following snippet shows how the source, destinations, and dependencies are set:

Loading gist https://gist.github.com/dinarcon/7608b3afd5db0793a66b1ca6d94e5dd8

Note that `photo_file` and `book_ref` both contain the unique identifier of records in the image and paragraph migrations, respectively. These can be used with the `migration_lookup` plugin to map the reference fields in the nodes to be migrated. `ud_paragraphs` is the machine name of the target content type.

The mapping of the image reference field follows the same pattern than the one explained in the article on migration dependencies. Using the `migration_lookup` plugin, you indicate which is the migration that should be searched for the images. You also specify which source column contains the unique identifiers that match those in the image migration. This operation will return a single value: the file ID (`fid`) of the image. This value can be assigned to the `target_id` subfield of `field_ud_image` to establish the relationship. The following code snippet shows how to do it:

Loading gist https://gist.github.com/dinarcon/7608b3afd5db0793a66b1ca6d94e5dd8

Paragraph field mappings

Before diving into the paragraph field mapping, let’s think about what needs to be done. Paragraphs are revisioned entities. To make a reference to them, you need two IDs: their entity id and their entity revision id. These two values need to be assigned to two subfields of the paragraph reference field: `target_id` and `target_revision_id` respectively. You have to come up with a process pipeline that complies with this requirement. There are many ways to do it, and the specifics will depend on your field configuration. In this example, the paragraph reference field allows an unlimited number of paragraphs to be associated, but only of one type: `ud_book_paragraph`. Another thing to note is that even though the field allows you to add as many paragraphs as you want, the example migrates exactly one paragraph.

With those considerations in mind, the mapping of the paragraph field will be a two step process. First, use the `migration_lookup` plugin to get a reference to the paragraph. Second, use the fetched values to set the paragraph reference subfields. The following code snippet shows how to do it:

Loading gist https://gist.github.com/dinarcon/7608b3afd5db0793a66b1ca6d94e5dd8

The first step is a normal `migration_lookup` procedure. The important difference is that instead of getting a single value, like with images, the paragraph lookup operation will return an array of two values. The format is like `[3, 7]` where the `3` represents the entity id and the `7` represents the entity revision id of the paragraph. Note that the array keys are not named. To access those values, you would use the index of the elements starting with zero (0). This will be important later. The returned array is stored in the `pseudo_mbe_book_paragraph` pseudofield.

The second step is to set the `target_id` and `target_revision_id` subfields. In this example, `field_ud_favorite_book` is the machine name paragraph reference field. Remember that it is configured to accept an arbitrary number of paragraphs, and each will require passing an array of two elements. This means you need to process an array of arrays. To do that, you use the `sub_process` plugin to iterate over an array of paragraph references. In this example, the structure to iterate over would be like this:

Loading gist https://gist.github.com/dinarcon/7608b3afd5db0793a66b1ca6d94e5dd8

Let’s dissect how to do the mapping of the paragraph reference field. The `source` configuration of the `sub_process` plugin contains an array of paragraph references. In the example, that array has a single element: the `'@pseudo_mbe_book_paragraph'` pseudofield. The quotes (') and at sign (@) are required to reuse an element that appears before in the process pipeline. Then, in the `process` configuration, you set the subfields for the paragraph reference field. It is worth noting that at this point you are iterating over a list of paragraph references, even if that list contains only one element. If you had more than one paragraph to migrate, whatever you defined in `process` will apply to all of them.

The `process` configuration is an array of subfield mappings. The left side of the assignment is the name of the subfield you want to set. The right side of the assignment is an array index of the paragraph reference being processed. Remember that this array does not have named-keys so, you use their numerical index to refer to them. The example sets the `target_id` subfield to the element in the `0` index and the `target_revision_id` subfield to the element in the one `1` index. Using the example data, this would be `target_id: 3` and `target_revision_id: 7`. The quotes around the numerical indexes are important. If not used, the migration will not find the indexes and the paragraphs will not be associated. The end result of this operation will be something like this:

Loading gist https://gist.github.com/dinarcon/7608b3afd5db0793a66b1ca6d94e5dd8

There are three ways to run the migrations: manually, executing dependencies, and using tags. The following code snippet shows the three options:

Loading gist https://gist.github.com/dinarcon/7608b3afd5db0793a66b1ca6d94e5dd8

And that is one way to map paragraph reference fields. In the end, all you have to do is set the `target_id` and `target_revision_id` subfields. The process pipeline that gets you to that point can vary depending on how your paragraphs are configured. The following is a non-exhaustive list of things to consider when migrating paragraphs:

  • How many paragraphs types can be referenced?
  • How many paragraphs instances are being migrated? Is this a multivalue field?
  • Do paragraphs have translations?
  • Do paragraphs have revisions?

Do migrated paragraphs disappear upon node rollback?

Paragraphs migrations are affected by a particular behavior of revisioned entities. If the host entity is deleted, and the paragraphs do not have translations, the whole paragraph gets deleted. That means that deleting a node will make the referenced paragraphs’ data to be removed. How does this affect your migration workflow? If the migration of the host entity is rollback, then the paragraphs will be removed, the migrate API will not know about it. In this example, if you run a migrate status command after rolling back the node migration, you will see that the paragraph migration indicated that there are no pending elements to process. The file migration for the images will report the same, but in that case, the images will remain on the system.

In any migration project, it is common that you do rollback operations to test new field mappings or fix errors. Thus, chances are very high that you will stumble upon this behavior. Thanks to Damien McKenna for helping me understand this behavior and tracking it to the rollback() method of the `EntityReferenceRevisions` destination plugin. So, what do you do to recover the deleted paragraphs? You have to rollback both migrations: node and paragraph. And then, you have to import the two again. The following snippet shows how to do it:

Loading gist https://gist.github.com/dinarcon/7608b3afd5db0793a66b1ca6d94e5dd8

What did you learn in today’s blog post? Have you migrated paragraphs before? If so, what challenges have you found? Did you know paragraph reference fields require two subfields to be set? Did you that deleting the host entity also deletes referenced paragraphs? Please share your answers in the comments. Also, I would be grateful if you shared this blog post with others.

This blog post series is made possible thanks to these generous sponsors. Contact us if your organization would like to support this documentation project, whether it is the migration series or other topics.

Tags

Comments

Jay (not verified)

Tue, 08/20/2019 - 10:06

I've seen a lot of articles explain paragraph migration, but yours seems better than most... thanks.
One thing not discussed (or I missed it)... what happen to multiple paragraphs per node. How does the delta get assigned? Is there a way to map that so that the paragraphs stay in order?

Hi Jay, thanks for your comment.

The example already supports multiple paragraphs for the same reference field. In the `sub_process` plugin, you can list many paragraphs in the `source` configuration array. The plugin will iterate over all of them and assign deltas in the order in which they are listed. Another option is to set deltas manually. This is mentioned in the article on migrating taxonomy terms. https://understanddrupal.com/articles/migrating-taxonomy-terms-and-multivalue-fields-drupal

Add new comment

Plain text

  • No HTML tags allowed.
  • Lines and paragraphs break automatically.
  • Web page addresses and email addresses turn into links automatically.