Using constants and pseudofields as data placeholders in the Drupal migration process pipeline

Submitted by dinarcon on Mon, 08/05/2019 - 23:00

So far we have learned how to write basic Drupal migrations and use process plugins to transform data to meet the format expected by the destination. In the previous entry we learned one of many approaches to migrating images. In today’s example, we will change it a bit to introduce two new migration concepts: constants and pseudofields. Both can be used as data placeholders in the migration timeline. Along with other process plugins, they allow you to build dynamic values that can be used as part of the migrate process pipeline.

Syntax for constants and pseudofields in the Drupal process migration pipeline

Setting and using constants

In the Migrate API, a constant is an arbitrary value that can be used later in the process pipeline. They are set as direct children of  the source section. You write a `constants` key whose value is a list of name-value pairs. Even though they are defined in the source section, they are independent of the particular source plugin in use. The following code snippet shows a generalization for settings and using constants:

Loading gist https://gist.github.com/dinarcon/6c4b5bdc5dae5455bd9641a05e3701a4

You can set as many constants as you need. Although not required by the API, it is a common convention to write the constant names in all uppercase and using underscores (_) to separate words. The value can be set to anything you need to use later. In the example above, there are strings, integers, decimals, and arrays. To use a constant in the process section you type its name, just like any other column provided by the source plugin. Note that you use the constant you need to name the full hierarchy under the source section. That is, the word `constants` and the name itself separated by a slash (/) symbol. They can be used to copy their value directly to the destination or as part of any process plugin configuration.

Technical note: The word `constants` for storing the values in the source section is not special. You can use any word you want as long as it does not collide with another configuration key of your particular source plugin. A reason to use a different name is that your source actually contains a column named `constants`. In that case you could use `defaults` or something else. The one restriction is that whatever value you use, you have to use it in the process section to refer to any constant. For example:

Loading gist https://gist.github.com/dinarcon/6c4b5bdc5dae5455bd9641a05e3701a4

Setting and using pseudofields

Similar to constants, pseudofields stores arbitrary values for use later in the process pipeline. There are some key differences. Pseudofields are set in the process section. The name is arbitrary as long as it does not conflict with a property name or field name of the destination. The value can be set to a verbatim copy from the source (a column or a constant) or they can use process plugins for data transformations. The following code snippet shows a generalization for settings and using pseudofields:

Loading gist https://gist.github.com/dinarcon/6c4b5bdc5dae5455bd9641a05e3701a4

In the above example, `my_pseudofield_1` is set to the result of a `concat` process transformation that joins a constant and a column from the source section. The result value is later used as part of a `urlencode` process transformation. Note that to use the value from `my_pseudofield_1` you have to enclose it in quotes (') and prepend an at sign (@) to the name. The new value obtained from URL encode operation is stored in `my_pseudofield_2`. This last pseudofield is used to set the value of the `uri` subfield for `field_link`. The example could be simplified, for example, by using a single pseudofield and chaining process plugins. It is presented that way to demonstrate that a pseudofield could be used as direct assignments or as part of process plugin configuration values.

Technical note: If the name of the subfield can be arbitrary, how can you prevent name clashes with destination property names and field names? You might have to look at the source for the entity and the configuration of the bundle. In the case of a node migration, look at the baseFieldDefinitions() method of the Node class for a list of property names. Be mindful of class inheritance and method overriding. For a list of fields and their machine names, look at the “Manage fields” section of the content type you are migrating into. The Field API prefixes any field created via the administration interface with the string `field_`. This reduces the likelihood of name clashes. Other than these two name restrictions, anything else can be used. In this case, the Migrate API will eventually perform an entity save operation which will discard the pseudofields.

Understanding Drupal Migrate API process pipeline

The migrate process pipeline is a mechanism by which the value of any destination property, field, or pseudofield that has been set can be used by anything defined later in the process section. The fact that using a pseudofield requires to enclose its name in quotes and prepend an at sign is actually a requirement of the process pipeline. Let’s see some examples using a node migration:

  • To use the `title` property of the node entity, you would write `@title`
  • To use the `field_body` field of the `Basic page` content type, you would write `@field_body`
  • To use the `my_temp_value` pseudofield, you would write `@my_temp_value`

In the process pipeline, these values can be used just like constants and columns from the source. The only restriction is that they need to be set before being used. For those familiar with the "rewrite results" feature of Views, it follows the same idea. You have access to everything defined previously. Anytime you use enclose a name in quotes and prepend it with an at sign, you are telling the migrate API to look for that element in the process section instead of the source section.

Migrating images using the image_import plugin

Let’s practice the concepts of constants, pseudofields, and the migrate process pipeline by modifying the example of the previous entry. The Migrate Files module provides another process plugin named `image_import` that allows you to directly set all the subfield values in the plugin configuration itself.

As in previous examples, we will create a new module and write a migration definition file to perform the migration. It is assumed that Drupal was installed using the `standard` installation profile. The code snippets will be compact to focus on particular elements of the migration. The full code is available at https://github.com/dinarcon/ud_migrations The module name is `UD Migration constants and pseudofields` and its machine name is `ud_migrations_constants_pseudofields`. The `id` of the example migration is `udm_constants_pseudofields`. Refer to this article for instructions on how to enable the module and run the migration. Make sure to download and enable the Migrate Files module. Otherwise, you will get an error like: “In DiscoveryTrait.php line 53: The "image_import" plugin does not exist. Valid plugin IDs for Drupal\migrate\Plugin\MigratePluginManager are:...”. Let’s see part of the source definition:

Loading gist https://gist.github.com/dinarcon/6c4b5bdc5dae5455bd9641a05e3701a4

Only one record is presented to keep snippet short, but more exist. In addition to having a unique identifier, each record includes a name, a short profile, and details about the image. Note that this time, the `photo_url` does not provide an absolute URL. Instead, it is a relative path from the domain hosting the images. In this example, the domain is `https://agaric.coop` so that value is stored in the BASE_URL constant which is later used to assemble a valid absolute URL to the image. Also, there is no photo description, but one can be created by concatenating some strings. The PHOTO_DESCRIPTION_PREFIX constant stores the prefix to add to the name to create a photo description. Now, let’s see the process definition:

Loading gist https://gist.github.com/dinarcon/6c4b5bdc5dae5455bd9641a05e3701a4

The `title` node property is set directly to the value of the `name` column from the source. Then, two pseudofields. `psf_image_url` stores a valid absolute URL to the image using the BASE_URL constant and the `photo_url` column from the source. `psf_image_description` uses the PHOTO_DESCRIPTION_PREFIX constant and the `name` column from the source to store a description for the image.

For the `field_image` field, the `image_import` process plugin is used. This time, the subfields are not set manually like in the previous article. The absence of the `id_only` configuration key, allows you to assign values to subfields simply by configuring the `image_import` plugin. The URL to the image is set in the `source` key and uses the `psf_image_url` pseudofield. The `alt` key allows you to set the alternative attribute for the image and in this case the `psf_image_description` pseudofield is used. For the `title` subfield sets the text of a subfield with the same name and in this case it is assigned the value of the `title` node property which was set at the beginning of the process pipeline. Remember that not only psedufields are available. Finally, the `width` and `height` configuration uses the columns from the source to set the values of the corresponding subfields.

What did you learn in today’s blog post? Did you know you can define constants in your source as data placeholders for use in the process section? Were you aware that pseudofields can be created in the process section to store intermediary data for process definitions that come next? Have you ever wondered what is the migration process pipeline and how it works? Please share your answers in the comments. Also, I would be grateful if you shared this blog post with your colleagues.

This blog post series is made possible thanks to these generous sponsors. Contact us if your organization would like to support this documentation project, whether the migration series or other topics.

Tags

Add new comment

Plain text

  • No HTML tags allowed.
  • Lines and paragraphs break automatically.
  • Web page addresses and email addresses turn into links automatically.