Understanding the syntax of Drupal migrations

Soumis par dinarcon le mar 07/04/2020 - 09:00

In the 31 days of Drupal migrations series we explained different aspects of the syntax used by the Migrate API. In today’s article, we are going to dive deeper to understand how the API interprets our migration definition files. We will explain how to configure process plugins and set subfields and deltas for multi-value field migrations. We will also talk about process plugin chains, source constants, pseudofields, and the process pipeline. After reading this article, you will better comprehend existing migration definition files and improve your own. Let’s get started.

Drupal migration syntax snippet

Field mappings: process plugin configuration

The Migrate API provides syntactic sugar to make migration definition files more readable. The field mappings under the `process` section are a good example of this. To demonstrate the syntax consider a multi-value Link field to store links to online profiles. The field machine name is `field_online_profiles` and it is configured to accept the URL and the link text. For brevity, only the `process` section will be shown, but it is assumed that the source includes the following columns: `source_drupal_profile`, `source_gitlab_profile`, and `source_github_profile`.

process:
  field_online_profiles: source_drupal_profile

In this case, we are directly assigning the value from `source_drupal_profile` in the `source` to the `field_online_profiles` in the `destination` entity. For now, we are ignoring the fact that the field accepts multiple values. We are setting the link text either, just the URL. Even in this example, the Migrate API is making some assumptions for us. Every field mapping requires at least one `process` plugin to be configured. If none is set, the `get` plugin is assumed. It copies a value from the source to the destination without making any changes. The previous snippet is equivalent to the next one:

process:
  field_online_profiles:
    plugin: get
    source: source_drupal_profile

The `process` plugin configuration options should be placed as direct children of the field that is being mapped. In the previous snippet, `plugin` and `source` are indented one level to the right under `field_online_profiles`. There are many process plugins provided by Drupal core and contributed modules. Their configuration can be generalized as follows:

process:
  destination_field:
    plugin: plugin_name
    config_1: value_1
    config_2: value_2
    config_3: value_3

Check out the article on using process plugins for data transformation for a working example.

Field mappings: setting sub-fields

Let’s expand the example by setting the a value for the Link text in addition to the URL. To accomplish this, we will migrate data into subfields. Fields can store complex data and in many cases they have multiple components. For example, a rich text field has a subfield to store the text value and another for the text format. Address fields have 13 subfields available. Our example uses Link fields which have three subfields:

  • `uri`: The URI of the link.
  • `title`: The link text.
  • `options`: Serialized array of options for the link.

For now, only the `uri` and `title` subfields will be set. This also demonstrates that, depending on the field, it is not necessary to provide values for all the subfields. One more thing we will implement is to include the name of the online profile in the Link text. For example: “Drupal.org profile”.

process:
  destination_field:
    plugin: plugin_name
    config_1: value_1
    config_2: value_2
    config_3: value_3

If you want to set a value for a subfield, you use the `field_name/subfield` syntax. Then, each subfield can define its own mapping. Note that when setting the `uri` we are taking advantage of the `get` plugin considered the default to simplify the value assignment. In the case of `title`, the `default_value` process plugin is used to set a fixed value to comply with our example requirement.

When setting subfields, it is very important to understand what format is expected. You need to make sure the process plugins return data in the expected format or the migration will fail. In particular, you need to know if they return a scalar value or an array. In the case of scalar values, you need to verify if numbers or strings are expected. In the previous example, the `uri` subfield of the Link field expects a string containing the URL. On the other hand, File fields have a `target_id` subfield that expects an integer representing the File ID that is being referenced. Some process plugins might return an array or let you set subfields directly as part of the plugin configuration. For an example of the latter, have a look at the article on migrating images using the image_import plugin. `image_import` lets you set the `alt`, `title`, `width`, and `height` subfields for images directly in the plugin configuration. The following snippets shows a generalization for setting subfields:

process:
  destination_field/subfield_1:
    plugin: plugin_name
    config_1: value_1
    config_2: value_2
  destination_field/subfield_2:
    plugin: plugin_name
    config_1: value_1
    config_2: value_2

If a field can have multiple subfields, how can I know which ones are available? For easy reference, our next blog post will include a list of subfields for different types of fields. To find out by yourself, check out this article that covers available subfields. In summary, you need to locate the class that provides the `FieldType` plugin and inspect its `schema` method. The latter defines the database columns used by the field to store its data. Because of object oriented practices, sometimes you need to look at the parent class to know all the subfields that are available. When migrating into subfields, you are actually migrating into those particular database columns. Any restriction set by the database schema needs to be respected. Link fields are provided by the `LinkItem` class whose `schema` method defines the three subfields we listed before.

If a field can have multiple subfields, how does the Migrate API know which one to set when no one is manually specified? Every Drupal field has at least one subfield. If they have more, the field type itself specifies which one is the default. For easy reference, our next blog post will indicate the default subfield for different types of fields. To find out by yourself, check out this article that covers default subfields. In summary, you need to locate the class that provides the `FieldType` plugin and inspect its `mainPropertyName` method. Its return value will be the default subfield used by the Migrate API. Because of object oriented practices, sometimes you need to look at the parent class to find the method that defines the default subfield. Link fields are provided by the `LinkItem` class whose `mainPropertyName` returns `uri`. That is why in the first example there was no need to specify a subfield to set the value for the link URL.

Field mappings: setting deltas for multi-value fields

Once more, let’s expand the example by setting the populating multiple values for the same field. To accomplish this, we will specify field deltas. A delta is a numeric index starting at 0 and incrementing by 1 for each subsequent element in the multi-value field. Remember that our example assumes that the `source` has the following columns: `source_drupal_profile`, `source_gitlab_profile`, and `source_github_profile`. One way to migrate all of them into the multi-value link field is:

process:
  field_online_profiles/0/uri: source_drupal_profile
  field_online_profiles/0/title:
    plugin: default_value
    default_value: 'Drupal.org profile'
  field_online_profiles/1/uri: source_gitlab_profile
  field_online_profiles/1/title:
    plugin: default_value
    default_value: 'GitLab profile'
  field_online_profiles/2/uri: source_github_profile
  field_online_profiles/2/title:
    plugin: default_value
    default_value: 'GitHub profile'

If you want to set a value for a subfield, you use the `field_name/delta/subfield` syntax. Then, every combination of delta and subfield can define its own mapping. Both `delta` and `subfield` are optionals. If no delta is specified, 0 is assumed which corresponds to the first element of a (multi-value) field. If no `subfield` is specified, the default subfield is assumed as explained before. In the previous example, if there is no need to set the link text the configuration would become:

process:
  field_online_profiles/0: source_drupal_profile
  field_online_profiles/1: source_gitlab_profile
  field_online_profiles/2: source_github_profile

In this example, we wanted to highlight syntax variations that can be used with the Migrate API. Nevertheless, this way of migrating multi-value fields is not very flexible. You are required to know in advance how many deltas you want to migrate. Depending on your particular configurations, you can write complex process pipelines that take into account an unknown number of deltas. Sometimes, writing a custom migration process plugin is easier and/or the only option to accomplish a task. Even if you can write a migration with existing process plugins, that might not be the best solution. When writing migrations, strive for them to be easy to read, understand, and maintain. For reference, the generic configuration for mapping fields with deltas and subfields is:

process:
  destination_field/0/subfield_1:
    plugin: plugin_name
    config_1: value_1
    config_2: value_2
  destination_field/0/subfield_2:
    plugin: plugin_name
    config_1: value_1
    config_2: value_2
  destination_field/1/subfield_1:
    plugin: plugin_name
    config_1: value_1
    config_2: value_2
  destination_field/1/subfield_2:
    plugin: plugin_name
    config_1: value_1
    config_2: value_2

Process plugin chains

So far, for every `field_name/delta/subfield` combination we only have used one process plugin. The Migrate API does not impose any restrictions to the number of transformations that the source data can undergo before being assigned to a destination property or field. You can have as many as needed. Chaining of process plugins works similarly to Unix pipelines in that the output of one process plugin becomes the input of the next one in the chain. When the last plugin in the chain completes its transformation, the return value is assigned. We have covered this topic in greater detail in the article on using process plugins for data transformation. For now, let’s consider an example chain of two process plugins:

process:
  title:
    - plugin: concat
      source:
        - source_first_name
        - source_last_name
      delimiter: ' '
    - plugin: callback
      callable: strtoupper

In this example, we are using the `concat` plugin to glue together the `source_first_name` and `source_last_name`. A space is placed in between as specified by the `delimiter` configuration. The result of this is later passed to the `callback` plugin which executes the `strtoupper` PHP function on the concatenated value effectively making the string uppercase. Because there are no more process plugins in the chain, the string transformed to uppercase is assigned to the `title` destination property. If `source_first_name` is ‘Mauricio’ and `source_last_name` is ‘Dinarte’, then `title` would be set to ‘MAURICIO DINARTE’. Refer to the article mentioned before for other things to consider when manipulating strings. The configuration of process plugin chains can be generalized as follows:

process:
  destination_field:
    - plugin: plugin_name
      source: source_column_name
      config_1: value_1
      config_2: value_2
    - plugin: plugin_name
      config_1: value_1
      config_2: value_2
    - plugin: plugin_name
      config_1: value_1
      config_2: value_2

It is very important to note that only the first process plugin in the chain should set a `source` configuration. Remember that the output of the previous process plugin is the input for the next one. Setting the `source` configuration in subsequent process plugins is unnecessary and can actually make the chain produce unexpected results or fail altogether.

Source constants, pseudofields, and the process pipeline

We have covered source constants, pseudo-fields, and the process pipeline in the article on using data placeholders in the migration process. This time, we are only going to give an overview to explain their syntax. Constants are arbitrary values that can be used later in the process pipeline. They are set as direct children of  the `source` section. Let’s consider this example:

source:
  constant:
    DRUPAL_LINK_TITLE: 'Drupal.org profile'
    GITLAB_LINK_TITLE: 'GitLab profile'
    GITHUB_LINK_TITLE: 'GitHub profile'
process:
  field_online_profiles/0/uri: source_drupal_profile
  field_online_profiles/0/title: constant/DRUPAL_LINK_TITLE
  field_online_profiles/1/uri: source_gitlab_profile
  field_online_profiles/1/title: constant/GITLAB_LINK_TITLE
  field_online_profiles/2/uri: source_github_profile
  field_online_profiles/2/title: constant/GITHUB_LINK_TITLE

To define source constants, you write a `constants` key and set its value to an array of name-value pairs. When you need to refer to them in the `process` section, you use `constant/NAME` and they behave like any other column present in the source. Although not required, it is customary to name constants in uppercase. This makes it easier to distinguish them from regular source columns. Notice how their use makes assigning the link titles simpler. Instead of using the `default_value` plugin, we read the value directly from the source constants.

Pseudofields also store arbitrary values for use later, but they are defined in the `process` section. Their names can be arbitrary as long as they do not conflict with a property name or field name in the destination. The value can be set to a verbatim copy from the source (a column or a constant) or they can use process plugins for data transformations. For the next example, consider that there is no need for the link text to be different among online profiles. Additionally, there is another Link field that can only store one value. This new field is used to store the URL to the primary profile. The example can be rewritten as follows:

source:
  constant:
    LINK_TITLE: 'Online profile'
process:
  pseudo_link_text:
    - plugin: get
      source: constant/LINK_TITLE
    - plugin: callback
      callable: strtoupper
  field_online_profiles/0/uri: source_drupal_profile
  field_online_profiles/0/title: '@pseudo_link_text'
  field_online_profiles/1/uri: source_gitlab_profile
  field_online_profiles/1/title: '@pseudo_link_text'
  field_online_profiles/2/uri: source_github_profile
  field_online_profiles/2/title: '@pseudo_link_text'
  field_primary_profile: '@field_online_profiles/0'

 

A psedofield named `pseudo_link_text` has been created. It has its own process pipeline to provide the link text that will be used for all online profiles. When you want to use the pseudo, you have to enclose it in quotes (') and prepend an at sign (@) to the name. The `pseudo_` prefix in the name is not required. In this case it is used to make it easier to distinguish among pseudofields and regular property or field names.

The previous snippets is also a good example of how the migrate process pipeline works. When setting `field_primary_profile`, we are reusing a value stored in another field: the first delta of `field_online_profiles`. There are many things to note here:

  • The migrate process pipeline lets you reuse anything that has been defined previously in the file. It can be source constants, pseudo fields, or regular destination properties and fields. The only requirement is that whatever you want to use needs to be previously defined in the migration definition file.
  • Source columns are accessed directly by name. Source constants are accessed using the `constant/NAME` syntax.
  • Any element defined in the `process` section can be reused later in the process pipeline by enclosing its name in quotes (') and prepending an at sign (@). This applies to pseudofields and regular destination properties and fields.

When reusing an element in the process pipeline, its whole structure becomes available. In the previous example, we set `field_primary_profile` to `'@field_online_profiles/0'`. This means that all subfields in the first delta of the `field_online_profiles` field will be assigned to `field_primary_profile`. Effectively this means both the `uri` and `title` properties will be set. Be mindful that when you reuse a field, all its delta and subfields are copied along unless specifically restricted. For example, if you only want to reuse the `uri` of the first delta you would use  `'@field_online_profiles/0/uri'`. In none of these scenarios, indicating that you want to reuse something guarantees that it will be stored in the new element assignment. For example, the `field_primary_profile` field only accepts one value. Even if we used `'@field_online_profiles'` to reuse all the deltas of the multi-value field, only the first one will be stored per the field’s (cardinality) definition.

The Migrate API is pretty flexible and you can write very complex process pipelines. The examples we have presented today have been exaggerated to demonstrate many syntax variations. Again, when writing migrations, strive for process pipelines that are easy to read, understand, and maintain.

What did you learn in today’s article? Did you know that it is possible to specify deltas and subfields in field mappings? Were you aware that process plugins can be chained for multiple data transformations? How have you used source constants and psuedofield before? Please share your answers in the comments. Also, we would be grateful if you shared this article with your friends and colleagues.

Étiquettes

Ajouter un commentaire

Texte brut

  • Aucune balise HTML autorisée.
  • Les lignes et les paragraphes vont à la ligne automatiquement.
  • Les adresses de pages web et les adresses courriel se transforment en liens automatiquement.