How to debug Drupal migrations? - Part 2

Submitted by dinarcon on Wed, 08/28/2019 - 23:00

In the previous article we began talking about debugging Drupal migrations. We gave some recommendations of things to do before diving deep into debugging. We also introduced the `log` process plugin. Today, we are going to show how to use the Migrate Devel module and the `debug` process plugin. Then we will give some guidelines on using a real debugger like XDebug. Next, we will share tips so you get used to migration errors. Finally, we are going to briefly talk about the `migrate:fields-source` Drush command. Let’s get started.

Example configuration for debug process plugin

The migrate_devel module

The Migrate Devel module is very helpful for debugging migrations. It allows you to visualize the data as it is received from the source, the result of field transformation in the process pipeline, and values that are stored in the destination. It works by adding extra options to Drush commands. When these options are used, you will see more output in the terminal with details on how rows are being processed.

As of this writing, you will need to apply a patch to use this module. Migrate Devel was originally written for Drush 8 which is still supported, but no longer recommended. Instead, you should use at least version 9 of Drush. Between 8 and 9 there were major changes in Drush internals.  Commands need to be updated to work with the new version. Unfortunately, the Migrate Devel module is not fully compatible with Drush 9 yet. Most of the benefits listed in the project page have not been ported. For instance, automatically reverting the migrations and applying the changes to the migration files is not yet available. The partial support is still useful and to get it you need to apply the patch from this issue. If you are using the Drush commands provided by Migrate Plus, you will also want to apply this patch. If you are using the Drupal composer template, you can add this to your composer.json to apply both patches:

Loading gist https://gist.github.com/dinarcon/f0d262f78657d36591db1dc9a07ee2f2

With the patchs applied and the modules installed, you will get two new command line options for the `migrate:import` command: `--migrate-debug` and `--migrate-debug-pre`. The major difference between them is that the latter runs before the destination is saved. Therefore, `--migrate-debug-pre` does not provide debug information of the destination.

Using any of the flags will produce a lot of debug information for each row being processed. Many time sanalyzing a subset of the records is enough to stop potential issues. The patch to Migrate Tools will allow you to use the `--limit` and `--idlist` options with the `migrate:import` command to limit the number of elements to process.

To demonstrate the output generated by the module, let’s use the image migration from the CSV source example. You can get the code at https://github.com/dinarcon/ud_migrations. The following snippets how to execute the import command with the extra debugging options and the resulting output:

Loading gist https://gist.github.com/dinarcon/f0d262f78657d36591db1dc9a07ee2f2

Loading gist https://gist.github.com/dinarcon/f0d262f78657d36591db1dc9a07ee2f2

In the terminal you can see the data as it is passed along in the Migrate API. In the `$Source`, you can see how the source plugin was configured and the different columns for the row being processed. In the `$Destination`, you can see all the fields that were mapped in the process section and their values after executing all the process plugin transformation. In `$DestinationIDValues`, you can see the unique identifier of the destination entity that was created. This migration created an image so the destination array has only one element: the file ID (`fid`). For paragraphs, which are revisioned entities, you will get two values: the `id` and the `revision_id`. The following snippet shows the `$Destination` and  `$DestinationIDValues` sections for the paragraph migration in the same example module:

Loading gist https://gist.github.com/dinarcon/f0d262f78657d36591db1dc9a07ee2f2

The debug process plugin

The Migrate Devel module also provides a new process plugin called `debug`. The plugin works by printing the value it receives to the terminal. As Benji Fisher explains in this issue, the `debug` plugin offers the following advantages over the `log` plugin provided by the core Migrate API:

  • The use of `print_r()` handles both arrays and scalar values gracefully.
  • It is easy to differentiate debugging code that should be removed from logging plugin configuration that should stay.
  • It saves time as there is no need to run the `migrate:messages` command to read the logged values.

In short, you can use the `debug` plugin in place of `log`. There is a particular case where using `debug` is really useful. If used in between of a process plugin chain, you can see how elements are being transformed in each step. The following snippet shows an example of this setup and the output it produces:

Loading gist https://gist.github.com/dinarcon/f0d262f78657d36591db1dc9a07ee2f2

Loading gist https://gist.github.com/dinarcon/f0d262f78657d36591db1dc9a07ee2f2

The process pipeline is part of the node migration from the entity_generate plugin example. In the code snippet, a `debug` step is added after each plugin in the chain. That way, you can verify that the transformations are happening as expected. In the last step you get an array of the taxonomy term IDs (`tid`) that will be associated to the `field_tags` field. Note that this plugin accepts two optional parameters:

  • `label` is a string to print before the debug output. It can be used to give context of what is being printed.
  • `multiple` is a boolean that when set to `true` signals the next plugin in the pipeline to process each element of an array individually. The functionality is similar to the `multiple_values` plugin provided by Migrate Plus.

Using the right tool for the job: a debugger

Many migration issues can be solved by following the recommendations from the previous article and the tools provided by Migrate Devel. But there are problems so complex that you need a full blown debugger. The many layers of abstraction in Drupal, and the fact that multiple modules might be involved a single migration, makes the use of debuggers very appealing. With them, you can step through each line of code across multiple files and see how each variables changes over time.

In the next article we will explain how to configure XDebug to work with PHPStorm and DrupalVM. For now, let’s consider where are good places to add breakpoints. In this article, Lucas Hedding recommends adding them in:

  • The `import` method of the MigrateExecutable class.
  • The `processRow` method of the MigrateExecutable class.
  • The process plugin if you know which one might be causing an issue. The `transform` method is a good place to set the breakpoint.

The use of a debugger is no guarantee that you will find the solution to your issue. It will depend on many factors including your familiarity with the system and how deep lies the problem. Previous debugging experience, even if not directly related to migrations, will help a lot. Do not get discouraged if it takes you too much time to discover what is causing the problem or if you cannot find it at all. Each time you will get a better understanding of the system.

Adam Globus-Hoenich, a migrate maintainer, once told me that the Migrate API "is impossible to understand for people that are not migrate maintainers." That was after spending about an hour together trying to debug an issue and failing to make it work. I mention this not with the intention to discourage you. But to illustrate that no single person knows everything about the Migrate API and even their maintainers can have a hard time debugging issues. Personally, I have spent countless hours in the debugger tracking how the data flows from the source to the destination entities. It is mind blowing and I barely understand what is going on. The community has come together to produce a fantastic piece of software. Anyone who uses the Migrate API is standing on the shoulders of giants.

If it is not broken, break it on purpose

One of the best ways to reduce the time you spend debugging an issue is having experience with a similar problem. A great way to learn to learn is finding a working example and breaking it on purpose. This will let you get familiar with the requirements and assumptions made by the system and the errors it produces.

Throughout the series, we have created many examples. We have made our best effort to explain how each example work. But we were not able to document every detail in the articles. In part to keep them within a reasonable length. But also, because we do not fully comprehend the system. In any case, we highly encourage you to take the examples and break them in every imaginable way. Making one change at a time, see how the migration behaves and what errors are produced. These are some things to try:

  • Do not leave a space after a colon (:) when setting a configuration option. Example: `id:this_is_going_to_be_fun`.
  • Change the indentation of plugin definitions.
  • Try to use a plugin provided by a contributed module that is not enabled.
  • Do not set a required plugin configuration option.
  • Leave out a full section like source, process, or destination.
  • Mix the upper and lowercase letters in configuration options, variables, pseudofields, etc.
  • Try to convert a migration managed as code to configuration; and vice versa.

The migrate:fields-source Drush command

Before wrapping up the discussion on debugging migrations, let’s quicky cover the `migrate:fields-source` Drush command. It lists all the fields available in the source that can be used later in the process section. Many source plugins require that you manually set the list of fields to fetch from the source. Because of this, the information provided by this command is redundant most of the time. However, it is particularly useful with CSV source migrations. The CSV plugin automatically includes all the columns in the file. Executing this command will let you know which columns are available. For example, running `drush migrate:fields-source udm_csv_source_node` produces the following output in the terminal:

Loading gist https://gist.github.com/dinarcon/f0d262f78657d36591db1dc9a07ee2f2

The migration is part of the CSV source example. By running the command you can see that the file contains four columns. The values under "Machine Name" are the ones you are going to use for field mappings in the process section. The Drush command has a `--format` option that lets you change the format of the output. Execute `drush migrate:fields-source --help` to get a list of valid formats.

What did you learn in today’s blog post? Have you ever used the migrate devel module for debugging purposes? What is your strategy when using a debugger like XDebug? Any debugging tips that have been useful to you? Share your answers in the comments. Also, I would be grateful if you shared this blog post with others.

This blog post series is made possible thanks to these generous sponsors. Contact us if your organization would like to support this documentation project, whether it is the migration series or other topics.

Tags

Comments

Benji Fisher (not verified)

Tue, 09/03/2019 - 12:15

The CLI options added by the migrate_devel module require a patch to work with drush 9, but the debug process plugin provided by that module works OOTB with drush 8 or drush 9.

Add new comment

Plain text

  • No HTML tags allowed.
  • Lines and paragraphs break automatically.
  • Web page addresses and email addresses turn into links automatically.