Migrating addresses into Drupal

Today we will learn how to migrate addresses into Drupal. We are going to use the field provided by the Address module which depends on the third-party library `commerceguys/addressing`. When migrating addresses you need to be careful with the data that Drupal expects. The address components can change per country. The way to store those components also varies per country. These and other important consideration will be explained. Let’s get started.

Example address field process mapping

Getting the code

You can get the full code example at https://github.com/dinarcon/ud_migrations The module to enable is `UD address` whose machine name is `ud_migrations_address`. The migration to execute is `udm_address`. Notice that this migration writes to a content type called `UD Address` and one field: `field_ud_address`. This content type and field will be created when the module is installed. They will also be removed when the module is uninstalled. The demo module itself depends on the following modules: `address` and `migrate`.

Note: Configuration placed in a module’s `config/install` directory will be copied to Drupal’s active configuration. And if those files have a `dependencies/enforced/module` key, the configuration will be removed when the listed modules are uninstalled. That is how the content type and fields are automatically created and deleted.

The recommended way to install the Address module is using composer: `composer require drupal/address`. This will grab the Drupal module and the `commerceguys/addressing` library that it depends on. If your Drupal site is not composer-based, an alternative is to use the Ludwig module. Read this article if you want to learn more about this option. In the example, it is assumed that the module and its dependency were obtained via composer. Also, keep an eye on the Composer Support in Core Initiative as they make progress.

Source and destination sections

The example will migrate three addresses from the following countries: Nicaragua, Germany, and the United States of America (USA). This makes it possible to show how different countries expect different address data. As usual, for any migration you need to understand the source. The following code snippet shows how the source and destination sections are configured:

source:
  plugin: embedded_data
  data_rows:
    - unique_id: 1
      first_name: 'Michele'
      last_name: 'Metts'
      company: 'Agaric LLC'
      city: 'Boston'
      state: 'MA'
      zip: '02111'
      country: 'US'
    - unique_id: 2
      first_name: 'Stefan'
      last_name: 'Freudenberg'
      company: 'Agaric GmbH'
      city: 'Hamburg'
      state: ''
      zip: '21073'
      country: 'DE'
    - unique_id: 3
      first_name: 'Benjamin'
      last_name: 'Melançon'
      company: 'Agaric SA'
      city: 'Managua'
      state: 'Managua'
      zip: ''
      country: 'NI'
  ids:
    unique_id:
      type: integer
destination:
  plugin: 'entity:node'
  default_bundle: ud_address

Note that not every address component is set for all addresses. For example, the Nicaraguan address does not contain a ZIP code. And the German address does not contain a state. Also, the Nicaraguan state is fully spelled out: `Managua`. On the contrary, the USA state is a two letter abbreviation: `MA` for `Massachusetts`. One more thing that might not be apparent is that the USA ZIP code belongs to the state of Massachusetts. All of this is important because the module does validation of addresses. The destination is the custom `ud_address` content type created by the module.

Available subfields

The Address field has 13 subfields available. They can be found in the schema() method of the AddresItem class. Fields are not required to have a one-to-one mapping between their schema and the form widgets used for entering content. This is particularly true for addresses because input elements, labels, and validations change dynamically based on the selected country. The following is a reference list of all subfields for addresses:

  1. `langcode` for language code.
  2. `country_code` for country.
  3. `administrative_area` for administrative area (e.g., state or province).
  4. `locality` for locality (e.g. city).
  5. `dependent_locality` for dependent locality (e.g. neighbourhood).
  6. `postal_code` for postal or ZIP code.
  7. `sorting_code` for sorting code.
  8. `address_line1` for address line 1.
  9. `address_line2` for address line 2.
  10. `organization` for company.
  11. `given_name` for first name.
  12. `additional_name` for middle name.
  13. `family_name` for last name:

Properly describing an address is not trivial. For example, there are discussions to add a third address line component. Check this issue if you need this functionality or would like to participate in the discussion.

Address subfield mappings

In the example, only 9 out of the 13 subfields will be mapped. The following code snippet shows how to do the processing of the address field:

field_ud_address/given_name: first_name
field_ud_address/family_name: last_name
field_ud_address/organization: company
field_ud_address/address_line1:
  plugin: default_value
  default_value: 'It is a secret ;)'
field_ud_address/address_line2:
  plugin: default_value
  default_value: 'Do not tell anyone :)'
field_ud_address/locality: city
field_ud_address/administrative_area: state
field_ud_address/postal_code: zip
field_ud_address/country_code: country

The mapping is relatively simple. You specify a value for each subfield. The tricky part is to know the name of the subfield and the value to store in it. The format for an address component can change among countries. The easiest way to see what components are expected for each country is to create a node for a content type that has an address field. With this example, you can go to `/node/add/ud_address` and try it yourself. For simplicity sake, let’s consider only 3 countries:

  • For USA, citystate, and ZIP code are all required. And for state, you have a specific list form which you need to select from.
  • For Germany, the company is moved above first and last name. The ZIP code label changes to Postal code and it is required. The city is also required. It is not possible to set a state.
  • For Nicaragua, the Postal code is optional. The State label changes to Department. It is required and offers a predefined list to choose from. The city is also required.

Pay very close attention. The available subfields will depend on the country. Also, the form labels change per country or language settings. They do not necessarily match the subfield names. Moreover, the values that you see on the screen might not match what is stored in the database. For example, a Nicaraguan address will store the full department name like `Managua`. On the other hand, a USA address will only store a two-letter code for the state like `MA` for `Massachusetts`.

Something else that is not apparent even from the user interface is data validation. For example, let’s consider that you have a USA address and select `Massachusetts` as the state. Entering the ZIP code `55111` will produce the following error: `Zip code field is not in the right format.` At first glance, the format is correct, a five-digits code. The real problem is that the Address module is validating if that ZIP code is valid for the selected state. It is not valid for Massachusetts. `55111` is a ZIP code for the state of Minnesota which makes the validation fail. Unfortunately, the error message does not indicate that. Nine-digits ZIP codes are accepted as long as they belong to the state that is selected.

Note: If you are upgrading from Drupal 7, the D8 Address module offers a process plugin to upgrade from the D7 Address Field module.

Finding expected values

Values for the same subfield can vary per country. How can you find out which value to use? There are a few ways, but they all require varying levels of technical knowledge or access to resources:

  • You can inspect the source code of the address field widget. When the country and state components are rendered as select input fields (dropdowns), you can have a look at the `value` attribute for the `option` that you want to select. This will contain the two-letter code for countries, the two-letter abbreviations for USA states, and the fully spelled string for Nicaraguan departments.
  • You can use the Devel module. Create a node containing an address. Then use the `devel` tab of the node to inspect how the values are stored. It is not recommended to have the `devel` module in a production site. In fact, do not deploy the code even if the module is not enabled. This approach should only be used in a local development environment. Make sure no module or configuration is committed to the repo nor deployed.
  • You can inspect the database. Look for the records in a table named `node__field_[field_machine_name]`, if migrating nodes. First create some example nodes via the user interface and then query the table. You will see how Drupal stores the values in the database.

If you know a better way, please share it in the comments.

The commerceguys addressing library

With version 8 came many changes in the way Drupal is developed. Now there is an intentional effort to integrate with the greater PHP ecosystem. This involves using already existing libraries and frameworks, like Symfony. But also, making code written for Drupal available as external libraries that could be used by other projects. `commerceguys\addressing` is one example of a library that was made available as an external library. That being said, the Address module also makes use of it.

Explaining how the library works or where its fetches its database is beyond the scope of this article. Refer to the library documentation for more details on the topic. We are only going to point out some things that are relevant for the migration. For example, the ZIP code validation happens at the validatePostalCode() method of the AddressFormatConstraintValidator class. There is no need to know this for a migration project. But the key thing to remember is that the migration can be affected by third-party libraries outside of Drupal core or contributed modules. Another example, is the value for the state subfield. Address module expects a `subdivision` code as listed in one of the files in the `resources/subdivision` directory.

Does the validation really affect the migration? We have already mentioned that the Migrate API bypasses Form API validations. And that is true for address fields as well. You can migrate a USA address with state `Florida` and ZIP code `55111`. Both are invalid because you need to use the two-letter state code `FL` and use a valid ZIP code within the state. Notwithstanding, the migration will not fail in this case. In fact, if you visit the migrated node you will see that Drupal happily shows the address with the data that you entered. The problems arrives when you need to use the address. If you try to edit the node you will see that the state will not be preselected. And if you try to save the node after selecting `Florida` you will get the validation error for the ZIP code.

This validation issues might be hard to track because no error will be thrown by the migration. The recommendation is to migrate a sample combination of countries and address components. Then, manually check if editing a node shows the migrated data for all the subfields. Also check that the address passes Form API validations upon saving. This manual testing can save you a lot of time and money down the road. After all, if you have an ecommerce site, you do not want to be shipping your products to wrong or invalid addresses. 😉

Technical note: The `commerceguys/addressing` library actually follows ISO standards. Particularly, ISO 3166 for country and state codes. It also uses CLDR and Google’s address data. The dataset is stored as part of the library’s code in JSON format.

Migrating country and zone fields

The Address module offer two more fields types: `Country` and `Zone`. Both have only one subfield `value` which is selected by default. For country, you store the two-letter country code. For zone, you store a serialized version of a Zone object.

What did you learn in today’s blog post? Have you migrated address before? Did you know the full list of subcomponents available? Did you know that data expectations change per country? Please share your answers in the comments. Also, I would be grateful if you shared this blog post with others.