This article explains how to connect an external data source to Drupal 7. It requires custom code and leverages existing tools like Views. While challenging, this approach pushes the boundaries of Drupal entities.
We were approached by the Danish company KTP DataOpens in a new tab regarding a project where they needed some help integrating an external data source into a Drupal site. In short, they wanted to load external content as Drupal remote entities, while storing local field data on these entities. The actual requirements were the following:
- Entities are always read live from a web service, no entity data is stored in the Drupal database.
- Entities are fieldable, meaning that the basic entity properties can be complemented with any type of fields available from the Field API, from simple text fields to image galleries and geographic coordinates.
- Entities can be listed using views. That listing should also be loading content live from the web service.
We have many examples of alternatives storage for fields (like storing them in MongoDB for example), and there’s also a good amount of documentation on how to create custom entities that are stored in the database, but I couldn’t find any mention of someone using externally stored entities with locally stored fields. What I did recall, however, was a few discussions from the Fields in Core code sprintOpens in a new tab that took place in Boston the week before Christmas of 2008. There had been a good amount of discussion regarding the flexibility of the Field API, and it had been decided that fieldable entities (I’m not sure the term “entity” had been coined in this context yet) could be remote. So I knew at least that it can be done.
Warning
This article is not meant to be a comprehensive guide to integrating external entities in Drupal, but since I couldn’t find any documentation on the topic I wanted to at least share my findings. Please also note that such a direct integration is not always the best solution and that importing external data into Drupal first rather than loading it on demand can often provide a lot of benefits.
What you need
- A web service which lets you read individual records. Creating, updating and deleting records is also possible but depending on your case might not be available or relevant.
- A web service which lets you list records. The possibility to specify individual fields or filter and sort the result set would, of course, influence what can be done.
- Code to connect to and read from the service. This will be specific to the type of web service you use (REST, SOAP, XML-rpc, etc) and the kind of data being sent. For the sake of simplicity, I’ll assume that you have the relevant code in an include file.
Existing tools
Before we dive into the custom code, it’s probably good to look first at what tools are available. Again, this list is not comprehensive.
The core entity API
Drupal core provides an API for defining custom entities. It also provides a default mechanism for loading entities, but no similar mechanism for creating/updating/deleting entities.
The contrib entity API
The entity module was created to address some of the things missing from the core entity API. It doesn’t define a separate API, but it extends the core API with additional properties, many helpful tools and an improved default controller. The entity module also includes an entity property API to complement the entity API, which makes it possible for example to have direct rules or token integration for any custom entity. Almost all contrib modules that define custom entities make use of the improved API from the entity module, and we will depend on it as well.
EntityFieldQuery and EFQ_Views
The EntityFieldQuery classOpens in a new tab is great for getting a list of entities matching certain criteria, whether they come from entity properties or fields. What’s great about EntityFieldQuery is that it works with different field storage engines. EFQ_ViewsOpens in a new tab is an alternative views backend using EntityFieldQuery instead of a normal SQL query.
Unfortunately for us though, EntityFieldQuery assumes that entity properties are stored in the database, so this is not a solution we can implement directly (this should be different in Drupal 8).
Entity Construction Kit
The Entity Construction Kit (ECK)Opens in a new tab provides a user interface for defining custom entities quickly and easily. However, ECK also assumes that these custom entities will have a base table in the database, so it does not help when creating remote entities.
Custom views query plugin
The views module provides an incredible amount of flexibility, including the possibility to define new data sources that can be queried using a custom views query plugin. There are many good examples, such as the Search APIOpens in a new tab(different website, new window) which makes it possible to create views from search indexes, or the Sparql ViewsOpens in a new tab that make it possible to create views by querying RDFa data. We can create such a custom views query plugin to load data from our web service.
Defining a remote entity
The starting point for defining a custom entity is to implement hook_entity_info() in your own module. This hook must return an array of entity definitions, keyed by the entity name. The various properties that are added in these entity definitions sometimes behave in a slightly different way, and some while some of them are used by the core Entity API, some others are used by the entity module’s Entity API.
/**
* Implements hook_entity_info().
*/
function mymodule_entity_info() {
$return = array(
'my_remote_entity_type' => array(
// First we define some basic information.
'label' => t('My remote entity type'),
'module' => mymodule,
// Then we add some properties that influence how our entities are treated
'entity keys' => array( // These keys are the name of properties of entity objects.
'id' => ‘my_remote_entity_id',
'label' => ‘my_remote_entity_name’,
),
'fieldable' => TRUE, // We want to be able to attach fields to our entity.
'admin ui' => array(
'path' => 'admin/content/ktp',
'controller class' => 'MyRemoteEntityUIController',
),
'base table' => NULL, // We don’t have a base table, since entities are remote.
'bundles' => array(
'my_remote_entity_type' => array( // For the sake of simplicity, we only define one bundle.
'label' => t('My remote entity type'),
'admin' => array(
'path' => 'admin/config/mymodule, // Field configuration pages for our entity will live at this address.
),
),
),
// Finally, we specify what part of our code will be acting on our entities, overriding the defaults. This can be done by specifying callbacks or methods on the entity controller class.
'controller class' => 'MyRemoteEntityController', // This is a class located in a separte include file. We’ll get into more details later on.
'access callback' => 'mymodule_my_remote_entity_access',
'uri callback' => 'mymodule_my_remote_entity_uri',
),
);
return $return;
}
?>
Note that the callbacks you indicate should be functions defined by your module, but they’re no different for remote entities than they are for any other entity so I won’t cover them here. Let’s look at our entity controller instead, since that’s where most of the important stuff happens.
As usual, when defining a custom entity controller, we extend the standard one so that we only need to specify things that work differently. In our case, this concerns loading and saving entities.
class MyRemoteEntityController extends EntityAPIController {
public function load($ids = array(), $conditions = array()) {
$entities = array();
// This method takes an array of IDs, but our webservice only supports loading one entity at a time.
foreach ($ids as $id) {
// This function should contain all the code to make a request to the web service and handle any errors.
if ($entity = remote_web_service_load($id)) {
// Entities must be keyed by entity ID in order for field values to be correctly attached.
$entities[$entity->my_remote_entity_id] = $entity;
}
}
return $entities;
}
public function save($entity) {
// There is nothing to save for the entity itself,
// we just save the fields.
field_attach_presave(‘my_remote_entity_type’, $entity);
field_attach_update(‘my_remote_entity_type’, $entity);
// If some entity properties can be modified, you would save them here.
remote_web_service_save($entity);
// We don’t call parent::save(), because we don’t have anything to save locally.
}
}
?>
A note about entity IDs
As mentioned above in a comment, entity IDs need to be single numeric values if you want your entity to be fieldable. There are cases where the remote system uses an alphanumeric hash used to identify information about published music) or a compound key to identifying entities. For example, many music-related services use MBIDsOpens in a new tab to identify tracks and artists. Some other services use multiple IDs to identify a specific entity, like a combination of company ID and department ID. In such cases, you need to define a map from these external IDs and the internal numeric ID.
Entity properties
We already have indicated some of the main entity properties in hook_entity_info(): id and label. We can have much more properties, which can simply be attributes of our entity objects. However, we get a lot of benefits from formally defining those using hook_entity_property_info().
/**
* Implements hook_entity_property_info().
*/
function mymodule_entity_property_info() {
$info['my_remote_entity_type']['properties'] = array(
‘property_name’ => array(
'label' => t('Property Label'),
'type' => 'integer',
),
‘date_property’ => array(
‘label’ => t(‘A date in a different format’),
‘type’ => ‘date’, // A UNIX timestamp is expected.
‘getter callback’ => ‘_mymodule_convert_date’,
),
…
return $info;
}
?>
This simple hook lets us know what properties are available for entities of our type, what data type they have, and eventually how to set/get them using setter/getter callbacks. The typical example of a module that makes full use of these property definitions is the rules module, but we will see later that they are also very important for us when it comes time to integrate our remote entities with views.
The setter and getter callbacks are particularly interesting when we’re dealing with remote data, as other systems often don’t have the same conventions for storing data types. For example, whereas Drupal uses UNIX timestamps to store dates, many systems use ISO timestamps instead. Setter and getter callbacks are then called automatically when needed.
Views integration
The views module provides an extensible system, where new custom query handler can be defined. This system is well-documented and relatively straightforward so I won’t go into much detail besides the fact that you need your custom query handler to populate the $view->result array with objects that have their respective entity ID and all other necessary properties. What becomes more complicated is when we want to make sure that the data that our custom query plugin retrieves from a web service is correctly identified with our custom entity definition. This step is essential, especially if we want to also have the possibility to integrate locally stored fields to the remote data.
After trying various approaches, the best solution I found was hidden in the EFQ_views module. The common point between the typical use case for EFQ_views (entities in the Drupal database and fields in MongoDB) and our situation (remote entities and fields in the Drupal database) is that entity properties and Field API fields are stored in separate systems. This means that two separate steps are needed to gather all the data.
EFQ_views itself does go through a lot of hoops to break the assumption that everything is stored in the database, but it does so in a clean way by relying on the definition of entities and entity metadata. This is where it all comes back together and we realize that formally defining all our entity properties was worth it.
The main point of interest here is in hook_views_data(). By copying EFQ_views’ implementation, all we had to do was the following:
- Use our own prefix instead of “efq_”, since we don’t want to have a namespace conflict.
- Rather than go through all entity types, we only define views data for our own entity types.
- Set the “query class” attribute to use our own query handler instead of “efq_query”
- Fix a couple of places that still assume that entities are stored in the database. These resulted in a few warnings and notices, but the code was already functional.
With this system in place, we were able to create views with lists of rendered entities as well as views mixing local and remote fields. The latter respected all fields formatters and had display options for entity properties that matched their data type.
The only point where some additional “data massaging” was needed was for some more complex display plugins such as the leaflet module that were expecting raw views results in a specific structure specific to fields being joined to the entity base table rather than queried separately. Such cases can be dealt with on a case-by-case basis using hook_views_pre_render().
Conclusion
Under Drupal 7, the creation of such remote entities still requires a lot of knowledge of the Entity API and especially of its limitations. Unfortunately, there are still many parts of Drupal that are built on the assumption that basic entity metadata is stored in the database.
If you are interested in what future solutions to this problem will look like under Drupal 8, I would strongly advise keeping up with the progress reports on Karoly Negyesi’s BlogOpens in a new tab. The work being done on decoupling Drupal from a relational database has the very positive side effect of resulting in a much more consistent and stable system, which will also make a cleaner implementation of remote entities possible.
While not exactly painless, this project made it possible to push a little bit more at the boundaries of what can be done with entities. I would like to thank KTP DataOpens in a new tab for this interesting challenge.