sfPropelLazyHydrationIteratorPlugin — Lazy Hydration, Made Concise

I’ve recently been working on Symfony / Propel projects that deal with particularly large data sets. In such cases, Propel’s documentation recommends a “lazy hydration” approach.

This “lazy hydration” of Propel’s looks like this:

[#!php]
// query all the author entities as a Creole ResultSet
$rs = AuthorPeer::doSelectRS( new Criteria() );

while( $rs->next() )
{
  $author = new Author();
  $author->hydrate( $rs );
  echo "{$author->getLastName()}, {$author->getFirstName()}";
}

The code above does the following:

1. it queries the database for all records in the author table, and loads it into a Creole ResultSet object.
2. enters a loop, iterating over each result (ie, table row) in the ResultSet, and with each of its records, it:
1. creates a new, empty Author object
2. hydrates the empty object with the data in the ResultSet‘s current row
3. writes the author’s last and first name to the output buffer

In this way, only one Author instance is in use at any given time. Each iteration discards the previous instance and creates a new one.

But I hate it — it’s ugly and unwieldy.

This overt use of the ResultSet object is an awkward practice when using an ORM. The primary design goal of ORMs is to allow the developer to work at a higher level of abstraction than SQL queries and database result sets.

So what should it look like?

Assuming you’re not worried the query’s results might chew up all system’s RAM, the Propel API lets you get the job done like this:

[#!php]
foreach( AuthorPeer::doSelect( new Criteria() ) as $author )
{
  echo "{$author->getLastName()}, {$author->getFirstName()}";
}

For starters, this is a more concise block of code.

Secondly, this block of code focuses more directly on classes that are directly relevant to your application’s business logic.

Unfortunately, for all the theoretical loveliness it espouses, it consumes considerably more RAM since the data from each record of author table is loaded into unique instances of Author objects, which are all dumped into an array.

If the database only has ten authors, you won’t have any problems finding the RAM to instantiate all ten Author objects.

If you have tens of thousands of author records, on the other hand, it might be challenging enough to allocate enough RAM to store the entirety of the raw data generated by the database query, let alone to load that data into tens of thousands of Author object instances.

Propel’s “lazy hydration” approach sacrifices a little concision in return for some resource efficiency.

But I was writing many dozens of blocks of code like this, and seeing what is essentially the same lines of code dutifully repeated all over the place started to wear on me. I was even nesting these iterations, winding up with variable scopes that simultaneously contained variables named $authors_rs, $books_rs, and $whatever_else_rs.

I got fed up, so I decided to fix it.

I had started thinking about some other Symfony classes that managed to abstract Propel API quirks, and found inspiration in the sfPropelPager class. I decided to encapsulate all that into the sfPropelLazyHydrationIterator class, which implements PHP’s Iterator interface to provide the best of both worlds.

Let’s start simply:

[#!php 1]
// create a sfPropelLazyHydrationIterator
$authors = new sfPropelLazyHydrationIterator('Author', new Criteria());

foreach( $authors as $author )
{
  echo "{$author->getLastName()}, {$author->getFirstName()}";
}

The constructor requires two arguments: the name of the Propel model class, and a Criteria object. Internally, the object queries the database to retrieve a ResultSet. Using this object in a foreach() loop iterates through the database results using the same lazy hydration technique we looked at earlier.

Let’s take a look at a slightly more complex scenario, which uses a Propel 1.2 Connection for an in-transaction query:

[#!php 1]
// let's add a transaction to the mix
$con = Propel::getConnection();
$con->begin();

try
{
  // do some stuff to author records within the transaction...

  $authors = new sfPropelLazyHydrationIterator( 'Author', new Criteria(), $con );
  foreach( $authors as $author )
  {
    echo sprintf('%s, %s', $author->getFirstName(), $author->getLastName());
    $author->setLastAccessedOn( 'now' );
    $author->save( $con );
  }

  $con->commit();
}
catch (SqlException $sqle)
{
  $con->rollback();
}

But how does this all happen? Let’s look at an abbreviated version of this class:

[#!php 1]
class sfPropelLazyHydrationIterator
implements Iterator
{
  private $modelClassName;
  private $resultSet;

  /**
   * Contstructor.
   *
   * @param string $model_class_name
   * @param Criteria $c
   * @param Connection $con
   * @param string $peer_name
   */
  public function __construct( $model_class_name, $c, $con=null, $peer_name=null )
  {
    $this->modelClassName = $model_class_name;
    $this->modelPeerName  = empty($peer_name) ? $this->modelClassName.'Peer' : $peer_name;
    $this->resultSet      = call_user_func(array($this->modelPeerName, 'doSelectRS'), $c);
  }

  /**
   * Implements Iterator::current()
   */
  public function current()
  {
    if (false !== $this->resultSet->getIterator()->current())
    {
      return $this->createAndHydrateCurrent();
    }

    return false;
  }

  /**
   * Implements Iterator::next()
   */
  public function next()
  {
    if (false !== $this->resultSet->getIterator()->next())
    {
      return $this->createAndHydrateCurrent();
    }

    return false;
  }

  // ... other Iterator methods

  /**
   * Returns a hydrated instance of the appropriate model class, using
   * the row data at the ResultSet's current pointer position.
   *
   * @return mixed
   */
  protected function createAndHydrateCurrent()
  {
    $object = new $this->modelClassName();
    $object->hydrate( $this->resultSet );
    return $object;
  }

}

The real work happens in the constructor and createAndHydrate() method. The latter is called by the Iterator interface implementations of the next() and current() methods (included in the code excerpt above).

This class is available in its entirety as part of the sfPropelLazyHydrationIteratorPlugin, which I have already published to the Symfony plugins site. The plugin is not, at the time of this writing, available as a PEAR package, but I intend to sort that out in the coming days.

Please note: the sfPropelLazyHydrationIterator presently only works with Propel 1.2.

Getting Really Lazy

And finally, let’s surrender fully to the laziness within — let’s look at super concision. We’ll add a new method to the AuthorPeer class:

[#!php 1]
class AuthorPeer extends BaseAuthorPeer
{

  /**
   * Returns a sfPropelLazyHydrationIterator to allow iteration over retrieved
   * database results, utilizing "lazy hydration".
   *
   * @param Criteria $c
   * @param Connection $con
   * @return sfPropelLazyHydrationIterator
   */
  public static function doSelectLazy( $c, $con=null )
  {
    return new sfPropelLazyHydrationIterator('Author', $c, $con);
  }

  // ... other peer methods ...
}

Now, we can iterator over all author records like so:

[#!php 1]
foreach( AuthorPeer::doSelectLazy(new Criteria()) as $author )
{
  echo "{$author->getLastName()}, {$author->getFirstName()}";
}

Whoever said you couldn’t have your cake and eat it obviously wasn’t terribly determined to be as lazy as possible.