Matt White

Matt White

developer

Matt White

developer

| blog
| categories
| tags
| rss

SQLAlchemy + FactoryBoy: Passing arbitrary sessions to factories

In work projects in the past, my team would try to avoid dealing with the complexity of SQLAlchemy database sessions by making one global session that every module referenced. It made things easy and straightforward and it played nice with FactoryBoy - whose factories seem to work quite well under those conditions.

Don't Look At Me - Patrick Hendry

Going into building lobit.io, I tried to do things the same way, but as the code base and the number of unit tests grew, I kept facing what seemed to be an unquashable avalanche of OperationalErrors griping about “Too many connections”.

I implemented an overhaul of the codebase, wrapping every piece of ORM code in a separate session and creating a DbUtils class to handle sessions consistently without too much boilerplate:

from contextlib import contextmanager
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy.orm.session import Session
from config import config


class DbUtils:
    Session = sessionmaker(bind=create_engine(config.db_connection_url))

    @classmethod
    @contextmanager
    def session_scope(cls) -> Session:
        """Provide a transactional scope around a series of operations."""
        session = cls.Session()
        try:
            yield session
            session.commit()
        except:
            session.rollback()
            raise
        finally:
            session.close()

Fixing the test factories

When it came to unit tests, I was at a loss for how to best handle all of the factories I was using. They were all tied into the global session. Furthermore, factories took in the session at the time of class creation, not instantiation, e.g.

import factory
from models.post import Post
from tests.factories.post_thread_factory import PostThreadFactory


class PostFactory(factory.alchemy.SQLAlchemyModelFactory):
    class Meta:
        model = Post
        sqlalchemy_session_persistence = "commit"
        sqlalchemy_session = session

    text = factory.Faker("text", max_nb_chars=100)
    external_id = factory.Faker("text", max_nb_chars=50)
    post_thread = factory.SubFactory(PostThreadFactory)

Wrapping factory class creation in a function did the trick:

import factory
from models.post import Post
from tests.factories.post_thread_factory import PostThreadFactory


def PostFactory(session):
    class _PostFactory(factory.alchemy.SQLAlchemyModelFactory):
        class Meta:
            model = Post
            sqlalchemy_session_persistence = "commit"
            sqlalchemy_session = session

        text = factory.Faker("text", max_nb_chars=100)
        external_id = factory.Faker("text", max_nb_chars=50)
        post_thread = factory.SubFactory(PostThreadFactory(session))

    return _PostFactory

While simple wrapping worked, it made my tests look God-awful. Manually managing sessions had already added boilerplate overhead to every unit test that hit the database.

Before refactoring, I could create a post using PostFactory.create(). Now it was PostFactory(session).create(). Most of the unit tests hit multiple factories several times each, so it was also less efficient to keep creating the same classes over and over.

I took to creating a single instance of each required factory at the top of each session:

    def test_post_stuff(self):
        with DbUtils.session_scope() as session:
            post_factory = PostFactory(session)
            post_thread_factory = PostThreadFactory(session)
            ...

Cleaning it up

A given test could have nearly a dozen factories defined at the top - which looked terrible even if it didn’t trigger a too-many-locals warning from Pylint (which it almost always did).

So I made one more change and hired a manager to keep all the factories running smoothly.

from tests.factories.post_factory import PostFactory
from tests.factories.post_thread_factory import PostThreadFactory
...

class FactoryManager:
    known_factories = [
        PostFactory,
        PostThreadFactory,
        ...
    ]

    def __init__(self, session):
        self.session = session

        for factory_func in self.known_factories:
            setattr(self, factory_func.__name__, factory_func(self.session))

The FactoryManager didn’t require a lot of code, but let me leave my tests largely unchanged. Instead of initializing several objects at the top of each test, I could initialize one.

    def test_post_stuff(self):
        with DbUtils.session_scope() as session:
            fm = FactoryManager(session)

Now I could prefix all factory calls with fm. and get something almost as clean as before. At least, fm.PostFactory.create() looks better to me than PostFactory(session).create() or littering each test with a bunch of locals.

Learn by doing.