Automatic Downcasting of Inherited Models in Django

I’ve been working with non-abstract model inheritance in Django and one of the problems I’ve run into is that what you get back from querying the superclass is superclass instances when generally what you want is subclass instances. Of course you can query the subclasses directly, but in cases where you want to operate on several different kinds of subclasses at once, querying the superclass is the obvious choice. If you need an introduction on model inheritance (or just a refresher) check out this article by Charles Leifer.

There are a few solutions out there such as Carl Meyer’s django-model-utils and django-polymorphic-models. These didn’t quite work as I wanted. I definitely don’t want to downcast individual instances, causing n additional queries for n objects. Also, although django-model-utils offers a cast method that does avoid a query per instance, it didn’t feel right to me. For one thing, it returns a list, not a QuerySet which means you lose lazy evaluation which is critical when working with large datasets. There are a few other differences that I will get into later, but for now let’s get back to the task at hand.

For this to work, we need our superclass to be able to find its subclasses automatically. Luckily, since non-abstract inheritance is handled by a OneToOneField, our superclass already knows about its subclasses, since an attribute is added for each subclass by the OneToOne. Let’s see what this looks like.

Say we have these models:

class Base(models.Model):
    name = models.CharField(max_length=255)
   
class SubA(Base):
    sub_field_a = models.CharField(max_length=255)
    
class SubB(Base):
    sub_field_b = models.CharField(max_length=255)

Then in the shell, let’s create some objects and see what’s happening in the database as we access the subclasses.

>>> from django.db import connection
>>> SubA.objects.create(name='Sub A Test', sub_field_a="A")
<SubA: SubA object>
>>> SubB.objects.create(name='Sub B Test', sub_field_b="B")
<SubB: SubB object>
>>> connection.queries = [] # clear the queries.
>>> base_objs = Base.objects.all()
>>> base_objs = list(base_objs) #Let's force it to evaluate.
>>> base_objs[0]
<Base: Base object>
>>> base_objs[0].suba
<SubA: SubA object
>>>> base_objs[1].subb
<SubB: SubB object>
>>> connection.queries
[{'sql': 'SELECT "testing_base"."id", "testing_base"."name" FROM "testing_base"',
  'time': '0.001'},
 {'sql': 'SELECT "testing_base"."id", "testing_base"."name", "testing_suba"."base_ptr_id", "testing_suba"."sub_field_a" FROM "testing_suba" INNER JOIN "testing_base" ON ("testing_suba"."base_ptr_id" = "testing_base"."id") WHERE "testing_suba"."base_ptr_id" = 1 ',
  'time': '0.000'},
 {'sql': 'SELECT "testing_base"."id", "testing_base"."name", "testing_subb"."base_ptr_id", "testing_subb"."sub_field_b" FROM "testing_subb" INNER JOIN "testing_base" ON ("testing_subb"."base_ptr_id" = "testing_base"."id") WHERE "testing_subb"."base_ptr_id" = 2 ',
  'time': '0.000'}]

We can see that although we can access the subclass instance directly from the superclass instance it has to make another trip to the database to get each subclass. All it’s using to do this is a join, which could’ve easily been done on the first query.

Enter select_related(), which will tell the queryset to go ahead and get the related information in the first query. Let’s do the same thing, but use select_related first.

>>> base_objs = Base.objects.select_related('suba','subb').all()
>>> base_objs = list(base_objs) #Let's force it to evaluate.
>>> base_objs[0].suba
<SubA: SubA object>
>>> base_objs[1].subb
<SubA: SubB object>
>>> connection.queries
[{'sql': 'SELECT "testing_base"."id", "testing_base"."name", "testing_suba"."base_ptr_id", "testing_suba"."sub_field_a", "testing_subb"."base_ptr_id", "testing_subb"."sub_field_b" FROM "testing_base" LEFT OUTER JOIN "testing_suba" ON ("testing_base"."id" = "testing_suba"."base_ptr_id") LEFT OUTER JOIN "testing_subb" ON ("testing_base"."id" = "testing_subb"."base_ptr_id")',
  'time': '0.001'}]

Now we see that we were able to access each different subclass with only one query. So this takes care of the meat of the problem, now we just need to make it more convenient.

To this end, I have three goals:

  • You should be able to specify which subclasses to automatically downcast but you should not be required to do so.
  • You should get subclass instances automatically
  • This mixed QuerySet should be filterable, clonable, and in all respects interchangeable for a standard QuerySet.

To achieve the first goal, we need to introspect the model a bit to find the subclasses. We can find all the OneToOne relationships to the model that could be from subclasses: they will be instances of a SingleRelatedObjectDescriptor. We can then filter through those further to ensure that they relate to a subclass.

For the second goal, we need to override the iterator method on QuerySet. Since each superclass instance will now have a prepopulated attribute for its appropriate subclass we just need to iterate through the subclasses and return instead the one that happens to be non null (falling back of course to just returning the superclass if no subclass instances exist).

Finally, for the third goal, we really just need to make sure we haven’t broken anything. Since we’ve subclassed QuerySet and only made very slight modifications, we should have no problems. The only thing we need to do is override _clone to pass along our extra information about subclasses.

from django.db.models.fields.related import SingleRelatedObjectDescriptor
from django.db.models.query import QuerySet

class InheritanceQuerySet(QuerySet):
    def select_subclasses(self, *subclasses):
        if not subclasses:
            subclasses = [o for o in dir(self.model)
                          if isinstance(getattr(self.model, o), SingleRelatedObjectDescriptor)\
                          and issubclass(getattr(self.model,o).related.model, self.model)]
        new_qs = self.select_related(*subclasses)
        new_qs.subclasses = subclasses
        return new_qs

    def _clone(self, klass=None, setup=False, **kwargs):
        try:
            kwargs.update({'subclasses': self.subclasses})
        except AttributeError:
            pass
        return super(InheritanceQuerySet, self)._clone(klass, setup, **kwargs)
        
    def iterator(self):
        iter = super(InheritanceQuerySet, self).iterator()
        if getattr(self, 'subclasses', False):
            for obj in iter:
                obj = [getattr(obj, s) for s in self.subclasses if getattr(obj, s)] or [obj]
                yield obj[0]
        else:
            for obj in iter:
                yield obj

Let’s take a look at how this works.

>>> qs = InheritanceQuerySet(model=Base)
>>> qs
[<Base: Base object>, <Base: Base object>]
>>> qs.select_subclasses()
[<SubA: SubA object>, <SubB: SubB object>]
>>> qs.select_subclasses('suba')
[<SubA: SubA object>, <Base: Base object>]
>>> qs.select_subclasses('subb').exclude(name__icontains="a")
[<SubA: SubB object>]

By default the InheritanceQuerySet works the same as a regular QuerySet, but if you call select_subclasses (the same way you’d call select_related), you get the subclasses. You can specify a subset of the subclasses you wish to automatically get or if you do not specify, you’ll get all of them. You can filter, exclude, or do any other QuerySet operations before or after calling select_subclasses. This behavior satisfies all of the desired goals.

One drawback of this approach is that it will only handle one level of inheritance. Some of the other tools out there do handle longer inheritance chains, and it would certainly be possible to modify this to do that as well, but honestly, I’ve never had occasion to do more than one level of non-abstract model inheritance and I struggle to imagine a case where that would be the optimal approach.

A word about performance

I was curious what the impact to performance would be when joining to perhaps several different tables with lots of rows. I decided to run some simulations to get a feel for the effect. This seemed like a good time to compare with some of the existing tools as well. I selected django-model-utils since it offers two approaches to downcasting: a cast() call on a model instance and a cast() call on a QuerySet. I assume that django-polymorphic-models will perform similarly to the first approach emplyed by django-model-utils.

So here’s the plan. We’ll reuse our same models from earlier and see how performance looks when trying to fetch the first 100 objects as we increase the total number of objects in the database. I’ve also included as a baseline a superclass query that doesn’t fetch the subclasses at all.


Obviously, the QuerySet cast() call is the clear loser here. That method must fully evaluate the queryset to yield subclass instances so you lose lazy evaluation. Let’s drop that method from the picture and see what’s happening with the other players:


We can see that the impact of individual queries is immediate and while it does rise with the number of rows in the database that rise more or less mirrors the increase for the standard query. This makes sense given that it does the same base query and then does 100 straight PK lookups. The select_subclasses query has much better performance with less data but degrades at a greater rate. I do wonder what could be done database-tuning-wise to improve performance for a query like this, but that’s a blogpost for another day.

Something else to consider is that if you’re working with a very large queryset then you’ll probably be using pagination or slicing if you’re trying to get a small subset of results, so let’s see how these methods perform when we use slicing (which will use the LIMIT clause in the SQL if the QuerySet is unevaluated).

Letting the database handle limiting the number of rows returned has a huge effect on performance. In this case, individual queries no longer make any sense since all the other methods are now dealing with so many fewer rows. An interesting note is the excellent performance of the QuerySet cast() method. Apparently joins are just that expensive.

Overall, the select_subclasses query seems like a reasonable approach. The performance hit over the standard query is significant but not overwhelming, especially if you don’t have lots and lots data. Also, while performance of the QuerySet call() method is excellent in the slicing case, the fact that you lose lazy evaluation is troubling. This means that the last thing you do with the queryset must be to call cast(). For something like pagination you would have to call the cast() method AFTER paginating, probably inside your template. This feels wrong to me, I’d rather not have to think about queries involving subclasses any differently than other queries. Another concern I had with django-model-utils is that it automatically adds a field to any model you want to use it with. This means you can’t just bolt it on without db modifications and it means that it will do two additional queries each time you create an object.

The performance for individual cast() calls might seem appealing with huge amounts of data but proper use of slicing and pagination eliminate its advantage.

Obviously, it’s all about your specific problem. If you need killer performance, have lots of data, and don’t mind having to work around making the cast() call as your last step, then the django-model-utils QuerySet.cast() method is an excellent choice. If you rarely need access to subclasses, but you want it to be convenient when you do, then the individual cast() call or django-polymorphic-models is right for you. I believe the select_subclasses approach fills a niche as well: a manageable hit to performance, no database modification required, and a convenient and familiar interface that doesn’t affect behavior unless you need it to.

At the end of the day, the fact is that if you need subclass instances, you’re going to have to request more data from the database which causes a performance hit. To me this is comparable to select_related: by default additional data requires another trip to the database, but if you know you’re going to need it you can get it ahead of time more efficiently. That’s why I modeled select_subclasses after select_related.

About these ads

26 Responses to “Automatic Downcasting of Inherited Models in Django”


  1. 1 Carl Meyer November 22, 2010 at 7:48 pm

    Hey Jeff, nice snippet! I’ve been meaning to get around to implementing this for django-model-utils as well (see https://github.com/carljm/django-model-utils/blob/master/TODO.rst), just hadn’t gotten around to it yet. When InheritanceCastModel was originally written (pre-1.2), this approach wasn’t possible as the ORM wouldn’t let you select_related on reverse O2O relations.

    Are you planning to package this up anywhere pip-installable? If not, I’d be happy to include it in django-model-utils, with full attribution of course.

  2. 4 Yuji March 21, 2011 at 8:44 pm

    Awesome! Thanks for this :)

  3. 5 Karol Jochelson July 2, 2011 at 12:34 pm

    Thank you for the hard work, this util really comes in handy, I just have one problem. Everything works perfectly when you use a queryset to extract the data, but what happens when you use the get() method on the parent object and want the child instance returned? Maybe I am missing something but I do not want to extract the first record from a queryset each time I want to use the get method. Let me use an example.

    class Base(models.Model):
    name = models.CharField(max_length=255)

    class SubA(Base):
    sub_field_a = models.CharField(max_length=255)

    class SubB(Base):
    sub_field_b = models.CharField(max_length=255)

    Everything is fine if I do Base.objects.all().select_subclasses() because I get the instances of each model.

    But I would like to do Base.objects.get(pk = ?).select_subclasses() so that it returns the instance of the child class.

    Any help would be appreciated. :)

    • 6 Jeff Elmore July 2, 2011 at 4:03 pm

      The key here is to call select_subclasses() first. Basically, select_subclasses returns a clone of the queryset that will return subclass instances instead of base class instances. So if you do Base.objects.select_subclasses().get(pk = ?), it’ll work like you want.

      I’ve added an explicit test of this in my branch of django-model-utils on BitBucket. Thanks for making me think to do this.

    • 7 Karol Jochelson July 2, 2011 at 4:09 pm

      Thank you so much !!! :):):) It works like a charm and completes my entire solution I was working on. Once again thank you!!

  4. 8 ken July 10, 2011 at 9:20 pm

    This is a fantastic utility. My only issue is within running the shell via manage.py and being told that global name ‘SingleRelatedObjectDescriptor’ is not defined

  5. 10 Eamonn Faherty (@eamonnfaherty) September 5, 2011 at 11:46 am

    Thank you for your work on this. I think it is really useful!!

    I am having an issue with either my app design or my implementation of this.

    I have the following models

    class User(models.Model):
    stream = models.ManyToManyField(Stream)

    class Stream(models.Model):
    activity = models.ForeignKey(Activity)

    class Activity(models.Model):
    pass

    class TextActivity(models.Model):
    text = models.TextField()

    class TwoTextActivity(models.Model):
    text_one = models.TextField()
    text_two = models.TextField()

    I would like to get the activities for a user, as their subtypes. Is this possible using your InheritanceQuerySet?

    • 11 Jeff Elmore September 20, 2011 at 8:25 pm

      First of all, my apologies for the delay in responding.

      So, I’m assuming that TextActivity and TwoTextActivity are meant to inherit from Activity. I’m not sure exactly how you’d do this easily with the explicit foreign key situation you’ve got going, but I have done something very similar directly on many-to-many relationships. Here is some untested code. Please let me know if this doesn’t work or you have any questions. Also, if you’re using this in an actual project, you should really install django-model-utils and use the version in that.


      class User(models.Model):
      activities = models.ManyToManyField(Activity, through=Stream)

      class ActivityManager(models.Manager):
      use_for_related_fields = True
      def get_query_set(self):
      return InheritanceQuerySet(self.model).select_subclasses()

      class TextActivity(models.Model):
      objects = ActivitiyManager()
      ...
      class TwoTextActivity(models.Model):
      objects = ActivitiyManager()
      ...

      You should then get subclasses from this:

      >>> u = User.objects.get(id=x)
      >>> u.activities.all()
      [TextActivity, TwoTextActivity]

  6. 12 Jimmy Henderickx October 17, 2011 at 12:33 pm

    Was looking for this. Thanks for this solution! I love the custom manager function!

  7. 13 Facundo Gaich December 24, 2011 at 12:13 am

    Hi Jeff,

    I was just using the version of InheritanceQuerySet that comes with django-model-utils 1.0.0 and I found out another drawback: The iterator breaks annotations, because it doesn’t “pull them down”.

    I don’t know if theres a way to extract the new fields just by looking at the object returned by QuerySet.iterator(). I guess one could pass them explicitly in the select_subclasses() call and then use getattr and setattr to add them manually.

    Well, just something I wanted to add, sorry for not giving a concrete solution. Keep up the good work.

    P.S.: Nice blog!

  8. 18 Lucian January 24, 2012 at 2:14 pm

    Thank you very much, InheritanceManager is just great. Using it we have 79% less queries on one of our pages.

  9. 19 Michael February 23, 2012 at 7:44 am

    Is there any way to use this to kind of fake a ModelAdmin with two models (that both inherit from the same parent).

    Thinking music here:

    class AbstractWork(modelsModel):

    class Work(modelsModel):

    class Movement(modelsModel):

    my ultimate goal is to have a ModelAdmin that registers abstract work, but shows the form for the subclasses when one is chosen in the changelist.

    Is that possible?

    Then next trick would then be to have an add button for each of the subclasses

    Thanks,

    Mike

  10. 20 Michael February 23, 2012 at 8:46 am

    These below code to have worked somehow. Now I need to figure out how to add multiple add buttons based on the different subclasses.

    class AbstractWorkAdmin(admin.ModelAdmin):
    def save_model(self, request, obj, form, change):
    obj.save()
    if not change:
    obj.users.add(request.user)

    def queryset(self, request):
    qs = super(AbstractWorkAdmin, self).queryset(request)
    if request.user.is_superuser:
    return qs.select_subclasses()
    return qs.filter(users=request.user)

    def get_form(self, request, obj=None, **kwargs):
    if request.user.is_superuser:
    kwargs['exclude'] = ['score']
    else:
    kwargs['exclude'] = ['users', 'groups']
    return admin.site._registry[type(obj)].get_form(request, obj, **kwargs)

    admin.site.register(AbstractWork, AbstractWorkAdmin)
    admin.site.register(Work)
    admin.site.register(Movement)

  11. 21 Michael February 24, 2012 at 6:34 am

    yet another update – got it all to work it looks like (with some template tweaks)… I am sure there is a way to abstract this more, but I am tired and pretty new (like one month) to django — cheers!

    class AbstractWorkAdmin(admin.ModelAdmin):
    def save_model(self, request, obj, form, change):
    obj.save()
    if not change:
    obj.users.add(request.user)

    def queryset(self, request):
    qs = super(AbstractWorkAdmin, self).queryset(request)
    if request.user.is_superuser:
    return qs.select_subclasses()
    return qs.select_subclasses().filter(users=request.user)

    def get_form(self, request, obj=None, **kwargs):
    if request.user.is_superuser:
    kwargs['exclude'] = ['score']
    else:
    kwargs['exclude'] = ['users', 'groups']

    if request.path == ‘/admin/thewulfCMS/abstractwork/addwork/':
    obj = Work()
    elif request.path == ‘/admin/thewulfCMS/abstractwork/addmovement/':
    obj = Movement()
    return admin.site._registry[type(obj)].get_form(request, obj, **kwargs)

    def add_view(self, request, form_url=”, extra_context=None):
    if not extra_context:
    extra_context = {}

    extra_context['is_abstract'] = True
    if request.path == ‘/admin/thewulfCMS/abstractwork/addwork/':
    extra_context['type'] = ‘work’
    self.inlines=[MovementsInline,]
    for inline_class in self.inlines:
    inline_instance = inline_class(Work, self.admin_site)
    self.inline_instances.append(inline_instance)
    elif request.path == ‘/admin/thewulfCMS/abstractwork/addmovement/':
    extra_context['type'] = ‘movement’
    return super(AbstractWorkAdmin, self).add_view(request, form_url, extra_context)

    def changelist_view(self, request, extra_context=None):
    if not extra_context:
    extra_context = {}

    extra_context['is_abstract'] = True
    extra_context['children'] = {‘work’,’movement’}
    return super(AbstractWorkAdmin, self).changelist_view(request, extra_context)

    def get_urls(self):
    urls = super(AbstractWorkAdmin, self).get_urls()
    my_urls = patterns(”,
    (r’^addwork/$’, self.admin_site.admin_view(self.add_view)),
    (r’^addmovement/$’, self.admin_site.admin_view(self.add_view))
    )
    return my_urls + urls

  12. 22 Karol Jochelson July 17, 2012 at 11:03 am

    Hi, one more thing would come in real handy, getting the subclass instance of an object when calling it as a foreign key. Meaning:

    Class A(models.Model):
    objects = InheritanceManager()

    Class A1(A):
    …..

    Class A2(A):
    ……

    class B(models.Model):
    a = models.ForeignKey(A)

    Now when I try to do this:
    for obj in B.objects,all():
    print obj.a <– Does not return the subclass of A

    This is causing me a lot of headaches as I always have to pull all queries using the base class and now when I have two different sets of inheritance models my problems only grew :)

    I know I could write a method in my base class that returns the subclass instance but that would require an additional query to the database for each record. So I don't know if that is the right way to go.

    Thanks for your help.

  13. 23 Diederik van der Boor May 20, 2013 at 4:17 pm

    Hi Jeff, nice to see some performance graphs on this subject! For reasons unknown to me, it seems people are missing out on django-polymorphic (https://github.com/chrisglass/django_polymorphic) which also supports automatic downcasting. Would you be able to tell how this performs compared to other solutions?

    • 24 Jeff Elmore May 22, 2013 at 3:36 pm

      Thanks for your comment.

      I’ve taken a quick look at django_polymorphic. It sounds like it’s using an approach not dissimilar from my own, employing inner joins to gather all data in one query rather than re-querying N times for N objects (as some systems do).

      So I suspect it would get comparable performance.

  14. 25 Antonio V October 28, 2013 at 12:54 pm

    I am working with different kind of users using AbstractBaseUser.

    class BaseUser(AbstractBaseUser):
    email = models.EmailField(max_length=254, unique=True)

    class Service_provider(BaseUser):
    company = models.CharField(max_length=140);

    def __unicode__(self):
    return self.company

    class Customer(BaseUser):
    name = models.CharField(max_length=140);

    But I don’t understand how to filter these kind of user in the authentication step

    if user is not None:
    auth.login(request, user)
    if user.name == Customers :
    return HttpResponseRedirect(‘/loggedincustomer/’)
    elif user.company == Service_provider :
    return HttpResponseRedirect(‘/loggedin/’)
    else:
    return HttpResponseRedirect(‘/invalid/’)

    • 26 Jeff Elmore November 14, 2013 at 6:53 pm

      For you to be able to use this code, you’d need to do concrete inheritance rather than abstract inheritance.

      If you did concrete inheritance, you’d be able to check whether a user was a certain type by using isinstance to see which class the returned object is an instance of.

      Not saying this is the best way to do it, as you will pay a penalty for having all those joins.

      You could instead have a “UserType” property on your model and perform different behaviors based on that. Basically, if the users are MOSTLY the same but need different behavior in a few small ways, this is probably a smarter way to do it than using multiple models.

      Hope that helps!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




Twitter-feed

  • Doing a study of morphology of words in COCA. In the random sample: ozzfest, orgasms, rockports... I love language. 4 days ago
  • @Springcoil I've been working on some projects where the machine produces a rough first draft that a human revises. Still big time savings 5 days ago
  • @Springcoil great example. Particularly in cases where humans and machines work together machine performance can be lower but it still helps 5 days ago
  • Good-enough AI has the potential to be as revolutionary as the internet. Doesn't have to be Turing-test level to radically change our lives 5 days ago
  • It's really amazing to watch performance on some AI tasks like object recognition finally get good. 5 days ago

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: