Django Optimization Guide: Make Your App Run 10X Faster

Django is a popular and open-source web framework used for building scalable web applications using Python. It is well-known for its clean and pragmatic design that encourages rapid development without need to reinvent the wheel. Django follows the “Don’t Repeat Yourself” (DRY) and “Convention over Configuration” (CoC) principles, which means it encourages developers to write less code by providing sensible defaults and conventions. This helps in maintaining a clean and organized codebase.
This article is a sequel to my Python Optimization Guide:
What is Performance Optimization?
Before we proceed, it is important to know what performance optimization is and what it isn’t.
Performance optimization in Django involves improving the speed and efficiency of your Django web application to ensure that it can handle a large number of requests and provide a responsive user experience. It could include reducing the number of database queries, using efficient query techniques, caching, memory management, code profiling, and so on.
What isn’t Performance Optimization?
One mistake a lot of developers make is optimizing code before identifying actual performance issues. It is essential to profile and benchmark your code to find bottlenecks before investing time in optimization. Focusing on tiny, insignificant optimizations that have minimal impact on overall performance can lead to complex and less maintainable code.
Code performance optimization should be about making strategic improvements to code to improve efficiency, while also making sure that the optimization doesn’t have an insignificant impact on the overall performance of your Django application.
The purpose of this article is to show you some performance optimization techniques and strategies which you can take advantage of to improve the overall performance of your Django application. Let’s jump right in.
Use Database Indexing
Django is well-known as a web framework for building database-driven applications. This means that you would probably need to interact with your database a lot. When working with a large database, creating database indexes could help you retrieve data from your database faster.
By default, databases use a sequential scan to fetch data from a database. This means that it would need to scan all the rows in the database to complete a query. The execution time may not be noticeable in relatively small databases, but for a huge database, it might be. Django provides us with a convenient method of creating indexes in our databases. Let’s look at an example:
class Article(models.Model):
title = models.CharField(max_length=10)
author = models.CharField(max_length=10)
comments = models.CharField(max_length=10)
The above model contains a model called Article
. This model does not use database indexing. Now, I am going to write a test that will insert 100,000 articles into the database, then retrieve 1000 articles from the database table. I will also measure the execution time of the query.
My test.py
file is as follows:
from django.test import TestCase
from core.models import Article
import datetime
class ArticleTestCase(TestCase):
def setUp(self):
start_time = datetime.datetime.now()
articles = []
size = 500
for i in range(100000):
article = Article()
article.title = f"title{i}"
article.author = f"author{i}"
article.comments = f"comment{i}"
articles.append(article)
Article.objects.bulk_create(articles, size)
end_time = datetime.datetime.now()
print(f"Create method execution time: {end_time - start_time}")
def test_lookup(self):
start_time = datetime.datetime.now()
for i in range(50000, 51000):
Article.objects.get(title=f"title{i}")
end_time = datetime.datetime.now()
print(f"Get method execution time: {end_time - start_time}")
Run Django migrations and then run the test:
python manage.py migrate
python manage.py test
The result of the test is as follows:
Found 1 test(s).
Creating test database for alias 'default'...
System check identified no issues (0 silenced).
Create method execution time: 0:00:05.564231
Get method execution time: 0:00:14.865735
.
----------------------------------------------------------------------
Ran 1 test in 20.511s
OK
Destroying test database for alias 'default'...
As you can see above, it took Django 5.56 seconds to insert 100,000 articles into the database and 14.86 seconds to retrieve 1000 articles from the database. Altogether, the test took 20.511 seconds to complete.
Now, let’s take advantage of Django’s convenient database indexing method to create indexes for our database. Modify the Article
model as follows:
class Article(models.Model):
title = models.CharField(max_length=10, db_index=True)
author = models.CharField(max_length=10)
comments = models.CharField(max_length=10)
In the new model, I added a new db_index=True
parameter to the title
field. This simply tell Django to index the title
field so that retrieving articles from the database using the title
field would be a lot faster.
Run Django migrations again, then run your test. My result is as follows:
Found 1 test(s).
Creating test database for alias 'default'...
System check identified no issues (0 silenced).
Create method execution time: 0:00:05.751827
Get method execution time: 0:00:00.795766
.
----------------------------------------------------------------------
Ran 1 test in 6.659s
OK
Destroying test database for alias 'default'...
As you can see from the result above, it took Django 5.75 seconds to insert 100,000 articles into the database and 0.79 seconds to retrieve 1000 articles from the database. Altogether, the test took 6.659s to complete.
According to Django’s documentation, performing a database lookup using a field that has
unique=True
ordb_index=True
(likeid
ortitle
field) will be faster than using a non-indexed field likeauthor
in our Article model above.
Note that id
fields are automatically indexed by Django.
Optimize Your Database Queries
Use select_related()
If your model has a ForeignKey field and you need to access a field in the ForeignKey model, you should use select_related()
to make your database queries faster.
select_related()
returns a queryset that will “follow” foreign-key relationships, selecting additional related-object data when it executes its query. This is a performance booster which results in a single more complex query but means later use of foreign-key relationships won’t require database queries. — Django’s official documentation
To illustrate this, let’s modify our Article model to contain a author
field which is a ForeignKey to a User
model:
class Article(models.Model):
title = models.CharField(max_length=10)
author = models.ForeignKey(User, on_delete=models.CASCADE)
Let’s write a test case that:
- creates an article
- assigns an author to the article
- then gets the username of the author.
The test case would contain two functions. The first function gets the username of the user without select_related()
and the second function makes use of select_related()
. This query would be run 100,000 times (to make the execution time more obvious) and the execution time of both functions would be measured.
My test.py
file is as follows:
from django.test import TestCase
from django.contrib.auth.models import User
from .models import Article
import datetime
class ArticleUsernameRetrievalTestCase(TestCase):
def setUp(self):
start_time = datetime.datetime.now()
user = User.objects.create(username="name", password="pwd")
Article.objects.create(title="Article 1", author=user)
end_time = datetime.datetime.now()
print(f"Create method execution time: {end_time - start_time}")
def test_without_select(self):
start_time = datetime.datetime.now()
for _ in range(100000):
article = Article.objects.get(id=1)
article.author.username
end_time = datetime.datetime.now()
print(f"Without select execution time: {end_time - start_time}")
def test_with_select(self):
start_time = datetime.datetime.now()
for _ in range(100000):
article = Article.objects.select_related("author").get(id=1)
article.author.username
end_time = datetime.datetime.now()
print(f"With select execution time: {end_time - start_time}")
The result of running the test above is as follows:
Found 2 test(s).
Creating test database for alias 'default'...
System check identified no issues (0 silenced).
Create method execution time: 0:00:00
With prefetch execution time: 0:02:17.049069
.Create method execution time: 0:00:00.000975
Without prefetch execution time: 0:03:25.364106
.
----------------------------------------------------------------------
Ran 2 tests in 342.414s
As you can see above, the execution time of the query that didn’t use select_related()
took 3 mins 25 seconds to execute. While the query that used select_related()
took 2 mins 17 seconds. The optimized query has a lesser execution time because in the second test,article.author.username
doesn’t hit the database again — unlike the first test. This is because the author ForeignKey has already been prepopulated in the previous query using select_related()
.
Use prefetch_related()
While select_related()
is used with ForeignKeys, prefetch_related()
is used with ManyToMany fields. Assuming our Article
model has a ManyToMany field called tags
, prefetch_related()
would be used like this:
Article.objects.prefetch_related('tags')
Don’t retrieve data you don’t need
When you make a database query in Django, most times, you don’t need certain fields contained in the query set especially if your model contains a lot of fields. You can avoid loading those irrelevant fields using the defer()
method which is used to optimize database queries by deferring the loading of certain fields from a database until they are explicitly accessed. This can help reduce the amount of data retrieved from the database, improving query performance and reducing the overhead associated with fetching unnecessary data.
Using our Article model, if you just need the author field, you can prevent loading the title
and comments
field like this:
Article.objects.defer("title", "comments")
You can also do the opposite using the only()
method which is used to select certain fields which you want to use. For example, to select only the author
field from the Article model, you can write:
Article.objects.only("author")
Count objects the right way
It’s generally more efficient to use the count()
method instead of the Python built-in len()
function when you want to determine the number of records in a queryset or a database table:
When you use len()
on a queryset, Django first fetches all the records in the queryset from the database into memory, and then Python's len()
function is applied to the resulting list. This means that len()
retrieves all the data from the database, which can be extremely slow and memory-intensive for large datasets. Using count()
the counting is done in the database and the data isn’t fetched and evaluated.
# Using count() (efficient)
num_articles = Article.objects.all().count()
# Using len() (inefficient)
articles = Article.objects.all()
num_articles = len(articles)
Use contains() to check for object in QuerySet
If you want to find out if an object is present in a query set, use the contains()
method instead of an if
condition. The contains()
method returns the object if it is present in the query set:
# Use contains()
Article.objects.contains(article)
# Don't use an if condition
if article in Article.objects.filter():
article
Note: While these query set methods in Django can be powerful tools for optimizing database queries, it's essential to use them judiciously and be aware of their potential downsides. Overusing these methods can lead to unintended consequences and reduced code maintainability.
Don’t order querysets if you don’t care
Adding default ordering to your Django model (using Meta.ordering attribute) can be convenient , but it's important to be cautious and consider the potential downsides. Sorting large query sets by default can be inefficient, especially if you don’t always need the results in that order.
In most cases, it’s better to specify ordering explicitly in your queries using the order_by()
method when you need it. This gives you full control over the sorting behavior for each specific query and makes your code more self-explanatory.
Implement Caching
Caching is a technique used to store and reuse frequently accessed data, reducing the need to repeatedly fetch or calculate it from the original source. The main goal of caching is to improve the speed of data retrieval by temporarily storing a copy of the data in a location that can be accessed more quickly than the original source.
Before you implement caching in your Django application, you need to determine which data should be cached. This could include frequently queried database results, rendered templates, API responses, or static assets.
You should also decide whether you’ll use client-side caching, server-side caching, or both. Server-side caching is more common for Django applications. You can use Django’s in-built caching mechanism or a third-party caching tool like memcached or redis.
Using our Article
model, the code below is function that uses Django’s in-built cache framework, to retrieve articles:
from django.core.cache import cache
from .models import Article
def get_articles():
cached_data = cache.get('articles')
if cached_data is not None:
return cached_data
data = Article.objects.all()
cache.set('articles', data, 3600)
return data
In the code above, I first checked if articles
is cached using cache.get()
. If it's in the cache, return the cached articles. Otherwise, retrieve the articles from the database, store it in the cache using cache.set()
, and then return it. The articles will remain in the cache for 1 hour (3600 seconds).
Note: Caching must be used carefully to ensure that users always receive up-to-date data when needed. Unnecessary caching can lead to serving stale content, especially in frequently changing environments. Also, make sure to first set up caching settings in your project’s settings.py
file.
Write Asynchronous Views
In Python 3.5, async/await was introduced and it allowed for asynchronous programming in Python. Asynchronous programming is a programming paradigm that allows you to write non-blocking and concurrent code in Python. It is particularly useful for I/O-bound tasks, such as network communication, file operations, and database queries, where you want to maximize the efficiency of your program by not waiting for I/O operations to complete.
Django has support for writing asynchronous views however, your server needs to be running under ASGI (Asynchronous Server Gateway Interface) instead of WSGI (Web Server Gateway Interface) in order to get the full benefits of asynchronous Django.
New in Django 4.1, all QuerySet methods that cause an SQL query to occur have an a-prefixed asynchronous variant. — Django’s documentation
The statement above means, you can write the following asynchronous code in Django:
async def my_async_view(request):
article = await Article.objects.aget(id=1)
title = article.title
return JsonResponse({'title': title})
Note that in the above code, we didn’t use the traditional get()
method. Instead, we used its asynchronous variant aget()
that allows us to make asynchronous database queries.
However, if you have an existing code base that makes use of synchronous Django views, you can still make it asynchronous by wrapping it around the sync_to_async()
function or using it as a decorator:
from asgiref.sync import sync_to_async
@sync_to_async
def my_async_view(request):
article = Article.objects.get(id=1)
title = article.title
return JsonResponse({'title': title})
The code above can now run in asynchronous context.
For your Django application to take advantage of asynchronous Python, your application need to be running under ASGI. You can use a web server like uvicorn or daphne to run your Django application in ASGI mode, instead of a web server like gunicorn.
Profile Your Code
Django code profiling involves analyzing the performance of your Django application to identify bottlenecks, slow operations, or resource-intensive parts of the code. Profiling helps you pinpoint areas that need optimization and improvement. The most commonly used Django code profiling tool is Django Debug Toolbar.
Django Debug Toolbar is a powerful debugging and profiling tool that integrates directly into your Django project. It provides detailed information about the time taken to render views, database queries, cache usage, and more.
To use Django Debug Toolbar, you can install it using pip:
pip install django-debug-toolbar
After installation, configure it in your project’s settings, and add the necessary middleware to your project. Once set up, you can access the toolbar in your web application while in debug mode.
Note that Profiling is typically performed in development environments, because it may introduce overhead and security concerns in production.
Conclusion
It is important to note that optimization is not just about shaving off milliseconds or microseconds from your code’s execution time; it’s about writing readable, scalable, and maintainable code.
Throughout this article, we have explored various strategies and techniques for Django code optimization including the do’s and don'ts. We explored database indexing, optimizing our database queries and using built-in query set methods to reduced the amount of queries made to our database. We also looked at how caching can increase the performance of our application and how profiling can help us identify bottlenecks in our Django application. Finally, we explored writing asynchronous views in Django.
By implementing the optimization strategies outlined in this article, you can ensure that your Django applications deliver exceptional user experiences, even under heavy loads.
Thanks for reading.
Connect with me on, GitHub, Twitter, LinkedIn and my website.
References
https://docs.djangoproject.com/en/4.2/ref/models/querysets/
https://docs.djangoproject.com/en/4.2/topics/db/optimization/
In Plain English
Thank you for being a part of our community! Before you go:
- Be sure to clap and follow the writer! 👏
- You can find even more content at PlainEnglish.io 🚀
- Sign up for our free weekly newsletter. 🗞️
- Follow us on Twitter(X), LinkedIn, YouTube, and Discord.