Using Threads in Django

Backend services are comprised of several components, one of which is background tasks. Background tasks allow some processes to be executed ‘behind the scenes’ due to some reasons which may include its execution time or how data-intensive it is among others. There are a lot of use cases for background tasks - which is outside the scope of this discussion. In implementing background tasks in Python, I usually go with queueing libraries like celery; using RabbitMQ as the message broker. With this approach, I can persist the workers’ task results in the database. However, another approach is to explicitly spawn a daemon thread to execute a task. Both approaches have their own advantages and disadvantages.

Let’s explore the thread approach. As opposed to a piece that explains a technical concept in some amount of detail, this post is a follow-along demo on using threads in a simple Django project that allows for bulk book data upload from an excel or csv file. You can navigate the post via the Table of Contents. Let’s dive right into it.

project setup

prerequisites:
- Basic knowledge of Linux commands
- Have Python 3 installed
- Knowledge of Python
- Some knowledge of Django and Django REST
NB: This piece does not explain Django concepts

creating the project structure

Before getting into some code let’s set up the project’s structure and get the basic ‘house-keeping’ stuff out of the way.

The final project structure would look like this:

proj
├── app
│   ├── __init__.py
│   ├── admin.py
│   ├── apps.py
│   ├── migrations
│   │   ├── __init__.py
│   ├── models.py
│   ├── serializer.py
│   ├── tests.py
│   ├── urls.py
│   └── views.py
├── db.sqlite3
├── manage.py
└── proj
    ├── __init__.py
    ├── asgi.py
    ├── settings.py
    ├── urls.py
    └── wsgi.py

Create a new directory and then cd into that directory.

Now we create a Python virtual environment in which all packages installed will be kept using the command: python3 -m venv env.

We then activate the virtual environment and install the required packages using the pip command via the terminal like so:

   source env/bin/activate

   pip install django
   pip install djangorestframework
   pip install numpy
   pip install pandas
   pip install openpyxl

We now create a Django project and an app using the commands below. Replace proj and app with your desired project and app names respectively:

    django-admin startproject proj
    cd proj
    django-admin startapp app

Let’s add the rest framework we installed and the app we created to INSTALLED_APPS in proj/settings.py like so:

   
   INSTALLED_APPS = [
    ... ,

    'rest_framework',
    'app',
]

Now to confirm if our setup is right, we start the server in the proj directory using the command below:

   python manage.py runserver

If the server starts correctly, you’re on the right track! You will see a warning in the terminal saying you have a given number of unapplied migrations, don’t worry about that - we’ll get that sorted shortly.

modeling a book

Since this mini project demos a bulk upload of book data from either a csv or excel sheet to a database, let’s model a book in app/models.py:

   
    # app/models.py
    
    from django.db import models
    # Create your models here.

    class Book(models.Model):
        title = models.CharField(max_length=255)
        author = models.CharField(max_length=100)
        book_type = models.CharField(
            max_length=140,
            choices=(
                ('Sci-tech', 'Sci-tech'),
                ('Magazine', 'Magazine'),
                ('Comic', 'Comic'),
                ('Classic', 'Classic'),
                ('Horror', 'Horror'),
            )
        )
        ISBN = models.CharField(max_length=255)
        year_published = models.CharField(max_length=4)
        price = models.DecimalField(decimal_places=2, max_digits=10)
        no_of_pages = models.PositiveIntegerField(blank=True, null=True)
        quantity = models.PositiveIntegerField()
        description = models.TextField(blank=True, null=True)

        def __str__(self):
            return self.title

To be able to view the model and its data in Django’s admin panel, we register the model in app/admin.py like so:


    # app/admin.py

    from django.contrib import admin
    from .models import Book

    # Register your models here.
    admin.site.register(Book)

creating a serializer

Django REST framework has a concept of serializers, which according to the official docs, “allow complex data such as querysets and model instances to be converted to native Python datatypes that can then be easily rendered into JSON, XML or other content types. Serializers also provide deserialization, allowing parsed data to be converted back into complex types, after first validating the incoming data.” With that explained, let’s create a serializer.py file within the app directory:


    # app/serializer.py
   
   from rest_framework import serializers
   from .models import Book

    class BookSerializer(serializers.ModelSerializer):
        class Meta:
            model = Book
            fields = '__all__' # serialize all fields

That’s it for the serializer.

creating a simple viewset

The threading will be implemented in app/views.py but for now, let’s implement a simple viewset and after that route it to an endpoint to make sure the setup works as expected. So, in app/views.py:


# app/views.py

from django.shortcuts import render
from rest_framework import viewsets, status
from rest_framework.response import Response

# Create your views here.

class UploadViewSet(viewsets.ViewSet):
    def upload(self, request):
        if request.data.get("message") == "healthcheck":
            return Response(
                    {"reponse": "It works!"}, 
                    status=status.HTTP_200_OK
                )

setting up the routes

Next, let’s set up the routes for proj and app. In proj/urls.py we add the url pattern for app in the urlpatterns list as shown below:


# proj/urls.py

from django.contrib import admin
from django.urls import path, include # new import

urlpatterns = [
    path('admin/', admin.site.urls),
    path('api/v1/', include('app.urls')), # include app's url pattern
]

We then create a urls.py file in the app directory - app/urls.py - where we define and register the route(s) for app using Django REST framework’s DefaultRouter.


# app/urls.py

from django.contrib import admin
from django.urls import path, include
from .views import UploadViewSet
from rest_framework import routers

router = routers.DefaultRouter()
router.register('upload', UploadViewSet, basename='Upload')

urlpatterns = [
    path('', include(router.urls)),
    
    path('upload', UploadViewSet.as_view({
        'post': 'upload',
    })),
]

Next, we run the server and test the endpoint.

running the server

While in the proj directory, execute these commands to run the default django migrations and create a super user with your desired credentials.

   
   python manage.py makemigrations
   python manage.py migrate

   python manage.py createsuperuser

Now execute the runserver command as shown below to start the server:

    python manage.py runserver

We then test the endpoint to make sure it is up and running. We send a post request to the endpoint and expect to recieve a response:


testing the endpoint

thread implementation of books upload

Now we implement the upload process using threading in app/views.py and therefore the file is rewritten as shown below:


# app/views.py

from django.shortcuts import render
from rest_framework import viewsets, status
from rest_framework.response import Response

# new imports
import threading
import numpy as np
import pandas as pd
import logging
logger = logging.getLogger(__name__)

from .serializer import BookSerializer
from django.db import transaction


# Create your views here.


class UploadViewSet(viewsets.ViewSet):

    # ensure atomic transaction
    @transaction.atomic
    def upload(self, request):

        """entry method that handles the upload request"""

        try:
            logger.debug("Upload request received ...")

            request_file = request.data.get("books")

            if request_file:
                logger.debug("File Detected")
            else:
                return Response(
                    {"response": "No file received"},
                    status=status.HTTP_400_BAD_REQUEST
                )

            # read request file using the appropriate
            # pandas method based on file extension check

            if "xls" in str(request_file).split(".")[-1]:
                file_obj = pd.read_excel(request_file)
            elif "csv" in str(request_file).split(".")[-1]:
                file_obj = pd.read_csv(request_file)
            else:
                return Response(
                    {"response": "Invalid File Format"},
                    status=status.HTTP_400_BAD_REQUEST,
                )

            # convert file obj to a dataframe,
            # handle empty rows and
            # convert the dataframe to a list of dictionaries

            books_df = pd.DataFrame(file_obj)
            books_df = books_df.replace(np.nan, "", regex=True)
            books = books_df.to_dict(orient="records")

            # instantiate and start thread with required params
            t = threading.Thread(
                target=self.execute_upload, 
                args=[books], 
                daemon=True
            )
            t.start()

            return Response(
                {"response": "Upload Initiated, Check Book Table"},
                status=status.HTTP_201_CREATED,
            )
        except Exception as e:
            logger.error(f"Error Uploading File. Cause: {str(e)}")
            return Response(
                {
                    "response": f"""Error Uploading File. 
                                             Cause: {str(e)}"""
                },
                status=status.HTTP_400_BAD_REQUEST,
            )

    def execute_upload(self, books):

        """
        the method that executes the book data upload
        :param books - dict
        """

        try:
            logger.debug("Within the execute_upload() method")

            for book in books:
                try:
                    # serialize and save book data
                    serializer = BookSerializer(data=book)
                    serializer.is_valid(raise_exception=True)
                    serializer.save()

                    logger.debug(
                        f"Successfully inserted {book['title']}"
                    )
                except Exception as e:
                    logger.error(f"Error Inserting Book. Cause: {str(e)}")
        except Exception as e:
            logger.error(f"Error During Upload. Cause: {str(e)}")

Let’s talk about the code in app/views.py. First off, the necessary imports are made, then in the UploadViewSet, the upload method is the entry method that handles the incoming request. It has a decorator - @transaction.atomic - that ensures an atomic transaction. When the request is received, the file extension is checked, after which the file data is read using the appropriate pandas method and then the data read from the file is converted to a list of dictionaries.

A thread is instantiated and started with the target parameter value set to execute_upload method (notice how the method name is passed without brackets), the args parameter, which is the argument(s) to be passed to the execute upload method when invoked, is set to books (a list of book data) and a boolean param, daemon, set to True. The daemon parameter which indicates whether the thread is a daemon thread or not. The main thread is not a daemon thread and therefore all threads created in the main thread default to daemon=False. A daemon thread does not block the main thread from exiting and it continues to run in the background while the appropriate response to the request is returned by the main thread. When the thread invokes execute_upload, each book data is serialized and then saved in the database.

Sample book data in an excel/csv file as shown below is sent as a post request to the endpoint which returns an almost immediate response while the database is being populated with the book data by the daemon thread.


sample book data in excel/csv file

An image for the file request via Postman is shown below:


file request via postman

Finally, a Django admin view of the uploaded book data:


django admin view of uploaded data

conclusion

In conclusion, background tasks are inevitable in backend services but rather than using a third-party job queue like celery for all background tasks which can introduce additional queueing delays, based on the specific use case, some tasks can simply be executed using daemon threads.
The source code for this post is available on Github