Introduction

API (Application Programming Interface) consumption is the new normal in contemporary times as many software products have shifted focus to decoupling Backend and Frontend codebases. Backend Engineers are tasked with writing consumable APIs that their Frontend counterparts consume. In some cases, even Backend Engineers utilize some other API services to accomplish their tasks.

Some services provide an enormously large dataset so making them accessible at a single API call might not be great. Pagination then comes to the rescue. Many APIs are now paginated to make available a fraction of the data. To access other fractions, you need some extra tasks.

A recursive solution using Celery with Django

In this article, we will set up some simple celery background tasks to consume periodically a paginated API and store them in our model. This storage typically overwrites the previous values as "remembering" previous data is not the subject of discussion. A separate post in this series will address building a Data warehouse that keeps track of the data history.

Prerequisite

It is assumed you have basic familiarity with Django. Go through the Django tutorials to become familiar.

Implementation

Let's create a Django project named api_django_celery in a virtual environment on any folder on our machine. It is assumed you have installed django and celery==5.5.x in the virtual environment.

bash
				┌──(sirneij@sirneij)-[~/Documents/Projects/Django/Django_celery]
└─$[sirneij@sirneij Django_celery]$ django-admin startproject api_django_celery .
			

Start an application called cryptoapp.

bash
				┌──(sirneij@sirneij)-[~/Documents/Projects/Django/Django_celery]
└─$[sirneij@sirneij Django_celery]$ python manage.py startapp cryptoapp
			

Register your application in settings.py.

Django_celery/settings.py
python
				...
# Application definition

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',

    'cryptoapp.apps.CryptoappConfig', # Our new app
]
...

			

It is time to set up our application to utilize Celery. To do this, create a file aptly named celery.py in your project's directory and paste the following snippet:

python
				# api_django_celery > celery.py


import os

from django.conf import settings

from celery import Celery

# Set the default Django settings module for the 'celery' program.
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'api_django_celery.settings')

app = Celery('api_django_celery')

# Using a string here means the worker doesn't have to serialize
# the configuration object to child processes.
# - namespace='CELERY' means all celery-related configuration keys
#   should have a `CELERY_` prefix.
app.config_from_object(f'django.conf:{settings.__name__}', namespace='CELERY')

# Load task modules from all registered Django apps.
app.autodiscover_tasks()


@app.task(bind=True, ignore_result=True)
def debug_task(self):
    print(f'Request: {self.request!r}')
			

That was directly lifted from Celery django documentation. Ensure you modify this line:

python
				os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'api_django_celery.settings')
			

and this line:

python
				app = Celery('api_django_celery')
			

to reflect your project's name. The namespace in this line:

python
				app.config_from_object('django.conf:settings', namespace='CELERY')
			

enables you to prefix all celery-related configurations in your settings.py file with CELERY such as CELERY_BROKER_URL.

Next, open open your project's __init__.py and append the following:

python
				#api_django_celery > __init__.py

# This will make sure the app is always imported when
# Django starts so that shared_task will use this app.
from .celery import app as celery_app

__all__ = ('celery_app',)
			

To conclude celery-related configurations, let's set the following in settings.py:

python
				CELERY_BROKER_URL = config("REDIS_URL")
CELERY_RESULT_BACKEND = config("REDIS_URL")
CELERY_ACCEPT_CONTENT = ['application/json']
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
			

We are using redis as our broker. You can opt for RabbitMQ which is supported out-of-box by celery.

In the settings above, I am linking CELERY_BROKER_URL to an environment variable named REDIS_URL. It normally should look like redis://127.0.0.1:6379 on an Ubuntu machine. That means I could have set my CELERY_BROKER_URL and CELERY_RESULT_BACKEND as:

python
				CELERY_BROKER_URL = "redis://127.0.0.1:6379"
CELERY_RESULT_BACKEND = "redis://127.0.0.1:6379"
			

Note that CELERY_RESULT_BACKEND is optional as well as CELERY_ACCEPT_CONTENT, CELERY_TASK_SERIALIZER, and CELERY_RESULT_SERIALIZER. However, not setting the last three might result in some runtime errors mostly when dealing with databases in asynchronous email broadcasting with celery.

The stage is now set, let's set up our database.