Introduction
API (Application Programming Interface) consumption is the new normal in contemporary times as many software products have shifted focus to decoupling Backend and Frontend codebases. Backend Engineers are tasked with writing consumable APIs that their Frontend counterparts consume. In some cases, even Backend Engineers utilize some other API services to accomplish their tasks.
Some services provide an enormously large dataset so making them accessible at a single API call might not be great. Pagination then comes to the rescue. Many APIs are now paginated to make available a fraction of the data. To access other fractions, you need some extra tasks.
A recursive solution using Celery with Django
In this article, we will set up some simple celery background tasks to consume periodically a paginated API and store them in our model. This storage typically overwrites the previous values as "remembering" previous data is not the subject of discussion. A separate post in this series will address building a Data warehouse that keeps track of the data history.
Prerequisite
It is assumed you have basic familiarity with Django. Go through the Django tutorials to become familiar.
Implementation
Let's create a Django project named api_django_celery
in a virtual environment on any folder on our machine. It is assumed you have installed django
and celery==5.5.x
in the virtual environment.
┌──(sirneij@sirneij)-[~/Documents/Projects/Django/Django_celery]
└─$[sirneij@sirneij Django_celery]$ django-admin startproject api_django_celery .
Start an application called cryptoapp
.
┌──(sirneij@sirneij)-[~/Documents/Projects/Django/Django_celery]
└─$[sirneij@sirneij Django_celery]$ python manage.py startapp cryptoapp
Register your application in settings.py
.
...
# Application definition
INSTALLED_APPS = [
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'cryptoapp.apps.CryptoappConfig', # Our new app
]
...
It is time to set up our application to utilize Celery. To do this, create a file aptly named celery.py
in your project's directory and paste the following snippet:
# api_django_celery > celery.py
import os
from django.conf import settings
from celery import Celery
# Set the default Django settings module for the 'celery' program.
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'api_django_celery.settings')
app = Celery('api_django_celery')
# Using a string here means the worker doesn't have to serialize
# the configuration object to child processes.
# - namespace='CELERY' means all celery-related configuration keys
# should have a `CELERY_` prefix.
app.config_from_object(f'django.conf:{settings.__name__}', namespace='CELERY')
# Load task modules from all registered Django apps.
app.autodiscover_tasks()
@app.task(bind=True, ignore_result=True)
def debug_task(self):
print(f'Request: {self.request!r}')
That was directly lifted from Celery django documentation. Ensure you modify this line:
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'api_django_celery.settings')
and this line:
app = Celery('api_django_celery')
to reflect your project's name. The namespace
in this line:
app.config_from_object('django.conf:settings', namespace='CELERY')
enables you to prefix all celery-related configurations in your settings.py
file with CELERY
such as CELERY_BROKER_URL
.
Next, open open your project's __init__.py
and append the following:
#api_django_celery > __init__.py
# This will make sure the app is always imported when
# Django starts so that shared_task will use this app.
from .celery import app as celery_app
__all__ = ('celery_app',)
To conclude celery-related configurations, let's set the following in settings.py
:
CELERY_BROKER_URL = config("REDIS_URL")
CELERY_RESULT_BACKEND = config("REDIS_URL")
CELERY_ACCEPT_CONTENT = ['application/json']
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
We are using redis as our broker. You can opt for RabbitMQ which is supported out-of-box by celery.
In the settings above, I am linking CELERY_BROKER_URL
to an environment variable named REDIS_URL
. It normally should look like redis://127.0.0.1:6379
on an Ubuntu machine. That means I could have set my CELERY_BROKER_URL
and CELERY_RESULT_BACKEND
as:
CELERY_BROKER_URL = "redis://127.0.0.1:6379"
CELERY_RESULT_BACKEND = "redis://127.0.0.1:6379"
Note that CELERY_RESULT_BACKEND
is optional as well as CELERY_ACCEPT_CONTENT
, CELERY_TASK_SERIALIZER
, and CELERY_RESULT_SERIALIZER
. However, not setting the last three might result in some runtime errors mostly when dealing with databases in asynchronous email broadcasting with celery.
The stage is now set, let's set up our database.