API (Application Programming Interface) consumption is the new normal in contemporary times as many software products have shifted focus on decoupling Backend and Frontend codebases. Backend Engineers are tasked with writing consumable APIs that their Frontend counterparts consume. In some cases, even Backend Engineers utilize some other API services to accomplish their tasks.

Some services provide an enormously large dataset so making them accessible at a single API call might not be great. Pagination then comes to the rescue. Many APIs are now paginated to make available a fraction of the data. To access other fractions, you need some extra tasks.

This article demonstrates how to set up Celery background tasks to consume paginated APIs periodically. We'll explore iterative and recursive approaches for APIs paginated using page parameters and those using next URLs. The fetched data will be stored in a Django model, overwriting previous data. Note that persisting historical data is outside the scope of this article but will be addressed in a future post on building a data warehouse.

Basic familiarity with Django is assumed. Refer to the Django tutorial for an introduction.

Sirneij
Sirneij/django_excel
20

Exporting Django model data as excel file (.xlsx) using openpyxl library and Google Spreadsheet API

pythonhtml5bashheroku

Create a Django project named django_excel within a virtual environment. Ensure that django and celery are installed in your environment.

bash

Create an app named core:

bash

Register your application in settings.py.

django_excel/settings.py
python

It is time to set up our application to utilize Celery. To do this, create a file aptly named celery.py in your project's directory and paste the following snippet:

django_excel/celery.py
python

That was directly lifted from Celery Django documentation. Ensure you modify lines 6 and 8 to reflect your project's name. The namespace in line 14 enables you to prefix all celery-related configurations in your settings.py file with CELERY such as CELERY_BROKER_URL.

Note: Capitalization of Celery-Related Configurations

Because you are literarily providing constants, the celery-related configurations in your settings.py file are capitalized. For instance, one of the configurations is beat_schedule which in Django, becomes CELERY_BEAT_SCHEDULE.

Next, open open your project's __init__.py and append the following:

django_excel/__init__.py
python

To conclude celery-related configurations, let's set the following in settings.py:

python

We are using redis as our broker. You can opt for RabbitMQ which is supported out-of-box by celery.

In the settings above, I am linking CELERY_BROKER_URL to an environment variable named REDIS_URL. It normally should look like redis://127.0.0.1:6379 on a Linux system. That means I could have set my CELERY_BROKER_URL and CELERY_RESULT_BACKEND as:

python

Note that CELERY_RESULT_BACKEND is optional as well as CELERY_ACCEPT_CONTENT, CELERY_TASK_SERIALIZER, and CELERY_RESULT_SERIALIZER. However, not setting the last three might result in some runtime errors mostly when dealing with databases in asynchronous email broadcasting with celery.

The stage is now set, let's set up our database. We will be consuming CoinGecko's API and will be saving some data.

Our model will look like this:

core/models.py
py

These are all the fields taken directly from CoinGecko's public API for coin markets:

json
Note:Commission from referral

If you use Coingecko's API, when you use my code, CGSIRNEIJ, I get some commissions. That can be a good way to support me.

null=True makes a column nullable in SQL:

sql

null=False or leaving it unset makes the column non-nullable:

sql

blank=True allows the field to be optional in forms and the admin page.

Talking about the admin site, let's register the model:

core/admin.py
python

With this, you can migrate your database:

Terminal
bash

Optionally, you can create a superuser:

Terminal
bash

Follow the prompts.

Here is the juicy part:

core/tasks.py
python

The build_api_url helps continuously build CoinGecko API url based on the supplied page number. The BASE_API_URL is:

django_excel/settings.py
python

fetch_coins_iteratively is the core of the program. It starts with the first page and does an "infinite" loop which breaks only when there's no data returned by the API using the iterative strategy.

Its recursive alternative is:

python

Then there is the get_full_coin_data_iteratively_for_page which is decorated by shared_task (for task autodiscovery). We supplied some parameters:

  • bind=True to access task instance via self
  • autoretry_for=(Exception,) to auto-retry on exceptions
  • retry_backoff=True for exponential backoff
  • max_retries=5 to limit retries to 5

For this task to be periodic, we must add it to the CELERY_BEAT_SCHEDULE in settings.py:

django_excel/settings.py
python

It schedules this task to run every 3 minutes ('*/3') using crontab.

These implementations were with performance in mind. However, there is still room for improvement.

There are APIs whose paginations are not page-based but use the next (default DRF pagination strategy). For these systems, the last bits of data have empty next. That's the breaking point:

python

That's it! I hope you enjoyed it.

Enjoyed this article? I'm a Software Engineer, Technical Writer, and Technical Support Engineer actively seeking new opportunities, particularly in areas related to web security, finance, healthcare, and education. If you think my expertise aligns with your team's needs, let's chat! You can find me on LinkedIn and X. I am also an email away.

If you found this article valuable, consider sharing it with your network to help spread the knowledge!