In this article, a detour from the ongoing series on building a financial data analyzer, I focus on the critical aspect of rigorously testing the server. Ensuring appropriate and accurate data handling is paramount. While Python doesn't enforce strict type-safety, I'll demonstrate how to use tools like mypy, bandit, and later prospector to maintain basic code quality and standards.

You should read the article on building the preliminary AI service to follow along. This article builds upon the concepts and code established there.

Sirneij
Sirneij/finance-analyzer
00

An AI-powered financial behavior analyzer and advisor written in Python (aiohttp) and TypeScript (ExpressJS & SvelteKit with Svelte 5)

sveltetypescriptpythonjavascriptcss3html5

The AI service described in the AI service has several areas for improvement:

  1. Testability: The current structure makes automated testing (both integration and unit) difficult.
  2. Model Accuracy: The zero-shot classification model, originally designed for sentiment analysis, isn't optimal for categorizing financial transactions. A more suitable model is needed.
  3. Code Quality: The code requires refactoring, cleanup, and the addition of new features.
  4. Type Consistency: Type annotations need to be consistently applied and enforced throughout the codebase.

To address these, we will adopt this structure:

sh

We introduced the src/ directory to house the entire application. The aiohttp server setup was refactored into src/app/app_instance.py, with run.py simply responsible for running the created app instance:

run.py
python

The run.py file initializes and starts the aiohttp application.

The key changes in app_instance.py are highlighted below:

src/app/app_instance.py
diff

We improved type consistency throughout the codebase, using # type: ignore where necessary. We also replaced the global WebSocket connection list with weakref.WeakSet for more robust connection management during a shutdown. To maintain persistent connections during long-running processes like zero-shot classification, we implemented a ping/pong mechanism.

Next, we consolidated common utility functions into a new src/utils/base.py file. This included functions like validate_and_convert_transactions, get_device, detect_anomalies, analyze_spending, predict_trends, calculate_trend, and calculate_percentage_change, previously located in utils/summarize.py and utils/analyze.py. We also introduced new functions to estimate financial health (calculate_financial_health) and detect recurring transactions (analyze_recurring_transactions). The anomaly detection was enhanced to identify single-instance anomalies, and the transaction grouping algorithm now uses difflib for fuzzy matching of descriptions. For example, difflib might consider these descriptions to be similar (approximately 69% match): "Target T-12345 Anytown USA" and "Target 12345 Anytown USA":

py

We also encapsulated sending progress reports in a reusable function, update_progress.

In src/utils/analyzer.py, the major improvements are:

  1. Improved Model Accuracy: We switched from the yiyanghkust/finbert-tone model to facebook/bart-large-mnli for zero-shot classification. This significantly improves accuracy, although at the cost of speed. For multilingual support, joeddav/xlm-roberta-large-xnli is another option.
  2. Hybrid Classification Approach: We now use a hybrid approach, first attempting to classify transactions using pattern matching. Any remaining unclassified transactions are then processed by the ML model. To improve performance, we process transactions in batches, releasing the event loop after each batch to allow other operations to proceed and to clear memory.
  3. Offloading Calculations: To reduce the load on the classification process, we moved the calculation of anomalies, spending_analysis, spending_trends, recurring_transactions, and financial_health to src/utils/summarize.py, which is significantly faster.

Our type annotations are currently only decorative. To enforce type safety, ensure code security, and maintain a consistent code style, we'll use the following tools:

  • mypy: A static type checker.
  • bandit: A security linter.
  • black: An uncompromising code formatter.
  • isort: A tool for sorting imports.
Tip: Consider using Prospector

Prospector provides comprehensive static analysis and ensures your code conforms to PEP8 and other style guidelines. It's highly recommended for in-depth code quality checks.

Install these tools and add them to requirements.dev.txt:

sh

Create a mypy.ini file at the root of the project with the following configuration:

mypy.ini
ini

This configuration enforces various type-checking rules. Each option is generally self-explanatory.

Next, create a bash script (scripts/static_check.sh) to automate the static analysis process:

scripts/static_check.sh
sh

This script checks the code against the defined standards. To ensure your code passes these checks, run the following commands before committing:

sh

To enforce these rules in a team environment, we'll use a CI/CD pipeline. This pipeline runs these checks, and any failure prevents the pull or merge request from being merged. We will use GitHub Actions for our CI/CD. Create a .github/workflows/aiohttp.yml file:

.github/workflows/aiohttp.yml
yaml

GitHub Actions uses .yaml or .yml files to define workflows, similar to docker-compose.yml. In this case, we're using the latest Ubuntu distribution as the environment. We use version 4 of the actions/checkout action to check out our repository. We also install system dependencies required by some of the Python packages, such as poppler-utils for pdf2image and tesseract-ocr and libtesseract-dev for pytesseract. Since our project doesn't have database interaction, we don't need a services section. The remaining steps are self-explanatory. We then execute our bash script to check the codebase against our defined standards. We also supply environment variables and run the tests (which we'll write later). This CI/CD pipeline runs on every pull request or push to the utility branch.

The last part of our CI/CD was running tests and getting coverage reports. In the Python ecosystem, pytest is an extremely popular testing framework. Though very tempting and might still be used later on, we will stick with Python's built-in testing library, unittest, and use coverage for measuring the code test coverage of our program. Let's start with the test setup:

tests/__init__.py
python

We simply have classes which provide blueprints for our tests. The Base class makes the create_transaction_dict method available to all its children, simplifying the creation of transaction data for tests. The FakeWebSocket class simulates aiohttp WebSocket behavior, which is essential for unit testing the project's WebSocket utilities. All asynchronous unit tests inherit from BaseAsyncTestClass, while synchronous tests inherit from BaseTestClass. BaseAioHTTPTestCase is used for integration-style tests that involve the aiohttp application. The get_application is required in this class to return our app's instance.

Note: Unit vs Integration tests

A unit test focuses on testing a single piece of code (like a function such as analyze_recurring_transactions) whereas integration tests examine how multiple units of code interact with each other within a system (this is like testing the behavior of sending a request to /ws)

Let's take an example integration-style test, especially for our websocket, and another unit test for some of the subprocesses to balance things out:

tests/app/websocket_handler/test_integration.py
python

Overlooking the dummy data generators, the __receive_messages helper is crucial for accumulating WebSocket messages. Without it, attempting await ws.receive_json(...) multiple times could lead to timeout errors, resulting in cryptic tracebacks:

sh

The helper also aids in filtering messages of interest. We also created a dummy version of ping_server to properly close it and prevent memory leaks. With the dummy functions in place, we created test cases that interact with our WebSocket endpoint. Using async patches and mocks, we fed predictable responses to the tests. Note that we used our async dummy methods as the side_effect of the AsyncMock. Using return_value instead of side_effect in the mocks prolonged the processes and caused timeout errors.

The other test cases handle various scenarios to provide better test coverage.

Warning: The file path in patch

When supplying file paths in patch, use the path where the program is used, not where it was defined. For instance, src.app.app_instance.analyze_transactions was defined in src/utils/analyzer.py but since it was used in src/app/app_instance.py, we used src.app.app_instance.analyze_transactions.

However, the integration testing approach poses some limitations. We can't modify the internals of the aiohttp WebSocket instance. This is where unit testing comes to the rescue, as we can modify internals and mock them as needed to thoroughly test the desired feature. Hence the other test file for our WebSocket, `tests/app/websocket_handler/test_ping.py`.

To wrap up, let's see how we tested the src/utils/analyze.py:

tests/utils/test_analyzer.py
python

This thorough testing allows us to have confidence in the reliability of our code. The repository's tests folder contains other test files that rigorously test our implementations. Currently, we have 100% test coverage on the AI service, and static analysis is enforced.

We will stop here. In the next article, we will return to implementing the dashboard.

Enjoyed this article? I'm a Software Engineer, Technical Writer and Technical Support Engineer actively seeking new opportunities, particularly in areas related to web security, finance, healthcare, and education. If you think my expertise aligns with your team's needs, let's chat! You can find me on LinkedIn and X. I am also an email away.

If you found this article valuable, consider sharing it with your network to help spread the knowledge!