There are dozens of articles about Django optimization. You might have already read about query optimization, caching, and so on. I’m not going to repeat it, but tell you the story of how we optimized memory consumption of our in-memory autocomplete search at hands.ru — the end-to-end marketplace for home repair, using Linux copy-on-write mechanism.
What is an autocomplete search? When a user types something at the input, it suggests him most relevant results. For instance, at hands.ru it’s used for service suggestions helping clients to find relevant services without navigating the catalog.
There are many ways to implement an autocomplete search. Some of you might set up Elasticsearch, some use Sphinx. But when having rather small data it might be in-optimal. Moreover, customizing box solutions sometimes takes much time. For instance, we’d like to sort relevant services according to their popularity and season. People more often install doors in autumn and set up air conditioners in spring, waiting for the hot summer. Also when considering typos you may want to give more weight to cases when confused letters are located side by side on the keyboard versus distant ones. The fuzzy approach from Elasticsearch does not support it.
Raw python built-in types, fairly speaking, are not optimal for purposes like storing huge amount of data in memory— every string takes 74 bytes overhead when interpreter prefers UCS-2 encoding:
But there are only hundreds of repair services at hands.ru. We’ve calculated that building a full-text search index with synonyms directly in memory takes about 300 Mb in the worst case. And we’ve decided to have some fun.
After deploying the first release, happy with the work done, we were pretty surprised when started to receive monitoring alerts with high memory utilization, swap, and CPU load jumps.
It was clear that the swap file was activating when there was not enough RAM and it caused CPU load jumps. After thinking for a while, the reason became clear too. Search index had taken all the memory, but why? For sure 300 Mb were small enough to move into swap.
Our autocomplete search is built as a module inside our back-end Django application running on a server with uWSGI + Nginx. Nginx serves all static and proxies other requests into the uWSGI app, working in multi-process/multi-thread mode to serve concurrent requests. Here is a part of the uWSGI configuration:
After making small research we’ve found that there was no mistake in our index memory consumption calculation, but we missed one thing — memory was allocated on each process, which we have four in total. So, it wasn’t 300 Mb. Actually, it was 1,2 Gb.
That was a rough mistake. The first idea was to get rid of forks and leave only threads at the uWSGI configuration. Once the index is loaded into memory, all other threads can use it. But wait, this is Python — remember GIL. Some of you may argue, that probably most of the time our workers wait for the I/O operations like database response. Partially this is true, but still, we got a slower average response time, which we couldn’t afford.
There was a better solution. We’ve recalled Linux copy-on-write mechanism. In short, it works as follows: once you fork process — the child process receives its personal memory space with copied memory of the parent process. But, actually, it’s no physically copied — it’ll be copied when the child process tries to write something into memory inherited from the parent process. And when reading — no copy needed. It allows for saving real physical memory usage. In simple words, it means that if you load 1 Gb data into the main process memory and fork 4 times, the total memory will be much less than 4 Gb until you try to change it at the child workers.
For us, it should have work because our search index is totally read-only. One thing we had to do is to know how to load it at the uWSGI main process before the fork happens.
Django is lazy — you can’t pre-load index when initializing view — this code will be invoked at each worker. So, you have to go upper — to your
wsgi module, directly to the point, where the main process performs application bootstrap. Here is the
After deploying the second release we saw the profit.
And now to the conclusion. Copy on write works almost everywhere if you use Linux, whether you use Django, Flask, or not a Python at all. But details matter. Things that I wrote here may seem evident, but often people make mistakes at the simplest staff. Once you make something, do not forget about the environment where it runs.
In the next story, I’ll dive deeper into full-text autocomplete search. What are the ready solutions and what’s under the hood — how to make it by yourself with typos correction, your local language features, synonyms, and so on. Stay tuned.