Clearing matrix synapse database

Cyril Šebek • 7/9/2024, 12:11:32 PM

Synapse is a Matrix homeserver implementation and one of the most widely used options available. However, it has a notable flaw that, while not uncommon, has surprisingly little documentation or discussion. What’s the problem? The PostgreQSL database, which tracks rooms and messages on your homeserver, can grow to an enormous size without any clear reason. To make matters worse, there is no native solution or comprehensive documentation explaining how to prevent or manage this issue.

Background

I’ve always wanted to have something like “my own private chat server.” When I started with self-hosting, I searched around for some time, tried multiple variants, but ended up settling on Matrix. Matrix is an open-source, decentralized communication network similar in working to Discord. You can create spaces (like servers on Discord), chats, and rooms within those spaces. Of course, you can have rooms outside spaces and direct messages are also supported. Unlike Discord, everything is encrypted with end-to-end encryption (E2EE). The best part is the ability to host your own homeserver. When you have your own server, you can create rooms and spaces that are located on your server, and all the data from those rooms will be stored only on your server. However, this doesn’t prevent you from communicating with people from different servers; in that case, the data will be sort of duplicated across involved homeservers.

There are a few Matrix homeserver implementations, but the most mature one is called Synapse. It uses a PostgreSQL database to store almost all data - like rooms, users, messages, configurations, etc. The problem is that it doesn’t perform much automatic cleaning and sometimes retains data that is no longer needed. For example, when you join a room on the Matrix.org homeserver from your own homeserver, your Synapse homeserver will duplicate some data from that room and also create some records in the state_groups_state table. What is that? Well, it’s complicated, and I won’t pretend to understand it completely. You can learn more from the official Synapse documentation. None of that sounds like a big problem or something really stupid, right?

The problem is that this one table, which is used only for rooms not owned by your homeserver, can get stupidly big. And the worst part? You don’t really need it. It’s not an unknown issue that these tables can get millions or even billions of rows and take up multiple GBs or even tens of GBs on your drives. There have been some improvements in the past, but it’s still an issue without any official solution. The closest thing to an official solution is the Rust-written synapse_auto_compressor, which aims to compress related rows to reduce the size. But if you leave the room, all the records about that room should be deleted, right? For some reason, they’re not. The Synapse API doesn’t have a direct command to “clean” a room like that, nor is it done automatically. So, if you were in multiple rooms and left them, all those records would still be on your drive, potentially taking up a lot of space.

I am running my homeserver in a virtual machine on Proxmox. I thought that about 100 GB should be enough, since I mainly use the homeserver to communicate with my family and some friends. And yes, I am part of some other Matrix communities and have joined some rooms, but nothing crazy. A reasonably sane person would guess that there is no way there would be more than 20 GB in messages, images, and user data combined on that server. Yet somehow, my PostgreSQL database managed to grow to such a wild size that my virtualized Ubuntu stopped working properly. How big was it? Almost 80 GB.

Search for solution

That was the moment when I realized something was wrong, so I started searching the web. I hoped there would be a simple command or automated tool, but I couldn’t find much information on how to solve it. As a developer, I started on GitHub, thinking there might be something useful. However, aside from that one Rust project, there wasn’t much available.

After searching for quite some time, I discovered a tool called matrix-synapse-diskspace-janitor by Forest at sequentialread. He found a way to completely remove those tables that otherwise never get deleted. His original script can also be found on his page. The matrix-synapse-diskspace-janitor tool automates this process and makes it accessible to less technical users. It has a reasonably nice web UI and is quite robust. The only thing I found missing was direct documentation. I’m not a Go developer, so how am I supposed to know how to build a Go app?

Figuring it out

I’ll explain how to set up the matrix-synapse-diskspace-janitor, as I found the information somewhat hard to find.

  1. Clone the repo
    Go to any folder where you would like to set up your janitor, perhaps /tmp for now and copy it later. Then clone the repo using Git: git clone https://git.cyberia.club/cyberia/matrix-synapse-diskspace-janitor.git

  2. Update the settings
    In the cloned folder there is a config.example.json file. Rename that file to config.json and edit the values inside it.

    • FrontendPort: The port on which the web UI will be accessible.
    • FrontendDomain: The domain or subdomain where your Janitor instance will be accessible.
    • MatrixServerPublicDomain: Your homeserver’s URL.
    • MatrixURL: The address where you can access Synapse’s API. If you are setting up the janitor on the same server as your Synapse and you haven’t changed the port on Synapse, you can leave this as is.
    • AdminMatrixRoomId: A room that specifies who can use this tool - only people in that room can. If you have a room with only admin personnel, you can use its ID. If you are the only admin, you can create a room with just you and use its ID.
    • MatrixAdminToken: An access token from any administrator account. It might be your account if it has administrator privileges on your homeserver. You can obtain this token in any client; I use Element. In Element desktop, open settings by clicking on your profile, then go to All settings > Help & About, and at the bottom under Advanced, you will see Access Token. Click on that to obtain your access token.
    • DatabseType: This will be PostgreSQL in most cases, as it is recommended for Synapse.
    • DatabaseConnectionString: How the app will connect to your database. If you are running PostgreSQL on the same machine, you can just change the relevant data, which in most cases would be only the password.
    • MediaFolder & PostgresFolder: These will usually stay at their default values.
  3. Install build depedencies (Ubuntu)
    If your system doesn’t already have Go installed, follow the official instructions

  4. Build the app
    This step is surprisingly easy. Go into the janitor’s folder and run: go build -o janitor

  5. (Optional) Create systemd unit
    You may want to run your app only when necessary or have it run continuously to perform regular checks. When you feel like it, you can open the web UI and remove all the “clutter” from your database.
    Create a file called janitor.service inside /etc/systemd/system/. The contents of the file should be something like the code bellow. After that, reload the units using: sudo systemctl daemon-reload. Enable and start the unit with sudo systemctl enable janitor && sudo systemctl start janitor.

[Unit]
Description=Matrix Synapse Diskspace Janitor
After=network.target

[Service]
Type=simple
ExecStart=<path-to-your-janitor-folder>/janitor
WorkingDirectory=<path-to-your-janitor-folder>
User=root
Group=root
Restart=on-failure

[Install]
WantedBy=multi-user.target

It seems straightforward, right? You install the app - sure, there aren’t detailed instructions, but it’s not difficult - so what’s the issue? Generally, there shouldn’t be any problems. You should be able to open the application, wait for it to scan your database, select the rooms you want to clean, and press the big button.

The first issue I encountered, which has since been resolved, was logging into the app. Every time I tried to log in, the app would crash with an error in the console.

http: panic serving 10.69.13.200:43140: runtime error: slice bounds out of range [:10] with capacity 0
goroutine 54 [running]:
net/http.(*conn).serve.func1()
 /usr/local/go/src/net/http/server.go:1850 +0xbf
panic({0x816fa0, 0xc000094078})
 /usr/local/go/src/runtime/panic.go:890 +0x262
main.initFrontend.func1({0x8ed6d0, 0xc000322000}, 0xc000288000, {{0xc0001ce08b, 0x2c}, {0xc000096670, 0x5}, 0x0, 0xc000600010})
 /home/emperor/Downloads/matrix-synapse-diskspace-janitor/frontend.go:170 +0x1118
main.(*FrontendApp).handleWithSession.func1({0x8ed6d0, 0xc000322000}, 0x4c6e53?)
 /home/emperor/Downloads/matrix-synapse-diskspace-janitor/frontend.go:369 +0x130
net/http.HandlerFunc.ServeHTTP(0xc000285af0?, {0x8ed6d0?, 0xc000322000?}, 0x0?)
 /usr/local/go/src/net/http/server.go:2109 +0x2f
net/http.(*ServeMux).ServeHTTP(0x0?, {0x8ed6d0, 0xc000322000}, 0xc000288000)
 /usr/local/go/src/net/http/server.go:2487 +0x149
net/http.serverHandler.ServeHTTP({0xc00019e210?}, {0x8ed6d0, 0xc000322000}, 0xc000288000)
 /usr/local/go/src/net/http/server.go:2947 +0x30c
net/http.(*conn).serve(0xc0001aa000, {0x8edbe0, 0xc00019ef60})
 /usr/local/go/src/net/http/server.go:1991 +0x607
created by net/http.(*Server).Serve

This issue has thankfully been resolved now. Initially, the app was expected to display the top 10 rooms consuming the most space on the server. However, since my server had fewer rooms than anticipated, the app crashed due to this discrepancy.

After fixing this issue, I assumed everything would work smoothly - delete unnecessary data from my database and operate flawlessly. However, I was mistaken. While the app accurately displayed the size of my database (78GB), it didn’t present any rooms for deletion.

Screenhot of Janitor showing how big my database is

Fortunately, I was already in contact with Forest himself, who had assisted in resolving the initial issue and pushed a commit to fix it. Upon discussing the state of my database with him, we concluded that manually removing it might be the best approach due to some unusual behavior.

This process consumed some time, but I successfully cleared most of the database manually. Queries indicated empty results, suggesting that rooms with extensive entries in the state_groups_state table had been deleted. However, strangely, the database size remained unchanged.

PostgreSQL cleaning procedures

To resolve this issue, the recommended approach is to use pg_dump to export your PostgreSQL database, followed by completely dropping the database using DROP DATABASE. This step ensures that all data, including the tombstones (unused space), is thoroughly removed. Afterward, you can reimport the database from the dump file and recreate it following the official Synapse instructions for PostgreSQL setup.

Frustratingly, the pg_dump file was only 540MB in my case, so everything else got removed during my manual cleaning and was just tombstones in my database files.

Conclusion and thanks

Synapse homeserver is widely regarded as one of the best Matrix homeserver implementations available, but like any software, it has its share of bugs and quirks. One such issue can be inefficient disk usage and occasional reluctance to delete unnecessary data. Typically, cleaning up shouldn’t be overly complicated, but in my case, the database had some unusual or broken aspects. To compound the problem, it had consumed all available disk space, making the situation more challenging to resolve.

Thanks to Forest and the exceptional support from cyberia.club, I successfully resolved all issues. Now, my Matrix homeserver runs smoothly without unnecessarily occupying excessive disk space. Huge thanks to everyone involved! <3