Allgemein

Birger Schacht: Status update, January 2026

Birger Schacht: Status update, January 2026

January was a slow month, I only did three uploads to Debian unstable:

  • xdg-desktop-portal-wlr updated to 0.8.1-1
  • swayimg updated to 4.7-1
  • usbguard updated to 1.1.4+ds-2, which closed #1122733

I was very happy to see the new
dfsg-new-queue and that there are more hands
now processing the NEW queue. I also finally got one of the packages accepted
that I uploaded after the Trixie release:
wayback which I uploaded last August.
There has been another release since then, I’ll try to upload that in the next
few days.

There was a bug report for carl
asking for Windows support. carl used the xdg
create for looking up the XDG directories, but xdg does not support
windows systems (and it seems this will not
change
)
The reporter also provided a PR to replace the dependency with the
directories crate which more system
agnostic. I adapted the PR a bit and merged it and released version
0.6.0
of carl.

At my dayjob I refactored
django-grouper.
django-grouper is a package we use to find duplicate objects in our data. Our
users often work with datasets of thousands of historical persons, places and
institutions and in projects that run over years and ingest data from multiple sources,
it happens that entries are created several times.
I wrote the initial app in 2024, but was never really happy about the approach
I used back then. It was based on this blog
post

that describes how to group spreadsheet text cells. It uses sklearns
TfidfVectorizer

with a custom analyzer and the library
sparse_dot_topn for creating the
matrix. All in all the module to calculate the clusters was 80 lines and with
sparse_dot_topn it pulled in a rather niche Python library. I was pretty sure
that this functionality could also be implemented with basic sklearn
functionality and it was: we are now using
DictVectorizer
because in a Django app we are working with objects that can be mapped to dicts
anyway. And for clustering the data, the app now uses the
DBSCAN
algorithm (with the manhattan distance as metric). The module is now only half
the size and the whole app lost one dependency! I released those changes as
version
0.3.0
of the
app.

At the end of January together with friends I went to Brussels to attend
FOSDEM. We took the night train but there were a couple of
broken down trains so the ride took 26 hours instead of one night. It is a good
thing we had a one day buffer and FOSDEM only started on Saturday. As usual
there were too many talks to visit, so I’ll have to watch some of the
recordings in the next few weeks.

Some examples of talks I found interesting so far: