vvv Click on the categories below to see other topic specific pages vvv



>>> Back to post index <<<

π 2025-10-26 01:01 in Computers, Linux
Part #2 was unfortunately much more painful in an unnecessary way due to a poorly made forced API change in exim4

It's been a while since I've been in XKCD 349 land :) Actually it's a good thing because honestly, it's really not fun and I enjoy other hobbies in my life, too :)


The power of linux is I never really had to re-install my linux system I built in 2000 or so because Debian is just that good. I did do an upgrade from i386 to amd64, but that was possible thanks to biarch in debian and a fancy and impressive in place binary upgrade from ia32/i386 to amd64.

Now, because of this little problem where my amd64 capable server from 2019 was taking way too much power (400W or so), I decided to replace it with an rPi5 which is almost 3 times faster for 20 times less power.


Despite the different binary arch, migrating was not a huge deal, although I still had ancient stuff running python2 that took a while to upgrade, but I figured it was time to get rid of python2 which has been gone from debian for a while (I went to trixie, v13, and it was removed after bulleye, 3 versions ago).
I was almost done with my upgrade and everything being back up, and then came the subject of mailman. Oh, no, mailman!
I used to be a mailman expert in 1999-2000 (yes, really, haha), knew the code well, but it's been 25 years and I've kept using it to run a few lists, but otherwise haven't touched in 25 years.

Of course, by now there is mailman3 that uses python3, but installing that on debian installed dozens of python packages, a new database system and god knows what I just didn't want or didn't need. Worse, I remembered that I have a fancy exim4 config that detects the mailman .pck files and auto provisions lists and aliases. Also, I changed the web interface a bit.

As much as its is yucky, I'm already 3 days into this full server upgrade and not wanting to spend a day or more to learn this new mailman3 and migrate to it, simply because it's not worth my time and I'm just happy to keep my few lists running as is.

So here is what I had to do:

Installing python2 was not too hard, I just had to bring back an old installation for bullseye:

magic:/usr/bin# cat /etc/apt/sources.list.d/debian_bullseye_python2.sources Types: deb URIs: http://deb.debian.org/debian
Suites: bullseye Components: main contrib non-free non-free-firmware Signed-By: /usr/share/keyrings/debian-archive-keyring.pgp

apt-get install python2.7-minimal magic:/usr/bin# ln -s python2.7 python2

Amazingly the packages were built well enough that they installed without fuss on trixie, including some dependencies:

moremagic:/etc/apt# apt-get install python2.7-minimal
apt-get defauts to bookworm/stable but system upgraded to trixie/testing 2024/10. May need to use sid if packages will not install
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  libpython2.7-minimal
Suggested packages:
  binfmt-support
Recommended packages:
  libpython2.7-stdlib python2.7
The following NEW packages will be installed:
  libpython2.7-minimal python2.7-minimal
0 upgraded, 2 newly installed, 0 to remove and 45 not upgraded.
Need to get 1,593 kB of archives.
After this operation, 6,393 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
moremagic:/etc/apt#

Now, mailman2 is python, so we're good, right? Well, not quite. There were some cgi binaries that hardcoded stuff for safety, and were obviously i386 on my system (~mailman/mail/mailman and ~mailman/cgi-bin/*).
I did have server backups going back to 2002 (not bad, haha, and yes they really still work), so I found the source I used back then, but then I realized that trying to rebuild the whole thing might take a while since it's all ancient configure, ancient python, and so forth. Just yesterday I had to rebuild ancient C, and its bundled configure crashed because its "is gcc there" test was not compliant anymore and told me my gcc could not build binaries when in fact the configure gcc test was so old that it was broken, and I just removed it (the rest actually built).

configure:1004: gcc -o conftest    conftest.c  1>&5
configure:1001:1: error: return type defaults to 'int' [-Wimplicit-int]
 1001 | main(){return(0);}
      | ^~~~
configure: failed program was:

After the source failing to build right away due to missing ancient python stuff, I asked myself "eh, can I maybe just get those i386 binaries work on arm64 as is?". And the answer is, yes:

magic:/var/local/mailman/mail# ./mailman 
bash: ./mailman: cannot execute binary file: Exec format error

# install binary emulator, not fast but more than good enough for my needs: magic:/lib# apt-get install qemu-user-static The following additional packages will be installed: qemu-user qemu-user-binfmt The following NEW packages will be installed: qemu-user qemu-user-binfmt qemu-user-static Do you want to continue? [Y/n] y Get:1 http://deb.debian.org/debian trixie/main arm64 qemu-user arm64 1:10.0.3+ds-0+deb13u1 [64.1 MB] Get:2 http://deb.debian.org/debian trixie/main arm64 qemu-user-binfmt arm64 1:10.0.3+ds-0+deb13u1 [2,068 B] Get:3 http://deb.debian.org/debian trixie/main arm64 qemu-user-static arm64 1:10.0.3+ds-0+deb13u1 [55.1 kB]

magic:/var/local/mailman/mail# ./mailman i386-binfmt-P: Could not open '/lib/ld-linux.so.2': No such file or directory

# copied over libraries from an old system: magic:/lib/i686# l -rwxr-xr-x 1 root root 171404 Oct 26 16:38 ld-linux.so.2* -rwxr-xr-x 1 root root 1993968 Oct 26 16:39 libc.so.6*

magic:/lib# ln -s i686/ld-linux.so.2 . magic:/var/local/mailman/mail# ./mailman Usage: ./mailman program [args...]

Success!

Well, now when I connect, I see:

The Mailman CGI wrapper encountered a fatal error. This entry is being stored in your syslog:
Failure to find group name for GID 33.  Mailman
expected the CGI wrapper to be executed as group
"www-data", but the system's web server executed the
wrapper as GID 33 for which the name could not be
found.  Try adding GID 33 to your system as "www-data",
or tweak your web server to run the wrapper as group
"www-data".

Now, this is actually already good: it means the CGI (i386 code) is running on arm64, but indeed there is a library issue because /etc/groups does have "www-data:x:33:". Strace showed it was looking for libnss_files.so.2, which makes sense.

Copied over the lib magic:/lib# l /lib/i686/libnss_files.so.2

-rw-r--r-- 1 root root 50812 Oct 26 17:45 /lib/i686/libnss_files.so.2 magic:/var/local/mailman/cgi-bin# su www-data magic:/var/local/mailman/cgi-bin$ ./listinfo File "/var/local/mailman/scripts/driver", line 107 print 'Status: 405 Method not allowed' ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ SyntaxError: Missing parentheses in call to 'print'. Did you mean print(...)?

Progress! (now the wrapper is running the wrong python). The easy fix is of course to make /usr/bin/python point to python2, but I was trying to resist doing so. however at this point I decided to stop being a purist, and honestly this python2/python3 stuff has cost me so much time in the past already that I'm fine with python being python2. All python3 code calls /usr/bin/python3 anyway.

By now, things are looking better and https://lists.merlins.org/lists/listinfo is returning

Bug in Mailman version 2.1.14
We're sorry, we hit a bug!
Please inform the webmaster for this site of this problem. Printing of traceback and other system information has been explicitly inhibited, but the webmaster can find this information in the Mailman error logs.
[html:/pre+

From there, I had to debug some non trivial permission issues which I think were due to qemu not respecting the setgid bit when running i386 code.

magic:~$ /var/local/mailman/mail/mailman post testlist
Group mismatch error.  Mailman expected the mail
wrapper script to be executed as group "mail", but
the system's mail server executed the mail script as
group "www-data".  Try tweaking the mail server to run the
script as group "mail", or re-run configure, 
providing the command line option `--with-mail-gid=www-data'.

This was all because the CGIs had to be SGID mailman and therefore had to be C binaries because python suid/sgid was considered not safe at the time. This has been fixed many ways in the last 25 years, but I wanted to keep things as is without getting into new rabbiholes :)

Sadly, it went downhill from there and the 2h rabbithole I was trying to avoid, caused me another one I fell into. But it was cool to see I could run intel binaries on rpi5/arm64 when needed
It did how break sgid which is essential for mailman and it turned out the reasonable path of rebuilding since I did have source and even a source tree from 2002 with the right build options still baked in:

magic:/var/local/src/mailman-2.1.7/src# make clean; make; make install
(...)
for f in admindb admin confirm create edithtml listinfo options private rmlist roster subscribe; do     exe=/var/local/mailman/cgi-bin/$f;     /usr/bin/install -c -m 755 $f $exe;     chmod g+s $exe; done
for f in mailman; do     /usr/bin/install -c -m 755 $f /var/local/mailman/mail;     chmod g+s /var/local/mailman/mail/$f; done

Yeah, that took fewer than 5mn and made native binaries. With that the web pages worked right away, but the Email gateway script was still being difficult and exim4 debugging didn't show the output from it, making it hard to debug. This does not even make it clear what the full command line was (need to go in +dall to see it, barely) ro that the command failed.

Works from command line: magic:~$ id uid=8(mail) gid=8(mail) groups=8(mail) magic:~$ ~mailman/mail/mailman post testlist From: marc@merlins.org To: testlist@lists.merlins.org subject: test 7

test

But when sending through exim: >>>>>>>>>>>>>>>> Exim pid=1720374 (delivery-local) terminating with rc=0 >>>>>>>>>>>>>>>> mm21_transport transport returned FAIL for testlist@lists.merlins.org post-process testlist@lists.merlins.org (2) LOG: MAIN ** testlist@lists.merlins.org F=<root@merlins.org> R=mm21_main_director T=mm21_transport: Tainted arg 2 for mm21_transport transport command: 'testlist'

I guess this said what was wrong, but it wasn't clear to me that tainted was an error and not a warning and that it caused the issue. Now this did become another rabbithole I need to solve with exim4 having made tainting a real pain to deal with, especially for the way I'm using exim4's local_part_data, that is still perfectly safe in my use case, but exim4 sadly decided that I cannot be trusted and is forcing an over strict and quite frankly very over bearing tainting system on me that is just breaking me without providing any easy opt out.
I'm honestly not happy with exim4 on that one, especially the complete lack of useful errors in exim logs and poor documentation that gives easy and actionable steps to get out of this hole.

So now, I'm many hours in trying to figure out how to fix exim4 and I'm really really not impressed at how they forced that overbearing tainting mechanism with very little info on how to easily fix things that it broke and that were working safely.

So, exim4 took much longer to fix than it should have, here's a new page on it: Part #2 was unfortunately much more painful in an unnecessary way due to a poorly made forced API change in exim4


More pages: November 2025 October 2025 July 2025 November 2024 September 2024 June 2024 April 2024 December 2023 August 2021 May 2020 August 2019 February 2016 July 2014 March 2014 December 2013 November 2013 January 2013 August 2011 July 2011 August 2010 June 2010 May 2010 March 2010 February 2010 December 2009 November 2009 March 2009 January 2008 December 2007 November 2007 July 2002 October 2001

>>> Back to post index <<<

Contact Email