2026-03-08 hylaeus
With love to Squarepusher.
Please send me links to public repositories containing SuperCollider code. Get in touch if you'd like to share more private source code with me, if you are concerned I might have your source code and you'd like me to remove it, or if you'd like to volunteer to help curate a large collection of SuperCollider source code.
I've been working in a vacuum on Hadron for a few years now while building many of the fundamentals
required for an interpreter, such as a lexer, parser, and "semantic analyzer," a concept borrowed
from LLVM's C++ compiler clang.
However, as I'm starting to work on a code formatter, and building up some support code for the language server, it's become readily apparent that I need to build a database of as much extant SuperCollider source code as possible. The more diverse styles and idioms of SuperCollider usage I can capture, the better.
Building a sample source code corpus affords a number of advantages for Hadron, and possibly some benefit for the SuperCollider project in general:
With that in mind, I've started a collection of publicly available SuperCollider source codes. I made
a corpus repository in the Hadron organization on Codeberg, but I've made it private to members of
the Hadron organization only, to avoid it being an obvious target for AI scrapers but also to allow
for the possibility that some folks might be willing to share private code with the corpus, and I
wanted to keep those access controls in place.
I've already gathered around a million and a half lines of code from three sources:
directory.txt file containing repository
links for each quark. I wrote a python script to add each one as a submodule to the corpus.I still need to comb through the Awesome SuperCollider lists, and I think some folks are moving to Codeberg so it's worth a search there, too. Additionally I think GitHub could use another pass through, as well.
I had a queasy moment when trawling through GitHub for SuperCollider code. I was getting a little close to the line for my comfort in terms of starting to resemble Large Language Model scraper behaviours. So far, I have taken the opt-out approach to the publicly available SuperCollider source code. Meaning, I have taken the fact that the author has posted their code on GitHub as implied consent for inclusion in the corpus. This seems to me to be much less of a stretch than implied consent for inclusion in the training set of a for-profit generative language model. But, it still feels like a strech.
I'm going to raise awareness about the corpus by circulating this blog post in a few spots known to be popular with SuperCollider users, in the hopes of both generating additional submissions of source code, but also in the interest of gathering feedback about it.
These are, in some ways, "complicated" ethical times. "Complicated" is a polite way of saying that as AI eats my industry I've seen a bunch of folks I respected seem to lose sight of some of the basics in terms of how harmful this technology is, and how dangerous. It makes me wonder about my own ethical sensibilities, and I find myself worrying and questioning when my plans start to resemble those of the robots.
As always, I'd love to hear your thoughts.