I haven’t figured out yet how I’ll be announcing updated to NusachDB, my database of Jewish liturgical melodies and chants. One thought was that I would somehow automate git logs, but those are really private messages and I don’t exactly want to make them public. One possibility is to figure out which tunes or sections have been added and link to them, but quite a bit of infrastructure needs to be built for this to happen at this point. But I do have this blog, so I might as well post here. I will warn you, however, that postings here will be relatively infrequent compared to the actual updates. I add new tunes nearly every night, sometimes even more than once in a day (though not so often when I’m traveling, which will happen next week and I’ll be out for about three weeks). I’m not going to make posts like they’re commit messages. That’s what commit messages are for. This will be reserved for bigger and more interesting updates and milestones.
Which leads me to today. Since I started NusachDB, I kept a count in the navigation bar of how many tunes a particular section has, as a way of monitoring progress and gauging completeness. You might see that there are 92 tunes for Shabbat Dinner, for example, and figure out that there are more now than there were yesterday. Now, not all of these are full-fledged tunes, and many are counted multiple times. For example, the entire first half of the service of Kabbalat Shabbat uses the same nusach — the same chant — but with each passage having obviously different words. Each one is counted separately, and when the start and end of a passage are listed separately, each of those is counted separately as well. There’s really no way around this, I’m pretty sure, for reasons of data integrity. However, it’s also often the case that the same text is repeated at multiple places. Adon Olam, for example, may be sung at weddings, at the start of the daily morning service, at the end of Shabbat Maariv, at the end of Shabbat Musaf, etc. The same tunes are often used for all of these (though not necessarily, which makes another set of problems). This is accomplished by using reference links in the data files. A section to be linked — say, Adon Olam, in the file Common Tunes.xml — is given an id — “Adon Olam”, and when another service wants to borrow it, it can cite the filename and the id. This saves on code duplication, but the entries are duplicated in the database, because each tune needs to be aware of its parent sections and nodes. The same tune will appear multiple times with the same title, images, recordings, etc. but different paths.
Speaking of paths, one of the great advantages of, say, MongoDB over SQL is the ability to store nested documents/records/whatever you want to call it. But I’m using MySQL because I need the relational part of the paradigm. What to do? Simple: the tree is represented by a string listing the nodes in order, with some arbitrary separator (‘ | ‘ in this case), and it’s easy to find the direct parent — break off the last bit and find the node with that path — or to check whether a node is an ancestor of another — check whether its path is an initial substring of the other. What’s happening here is that a particular tune is at the end of a path of sections and services. For example, one service path might be Shabbat | Shacharit and its section path might be K’riat Sh’ma | Sh’ma | Sh’ma Yisrael | Melody 1, which would denote Melody 1 of Sh’ma Yisrael, which is part of the Sh’ma, the central passage of the K’riat Sh’ma, at Shabbat Shacharit. The tunes are exactly the same as for Shabbat Maariv, so the database entry for the tune at Maariv would have a service path of Shabbat | Maariv instead, with, coincidentally, the same section path since the two services are similar in this way.
The problem comes when we want to count them, because both services refer to the same melody, but there is no way to figure out that they’re the same melody. One solution would be to include unique id’s, but let’s get real; I’m writing these XML files by hand, so I can’t generate and maintain unique id’s for every tune. I can’t go by the title because almost every section has a Melody 1. The section lineage is what makes the tune unique, but since it can have different section lineages in different services, that’s not very helpful. So I thought that maybe I could figure out what file they’re in. That was nice! I can do that. When I reference another file, I need the name of the file, right? So I can just keep passing that around. Great! Doesn’t solve the problem, because that’s not nearly enough information to uniquely identify the tune…
So I came up with another path: the XML lineage inside the file. THAT is unique to each XML node, which the tunes are. Perfect! Except… there’s no way to easily get a lineage from an arbitrary XML element without holding the entire XML document in memory. There was a solution to that, though: whenever a section reference is encountered, whether a foreign reference or the referent, the path resets to filename | refid for the referent. This has to be unique within each file anyway or the import script will throw an error. Now, all that remains is for the counting function to count distinct paths, and hey, we get an accurate count! Shabbat went from having over 770 tunes to 602, simply by virtue of not counting the same tune nodes multiple times. It means that tune totals don’t add — Friday Afternoon currently has 48 tunes, all of them from other places because I haven’t actually collected any tunes for this particular portion of the liturgy yet. Combining them with Kabbalat Shabbat’s 306 tunes would yield… 306 tunes, because all of the ones in Friday Afternoon are also listed under Kabbalat Shabbat. I also added a Total field to gauge overall progress, and…
So close. Maybe tomorrow I’ll break 1000.