I owe a lot to Mark of the Unicorn’s Professional Composer. Had my dad not encountered this program around 1985 and subsequently adopted it (and its corresponding Mac hardware) for himself two years later, I would not have grown up with Macs, possibly even computers. I certainly wouldn’t have become as familiar with music notation software, let alone music theory, as I am today. My dad tells the origin story thusly:
After Graduate School (1985 or so) I became familiar with the notation program called Professional Composer™. The program was housed in the ISU [Illinois State University] computer lab where it was run on several Macintosh 512 computers. I’ve always been something of a visual learner when it came to things like this and found I could navigate this program rather easily without having to read a manual. I began by doing easy arrangements of trumpet quartets. Early on, these programs ran on 3 1/2″ floppy disks which meant that your files couldn’t be very big before you’d need another disk. This gave way to hard drives but even then you were still limited as to how big your files were or how many files you had on the drive.
I remember asking upper-level administration in District #131 that if I bought the hardware (in this case a Mac Plus), would they buy me the software for this program? They agreed and the rest is pretty much history.
MOTU discontinued Professional Composer (hereafter referred to by the nickname “ProCo,” courtesy of my friend and fellow hacker Josh Juran) sometime after its final 2.3M revision was released in 1990. My dad used this program almost every day for 20 years(!), after which I had convinced him to crossgrade to MOTU’s Composer’s Mosaic. The latter offered better MIDI playback and print layout capabilities, plus could import his by this time extensive library of ProCo files. However, neither ProCo nor Mosaic files can be fully imported into any modern music notation program. It therefore fell upon me as the family’s computer expert—and continues to even now—to ensure that the hardware powering my dad’s favorite music software continues to run despite all other advancements. As a matter of course, I have an intimate knowledge of the capabilities, requirements, and quirks of these two programs. (I have a lot of sympathy for banks and government institutions tasked with similar mandates.)
By the time ProCo 2.3M came out, hard drives were common and the ProCo application itself had long since outgrown its original home on a 400K (later 800K) boot disk. So 2.3M shipped on an 800K disk offering the option to install or remove itself to or from an attached hard drive, respectively, if launched from the master disk. Installing to the hard drive decrements an “install count” on the master disk, allowing the user to use one master disk to install ProCo to one hard drive at a time. Merely copying the ProCo application to a hard drive isn’t enough; if the application isn’t properly installed by the master disk, launching the program from hard drive prompts the user to insert the master disk if not already present. I remember accidentally wiping out at least one of these hard drive installations from my dad’s Mac as a curious tinkerer in my youth (for which Dad was not pleased), leading Dad to request/beg MOTU for one final backup master disk some time in the mid-90s. It is a testament to the quality of 3.5″ floppies back then along with how well my dad takes care of them that the disks remain usable, some 30+ years later.
Eventually Dad got me my own Mac(s), where I could hack away safely—safe from his expensive software, at least. But as I grew as a hacker and programmer, acquiring and installing various software packages of my own, encountering—and defeating—assorted authentication schemes, going deep down the rabbit hole of the inner workings of Mac OS, and even dipping my toes into the field of software preservation, the music notation program that started it all continued to elude me. Why can’t Disk Copy or DiskDup produce a working substitute for the ProCo master disk? Why is it so difficult to duplicate the master disk using, for example, a KryoFlux? Why can I install ProCo to an emulated HD20 via my Floppy Emu, but not to a mounted Disk Copy disk image? Why does ProCo crash when After Dark kicks in? Why does its installer only appear when launched from the original master disk? And how does ProCo know that it has been properly installed?
I’m determined to finally find out.
ProCo has a minimal About dialog, displaying the name of the software, the version, its copyright years, and credit only to “Mark of the Unicorn, Inc.” I therefore have no real idea who wrote it, despite having asked MOTU via email and Twitter for the source code multiple times over the years. Ultimately my goal here is to develop a conversion utility that brings ProCo files into the 21st century, so I’d even be satisfied with documentation of its file format, but I would be surprised if there is anybody left at MOTU who is even aware of Professional Composer, let alone familiar with a product they haven’t supported in over a quarter century.
So off to the disassembler we go. 😉
A Short Primer on Memory Management and Launching 68K Applications on the Mac
Much of the Mac’s software is split into chunks of data and code called “resources” that can be swapped in and out of RAM as needed. In order to maximize the use of the sparse amount of memory on the original Macintosh, its designers traded a tiny bit of speed for greater efficiency when building the Memory Manager. When asked to load a resource from disk, the Resource Manager returns a “handle” to the loaded resource, which is a pointer to an OS-controlled “master pointer” pointing to a relocatable block within the heap. The Memory Manager can then be allowed to move or “compact” relocatable blocks in the heap, or even remove/”purge” such blocks when available RAM is running low. This relieves applications from some of this management burden and is leveraged throughout the Mac System Software. In addition to allocating their own handles and non-relocatable blocks, applications may mark existing handles with various attributes to guide the Memory Manager in its housekeeping; for example marking a handle “purgeable” allows the Memory Manager to free its associated block, and likewise “locking” a handle prevents it from being moved or freed.
Each time you double-click on a 68K application to launch it from the Finder, a carefully orchestrated sequence of events takes place:
- Finder calls the
_Launch
trap with the name of the application. - The Segment Loader opens the resource fork of the file passed to
_Launch
and immediately preloads ‘CODE’ resource ID 0. ‘CODE’ 0 is a specially formatted ‘CODE’ resource. It contains the parameters necessary to set up a non-relocatable block of memory near the top of the application’s memory space containing application and QuickDraw globals, any parameters passed to it from the Finder, and the application’s jump table. Following these parameters is the jump table itself: a list of tiny eight-byte routines that each load a ‘CODE’ resource, or “segment,” and jump to an offset within that segment. - Using the parameters at the start of ‘CODE’ 0, the Segment Loader allocates space for globals pivoting around register A5, which is eventually passed to the application. This is known in Mac programming parlance as the “A5 world” and is unique to each running application.
- The jump table is copied above the Finder’s application parameters in the A5 world and the ‘CODE’ 0 resource is released.
- The first entry in the jump table is executed, and the application takes control.
The relative jump instructions of the original 68000 processor are limited to signed 16-bit offsets, so branches or subroutine calls are limited to 32K offsets in either direction from the current program counter. In order to accommodate programs with more than 32K of code under the memory constraints of the original 128K Macintosh, the Segment Loader was invented which manages applications split into ~32K code “segments.” Code within each segment can make intra-segment jumps (branches or subroutine calls), but once a subroutine is needed outside a particular segment, a call must be made to the jump table which in turn loads the necessary segment. New segments are returned as handles to relocatable blocks just like any other resource, so as they are loaded the Memory Manager automatically compacts the heap and/or frees purgeable handles to make room in RAM. Recall that the jump table is copied to a known location relative to the A5 register, so applications always have easy access to it. But since new code segments are created at locations on the application heap unknown at compile time, this also means that all code segments are invoked assuming position-independent code, meaning all branches and subroutine calls are relative.
All 68000 processors support branches to absolute addresses utilitizing the full usable width of the address bus. The 68020 and later processors support larger relative branch offsets, so segments are not necessarily limited to around 32K. Well-behaved applications check at launch that the host Mac has their necessarily capabilities, and exit early if not. But for maximum compatibility, some applications built with compilers such as CodeWarrior are generated with a table of offsets to absolute branch instructions within each code segment. These instructions are compiled as jumps to offsets within the segment relative to zero—sure to crash the Mac if executed as stored. But in a small bit of “preflight” code, these absolute branches are fixed up to point within the segment, providing larger branch offsets to all Macs. This is how the ‘rvpr’ 0 resource was compiled for the Pippin.
The first thing we notice is that ProCo’s jump table contains one valid entry and then… a lot of nonsense. This is a likely sign of an encrypted jump table—Epyx’s Temple of Apshai along with Winter Games also uses this obfuscation trick to scare off casual hackers. In fact, almost all of ProCo’s ‘CODE’ resources look to be encrypted! If I hope to make any sense out of ProCo’s file format by looking at its code, we’ll need to derive the algorithm that decrypts the rest of this.
MOVE.W #$0029,-(A7)
pushes the ID of ‘CODE’ resource 41 onto the stack prior to jumping into it via _LoadSeg
. Once there, we start by allocating a couple of memory blocks to use, starting with a 384-byte block of memory that we’ll call the “environment” block. We stash the stack pointer into offset 28 of that block, push a pointer to our environment block onto the stack, then stash the value of the ScrDmpEnb
global into offset 32 of our environment block.
ScrDmpEnb
is short for “screen dump enable” which originally meant whether the screen shot feature is enabled via Command-Shift-3 on Macs, but grew to include other FKEYs as well. One popular third-party FKEY available to hackers was the “Programmer’s Key,” which drops into the installed debugger when invoking e.g. Command-Shift-7, providing a way to drop to MacsBug without a physical programmer’s switch installed on the side of the machine. But the same functionality could be had by simply writing your own equivalent FKEY, following instructions provided in the official MacsBug manual. MOTU certainly couldn’t have made it easy for most would-be crackers to just conveniently drop into the debugger during the startup process of their precious software, so ScrDmpEnb
is set to zero, effectively disabling all FKEYs.
What the FKEY?!
The Apple ][, Lisa keyboard, original Macintosh keyboard, and most keyboards that later shipped with 20th-century Macs, did not feature what we know as “function keys”: the F1-F12 (and beyond) keys that adorn the top of most keyboards today. These devices more closely mimicked typewriter keyboard layouts that many users were familiar with at the time of their respective introductions. But on the Apple ][, there are a few reserved keyboard shortcuts that are always available to the user: Control-Reset breaks out of the currently running program, and Control-Open Apple-Reset resets the computer, for example.
These shortcuts are hardcoded into the ROM and not easily modifiable by the user. On the original Mac, since localization (in particular, keyboard layouts) fell out of the disk-based System Software’s foundation built on resources, it’s only natural then that shortcuts be handled by the OS in a modular way as well. So Apple made up for the lack of physical function keys by providing several “virtual” function keys bound to Command-Shift-numbers. When invoked, these shortcuts run tiny programs stored as ‘FKEY’ resources in the System file, which is why they are known as “FKEYs.” Programmers quickly discovered that they could write their own tiny FKEY programs and install them into the System file assigned to otherwise unused numbers.
The original set of FKEYs as shipped in 1984 are as follows:
- Command-Shift-1: eject the first/internal floppy disk, if present
- Command-Shift-2: eject the second/external floppy disk, if present
- Command-Shift-3: take a screenshot and save it to disk
- Command-Shift-4: take a screenshot and print it
The first two ejecting FKEYs went away with the introduction of Mac OS X in 2001, as Apple had stopped shipping Macs with floppy drives by then (though macOS continues to support external drives natively). But Command-Shift-3 lives on as the assigned shortcut for saving screenshots to disk—one of the few remaining holdovers from the original 1984 System Software.
We then allocate a handle to a new locked 82-byte “context,” pop the pointer to our environment block into offset 48 of our context, then push our newly-allocated context’s handle onto the stack. Next we pass the pointer to the top of the ‘CODE’ 41 resource we’re executing from to _RecoverHandle
to get our ‘CODE’ resource’s handle. We store this handle at offset 0 of our context, then store its master pointer at offset 4. We finally recover our context’s handle from the stack before passing it to the first real “stage.”
Stage 1: Front Line Disassembly
Stage 1 is fairly simple and calls three subroutines before launching into Stage 2, looking roughly like this when decompiled back to pseudo-C code:
initContext(&context);
context.aggregateChecksum = 0;
updateStage2Checksums(&context);
decryptStage2(&context);
jumpTo(context.stagePtrs[1]); // jump to stage 2
So let’s break it down, function by function. We start with initContext
, which looks like this:
void initContext(Ptr contextPtr)
{
for (short i = 0; i < 8; ++i)
{
contextPtr->stagePtrs[i] = stageInfoCmd(
GET_OFFSET,
nullptr,
i,
contextPtr->code41Ptr);
}
contextPtr->stageGlobalsPtr = contextPtr->stagePtrs[7];
contextPtr->code41Size = _GetHandleSize(contextPtr->code41Handle);
contextPtr->code41End = contextPtr->code41Ptr
+ contextPtr->code41Size - 1;
}
initContext
initializes a block of eight pointers to the results of stageInfoCmd
, which looks like this:
enum StageInfoCmdSelector
{
SET_OFFSET = 0,
GET_OFFSET = 1,
GET_OFFSETS_ARRAY = 2,
};
Ptr stageInfoCmd(
StageInfoCmdSelector selector,
Ptr stagePtr,
short index,
Ptr codePtr)
{
static short offsets[] =
{
0x0086, // offset to Stage 1 from top of 'CODE' 41
0x0388, // offset to Stage 2 from top of 'CODE' 41
0x0D24, // ?
0x3528,
0x3006,
0x1E66,
0x49DA,
0x4C4C,
};
Ptr outPtr = nullptr;
switch (selector)
{
case SET_OFFSET:
offsets[index] = (short)(stagePtr - codePtr);
break;
case GET_OFFSET:
outPtr = codePtr + offsets[index];
break;
case GET_OFFSETS_ARRAY:
outPtr = offsets;
break;
}
return outPtr;
}
So initContext
initializes a block of eight pointers in our context to point to eight different areas of 'CODE' resource ID 41. The first of these pointers points to the Stage 1 code we're currently executing, and the second of these pointers points to the encrypted block of 'CODE' 41 immediately following the bits of Stage 1 that are recognizable as executable code. This eventually becomes the Stage 2 code that we jump to later. (It's interesting that stageInfoCmd
is only ever called with the GET_OFFSET
selector, making the rest of that function dead code.)
Next we initialize a field in our context that's used to store an aggregate of all computed checksums. We then call updateStage2Checksums
which looks like this:
long updateStage2Checksums(Ptr contextPtr)
{
contextPtr->latestChecksum = calculateChecksum(
contextPtr->code41Ptr,
contextPtr->stagePtrs[1],
'PACE');
contextPtr->aggregateChecksum += calculateChecksum(
contextPtr->stagePtrs[1],
contextPtr->code41End,
'PACE') + contextPtr->latestChecksum;
return contextPtr->latestChecksum;
}
updateStage2Checksums
in turn makes a couple of calls to calculateChecksum
, passing each call the blocks leading up to, and following Stage 2, respectively.
long calculateChecksum(Ptr startPtr, Ptr endPtr, long seed)
{
long size = endPtr - startPtr;
long cksum = seed;
long sizeInLongs = size / sizeof(long);
Ptr ptr = startPtr;
if (sizeInLongs != 0)
{
size -= sizeInLongs * sizeof(long);
for (long i = 0; i < sizeInLongs; ++i)
{
cksum += *((long*)ptr)++;
}
}
if (size > 0)
{
for (long i = 0; i < size; ++i)
{
cksum += *(byte*)ptr)++;
}
}
return cksum;
}
One thing that sticks out immediately to me is the use of the longword 'PACE' as an initial checksum seed. 'PACE' is likely a reference to PACE Anti-Piracy: a company founded in 1985 that's still around today. MOTU adopted PACE's protection code for ProCo starting with version 2.1, released in late 1987. Indeed, pirated versions of ProCo existed as early as 1985; some are still available on the Internet. With 1.0 and 2.0 unencrypted, "sharing" these was much easier than with later versions. My dad acquired his Mac Plus / ProCo combo in the summer of 1987—possibly June of that year—so with 2.1 having a creation date of September 11, 1987 and assuming MOTU shipped the latest version with new orders, it follows that 2.0 is the earliest version my dad owns. MOTU periodically shipped new master disks containing updated versions of ProCo to registered owners as they became available, free of charge—a practice I commend them for. 🙂
Now that we have our checksums, we're ready to head into decryptStage2
. The decryption code takes the checksum of the code to be decrypted as the "key." This makes binary patching the ProCo application an involved process, since the remainder of the code would need to be reencrypted for decryption with its new checksum to succeed. One thing is for sure about this protection: it is designed to be resilient against quick-and-dirty patches.
void decryptStage2(Ptr contextPtr)
{
decryptStage2Block(
contextPtr->latestChecksum,
contextPtr->stagePtrs[1],
contextPtr->code41End);
}
void decryptStage2Block(long key, Ptr startPtr, Ptr endPtr)
{
long size = endPtr - startPtr;
long sizeInLongs = size / sizeof(long);
Ptr ptr = startPtr;
if (sizeInLongs-- != 0)
{
long longsLeft = sizeInLongs;
do
{
long rotCount = key & 0x0F;
if (rotCount == 0)
{
rotCount = 1;
}
key = rotateRight(key, rotCount); // Ror.L in 68K
*((long*)ptr)++ ^= key;
}
while (longsLeft--);
size -= sizeInLongs * sizeof(long);
}
if (--size >= 0)
{
long bytesLeft = size;
do
{
long rotCount = key & 0x0F;
if (rotCount == 0)
{
rotCount = 1;
}
key = rotateRight(key, rotCount); // Ror.L in 68K
*((byte*)ptr)++ ^= (byte)key;
}
while (bytesLeft--);
}
}
With Stage 2 now fully decrypted, we can jump right in by loading contextPtr->stagePtrs[1]
into A0 and JMP
ing right to it.
That wasn't so hard, was it? 😛
There is still at least another stage of this protection to get through, and we're hardly that much closer to decrypting the remaining 'CODE' resources or the jump table. Knowing PACE's reputation, this is likely a small victory in what will ultimately be a long battle. ProCo is legendary in some Mac circles for its copy protection, so if what I've heard of it is true, then I'm surely in for a ride. PACE even makes a bold claim on their own website:
We know it sounds like an unrealistic boast to say our anti piracy software cannot be cracked. Our goal is to stay ahead of the curve and hacking trends. We avoid giving known hooks or patterns that they recognize, and we pepper our anti piracy solutions with methods that we know are time consuming and difficult, if not impossible, to remove.
Challenge accepted. 😉