Pippin Kickstart 1.1

I closed out 2020 by piecing together a minor update to Pippin Kickstart (then spent most of January writing this blog post 😛 ). Version 1.1 patches the SCSI Manager on Pippins with ROM 1.0 (most models with a white case) so that they too may boot from SCSI devices other than the internal CD-ROM drive, such as an external hard drive. If you have a 1.2 or 1.3 Pippin and are happy with Pippin Kickstart 1.0.x, then 1.1 adds no new functionality other than some cute graphics (see below).

Download it here: pippin-kickstart-1.1.zip. Extract and burn pippin-kickstart-1.1.iso to a CD-R using the software of your choice.

Source code is available here, licensed under the GPLv2: https://bitbucket.org/blitter/pippin-kickstart

The Pippin was Bandai and Apple’s ill-fated collaborative attempt to break into the video game console market by marrying two things I love: Macs and video games. Bandai launched the Pippin in 1996 amid fierce competition from other fifth-generation consoles like the Panasonic 3DO, Sega Saturn, Sony PlayStation, and Nintendo 64. Based on Macintosh technology, the Pippin is capable of running Mac software and vice versa, but Apple built some software-based security into the Pippin’s boot process making it difficult to use the Pippin just like any another Mac. In part because of its high price and lack of developer support—both internally and externally—the Pippin was considered a commercial failure and Apple subsequently canceled the project in early 1997, with Bandai following shortly after in 1998.

I grew up playing games on Macs through the 90s and early 2000s, so I’ve always had a soft spot for the classic Mac OS. I learned how to program on a Mac and nurtured those coding skills over several years, which I later parlayed into a modest career in video game development. All the while I noticed that while other vintage consoles were getting renewed attention due to burgeoning homebrew developer scenes of their own, the poor Pippin was being left out in the cold. By the late 2010s I figured that since nobody had paid much attention to Apple’s foray into video games, then I may as well, especially given my nostalgia for classic Mac gaming. So I cracked its signing key in May 2019 and shortly thereafter released a boot disc called Pippin Kickstart that made it easier for folks to test their own Pippin CD-ROMs.

When I released Pippin Kickstart 1.0.1 at the beginning of July 2019, I thought it was a done deal. Owners of 1.2 and 1.3 Pippins could boot from any SCSI device they wanted, and 1.0 Pippin owners could boot from any CD-ROM they wanted, owing to its lack of support for other SCSI devices. Unsigned booting was finally a reality on the Pippin, and I could rest easy not having to worry about this little project anymore.

That is, until September 2020, when @LuigiThirty on Twitter picked up a 1.0 Pippin and wrote about this obstacle she encountered while working on some homebrew:

Not only was her Pippin refusing to boot from a known-good external SCSI hard drive, but the drive wouldn’t work with her Pippin at all. Hard drives are the earliest and most basic SCSI storage devices available for Macs, but even when unformatted or without drivers installed, formatting utilities should still recognize that a SCSI device is attached and available for use. Perhaps the 1.0 Pippin’s ignorance of external SCSI devices had less to do with driver support in ROM and more to do with artificial limitations. As a hacker, artificial limitations offend my sensibilities, so I revisited Pippin Kickstart with an aim to do something about this.

The Problem

The consumer model of the Bandai Pippin is designed to boot exclusively from its internal CD-ROM drive. In late 1996, a revised ROM—1.2—was offered as an upgrade allowing the use of external SCSI devices, but particularly the Deltis 230 MO Docking Turbo which provides a magneto-optical drive “docked” to the underside of the Pippin. A developer dongle could be attached to Pippins equipped with ROM 1.2 to enable booting from external SCSI devices. But the earlier ROM 1.0 completely ignores all SCSI devices (and dongles) other than the internal CD-ROM, precluding altogether the use—not to mention booting—of external SCSI volumes. Signs point to the ROM’s low-level SCSI Manager as the culprit, since while mandatory initial booting from CD-ROM is standard across all retail Pippins, only ROM 1.0 refuses to see additional SCSI devices after the Pippin has fully booted.

However, the “GM Flash” ROM supplied with developer and test Pippin kits is nearly identical to the final 1.0 ROM, save for a few minor changes to enable debugging. Indeed, the GM Flash ROM has an Open Firmware timestamp of 1996-01-28, while the 1.0 ROM has a corresponding timestamp of 1996-01-29—just one day apart, suggesting that the two ROM versions were built from the same codebase. Among the differences, the GM Flash ROM can enumerate all SCSI devices at all times and may attempt to boot from any of them.

The Solution

All known Pippin ROMs load their SCSI Manager code from a ‘nitt’ resource located in ROM. Upon closer examination it appears that the ‘nitt’ resources with ID 43 (hereafter referred to as “‘nitt’ 43”; SCSI Manager 4.3 was codenamed “Cousin Itt.”) in the GM Flash and 1.0 ROMs—where the SCSI Manager code is stored—are the exact same size and, aside from timestamps, differ by only nine bytes. These nine bytes make up code that check a SCSI device’s ID to determine whether or not it should be considered. In the GM Flash version, this code verifies that the ID is between 0 and 7 inclusive (all legal SCSI IDs), whereas in ROM 1.0, this code only passes devices with an ID of 3, the internal CD-ROM drive. Given that the GM Flash and 1.0 ROMs are so closely related, it’s reasonable to hypothesize that the 1.0 ROM can use a SCSI Manager from the GM Flash ROM. Replacing ROM 1.0’s ‘nitt’ 43 with the GM Flash version should therefore be the most straightforward fix. Barring that, patching those nine bytes to match those of the GM Flash ROM should be sufficient to make ROM 1.0’s SCSI Manager functionally equivalent to that of the GM Flash ROM.

Finding The Problem

Pippin Kickstart was first conceptualized as a custom SCSI CD-ROM driver. My thinking was, since Macs automatically load device drivers from the first few partitions of an Apple-formatted disk, why would the Pippin behave differently? The longer story is written in my blog post about that, but the short answer is that the Pippin simply ignores patch and driver partitions. The Pippin has its own .AppleCD driver in ROM, which has to be loaded and active before it can boot, well, anything. Perhaps the thinking at Apple was, “since .AppleCD has to be working to boot the Pippin into an OS, there’s no sense in patching it that early. Let the OS do that if need be.” Since the signing process is only applied to the boot volume and no other partitions, maybe it was a conscious effort to block patches and custom drivers from executing their own unsigned code early enough to work around the Pippin’s security. Whatever the reason, I discovered quickly that Pippin Kickstart’s payload couldn’t sneak in through a back door.

Ultimately, I reverse-engineered the signing keys and implemented Pippin Kickstart as a simple bootloader, signed using Apple’s private key so that it launches on any retail Pippin without resorting to any sneaky tricks. My own code implements its own boot candidate search loop, mimicking the loop in the Pippin ROM’s Start Manager (in fact calling some support functions in ROM as necessary) but omitting an authentication and driver check. I implemented my own loop because the ROM’s loop, being read-only, can’t be patched in-place. But since the ROM’s search loop is part of the early startup code, frozen in ROM rather than loaded as a resource, the locations of code in ROM needed by Pippin Kickstart are always at known, fixed addresses. They never change, so I can hardcode them directly into Pippin Kickstart’s logic.

🎵 Call me (call me) on the line / Call me, call me any, anytime 🎵

It’s a different story with the SCSI Manager. The SCSI Manager is a low-level library used to enumerate and wrangle access to any and all SCSI devices attached to a Mac or Mac-based system like the Pippin. If you want to get into the party where the SCSI devices are, the SCSI Manager is both the bouncer and the emcee. In order to get to the point where any user-provided code—including Pippin Kickstart—can be loaded at all, it has to be read from a drive, a process which the ROM starts by first querying the SCSI Manager for where and how it can ask a drive for anything. The SCSI Manager therefore has to be loaded and active before the Start Manager even checks to see if it can boot from anything. Furthermore, as I point out above, the 1.0 ROM’s SCSI Manager appears to reject external devices even after we’re done booting, so the SCSI Manager has to persist as long as it may be in use.

The Pippin’s earliest boot code in ROM—from the time it powers on—executes natively on the PowerPC, configuring some low-level hardware and initializing a 68K emulator. But soon after that it enters the emulator to launch the boot code located in the Toolbox, which is predominantly written in 68K assembly. In fact most of the Pippin’s ROM targets the 68K instruction set architecture (or “ISA”), but some portions target the Pippin’s native PowerPC ISA. To understand why this startup code (and, by extension, Pippin Kickstart) runs in emulation rather than natively, we need to look back in time to when the Pippin’s software was being designed.

In 1994, Apple released their first PowerPC-based Mac. The Power Macintosh 6100/60 sports a PowerPC 601 processor running at 60 MHz and shipped with System 7.1.2. The development of the 6100 is a very interesting tale, with some parallels to how the most recent Apple Silicon-based Macs came to be, particularly when it comes to emulation. The Mac had a relatively rich library of both first-party and third-party software, a relatively mature OS that was going to be ten years old by the Power Macs’ release, and a developer community that was used to working with the Mac and how to flex its muscles. Throwing that all away and starting up again from scratch—especially with Windows 95’s release on the horizon—would not have made the best business sense given the short period of time in which Apple had to make the transition. Thus the decision was made to use emulation to run the Mac’s existing software library—including most of its operating system—on the new PowerPC-based machines, with the idea that modules of the OS could be replaced piece by piece over time rather than all at once. In turn, existing software and development knowhow would continue to retain its value, and developers were under less pressure to produce PowerPC-native versions of their software right away. This strategy paid off in a big way; the new Power Macs were a hit, and soon became the basis for the Pippin platform.

The Power Macintosh 6100/66av, a.k.a. the Pippin prototype prototype

Due mostly to time pressure, only the most-often used portions of the Mac’s System Software were rewritten as native PowerPC code for the initial lineup of Power Macs. Namely, QuickDraw was given the native treatment, while components that still had many 68K dependencies—such as the SCSI Manager and disk drivers that had heretofore been written to work with it (targeting the 68K, mind you)—were left emulated, for the time being anyway. It didn’t take long for Apple to let the SCSI Manager into the native club. Just 15 months after the first Power Macs hit the market, the Power Macintosh 9500 arrived on the scene in June 1995, utilizing the PCI standard as the first of the “second-generation” Power Macs and featuring a native SCSI Manager 4.3 built into its ROM. The new PCI-based Power Macs—particularly the Power Mac 7500—lent many of their features and specifications to what eventually became the final hardware and software design of the Pippin.

The opportunity to transition to a new processor architecture in turn gave Apple the opportunity to learn from the effects of previous design decisions and implement more modern corresponding changes. The original Macintosh operating system was designed in 1983 to run one application at a time on a computer with no built-in storage, 128 kilobytes of RAM, no virtual memory, and a 68000 processor which was limited to relative branches up to a range of 32 kilobytes in either direction. User-provided 68K code must be loaded into RAM before it can be executed; Macintosh engineers invented the Segment Loader as a sort of virtual memory to allow larger applications to be broken into code resources swapped in and out 32K at a time. Enterprising hackers later figured out how to work around this to get much larger segment sizes, but segments still have to be loaded into RAM regardless. Generally, all the code associated with a 68K application has to live in that application. “Dynamic” or “shared” code libraries were not explicitly supported by the operating system, so the best one could hope for were system extensions/patches offering either new official system APIs, or third-party de facto standard interfaces informally agreed upon by multiple applications. This was not the most stable environment in which to develop scalable software.

The original Macintosh, a.k.a. the Pippin’s granddaddy

By contrast, the upcoming Power Macs would feature cooperative multitasking with System 7, paged memory support, an internal hard drive, multiple megabytes of RAM, and a completely different ISA which could support much larger branch sizes. Apple had almost ten years of Macintosh development experience by the time they began designing software for the upcoming Power Macs, and had taken lessons to heart in terms of how to improve their operating system to better scale to modern software development practices. The new PowerPC code that would run natively on the improved operating system would not use the old-style Segment Loader. Instead, code blocks on the hard drive could be mapped and executed directly as pages in memory without having to copy them to physical RAM. Self-contained blocks of PowerPC code, known as “fragments,” are defined in terms of multiple “sections”—both code and data—and could be paged/loaded into memory once and then optionally shared with multiple applications, either as application plugins or standalone libraries in their own right. Each fragment could have its own block of globals and constants loaded into RAM for it to use, with their initial values specified in the fragment’s definition. The standard interface in the Mac OS to fetch these fragments and their entry point(s) at runtime became known as the Code Fragment Manager, or “CFM.”

There are three ways to load a fragment, and the CFM provides three respective functions to accomplish each one:

68K application code lives by and large in that application’s resource fork, leaving the data fork typically empty; one or more ‘CODE’ resources are swapped in and out at runtime by the Segment Loader, using a jump table in ‘CODE’ 0 as an initial reference point. To support “fat” binaries that can run on 68K Macs but also natively on Power Macs, a PowerPC-aware application stores its code in the otherwise unused data fork of an application, using a ‘cfrg’ resource as an initial reference point. When booting from ROM, there is no concept of “resource forks” or “data forks” until Mac filesystem code is loaded, but the Resource Manager doesn’t require resources to live in a file to be located and used. There exists a provision to load resources from a resource map located in ROM instead, and this is how the native SCSI Manager and other modular components are loaded despite the rest of ROM running in emulation. GetMemFragment is therefore the function used by the ROM; the SCSI Manager isn’t a shared library (at least not by the CFM’s definition), and we obviously can’t call GetDiskFragment until we have the ability to read from disk. During startup, the ROM asks the Resource Manager for a handle to the ‘nitt’ 43 resource in ROM, and then that resource is passed to GetMemFragment to prepare the fragment contained in that resource.

Comparing the ‘nitt’ 43 resources of the GM Flash and 1.0 ROMs reveals that, aside from timestamps in the fragments’ respective headers, the two resources differ by only nine bytes in four places. Curiously, three of those nine bytes are replaced by the value 3 in the 1.0 ROM, where they originally have the value 7 in the GM Flash version. The SCSI ID 7 is the upper bound of what IDs may be assigned to devices (7 is always reserved for the host in Apple’s implementation), whereas ID 3 is that of the internal CD-ROM drive. I suspected this might have something to do with why only the CD-ROM drive is recognized by the 1.0 ROM, but without knowing what the other changed bytes correspond to, I couldn’t know for sure. I’d have to run the two versions through a disassembler.

Three of the four significant differences. Left: GM Flash. Right: 1.0

Finding a sufficient PowerPC disassembler was somewhat of an adventure in itself, but I wound up circling right back to something that I had long since installed on my G3 Power Mac: Apple’s own Macintosh Programmer’s Workshop, or “MPW.” I didn’t know this, but MPW comes with a tool called DumpPEF specifically designed to tear apart fragment containers for analysis. The best part: it includes a very good PowerPC disassembler. All I had to do in MPW Shell was pass along the contents of ‘nitt’ 43 as a data-fork-only file and redirect DumpPEF’s output to a text file, like so:

DumpPEF -do All -ldr All -pi u -dialect PPC601 -fmt on -v "Arthur:Development:Projects:Pippin Kickstart:1.1:nitt43" > "Arthur:Development:Projects:Pippin Kickstart:1.1:nitt43dump.txt"
My boot drive is named Blackwood, after the protagonist in the Journeyman Project games, and my working/data drive is named Arthur, after his wise-cracking sidekick.

I know 68K assembly much better than I do PowerPC, but teaching myself just enough PowerPC to understand what goes on in the SCSI Manager was not as bad as I thought it might be. Matching up the offsets where bytes differ to their corresponding locations in the disassembly, I quickly confirmed my suspicions. If Apple’s SCSI Manager 4.3 Reference guide (and the short amount of time it took Apple to build a native version) is any indication, the SCSI Manager itself was originally written in C. According to Apple’s own documentation, then, one of the common structures used by the SCSI Manager is called a “DeviceIdent,” containing among other things the “targetID” of a particular SCSI device.

If we look at the last two differences as one code “site,” making the total amount of differences map to changes at three code sites, then in the GM Flash ROM, the SCSI Manager appears to be doing the equivalent to this C code at all three sites:

if (devIdent.targetID > 7)

where target ID 7 is the upper bound of what IDs are legally allowed as mentioned earlier. Translated into English, when a request for a SCSI action comes in, the SCSI Manager looks to see if the intended device’s ID is out of range, and if so it refuses the request.

Contrast that with this C code, equivalent to what the 1.0 ROM does at each code site:

if (devIdent.targetID != 3)

Notice the difference? “If a device’s ID is not 3, refuse the request.” This is clearly why no other SCSI devices are recognized by the 1.0 ROM; the low-level code responsible for routing SCSI requests flat out refuses to do so unless it’s to or from a device with ID 3. I had figured out where the problem is, and fortunately, Apple already showed me how to fix it by way of how the GM Flash ROM behaves. 🙂 The next logical step then was to develop a patch.

Implementing The Solution

The most obvious way to patch the SCSI Manager would be to burn a patched 1.0 ROM. In theory it’d be easy—the ‘nitt’ 43 resource is the same size in both the GM Flash ROM and the 1.0 ROM. From a content perspective it’d literally just be a copy/paste job, but I’m primarily a software guy, and I’d rather lose just my time debugging software than lose my time and money feebly trying to make and support a reliable physical tool all while out of my wheelhouse. Acquiring, programming, and installing a custom Pippin ROM board can not only be intimidating to a casual collector/homebrewer (including yours truly), but also significantly more expensive (and legally questionable) than burning Pippin Kickstart to a CD and running it on stock hardware. Besides, if I was to burn a new ROM anyway, why would I stick with 1.0 when I could use the much more fully-featured version 1.3 instead? Pippin Kickstart is a free, open, and purely software-only utility, so I think it’s worth trying to patch in software. The fix should only cost the price of a blank CD-R. 🙂

When GetMemFragment is called to prepare the native SCSI Manager fragment in ROM, no code is copied or moved around in memory. The ‘nitt’ 43 resource stays right where it is and the SCSI Manager is executed directly from its home in ROM. How then does one patch this read-only code in software? Is it even possible?

Writing into read-only memory is out of the question for reasons that should be obvious. What about replacing the SCSI Manager with my own implementation? In order to cleanly install my own replacement, I would have to shut down and clean up the existing ROM-based SCSI Manager so as to make sure no remnants remain. Is this possible? I don’t know. The SCSI Manager is designed to remain permanently resident, so while I know a SCSI Manager system extension exists for older 68K Macs, I don’t know how or when it installs itself and furthermore, I couldn’t find a mechanism by which the PowerPC-based Pippin could accomplish the same task at boot time. So that’s out.

Hang on. If, hypothetically, the CFM loads a fragment from ROM that depends on a fragment that’s loaded from RAM or from disk, how does the ROM fragment know how—and where—to call into those dependencies? What about non-ROM fragments that in turn depend on ROM fragments and other non-ROM fragments alike? It would make sense for the CFM to keep track of these inter- (and, as we’ll see, intra-) fragment locations in a unified way.

Each loaded fragment has at least one associated “data” section allocated in RAM. This section may contain globals or other statically-initialized data referenced by the fragment, but the data section also contains a special area called (by IBM) the “Table of Contents” or “TOC,” though that’s a bit of a misnomer. Apple says that the TOC is more like an address book, acting as a lookup table for functions and data living both within a fragment and outside that fragment. Each fragment has its own TOC, so before a routine in another fragment is called, PowerPC register GPR2—otherwise known as the “RTOC”—is saved, then preloaded with the bottom of the destination fragment’s TOC so that the fragment knows how to find its own globals and data. The calling fragment’s RTOC is restored upon return of the called routine.

A fragment’s code must be position-independent; that is, it should be able to be loaded into and run from any address. Therefore, a fragment’s code section does not usually reference hardcoded memory locations. Instead, it fetches an address it needs from its TOC at runtime by looking up that address in its TOC and reading it from RAM. It does this by using the RTOC register plus a known offset as an index into its TOC. The CFM is responsible for preparing and maintaining these addresses in the TOC at fragment load time, and at any time the loaded fragment or any of its dependencies have to be relocated in memory.

When pointing to data, a TOC entry is just a pointer to that raw data. But when pointing to a routine, because that routine could be exported from a fragment (someone can call us) or imported from another fragment (we’re calling someone else), it needs to know at minimum where its TOC lives in RAM, so a simple raw pointer to the routine is not enough. Enter the “transition vector.” A transition vector is very simple: it contains at least two pointers, the first being the address of the routine within the fragment, and the second being the address of a fragment’s context. In most cases, a fragment’s TOC provides enough context, so the second pointer is used to prepare RTOC immediately prior to entering the routine. A transition vector may optionally contain other fields at the discretion of the compiler and environment used to build the fragment; the only expectation is that it contains at least the first two.

A strange environment

Transition vectors must contain at least two pointers—one to the routine itself and one to its context—but the SCSI Manager’s transition vectors each contain three pointers. The third pointer is unused, but is designated as an “environment” pointer. Early PowerPC Mac development was done on IBM RS/6000 workstations, so this vestigial “environment” pointer may have come from that early toolchain.

If you’ve been paying attention so far, you might be able to figure out where this is going. The SCSI Manager’s transition vectors all point to code in ROM, because ‘nitt’ 43 itself is not loaded into RAM and there’s no problem executing its code directly from its home in ROM. But the transition vectors themselves live in RAM, which means they can be changed. Patching the necessary transition vectors in RAM is tantamount to patching the routines that the SCSI Manager itself exports to be called by the OS. So naturally, the next question is, how do we find those transition vectors?

Calling GetSharedLibrary, GetDiskFragment, or GetMemFragment prepares a fragment (if found) and returns a “connection ID” to that fragment. Each time an interface is established to a particular loaded fragment, it’s called a “connection” to that fragment. Connections are reference counted and when all connections to a particular fragment have been closed, that fragment is unloaded. All three of the “GetFragment” APIs create a new connection and each takes a parameter called findFlags that can equal one of three values:

  • kLoadLib: load a fragment if it’s found and not yet loaded. If it is loaded, create a new connection to the already-loaded fragment.
  • kFindLib: find a loaded fragment. If it is loaded, create a new connection to the already-loaded fragment. If not, return fragLibNotFound.
  • kLoadNewCopy: load a fragment if it’s found and not yet loaded. If it is loaded, create a new connection to the already-loaded fragment but also create a new data section specifically for this connection.

The CFM provides the FindSymbol API for locating a symbol in a fragment by name, given a connection ID. After preparing the SCSI Manager’s native fragment, the ROM calls FindSymbol to find the transition vector for the SCSI Manager’s “InitItt” entry point, then calls it to begin executing the native SCSI Manager’s code.

Hmm. In the SCSI Manager’s case, a connection is created at startup, but it is never closed at any time thereafter, ensuring that the SCSI Manager is never unloaded. Could a new connection be established to the SCSI Manager / ‘nitt’ 43 fragment-as-resource, then a known symbol—perhaps its entry point—be used as a reference to poke around in the rest of the fragment’s data section, including its transition vectors?

This seemed like the most “polite” way of getting at the SCSI Manager’s data section, so I tried this first.

 move.w  #0xFFFF, (RomMapInsert)
 subq.l  #6, %a7        /* make room for GetResource's return handle (4 bytes) */
                        /*  and GetMemFragment's return value (2 bytes) */
 move.l  #nittRsrcType, -(%a7)
 move.w  #43, -(%a7)
 movea.l (%a7)+, %a4

 move.l  (%a4), -(%a7)  /* Ptr                  memAddr */
 subq.l  #4, %a7        /* make room for SizeRsrc's value in length */
                        /*  (4 bytes) */
 move.l  %a4, -(%a7)
 clr.l  -(%a7)          /* Str63                fragName */
 pea    1               /* kLoadLib /*
 pea    0x1A(%a7)       /* ConnectionID*        connID */
 clr.l  -(%a7)          /* Ptr*                 mainAddr */
 clr.l  -(%a7)          /* Str255               errName */
 move.w #3, -(%a7)      /* GetMemFragment */
 _CodeFragmentDispatch  /* we'll assume it succeeds... */
-2817, better known by its other name fragLibConnErr

Would that it were so simple. The use of the findFlags parameter is documented for GetSharedLibrary and GetDiskFragment, but the documentation for GetMemFragment just refers to the documentation for GetDiskFragment for how findFlags is used. Despite Apple’s redirection, passing kLoadLib to GetMemFragment will not create a new connection to an already-loaded fragment at an address previously provided to the CFM. There would be no going in through the front door. Damn. I’d have to sneak in through another way.

The classic Mac OS memory map is split into several areas. There are low-level system globals near the bottom of the address space, a system heap for use by the OS above that, and the remaining RAM is comprised of one or more fixed-size application heaps (and stacks) belonging to running applications. Originally, the system heap was fixed in size, with the remainder of usable RAM reserved for the application heap and stack. Double-clicking an application from the Finder would close down the Finder and the newly-launched application would take its place in the application heap. The reverse would occur when the application closed down, relaunching the Finder in its stead. As this original design was built around running one application at a time, later some technical gymnastics were achieved to add multitasking to the system while maintaining backward and some level of future compatibility with Mac apps. The original Memory Manager APIs and low-memory globals were therefore left mostly unchanged, including a well-known global containing the base address of the system heap.

As is mentioned earlier, the SCSI Manager is designed to remain permanently resident, so it makes sense that its fragment’s data section lives in the system heap. Opening and closing applications has no effect on the existence of the SCSI Manager. Indeed, launching applications often involves the SCSI Manager to fetch those very applications from disk; the SCSI Manager is a core component of loading code on the Pippin. Furthermore, since the Pippin needs to know at all times how to load additional code and data from disk, those exported transition vectors need to be at fixed locations in memory in a nonrelocatable block of RAM. Of the many low-level Memory Manager structures documented early on by Apple, the system heap is one of them, so we can find the SCSI Manager’s data section in the system heap by doing a brute-force linear search.

Reading the SysZone system global gives us the address of the beginning of the system heap. The system heap “zone” begins with a zone header block for various bookkeeping tasks like keeping track of its size, flags, which blocks are free, and other internal uses. Immediately following the zone header are the contents of the heap itself. Likewise, each block in a classic Mac OS heap starts with a block header, describing among other things its size in memory.

Immediately following the block header are the block’s contents. By adding each block’s size to its respective header’s address, we can step through each block of the system heap, inspecting each block’s contents along the way.

 movea.l SysHeap, %a0
 lea     heapData(%a0), %a1  /* A1 -> allocated block in system heap */
 cmp.l   bkLim(%a0), %a1 /* bkLim(A0) -> system heap trailer block */
 beq.w   SkipPatching    /* if this block is the trailer, we've searched */
                         /*  the entire system heap and couldn't find the */
                         /*  SCSI Manager's pidata section */

 movea.l %a1, %a4
 move.l  blkSize(%a1), %d0   /* D0 == physical size of this block */
 add.l   %d0, %a1            /* A1 -> the next block in case we skip */

As well as being a handy PowerPC code fragment disassembler, another nifty facility DumpPEF provides is the ability to examine how a fragment’s data section is initialized. The data section is of a fixed size and, as discussed previously, the SCSI Manager’s data section starts with a Table of Contents, followed by a list of transition vectors. These transition vectors are the bytes we’re looking to patch. But in the SCSI Manager’s case, after the transition vectors comes a series of text string constants. These strings are always in the same location relative to the beginning of the data section and as read-only constants, they always have the same predictable values. Therefore I reasoned that in addition to verifying that a block within the system heap is of the same expected size as the SCSI Manager’s data section, checksumming these strings of text within that block should provide a suitable heuristic for identifying a particular block as belonging to the SCSI Manager.

 /* Verify this really is the SCSI Manager's block in the system heap. */
 /* We do that by checksumming the area in the middle of this block where */
 /*  we know the SCSI Manager looks for some read-only strings. Use an */
 /*  algorithm similar to that used to checksum the Toolbox, only we'll */
 /*  walk our pointer backwards so we can use A3 as-is if/when it comes */
 /*  time to check our TVectors. */
 add.l   -(%a2), %d0
 cmpa.l  %a2, %a3
 ble.s   ChecksumLoop
 cmp.l   #scsiStrsCksum, %d0
 bne.s   NextSysBlock

Once we’ve found our block, we know where the transition vectors live within it, so it’s time to patch them, right? Well, first we need to create the patch itself. At the time Pippin Kickstart runs, there is no application heap yet. Application heaps aren’t set up until the Process Manager starts, which doesn’t happen until after the familiar “Welcome to MacintoshPippin” extension parade has completed and our first application is ready to launch. Therefore, the system heap is our active and only heap. What’s more, as of System 7, so long as there is RAM available the system heap can grow dynamically to accommodate allocation requests, with the Process Manager shifting its base accordingly.

Our patch needs to stick around as long as the SCSI Manager exists, so naturally we need to give it a home somewhere where the OS won’t stomp over it later. Since the SCSI Manager’s data section lives in the system heap, and since Pippin Kickstart itself works from the system heap, it follows that we should be able to safely create a small block of nonrelocatable space in the system heap for our patch to live. Our patch really only needs to replace nine bytes in four locations, but we can’t just create a nine-byte block, stick our bytes there, and call it a day. The nine patched bytes belong to different functions in the SCSI Manager—functions that can and are referenced internally. Specifically, these functions are invoked internal to the SCSI Manager not by referencing transition vectors, but by good old-fashioned relative branching. It makes sense; the SCSI Manager targets the PowerPC and its functions run natively on the PowerPC, so why bother with transition vectors when you know you’re calling other PowerPC functions that are part of the same code fragment? It’s certainly convenient for the Pippin, but makes things slightly more annoying when creating this patch.

We have to ensure that no code that either leads to or leads from our patched locations can lead back to the unpatched versions in ROM. Therefore we have to account for relative branching by including all of that extra code in our patch, even though we don’t change any of it! I wrote a small C++ program to calculate exactly how much code I’d have to copy by essentially “emulating” PowerPC branch instructions, keeping track of the lowest and highest reachable addresses and using the ‘blr‘ instruction as a heuristic for the ends of subroutines. Passing my program a line-by-line disassembly of the SCSI Manager’s code section cut from DumpPEF’s output, I started the “emulation” at each of the three code sites and noted which one could be reached by the largest range. It turns out that all three sites lie within a mere 35K of code that only calls into itself; the rest of the SCSI Manager’s code section appears to be helper functions or routines unrelated to the SCSI Manager’s “core.”

The earliest address of our 35K block is offset 0xB854 into the SCSI Manager’s code section. The transition vector in the SCSI Manager’s data section with the earliest offset that should call into our patch is the vector that points to offset 0xBAD4. We know this transition vector’s index into the data section’s list, so by reading its target address and subtracting an offset (0xBAD4 – 0xB854), we can get the starting address in RAM of the code block to copy into our patch area. We also know exactly how much code to copy—35536 bytes—so the procedure becomes rather simple: copy our 35K of code into a nonrelocatable block on the system heap, then patch that. We can also easily determine which transition vectors point within that original 35K of code, so we know exactly which transition vectors to patch. It turns out that the vectors, starting with the one pointing to offset 0xBAD4 through the end of the data section’s list, all need to point into our patched code. By calculating the difference between offset 0xB854 into the SCSI Manager’s code section, and where our 35K block is in RAM, we get an offset value that makes it trivial to patch the transition vectors. We simply add that offset to each of those transition vectors so that they then point into our patched code instead of into ROM.

 /* now let's patch up the TVectors to point to our patched code */
 move.b  #tVectorsSize-1, %d0    /* # of TVectors to patch minus one */
 movea.l (%a4), %a2
 suba.l  %a0, %a2
 move.l  %a2, (%a4)+
 addq.l  #8, %a4
 dbra    %d0, TVectorLoop

After all of that, we’re still not quite done. All PowerPC processors have some form of a “data cache.” When you make changes to RAM on a PowerPC architecture, those changes aren’t necessarily written to RAM right away. Instead, the address decoder checks first to see whether where you’re reading/writing has been “cached,” or saved in a smaller but faster block of memory within arm’s length of the processor. Cache is to RAM what RAM itself is to System 7’s virtual memory; it is prioritized as a faster alternative to its counterpart, and when there’s no space left its contents are “flushed” to make room. Consequently, writing to a particular address will often write only to the cache instead, anticipating that its contents will be referenced again shortly thereafter.

To further complicate matters, the cache on the PowerPC 603 chip used in the Pippin is split between instructions (code) and data (not code). We’re patching code in RAM, but the Pippin doesn’t know that; it’s all just bytes of data as far as it’s concerned. We want to make sure that our patched area is flushed to RAM so that when the SCSI Manager comes around to execute it next, those patched bytes are waiting in RAM ready for the instruction decoder to pick them up. It’s reasonable to assume that at the time in the startup process when Pippin Kickstart runs, our block of code in the system heap does not have corresponding entries in the instruction cache, but it’s not necessarily safe to assume that our patched code will be automatically flushed to RAM before Pippin Kickstart exits. We certainly wouldn’t want the SCSI Manager to invoke the old unpatched transition vectors, or worse, execute whatever happened to be in RAM before we put our patched block of code there.

There exists an API called MakeDataExecutable that does exactly what we want here. But in order for this API to work for us, we’d have to make a connection to InterfaceLib, call FindSymbol, call MakeDataExecutable with the proper parameters, then close the connection to InterfaceLib. That’s a lot of work for just one call. Fortunately, since Pippin Kickstart runs in the 68K emulator, there’s a faster and easier way, albeit undocumented.

 movea.l %a1, %a0
 move.l  %d6, %d0
 dc.w    0xFE0C      /* undocumented F-line instruction that evicts our */
                     /*  patched area from the PPC data cache into main */
                     /*  memory so it's visible to the instruction decoder */

Apple’s 68K emulator supports the features of a 68LC040 processor with a 68020 exception stack frame. The 68LC040 is like the more powerful 68040 processor powering the Quadra line, but minus the floating-point operations built into the latter. Floating-point operations on the ‘040 are implemented by way of “F-line instructions;” that is, instruction opcodes that begin with the hex digit F. But just because the 68K emulator doesn’t support floating-point operations doesn’t mean that the emulator doesn’t support F-line instructions. 😉 Elliot Nunn helpfully pointed out that one of the F-line instructions used internally by the 68K emulator has the opcode 0xFE0C and it does just what we want: it flushes the PowerPC’s data cache to RAM. This instruction takes two parameters: a pointer in 68K register A0 to an area in memory, and a size in bytes in register D0. Easy peasy, if a little skeezy.

With that, we’re finally done patching the SCSI Manager so that it behaves identically to the version in the Pippin GM Flash ROM.

Bad F-line instructions

System error type 11 often manifests itself as a “bad F-line instruction” bomb dialog in System 7 and later. In System 6 this dialog instead displays the message “coprocessor not installed,” owing to the fact that F-line instructions can map to floating-point operations on an internal or external FPU. Despite suggesting that these errors stem from a missing FPU, very little software for classic Mac OS makes use of—let alone requires—floating-point hardware. For the broadest compatibility, programs requiring floating-point operations either use their own integer math library or call into Apple’s SANE math library instead, requiring no F-line instructions.

Usually these system errors are the result of a buggy program erroneously jumping into an area of data and interpreting it as code, setting off the bomb when carelessly stumbling upon a pair of bytes starting with the hex digit F. 😉

Adding Some Fun

Pippin Kickstart runs from RAM after being loaded from the boot blocks, which are the first two 512-byte sectors of an HFS-formatted volume. I was able to squeeze versions 1.0 and 1.0.1 each into the first 512 bytes of this area. Keeping Pippin Kickstart’s footprint limited to the boot blocks makes authoring the Pippin Kickstart disc relatively easy; I merely have to replace the boot blocks with my own, and since the Pippin loads them for me, I don’t have to make any other calls to load any additional code from the CD. Calls to the disk driver’s _Read—which Pippin Kickstart makes to check the first block of boot candidates—can only return 512-byte chunks, so 1.0 and 1.0.1 use the latter half of the boot blocks as scratch space during their respective boot candidate search loops. With the aforementioned SCSI Manager patch going into Pippin Kickstart 1.1, I need more code than will fit in those first 512 bytes, but I still want to limit myself to the boot blocks for convenience. Keeping the code tight is a fun engineering challenge, too. 🙂

If I move the boot candidate search loop and its dependencies from the first 512-byte block into the second block, I leave behind enough room in the first block to patch the SCSI Manager. By the time I’m done patching the SCSI Manager, I don’t need anything from the first block anymore, so I can jump into the search loop in the second block and that first block can be used instead as scratch space. Other than the address of my scratch space, I don’t have to change any of my tested and working search loop logic from 1.0 and 1.0.1. Hooray!

But the SCSI Manager patch doesn’t take up a lot of space, certainly not a whole extra 512 bytes. All that extra unused space felt like a waste to me, but there’s nothing more that Pippin Kickstart needs to do to allow a stock 1.0 Pippin to boot from any capable SCSI devices. My code golf skills had gotten the better of me. How could I make meaningful use of those remaining bytes?

Pippin Kickstart has a very spartan interface, though perhaps it’s a little too spartan—folks have mistaken it for BSD and have asked me what “kernel” it boots into. 😆 I take that as a compliment and a testament to how much utility I’ve packed into such a small space, but I do admit that it could look prettier. My good friend Tommy Yune is an accomplished graphic artist who graciously drew up a Pippin Kickstart “logo” (seen above) around the time I was working on the first versions. His graphics appear in the readme files I include on the disc. But other than the text Pippin Kickstart prints to the screen logging its behavior, the most anybody sees is the Pippin logo leftover from when the Pippin gets a fresh start.

Perhaps those extra bytes could translate into some extra polish. 🙂

Normally when the Pippin boots, it first draws the Pippin logo and looks for a bootable CD-ROM. If after a few seconds it can’t find one, it starts looping an animation suggesting that a CD be inserted into the built-in CD-ROM drive.

If the inserted CD is an audio CD, then the Pippin launches into its built-in audio CD player application. If the inserted CD is a data CD, then the screen goes black and the Pippin tries to boot from that disc. Many Pippin titles at this point put up a “StartupScreen” made up of the Bandai Digital Entertainment (the Pippin’s first-party publisher) logo; some unofficial titles like “Tuscon” have a custom StartupScreen file. But if the Pippin cannot boot from a given data CD, then it is ejected and the Pippin reboots, drawing the Pippin logo again and repeating the cycle. Therefore if Pippin Kickstart is inserted during the tray-loading animation, you get the least interesting visual result: the screen goes black and nothing else is shown on the screen other than Pippin Kickstart’s text console.

We’ll fix that in 1.1 by drawing Tommy’s “locked” Pippin logo, then drawing it “unlocked” after we’ve successfully circumvented the Pippin’s security. 🙂

Except for the Pippin logo itself, which we get from ROM, we’ll do all of this using QuickDraw primitives. Drawing the Pippin Kickstart logo programmatically rather than storing it as a bitmap takes up a fraction of the space, which is important given that we’re drawing two versions of it and have less than 512 bytes available to pull it all off. To begin, we create a new “clip region” that excludes the area where the Pippin logo is in the center of the screen. Since this is created in RAM at runtime, we have to clean it up before Pippin Kickstart exits, but it’s needed to later tell QuickDraw that we want to allow drawing anywhere but where the logo is.

 lea     logoRect, %a3

 subq.l  #8, %a7
 _NewRgn                        /* create clipRgn */
 move.l	 (%a7), %d5             /* save clipRgn */
 move.l	 %a3, -(%a7)
 _RectRgn                       /* clipRgn == logoRect */

 _NewRgn                        /* create tempRgn */
 move.l	 (%a7), %d6             /* put tempRgn in D6 because D7 is our ROM index */
 move.l	 %d6, (tempRgn - logoRect)(%a3) /* save tempRgn in RAM since D6 */
                                        /*  is used by the search loop */
 _GetClip                               /* tempRgn == original clip region */

 movem.l %d5-%d6/%a3, -(%a7)
 move.l	 %d5, -(%a7)
 _DiffRgn                       /* clipRgn == original clip region - logoRect */

We then set the background color to black (at this point it’s not—black is the foreground color and that’s what QuickDraw uses to initially paint the screen at startup) and erase the area inside the clip region we just created. This has the positive effect of erasing around the Pippin logo if it has already been drawn. We later draw the logo ourselves just in case, but either way this approach avoids some flicker when Pippin Kickstart launches.

 pea     blackColor
 move.l	 %d5, -(%a7)
 _EraseRgn                      /* erase around the logo */

Next we set the foreground color to white and draw a 7-pixel border around the logo area. Since we have to set the foreground color to white for the text anyway, we set it here so we don’t have to worry about it again.

 pea     whiteColor
 _ForeColor                     /* draw white-on-black */
 move.l  #((outlineThickness << 16) + outlineThickness), -(%a7)
 _FrameRect                     /* draw logo outline */

The Pippin logo is stored in ROM as 'PICT' resource -20137. It's trivial to find in ROM where the code is to draw the Pippin logo—search for a call to _GetPicture (0xA9BC) and an instruction that passes -20137 (0xB157) to it. The logo-drawing routine is located in the same place in all three known retail ROMs: offset 0xE3C from the beginning of ROM. We call this routine to fill the outline with the Pippin logo if it hasn't yet been drawn. If it has, then drawing over the Pippin logo has no perceptible side effects.

 jsr     logoRoutineOffset(%a4)  /* draw the Pippin logo from ROM */

We then have to draw the locked Pippin logo's "shackle." I created a routine for this purpose called, appropriately, DrawShackle and placed it in the second boot block so that we can call it again to draw the "unlocked" logo when it's time to exit the search loop. DrawShackle sets QuickDraw's clip region and then draws a framed rounded rectangle clipped to that region. The net effect is a shackle that appears inside the "locked" Pippin logo.

/* input: A3 -> shackle rect - 8 */
/* trashes: D0-D2, A0-A1 are scratch regs used by QuickDraw */
 move.l  %d5, -(%a7)
 _SetClip               /* clip shackle to logo */

 move.l  #((shackleThickness << 16) + shackleThickness), -(%a7)

 addq.l  #8, %a3        /* a3 -> shackle rect */
 move.l  %a3, -(%a7)
 move.l  #((shackleRadius << 16) + shackleRadius), -(%a7)
Green: drawable region, Red: clipped region

Notice how DrawShackle grabs the clip region handle from register D5. Luckily, none of the external routines called by Pippin Kickstart trash this register, leaving it available for temporary storage. The same is true of register D7, used by Pippin Kickstart as an index corresponding to the Pippin's ROM version so that we can call ROM routines from their proper locations. It is not true however of register D6, which is used by the ROM routines called by the search loop.

Now that the "locked" Pippin logo is up on the screen, we ready a clip region in register D5 that will be used to replace the "locked" shackle with one that dangles off to the side. We use the same clip region both to erase the existing shackle and to clip the "unlocked" shackle during the next and final call to DrawShackle. The _SectRgn API lets me calculate this region easily, finding the intersection of the existing clip region (set during the first call to DrawShackle that allows drawing anywhere except the logo area) and a predefined rectangle enclosing both the intended area of the "unlocked" shackle and the area of the existing "locked" shackle. Even though my predefined rectangle overlaps the forbidden logo area, this isn't a problem because _SectRgn finds the intersection of both drawable regions; that is, it calculates the region common to both. In the final clip region, only the shackle areas outside the logo will be affected.

 addq.l  #8, %a3                /* A3 -> clipRect */
 move.l	 %d5, -(%a7)
 move.l	 %a3, -(%a7)
 _RectRgn                       /* clipRgn == clipRect */

 movea.l GrafGlobals(%a5), %a0  /* A0 -> qdGlobals */
 movea.l thePort(%a0), %a0      /* A0 -> qdGlobals.thePort */
 move.l	 clipRgn(%a0), -(%a7)	/* push existing clip region */
 move.l	 %d5, -(%a7)            /* find intersection with clipRect */
 move.l	 %d5, -(%a7)            /* make it our next clip region */

The active clip region at this point is still what DrawShackle uses to draw the "locked" Pippin logo, so everything outside the Pippin logo is still fair game as far as drawing goes. This includes text, so naturally we still get the familiar text console as Pippin Kickstart goes through its expected motions.

When it comes time for Pippin Kickstart to exit and boot from the candidate it finds, the string "Booting..." is printed and the existing shackle is erased, using the clip region that we prepared earlier. Dangling the shackle to the left side of the Pippin logo would overlap our lovely text console, so we call DrawShackle to dangle an "unlocked" shackle off to the right instead.

 /* Erase the "locked" Pippin. */
 move.l	 %d5, -(%a7)

 /* Draw an "unlocked" Pippin. */
 lea     unlockedRect-8, %a3    /* because A4 is our link pointer */
 bsr.s   DrawShackle
Green: drawable region, Red: clipped region

This "unlocked" shackle is missing something, or rather it needs to be missing something. 🙂 In Tommy's logo, the shackle has a "notch" cut out of it, as would a shackle on a real padlock. Cutting a rectangular notch out of our shackle is simple enough; just erase a tiny rectangle where the notch should be.

 pea     notchRect

There's no "notch" primitive in QuickDraw (nor is there a corresponding _DrawNotch API), so drawing the slanted part of the notch requires a little bit of outside-the-box thinking. One option is to set the pen size to the notch width and draw a line between two points. That would work, but there's an even more efficient way, at least in terms of instructions used.

Among its supported primitives, QuickDraw can draw ovals, circles, and rounded rectangles. What do all these shapes have in common? They all involve drawing one or more arcs of a particular width, height, and arc length. Since ovals, circles, and rounded rectangles are themselves at least partially made up of arcs, QuickDraw also exposes the ability to draw just an arc through its _PaintArc API. If we draw an arc at least half the height of the notch we erase with a length extending to the edge of the notch, we get the slanted part we need. There is a tiny bit of overdraw into the shackle area above the notch, but since both the shackle and the arc are drawn with the same white foreground color, it doesn't matter. In the end, using an arc instead of a line segment gets the job done in about half the required instructions, with even less overdraw than drawing a line segment would produce.

 pea     arcRect
 clr.w   -(%a7)
 move.w  #-45, -(%a7)

Finally, we clean up after ourselves and exit. The clip regions we create are allocated dynamically in RAM, so to be a good citizen we dispose of those, but not before restoring the original clip region from prior to launching Pippin Kickstart. The next thing QuickDraw puts on the screen could be an alert, a StartupScreen, or something else entirely; it's up to whomever we hand off the boot process. The least we can do before we say goodbye is return the Pippin to a state reasonably close to how we found it.

 /* Clean up. */
 move.l	 %d5, -(%a7)    /* push clipRgn */
 move.l	 -(%a3), -(%a7) /* push tempRgn */
 move.l	 (%a3), -(%a7)  /* push tempRgn */

After adding these graphics, I'm back to having zero bytes left in both boot blocks. Waste not, want not. 🙂


Apple Computer was willing to license their Macintosh technology to third parties by the mid-90s, but not at the expense of their own first-party products. While the initial line-up of Power Macs and clones were a success, Apple was still a computer company; it was right there in their name. Licensed Mac derivatives like the Bandai Pippin could not be allowed to cut into Apple's bottom line, so Apple took measures both technical and tactical to help protect against the Pippin cannibalizing Mac sales. The term "Mac" or "Macintosh" was never to be used publicly to describe the Pippin or its software; the Pippin runs "Pippin OS" and is based on "advanced technology by Apple Computer." Pippins with the first revision of the retail ROM have special code that explicitly blocks the use of storage devices other than those built into the system and those officially available at launch. But on top of that, as extra insurance against Pippins being used as "cheap Macs," Apple added a signing check to the startup process that verifies that a particular boot CD has been authorized for use on the Pippin.

Bandai: "It's not a Mac. We swear."

There was nothing particularly novel about this approach in 1996, and in fact it's still in use today by almost all video game console platforms. Ever since the Nintendo Entertainment System's release in 1985, most consoles have some kind of protection against running unlicensed software. Atari's 7800 ProSystem from 1986 was the first video game console (that I know of) to use a boot-time signature check (with similarities to the RSA algorithm the Pippin would use ten years later). Like the Pippin and all major disc-based consoles that came after it, all 7800 titles had to be digitally signed for release. Given the amount of computing power required in those days to crack a digital cryptographic signature, and the amount of computing power typically available to the general public, these strategies were mostly effective for their time to prevent unlicensed software from affecting a platform's brand and public image during its supported lifetime, for better or worse.

Reverse-engineering and circumventing the Pippin's boot security wasn't easy, but with the exception of deducing Apple's private RSA key, Pippin Kickstart could have been developed using tools and documentation available in the late 90s and early 2000s. I am often nostalgic for the Mac games of my youth and feel that in modern times they deserve to be played on a "real" video game system on a big-screen TV. Thus the idea appealed to me of hacking an "Apple" video game system to let me do just that. Given the Pippin's place in history and how little attention it has received compared to homebrew efforts for systems from Atari, Nintendo, Sega, Sony, and Microsoft, I was surprised at the positive reception and interest Pippin Kickstart got when I first released it in June 2019. That folks are beginning to dip their toes into producing homebrew for this relatively obscure platform goes above and beyond my expectations; I'm absolutely delighted to have enabled this and I hope it continues.

It has been 25 years since the introduction of the Bandai Pippin. In that time, the Internet has exploded with a force that hardly anybody could have predicted back then, and Apple itself has gone from a relatively small player in the computer industry to one of the largest consumer electronics companies in the world. (Bandai is still doing OK, too.) The Internet has brought together fans of gaming from all over the world and from all kinds of backgrounds, fostering communities that celebrate video games and their technology from the mainstream to the obscure. The classic systems of yesteryear may have been forgotten by major retail outlets, but that doesn't mean they have been forgotten by fans and enthusiasts, no matter how obscure or commercially unsuccessful. Nostalgia is a powerful drug, and thanks to it I think there will always be an audience for new developments targeting these vintage consoles, by amateurs and professionals alike. These days, "retro" is cool, and I'm happy to contribute to the zeitgeist in my own small way.

If I've done anything to help rekindle some gaming nostalgia among fellow retro gaming fans, then it's all worth the while. 🙂

My Ken Williams story

Not All Fairy Tales Have Happy Endings by Ken Williams

I picked up Ken Williams’s book Not All Fairy Tales Have Happy Endings back in October and read it over my Thanksgiving break. I recommend it, especially if you’re interested in the history of video games or software. Even at 530 pages, it’s a very quick read, and I was able to get from start to finish in just a day. It’s a good complement to the Sierra chapters in Steven Levy’s book Hackers, which I also recommend for tech history buffs.

The house that Ken and Roberta built

For those unfamiliar, Ken Williams founded Sierra On-Line in 1979 with his wife Roberta. After Roberta conceived of the game Mystery House for the Apple ][ (marking the invention of the graphical adventure genre), Ken programmed the game’s “engine” while Roberta fleshed out the game’s world, doing both the writing and illustrations. The pair marketed the game by demonstrating it at local computer stores, where store owners would thereafter offer the game for sale. Sierra grew from a husband-and-wife team into a multimillion-dollar powerhouse by the mid-to-late 1990s.

I’ve never met either of the Williamses. But in 2017, I exchanged a few e-mails with Al Lowe, one of Sierra’s most prolific game designers.

This story is mentioned briefly in Ken’s book (on page 205), but the longer version goes like this: Leading up to its launch in January 1983, Apple was hip-deep in development of their first computer built around a graphical user interface. Steve Jobs wanted to call this new computer the “Lisa,” after his daughter, but the legal folks at Apple quickly discovered that Sierra was offering an assembler for the Apple ][ with the same name. Trademark law being what it is, Apple couldn’t easily go ahead and use the “Lisa” name in the computer field without risking infringing on the trademark Sierra already owned.

So one day between 1981-1982, Ken Williams got a call from Apple. In exchange for the rights to use the “Lisa” name on Apple’s newest computer, Ken would receive several brand-new Lisa computers—each worth $10,000 at launch in 1983—as well as a few prototypes of a top-secret new machine in the works at Apple (this turned out to be the Macintosh). The Lisas arrived, and a few months later came the promised Mac prototypes, which Sierra used to develop Frogger, among other early Mac games.

Along with four Mac prototypes came a Picasso-style “Macintosh” lamp, usually designated for Mac dealers. Ken was not exactly known for keeping a tidy office (I can relate), so the Macintosh lamp became one of many items scattered about his space. Several months later, Al was in Ken’s office, noticed the lamp on his floor, and inquired about it… only to receive the shock that Ken was just going to throw it away. Al was able to successfully begnegotiate for the lamp, which found a new home in his personal office, where it lived for almost 33 years.

Al offered up the lamp for sale in 2017, and given its history, I bought it. It was on my desk at work for a while but now lives in my office perched—appropriately—atop my Lisa.

It was always obvious which desk was mine.

I promised Al and myself that the lamp would stay in the games industry. So far I’ve kept my word, and I foresee no reason to break that promise any time soon.

Digitizing Old 8mm Tapes

It’s astounding to think back and consider how much technological progress has occurred in just the past 15 years. Most folks today carry a smartphone in their pocket everywhere they go, and a great many of those smartphones have powerful cameras built in capable of recording multiple hours in high definition. Pair this ability with low-cost video editing software—some of which comes at no cost at all—and far more people today have the tools to practice shooting, editing, compositing, and rendering professional-looking videos on a modest budget.

My personal experience with photography began around age 7 shooting on 110 film using a small “spy” camera I got as a gift. My dad’s Sony CCD-V5 was bulky, heavy, and probably expensive when he bought it around 1987, so he was reluctant to let me or my sister operate it under his supervision, let alone borrow it to make our own films by ourselves. As a consequence, my sister and I kept ourselves entertained by making audio recordings on much cheaper audio cassette hardware and tapes—we produced an episodic “radio show” starring our stuffed animals long before the podcast was invented. Though my sister and I took good care of our audio equipment, Dad stuck to his guns when it came to who got to use the camcorder, but he would sometimes indulge us when we had a full production planned, scripted, and rehearsed. Video8 tapes were expensive, too, and for the most part Dad reserved their use for important events like concerts, school graduations, birthdays, and family holidays.

Sony CCD-V5 camcorder
I remember it being a lot bigger.

I went off to college and spent a lot of time lurking the originaltrilogy.com forums. It was here that not only did I learn a lot about the making and technical background of the Star Wars films (a topic I could blog about ad nauseum), but I also picked up a lot about video editing, codecs, post-production techniques, and preservation. OT.com was and still is home to a community of video hobbyists and professionals, most of whom share a common love for the unreleased “original unaltered” versions of the Star Wars trilogy. As such, many tips were/are shared as to how to produce the best “fan preservations” of Star Wars and other classic films given the materials available, sacrificing the least amount of quality.

I bought my dad a Sony HDR-CX100 camcorder some years ago to supplement his by that time affinity for digital still cameras—he took it to Vienna and Salzburg soon after and has since transitioned to shooting digital video mostly on his iPhone. But the 8mm tapes chronicling my family’s milestones over the first 25 years of my life continued to sit, undisturbed, in my folks’ cool, dry basement. My dad has recordings on them going as far back as 1988 that I’ve found so far. These recordings are over 30 years old, so the tapes must be at least that age.

8mm video tape does not last forever, but making analog copies of video tape incurs generational loss each time a copy is dubbed. On the other hand, a digital file can be copied as many times as one wants without any quality loss. All I need is the right capture hardware, appropriate capture software, enough digital storage, and a way to play back the source tapes, and I can preserve one lossless digital capture of each tape indefinitely. The last 8mm camcorder my dad bought—a Sony CCD-TR917—still has clean, working heads and can route playback of our existing library of tapes through its S-video and stereo RCA outputs. This provides me with the best possible quality given how they were originally shot.

Generally with modern analog-to-digital preservation, you want to losslessly capture the raw source at a reasonably high sample rate with as little processing done to the source material as possible, from the moment it hits the playback heads to the instant it’s written to disk. Any cleanup can be done in post-production software; in fact, as digital restoration technology improves, it is ideal to have a raw, lossless original available to revisit with improved techniques. For this project, I am using my dad’s aforementioned Sony CCD-TR917 camcorder attached directly to the S-video and stereo audio inputs of a Blackmagic Intensity Pro PCIe card. The capturing PC is running Debian Linux and is plugged into the same circuit as the camcorder to avoid possible ground loop noise.

Since my Debian box is headless, I’m not interested in bringing up a full X installation just to grab some videos. Therefore I use the open source, command-line based bmdtools suite—specifically bmdcapture—to do the raw captures from my Intensity Pro card. I do have to pull down the DeckLink SDK in order to build bmdcapture, which does have some minor X-related dependencies, but I have to pull down the DeckLink software anyway for Linux drivers. I invoke the following from a shell before starting playback on the camcorder:

$ ./bmdcapture -C 0 -m 0 -M 4 -A 1 -V 6 -d 0 -n 230000 -f <output>.nut

The options passed to bmdcapture configure the capture as follows:

  • -C 0: Use the one Intensity Pro card I have installed (ID 0)
  • -m 0: Capture using mode 0; that is, 525i59.94 NTSC, or 720×486 pixels at 29.97 FPS
  • -M 4: Set a queue size of up to 4GB. Without this, bmdcapture can run out of memory before the entire tape is captured to disk.
  • -A 1: Use the “Analog (RCA or XLR)” audio input. In my case, stereo RCA.
  • -V 6: Use the “S-Video” video input. The S-video input on the Intensity Pro is provided as an RCA pair for chroma (“B-Y In”) and luma/sync (“Y In”); an adapter cable is necessary to convert to the standard miniDIN-4 connector.
  • -d 0: Fill in dropped frames with a black frame. The Sony CCD-TR917 has a built-in TBC (which I leave enabled since I don’t own a separate TBC), but owing to the age of the tapes, there is an occasional frame drop.
  • -n 230000: Capture 230000 frames. At 29.97 FPS, that’s almost 7675 seconds, which is a little over two hours. Should be enough even for full tapes.
  • -f <output>.nut: Write to <output>.nut in the NUT container format by default, substituting the tape’s label for <output>. The README.md provided with bmdtools suggests sticking with the default, and since FFmpeg has no trouble converting from NUT and I’ve had no trouble capturing to that format, I leave the output file format alone.

Once I have my lossless capture, I compress the .nut file using bzip2, getting the file size down to up to a quarter of the original size depending on how much of the tape is filled. I then create parity data on the .bz2 archive using the par2 utility, and put my compressed capture and parity files somewhere safe for long-term archival storage. 🙂

My Windows-based Intel NUC is where I do most of my video post-production work. It lacks a PCIe slot, so I can’t capture there, but that’s fine because at this point my workflow is purely digital and I only have to worry about moving files around. My tools of choice here are AviSynth 2.6 and VirtualDub 1.10.4, but since AviSynth/VirtualDub are designed to work with AVI containers, I first convert my capture from the NUT container to the AVI container using FFmpeg:

$ ffmpeg.exe -i <output>.nut -vcodec copy -acodec copy <output>.avi

The options passed to FFmpeg are order-dependent and direct it to do the following:

  • -i <output>.nut: Use <output>.nut as the input file. FFmpeg is smart and will auto-detect its file format when opened.
  • -vcodec copy: Copy the video stream from the input file’s container to the output file’s container; do not re-encode.
  • -acodec copy: Likewise for the audio stream, copy from the input file’s container to the output file; do not re-encode.
  • <output>.avi: Write to <output>.avi, again substituting my tape’s label for <output> in both the input and output filenames.

A note about video containers vs. video formats

Pop quiz! Given a file with the .mov extension, do you know for sure whether it will play in your media player?

Files ending with .mov, .avi, .mkv, and even the .nut format mentioned above are “container” files. When you save a digital video as a QuickTime .mov file, the .mov file is just a wrapper around your media, which must be encoded using one or more “codecs.” Codecs are small programs that can encode and/or decode audio or video. These codecs must be specified at the same time as when you save your movie. QuickTime files can wrap among a great many codecs: Motion JPEG, MPEG, H.264, and Cinepak just to name a few. They’re a bit like Zip files, except that instead of files inside you have audio and/or video tracks, and there’s no compression other than what’s already done by the tracks’ codecs. Though Apple provides support in QuickTime for a number of modern codecs, older formats have been dropped over time and so any particular .mov file may or may not play… even using Apple’s own QuickTime software! Asking for a “QuickTime movie” is terribly vague—a QuickTime .mov file may not play properly on a given piece of hardware if support for a containing codec is missing.

AVI, MKV, and MP4 are containers, too—MP4 is in fact based on Apple’s own QuickTime format. But these are still just containers, and a movie file is nothing without some media inside that can be decoded. Put another way, when I buy a book I’m often offered the option of PDF, hardcover, or paperback form. But if the words contained therein are in Klingon, I still won’t be able to read it. When asked to provide a movie in QuickTime or AVI “format,” get the specifics—what codecs should be inside?

Now that I have an AVI source file, I can open it in VirtualDub. Owing to its namesake, VirtualDub’s interface is reminiscent of a dual cassette deck ready to “dub” from one container to another. It isn’t as user-friendly as, say, Premiere or Resolve when it comes to editing and compositing, but what it lacks in usability it gains in flexibility. In particular, VirtualDub is designed to run a designated range of source video through one or more “filters,” encoding to one of several output codecs available at the user’s discretion via Video for Windows and/or DirectShow. If no filters are applied, VirtualDub can trim a video (and its audio) without re-encoding—great for preparing source footage clips for later editing or other processing.

This slam dunk in 1988 marked the peak of my basketball career.

Though the Sony CCD-TR917 has a built-in video noise reduction feature, I explicitly turn it off before capturing, because one of the filters I have for VirtualDub is “Neat Video” by ABSoft. It’s the temporal version of their “Neat Image” Photoshop filter for still images, which I used most recently to prepare a number of stills for Richard Moss’s The Secret History of Mac Gaming. It’s a very intelligent program that has a lot of knobs and dials to really tune in the noise profile you want to filter out, so I was equally delighted to find that ABSoft’s magic works on videos too. Luckily they offer a plugin built to work with VirtualDub, so I didn’t hesitate to buy it as a sure improvement over the mid-90s noise reduction technology built in to the camcorder.

Most of the aforementioned features can be done in high-end NLE applications such as Resolve—indeed I have used Resolve to edit several video projects of my own. What makes VirtualDub the “killer app” for me is its use of Windows’s built-in video playback library, and therefore its ability to work with AviSynth scripts. AviSynth is a library that can be installed on Windows PCs that grants the ability to interpret AviSynth “script” files (with the .avs extension) as AVI files anywhere Windows is prompted to play one using its built-in facilities. The basic AviSynth scripting language is procedural, without loops or conditionals, but it does retain the ability to work with multiple variables at runtime and organize frequently-called sequences into subroutines. Its most common use is to form a filter chain starting with one or more source clips, ending with a final output clip. When “played back,” the filter chain is evaluated for each frame, but this is transparent to Windows, which instead just sees a complete movie as though it’s already rendered to an AVI container.

Combined with VirtualDub, AviSynth allows me to write tiny scripts to do trims and conversions with frame-accurate precision, then render these edits to a final output video. Though AviSynth should be able to invoke VirtualDub plugins from its scripting language, I couldn’t figure out how to get it to work with Neat Video, so I did the next best thing: I created a pair of AviSynth scripts; one to feed to Neat Video, and one to process the output from Neat Video. The first script looks like this:

AviSource("1988 Christmas.avi")
# Invoke Neat Video
ConvertToRGB(matrix="Rec601", interlaced=true)

Absent of an explicit input argument, each AviSynth instruction receives the output of the previous instruction as its input. The Neat Video plugin for VirtualDub expects its input to be encoded as 8-bit RGB. VirtualDub will automatically convert the source video to what Neat Video expects if not already in the proper format. Since I’m not sure exactly how VirtualDub does its automatic conversion, I want to retain control over the process so I do the conversion myself from YUV to RGB using the Rec.601 matrix. I know that my source video is from an interlaced analog NTSC source; VirtualDub doesn’t know that unless I explicitly say so.

I render this intermediate video to an AVI container using the Huffyuv codec. Huffyuv is a lossless codec, meaning it can compress the video without any generational loss. Despite its name, Huffyuv is perfectly capable of keeping my video encoded as RGB. I can’t do further AviSynth processing on the result from Neat Video until I load it into my second AviSynth script, so I’m happy that its output can be unchanged from one script to the next.

Looks like I picked the wrong week to quit huffing YUV.

Color encodings and chroma subsampling

Colors reproduced by mixing photons can be broken down into three “primary colors.” We all learned about these in grade school: red, green, and blue. Red and blue make purple, blue and green make turquoise, all three make white, and so on.

On TV screens, things are a bit more complicated. Way back when, probably before you were born, TV signals in the United States only came in black and white, and TVs only had one electron gun responsible for generating the entire picture. The picture signal mostly comprised of a varying voltage level per 525 lines indicating how bright or dark the picture should be at that point in that particular line. The history of the NTSC standard used to transmit analog television in the United States is well-documented elsewhere on the Internet, but the important fact here is that in 1953, color information was added to the TV signal broadcast to televisions conforming to the NTSC standard.

One of the challenges in adding color to what was heretofore a strictly monochrome-only signal was that millions of black and white TVs were already in active use in the United States. A TV set was extremely expensive even by the early 1950s, so rendering all the active sets obsolete by introducing a new color standard would have proven quite unpopular. The solution—similar to how FM stereo radio was later standardized in 1961—was to add color as a completely optional, but still integral, signal of monochrome TV. The original black and white signal—now known as “luma”—would continue to be used to determine the brightness or “luminance” of the picture at any particular point on the screen, while the new color stream—known as “chroma”—would only transmit the color or “chrominance” information for that point. Existing black and white TVs would only know about the original “luma” signal, and so would continue to interpret it as a monochrome picture, whereas new color TVs would be aware of and overlay the new “chroma” stream on top of the original “luma” stream to produce a rich, vibrant, color picture. All of this information still had to fit into a relatively limited bandwidth signal, designed in the early 1940s to be transmitted through the air with graceful degradation in poor conditions.

The developers of early color computer monitors, by contrast, needed not worry about maintaining backward compatibility with black and white American television nor did they need to concern themselves with adopting a signal format that was almost 50 years old by that point. It should be of little surprise then, naturally, that computer monitors generate color closer to how we learned in grade school. Computer monitors in particular translate a signal describing separate intensities of red, green, and blue to a screen made up of triads of red, green, and blue dots of light. This signal describing what’s known as “RGB” color (for Red, Green, and Blue) comes both from that aforementioned color theory of mixing primaries, but also historically from those individual color signals more or less directly driving the respective voltages of three electron guns inside a color CRT. Despite both color TVs and computer monitors having three electron guns for mixing red, green, and blue primary colors, the way that color information is encoded before entering the TV is a main differentiator.

Whereas RGB is the encoding scheme by which discrete Red, Green, and Blue values are represented, color TV uses something more akin to what’s known as “YUV.” YUV doesn’t really stand for anything—the “Y” component represents Luma, and the “UV” represents a coordinate into a 2D color plane, where (1, 1) is magenta, (-1, -1) is green, and (0, 0) is gray (the “default” value for when only the “Y” component is present, such as on black and white TVs). In NTSC, quadrature amplitude modulation is used to convey both of the UV components on top of the Y component’s frequency—I don’t know exactly what quadrature amplitude modulation is either, but suffice it to say it’s a fancy way of conveying two streams of information over one signal. 🙂

An interesting quirk of how the human visual system works is that we have evolved to be much more sensitive to changes in brightness than in color. Some very smart people have sciencey explanations as to why this is, but ultimately we can thank our early ancestors for this trait—being able to detect the subtlest of movements even in low light made us expert hunters. Indeed, when the lights are mostly off, most of us can still do a pretty good job of navigating our surroundings (read: hunting for the fridge) despite there being limited dynamic range in the brightness of what we can see.

Note that having a higher sensitivity to brightness vs. color does not mean humans are better at seeing in black and white. It merely means that we notice the difference between something bright and something dark better than we can tell if something is one shade of red vs. a different shade of red. In addition, humans are more sensitive to orange/blue than we are to purple/green. These facts actually came in very handy when trying to figure out how to fit a good-enough-looking color TV signal into the bandwidth already reserved (and used) for American television. Because we are not as sensitive to color, the designers of NTSC color TV could get away with transmitting less color information than what’s in the luma signal. By reducing the amount of bandwidth for the purple/green range, color in NTSC can still be satisfactorily reproduced, though the designers of NTSC adopted a variant of YUV to accomplish this called “YIQ.” In YIQ, the Y component is still Luma, but the “IQ” represents a new coordinate into the same 2D color plane as YUV, just rotated slightly so that the purple/green spectrum falls on the axis with a smaller range. Nowadays with the higher bandwidth digital TV provides, we no longer need to encode using YIQ, but due again to the way our vision system responds to color and the technical benefits it provides, TV/video is still encoded using YUV, albeit will a fuller chroma representation.

What does all of this have to do with digitizing old 8mm tapes?

Each pixel on a modern computer screen is represented by at least three discrete values for Red, Green, and Blue. Though NTSC defines 525 lines per frame (~480 visible), being an analog standard means there really isn’t such a thing as “pixels” horizontally. However, most capture cards are configured to sample 720 points along each line of NTSC video, forming what we would call 720 “pixels” per line. But two important details must be noted:

  • Though 720 samples are enough to effectively capture the entire line, only 704 of them are typically visible and furthermore, NTSC TV is designed for a 4:3 aspect ratio. That is, if the picture is 480 square dots vertically, then it must be (480 * 4) / 3 == 640 square dots horizontally, or the picture will appear squished and everything will look “fat.” A captured NTSC frame at 720×480 will need horizontal scaling plus cropping to 640×480 to be displayed with the correct aspect ratio on a computer screen with square dots.
  • 720 samples are enough to capture each line of the luma component. The chroma component is a whole other story.

Remember how the luma and chroma components are encoded separately, but that some of the chroma information can be discarded to save space and we’re not likely to notice? Turns out, computers can use that technique too to reduce bandwidth usage and save disk space. RGB is just how your computer talks to its display, but there’s no rule that says computer video files need to be encoded as RGB. We can encode them as YUV, too, and this is where the term chroma subsampling comes in.

While we always want to sample all 704 visible “pixels” of luma information, we can often get away with capturing 50% or even as little as 25% of the chroma information for a given line of picture. The ratio of sampled chroma data to luma data is called the “chroma subsampling” ratio and is indicated by the notation Y:a:b, where:

  • Y: the number of luma samples per line in the conceptual two-line block used by a and b to reference. This is almost always 4.
  • a: the number of chroma samples mapped over those Y dots in the first line. This is at most Y.
  • b: the number of times those a chroma samples change in the second line. This is at most a.
Trans rights!

A chroma subsampling ratio of 4:4:4 means that every luma sample has its own distinct chroma sample; this ratio captures the most data and therefore provides the highest resolution. The value of a is directly proportional to the horizontal chroma resolution, and the value of b is directly proportional to the vertical chroma resolution. NTSC is somewhere between 4:1:1 and 4:2:1 (consumer tape formats even less), whereas the PAL standard in Europe is closer to 4:2:0 (half the vertical chrominance resolution as NTSC). As the values of a and b shrink, the overall chroma resolution decreases, and depending on your source picture it may be more or less noticeable.

Annie is always a willing test subject.

As the chroma resolution on this photo of Annie is diminished, artifacts become apparent which should remind you of heavily-compressed JPEG images. This is no coincidence—the popular JPEG image codec and all variants of the popular MPEG video codec use YUV encoding and discard chroma data as a form of compression. This is one reason why encoding as YUV is popular with real-world images and video—an RGB representation can be reconstructed by decoding the often-smaller YUV information, much like a color CRT has to do to get a YUV-encoded picture on its screen through its RGB electron guns. A ratio of 4:2:0 is popular with the common H.26x video codecs (though H.26x can go up to a full 4:4:4), and several professional codecs indicate their maximum chroma subsampling ratios directly in the name of the codec. DNxHR 444 and ProRes 422 come to mind. What these numbers represent is how much chroma information can be preserved from the original, uncompressed image, but looking at it another way, they represent how much chroma information is discarded during compression.

Breaking it down further, we can see how various chroma subsampling ratios affect the chroma information saved (or not) in this close-up of Annie’s toys. Notice the boundary between the pink and green halves of her ball, and the rightmost edge of her red and white rolling cage.

With my analog NTSC 8mm tapes having chroma resolution of at most 50% of the luma resolution, I need to capture them with at least a 4:2:0 chroma subsampling ratio. That 50% could be in the horizontal direction, the vertical direction, or both, but fortunately bmdcapture grabs YUV video at a 4:2:2 chroma subsampling ratio from my Blackmagic Intensity Pro card by default, which is enough to sample all the chroma information in my NTSC source. This is fine for my 8mm tapes, but if I want to capture a richer video signal in the future (like, say, a digital source over HDMI), I would probably want to investigate capturing at a full 4:4:4 ratio.

I tell Neat Video to clean up my interlaced video with a temporal radius of 5 (the maximum) and a noise profile I generated in advance for the Sony CCD-TR917 I’m using. With these parameters, it takes about eight hours to clean up two hours of raw footage on my PC, so I usually start it in the morning and have it running in the background while I work during the day. Once it’s finished, I’m ready to feed its intermediate result to my second and final AviSynth script to get my final output:

AviSource("1988 Christmas NV.avi")
QTGMC(Preset="Placebo", MatchPreset="Placebo", MatchPreset2="Placebo", NoiseProcess=0, SourceMatch=3, Lossless=2)
# Trim the fat
Trim(329, end=222768)
# Crop to 704x480
Crop(4, 0, -12, -6)

For some reason, Neat Video produces two identical frames for every input frame I provide in the first script, so the first step in the second script tells AviSynth to disregard those duplicate frames and assume we’re dealing with a 59.94 Hz interlaced NTSC video. I then convert Neat Video’s intermediate RGB result back to its original YUV encoding, once again using the Rec.601 matrix corresponding to my video’s NTSC format.


YUY2 is a digital representation of YUV-encoded video, using 16 bits per pixel. When capturing with a chroma subsampling ratio of 4:2:2, each chroma sample is shared with two luma samples. YUY2 encodes this by reserving 8 bits for each of four luma samples (“Y”), then 16 bits for each of the two chroma samples (8 bits for each of the two components of “UV”). When you add it all up, 8 bits x 4 luma samples + 16 bits x 2 chroma samples == 64 bits. Divide that by the four luma samples and you get 16 bits per pixel.

Before calling ConvertToRGB in the first script, our original raw capture was encoded as uncompressed YUV 4:2:2. ConvertToRGB does a bit of math to map the YUV values into RGB space given the conversion matrix specified. If you are converting back to YUY2 from an RGB clip created by AviSynth, using ConvertBackToYUY2 can be more effective because it is allowed to make assumptions about the algorithm initially used to convert to RGB, given that both functions belong to AviSynth. In particular, ConvertBackToYUY2 does a point resampling of the chroma information instead of averaging chroma values as ConvertToYUY2 would, resulting in a chroma value closer to that in the original pre-RGB source clip.

After cleaning up the noise with Neat Video, to prepare these videos for playback on a progressive scan display such as a PC, mobile device, or modern TV, I still need to apply a deinterlacing filter. Each NTSC frame is split into two “fields”—the first field alternates every other line, and the second field has the other half. NTSC only has enough bandwidth to transmit 29.97 frames per second, so interlacing is a trick used to send half-sized “fields” at twice the rate. CRT TVs have a slower response time than modern digital TVs, so the net effect of interlacing on CRTs is that one field “blurs” into the next, producing the illusion of a higher rate of motion. Nowadays when TVs receive an analog NTSC signal they’re smart enough to recognize that it’s interlaced and apply a simple deinterlacing filter. But such filters are designed to work in real time, and without one, an interlaced source will appear to stutter when compared to its deinterlaced 60 Hz counterpart. A free third-party AviSynth script called QTGMC can do a much higher-quality job of deinterlacing ahead of time, with the caveat that I need to apply this script in advance and offline, that is, not in real time.

Here’s something to know about me. When it comes to filtering and rendering media offline, it doesn’t matter to me all that much how long it takes. The only deadline I have with this stuff is the eventual deterioration of the source media (and my own mortality, I suppose). For the moment, I have time and computing cycles to burn. As with the eight-hour Neat Video process, I have no qualms about backgrounding a rendering queue and letting it run overnight or even well into the next day. So when QTGMC and other tools advertise “Placebo” presets, I use them. I sleep better at night knowing there’s nothing more I could do to get a better result out of my tools, however miniscule. 🙂

I finally trim the start and end of my raw capture, leaving only the actual content behind. The last step crops some “garbage” lines from the bottom (VBI data I don’t need to save, but my Blackmagic card captures anyway), along with some black bars on the left and right sides of the image. Not surprisingly, the final dimensions of the video come out to 704×480: the visible part of a digitally-sampled NTSC frame. I render the whole thing out as a lossless Huffyuv-encoded AVI once again, saving the final lossy encoding step for FFmpeg.

At this point, I have a couple of directions I can take. I can prepare my final video for export to DVD, or I can split my video into one or more video files suitable for uploading to my Plex server. I can do both if I want. In either case, I need to prepare chapter markers so FFmpeg knows how to split the final, monolithic video into its constituent clips. I use Google Sheets for this; I have one sheet with columns for “Start time,” “Duration,” “End time,” and “Label.” I put the “Start time” column in the HH:mm:ss.SSS format that FFmpeg expects for its timecodes, and I express the “Duration” column in seconds with decimals. When I’ve marked all the chapters (relying on VirtualDub’s interface for precise frame numbers and timing), I pull the “Start time” and “Duration” columns into a new sheet and export to a CSV file.

I wrote a handful of Windows batch scripts that take this CSV file, a path to FFmpeg, and a path to my final video, and split my final render into a collection of H.264-encoded MP4 files, or a series of MPEG files encoded for NTSC DVD. In the DVD version, I also generate an XML file suitable for import into dvdauthor. I found that neither my iPhone nor my Android devices like H.264 files encoded with a chroma subsampling ratio higher than 4:2:0, so I wrote a script specifically to encode videos destined for mobile devices. These scripts are longer than make sense to post inline here, so here are links to each one if you’re curious and would like to use them:

Capturing and digitizing my family’s movie memories will help preserve these moments so they can be enjoyed in the years to come without wearing out the original tapes. The biggest cost to this digitization process is time, both in capturing from tape (which is always done in real time) and the offline clean up and encoding to digital files. I still have more than a couple dozen tapes to sort through, so this blog post doesn’t signify the end of this project, but now that I have a reliable process that results in what I consider a marginal improvement over the source material, I like to think I can get through the remaining tapes on a sort of “autopilot.”

I hope this post will inspire someone else to preserve their own memories.

Bonus postscript: field order

As I mentioned above, an interlaced NTSC frame is split into two “fields,” with each field alternating between even and odd lines of the full frame. But which field is sent first? The even field, or the odd field?

The answer, as one might expect, is “it depends.” For my purposes here, the fields in a 480i NTSC signal are always sent bottom field first. But a 1080i ATSC signal—as is used with modern digital TV broadcasts—sends interlaced frames top field first. If you crop the source video before deinterlacing, you have to be careful to crop in vertical increments of two lines, or you run the risk of throwing off the deinterlacer’s expected field order. For example, if line 12 is the first visible line of the first field of my picture, it’s reasonable to expect it to be the bottom field of the frame. After cropping 13 lines from the top of the frame, though, the deinterlacer will assume that line 13 is the bottom field, when it really is the top field. The deinterlacer will then assume that the previous frame is the current frame and will get confused, resulting in motion that appears jerky.

If you must crop an interlaced source by an odd number of lines, AviSynth fortunately has functions to assume a specific field order. AssumeTFF, as its name implies, tells AviSynth that an interlaced clip is Top Field First. Its complement, AssumeBFF, tells AviSynth that a clip is Bottom Field First, though for raw analog NTSC captures this function is largely unnecessary.

Not sure which order is correct for a particular clip? Try deinterlacing in one order, and if the motion looks jerky, try the other. If the motion is smooth, the order is correct.

Alternatively, avoid capturing from interlaced sources altogether. 😛

Rooting the “Pippin Classic”

It was a rhetorical question.

OK, I’ll admit that the Retroquest Super Retro Castle isn’t really a “Pippin Classic,” but it sure looks the part, and when I learned of its existence back in April, I figured it had to have at least as much power as a stock Pippin. Its advertised specs reminded me of the Raspberry Pi, leading me to speculate that maybe the Super Retro Castle is nothing more than a Raspberry Pi in a fancy Pippin-inspired case:

  • Two USB controller ports
  • HDMI output
  • microSD slot
  • DC power jack

Odds seemed good to me that it runs either a variant of Android or mainline Linux. Therefore, couldn’t I turn it into a sort of miniature Pippin console?

With a working knowledge of Linux and a little bit of hardware hacking, maybe I can. 🙂

Pippin and "Pippin Classic"
Don’t talk to me or my son ever again. 😛

Coaxing the Super Retro Castle into running user-provided code is certainly easier than cracking its inspiration. Booting it quickly reveals that it’s running a mostly-stock build of RetroArch on Linux, and booting it with the bundled microSD card inserted reveals that it’s configured to search the card for supplemental configurations. A-ha, a vector! Without making any modifications to the console itself, it is trivial to add new ROMs and even new libretro modules for emulators not built in to the Super Retro Castle. Just make sure the desired modules are built for the “armhf” architecture and the respective .lpl playlist file is configured with the correct path to the .so file under the microSD card’s /storagesd mountpoint, and you’re off to the races.

Bonus points: it runs RetroArch as root.

This is great if you don’t mind the Super Retro Castle’s built-in software and interface and are content to use it as an cheap emulator box. Don’t get me wrong, it’s great at what it does, but 1) I can’t read Japanese and 2) I’d like to be able to replace its built-in software with something newer, or something different entirely. To do that, I’d have to break out of its RetroArch jail and get access to its filesystem, ideally via a shell.

Rooting the Super Retro Castle, though similarly easy, requires a bit more elbow grease. Step one is getting the case apart, which is pretty straightforward: Just remove the four Phillips-head screws located underneath the rubber pads on the underside of the case:

Underside of the Super Retro Castle
Opened Super Retro Castle

Right away we notice that most of the console’s weight is concentrated in the metal heatsink affixed to the bottom of the case. But the other thing we see is that the Super Retro Castle is not a Raspberry Pi or derivative at all; it appears to be a custom logic board built around the Amlogic S905X SoC.

Overview of the Super Retro Castle logic board
And now I see with eye serene
The very pulse of the machine

Wikipedia tells me quite a bit about the S905X: It sports a quad-core ARM-based CPU clocked at 1.2 GHz, a Mali-450 GPU, and support for decoding H.264-encoded video at up to 1080p60 in hardware. Quite the chip. When implemented on the Super Retro Castle board, the S905X is supported by two Samsung K4B461646E-BCMA chips for a total of 1 GiB of RAM, and one Samsung KLMBG2JETD-B041 eMMC chip for 32 GiB of onboard storage. The latter definitely explains the wealth of preloaded ROMs available for selection right after bootup. 😉

A handful of pads next to the S905X chip suggest a JTAG interface, but what piques my interest is this small set of vias next to the microSD card slot:

UART pins on the Super Retro Castle logic board
A literal backdoor

GND, TX, RX, and VCC? That sure looks like a UART serial interface to me, and if the Linux image used is only slightly tweaked from defaults, I should expect to see a console on this interface at 115200 baud. But first, let’s solder on a header:

Header soldered to UART vias

Next, route a short cable out through the rear vent holes…

Jumper cable through rear of the Super Retro Castle

… reassemble it…

Reassembled Super Retro Castle with UART cable through back

… and now it almost vaguely resembles a Pippin dev/test kit. 😛

Side note about UARTs

It is tempting to think of a UART interface such as the one found on the Super Retro Castle as the same interface used by oldschool PC serial ports (RS-232 / RS-422). But beware: they are not the same. Not only are the voltage ranges different—between -15 and +15 V for RS-232 and -6 to +6 V for RS-422—but UARTs are TTL devices (transistor-transistor logic) that expect voltages between 0V for logic low, and either 3.3V or 5V for logic high. Attempting to drive a UART by naively wiring it to a USB serial adapter (as I initially did) therefore runs the risk of frying the UART, which in the instance of the Super Retro Castle is built in to the S905X SoC. Fortunately the Super Retro Castle has some protective circuitry somewhere, so while my initial efforts produced garbage data regardless of baud rate, I was lucky in that I could try again with a proper USB UART adapter. I picked up a μART from Crowd Supply for this purpose and I really like it so far. 🙂

UART adapter wired to the Super Retro Castle

Getting root at this point just required wiring the GND, TX, RX, and VCC pins to my μART adapter, connecting the adapter to my PC, and opening up a PuTTY instance over the COM port it provides. At 115200 baud, I get a full boot log confirming four CPU cores each clocked at 1.2 GHz, U-Boot as the bootloader, and a build of Lakka running from the internal eMMC chip mounted read-only, followed by a root shell prompt. The initial syslog also seems to indicate the presence of a network interface—this follows from the description of the S905X chip on Wikipedia, but I don’t see any unused pads on the main board suggesting the relevant pins are routed from the SoC. Next steps for me will be backing up an image of the internal filesystem before attempting to remount it read-write so I can augment it with my own provided software.

Maybe I can get Advanced Mac Substitute to run on it and truly turn it into a real “Pippin Classic.” 🙂

Decomposing Professional Composer

Professional Composer

I owe a lot to Mark of the Unicorn’s Professional Composer. Had my dad not encountered this program around 1985 and subsequently adopted it (and its corresponding Mac hardware) for himself two years later, I would not have grown up with Macs, possibly even computers. I certainly wouldn’t have become as familiar with music notation software, let alone music theory, as I am today. My dad tells the origin story thusly:

After Graduate School (1985 or so) I became familiar with the notation program called Professional Composer™. The program was housed in the ISU [Illinois State University] computer lab where it was run on several Macintosh 512 computers. I’ve always been something of a visual learner when it came to things like this and found I could navigate this program rather easily without having to read a manual. I began by doing easy arrangements of trumpet quartets. Early on, these programs ran on 3 1/2″ floppy disks which meant that your files couldn’t be very big before you’d need another disk. This gave way to hard drives but even then you were still limited as to how big your files were or how many files you had on the drive.

I remember asking upper-level administration in District #131 that if I bought the hardware (in this case a Mac Plus), would they buy me the software for this program? They agreed and the rest is pretty much history.

MOTU discontinued Professional Composer (hereafter referred to by the nickname “ProCo,” courtesy of my friend and fellow hacker Josh Juran) sometime after its final 2.3M revision was released in 1990. My dad used this program almost every day for 20 years(!), after which I had convinced him to crossgrade to MOTU’s Composer’s Mosaic. The latter offered better MIDI playback and print layout capabilities, plus could import his by this time extensive library of ProCo files. However, neither ProCo nor Mosaic files can be fully imported into any modern music notation program. It therefore fell upon me as the family’s computer expert—and continues to even now—to ensure that the hardware powering my dad’s favorite music software continues to run despite all other advancements. As a matter of course, I have an intimate knowledge of the capabilities, requirements, and quirks of these two programs. (I have a lot of sympathy for banks and government institutions tasked with similar mandates.)

By the time ProCo 2.3M came out, hard drives were common and the ProCo application itself had long since outgrown its original home on a 400K (later 800K) boot disk. So 2.3M shipped on an 800K disk offering the option to install or remove itself to or from an attached hard drive, respectively, if launched from the master disk. Installing to the hard drive decrements an “install count” on the master disk, allowing the user to use one master disk to install ProCo to one hard drive at a time. Merely copying the ProCo application to a hard drive isn’t enough; if the application isn’t properly installed by the master disk, launching the program from hard drive prompts the user to insert the master disk if not already present. I remember accidentally wiping out at least one of these hard drive installations from my dad’s Mac as a curious tinkerer in my youth (for which Dad was not pleased), leading Dad to request/beg MOTU for one final backup master disk some time in the mid-90s. It is a testament to the quality of 3.5″ floppies back then along with how well my dad takes care of them that the disks remain usable, some 30+ years later.

Hell hath no fury like my dad scorned.

Eventually Dad got me my own Mac(s), where I could hack away safely—safe from his expensive software, at least. But as I grew as a hacker and programmer, acquiring and installing various software packages of my own, encountering—and defeating—assorted authentication schemes, going deep down the rabbit hole of the inner workings of Mac OS, and even dipping my toes into the field of software preservation, the music notation program that started it all continued to elude me. Why can’t Disk Copy or DiskDup produce a working substitute for the ProCo master disk? Why is it so difficult to duplicate the master disk using, for example, a KryoFlux? Why can I install ProCo to an emulated HD20 via my Floppy Emu, but not to a mounted Disk Copy disk image? Why does ProCo crash when After Dark kicks in? Why does its installer only appear when launched from the original master disk? And how does ProCo know that it has been properly installed?

I’m determined to finally find out.

Twenty years is much too long to be greeted by this.

ProCo has a minimal About dialog, displaying the name of the software, the version, its copyright years, and credit only to “Mark of the Unicorn, Inc.” I therefore have no real idea who wrote it, despite having asked MOTU via email and Twitter for the source code multiple times over the years. Ultimately my goal here is to develop a conversion utility that brings ProCo files into the 21st century, so I’d even be satisfied with documentation of its file format, but I would be surprised if there is anybody left at MOTU who is even aware of Professional Composer, let alone familiar with a product they haven’t supported in over a quarter century.

So off to the disassembler we go. 😉

“Much more advanced?” We’ll see about that.

A Short Primer on Memory Management and Launching 68K Applications on the Mac

Much of the Mac’s software is split into chunks of data and code called “resources” that can be swapped in and out of RAM as needed. In order to maximize the use of the sparse amount of memory on the original Macintosh, its designers traded a tiny bit of speed for greater efficiency when building the Memory Manager. When asked to load a resource from disk, the Resource Manager returns a “handle” to the loaded resource, which is a pointer to an OS-controlled “master pointer” pointing to a relocatable block within the heap. The Memory Manager can then be allowed to move or “compact” relocatable blocks in the heap, or even remove/”purge” such blocks when available RAM is running low. This relieves applications from some of this management burden and is leveraged throughout the Mac System Software. In addition to allocating their own handles and non-relocatable blocks, applications may mark existing handles with various attributes to guide the Memory Manager in its housekeeping; for example marking a handle “purgeable” allows the Memory Manager to free its associated block, and likewise “locking” a handle prevents it from being moved or freed.

Each time you double-click on a 68K application to launch it from the Finder, a carefully orchestrated sequence of events takes place:

  1. Finder calls the _Launch trap with the name of the application.
  2. The Segment Loader opens the resource fork of the file passed to _Launch and immediately preloads ‘CODE’ resource ID 0. ‘CODE’ 0 is a specially formatted ‘CODE’ resource. It contains the parameters necessary to set up a non-relocatable block of memory near the top of the application’s memory space containing application and QuickDraw globals, any parameters passed to it from the Finder, and the application’s jump table. Following these parameters is the jump table itself: a list of tiny eight-byte routines that each load a ‘CODE’ resource, or “segment,” and jump to an offset within that segment.
  3. Using the parameters at the start of ‘CODE’ 0, the Segment Loader allocates space for globals pivoting around register A5, which is eventually passed to the application. This is known in Mac programming parlance as the “A5 world” and is unique to each running application.
  4. The jump table is copied above the Finder’s application parameters in the A5 world and the ‘CODE’ 0 resource is released.
  5. The first entry in the jump table is executed, and the application takes control.

The relative jump instructions of the original 68000 processor are limited to signed 16-bit offsets, so branches or subroutine calls are limited to 32K offsets in either direction from the current program counter. In order to accommodate programs with more than 32K of code under the memory constraints of the original 128K Macintosh, the Segment Loader was invented which manages applications split into ~32K code “segments.” Code within each segment can make intra-segment jumps (branches or subroutine calls), but once a subroutine is needed outside a particular segment, a call must be made to the jump table which in turn loads the necessary segment. New segments are returned as handles to relocatable blocks just like any other resource, so as they are loaded the Memory Manager automatically compacts the heap and/or frees purgeable handles to make room in RAM. Recall that the jump table is copied to a known location relative to the A5 register, so applications always have easy access to it. But since new code segments are created at locations on the application heap unknown at compile time, this also means that all code segments are invoked assuming position-independent code, meaning all branches and subroutine calls are relative.

All 68000 processors support branches to absolute addresses utilitizing the full usable width of the address bus. The 68020 and later processors support larger relative branch offsets, so segments are not necessarily limited to around 32K. Well-behaved applications check at launch that the host Mac has their necessarily capabilities, and exit early if not. But for maximum compatibility, some applications built with compilers such as CodeWarrior are generated with a table of offsets to absolute branch instructions within each code segment. These instructions are compiled as jumps to offsets within the segment relative to zero—sure to crash the Mac if executed as stored. But in a small bit of “preflight” code, these absolute branches are fixed up to point within the segment, providing larger branch offsets to all Macs. This is how the ‘rvpr’ 0 resource was compiled for the Pippin.

The first thing we notice is that ProCo’s jump table contains one valid entry and then… a lot of nonsense. This is a likely sign of an encrypted jump table—Epyx’s Temple of Apshai along with Winter Games also uses this obfuscation trick to scare off casual hackers. In fact, almost all of ProCo’s ‘CODE’ resources look to be encrypted! If I hope to make any sense out of ProCo’s file format by looking at its code, we’ll need to derive the algorithm that decrypts the rest of this.

MOVE.W #$0029,-(A7) pushes the ID of ‘CODE’ resource 41 onto the stack prior to jumping into it via _LoadSeg. Once there, we start by allocating a couple of memory blocks to use, starting with a 384-byte block of memory that we’ll call the “environment” block. We stash the stack pointer into offset 28 of that block, push a pointer to our environment block onto the stack, then stash the value of the ScrDmpEnb global into offset 32 of our environment block.

ScrDmpEnb is short for “screen dump enable” which originally meant whether the screen shot feature is enabled via Command-Shift-3 on Macs, but grew to include other FKEYs as well. One popular third-party FKEY available to hackers was the “Programmer’s Key,” which drops into the installed debugger when invoking e.g. Command-Shift-7, providing a way to drop to MacsBug without a physical programmer’s switch installed on the side of the machine. But the same functionality could be had by simply writing your own equivalent FKEY, following instructions provided in the official MacsBug manual. MOTU certainly couldn’t have made it easy for most would-be crackers to just conveniently drop into the debugger during the startup process of their precious software, so ScrDmpEnb is set to zero, effectively disabling all FKEYs.

What the FKEY?!

The Apple ][, Lisa keyboard, original Macintosh keyboard, and most keyboards that later shipped with 20th-century Macs, did not feature what we know as “function keys”: the F1-F12 (and beyond) keys that adorn the top of most keyboards today. These devices more closely mimicked typewriter keyboard layouts that many users were familiar with at the time of their respective introductions. But on the Apple ][, there are a few reserved keyboard shortcuts that are always available to the user: Control-Reset breaks out of the currently running program, and Control-Open Apple-Reset resets the computer, for example.

These shortcuts are hardcoded into the ROM and not easily modifiable by the user. On the original Mac, since localization (in particular, keyboard layouts) fell out of the disk-based System Software’s foundation built on resources, it’s only natural then that shortcuts be handled by the OS in a modular way as well. So Apple made up for the lack of physical function keys by providing several “virtual” function keys bound to Command-Shift-numbers. When invoked, these shortcuts run tiny programs stored as ‘FKEY’ resources in the System file, which is why they are known as “FKEYs.” Programmers quickly discovered that they could write their own tiny FKEY programs and install them into the System file assigned to otherwise unused numbers.

The original set of FKEYs as shipped in 1984 are as follows:

  • Command-Shift-1: eject the first/internal floppy disk, if present
  • Command-Shift-2: eject the second/external floppy disk, if present
  • Command-Shift-3: take a screenshot and save it to disk
  • Command-Shift-4: take a screenshot and print it

The first two ejecting FKEYs went away with the introduction of Mac OS X in 2001, as Apple had stopped shipping Macs with floppy drives by then (though macOS continues to support external drives natively). But Command-Shift-3 lives on as the assigned shortcut for saving screenshots to disk—one of the few remaining holdovers from the original 1984 System Software.

We then allocate a handle to a new locked 82-byte “context,” pop the pointer to our environment block into offset 48 of our context, then push our newly-allocated context’s handle onto the stack. Next we pass the pointer to the top of the ‘CODE’ 41 resource we’re executing from to _RecoverHandle to get our ‘CODE’ resource’s handle. We store this handle at offset 0 of our context, then store its master pointer at offset 4. We finally recover our context’s handle from the stack before passing it to the first real “stage.”

Stage 1: Front Line Disassembly

Stage 1 is fairly simple and calls three subroutines before launching into Stage 2, looking roughly like this when decompiled back to pseudo-C code:


context.aggregateChecksum = 0;


jumpTo(context.stagePtrs[1]);   // jump to stage 2

So let’s break it down, function by function. We start with initContext, which looks like this:

void initContext(Ptr contextPtr)
    for (short i = 0; i < 8; ++i)
        contextPtr->stagePtrs[i] = stageInfoCmd(

    contextPtr->stageGlobalsPtr = contextPtr->stagePtrs[7];

    contextPtr->code41Size = _GetHandleSize(contextPtr->code41Handle);
    contextPtr->code41End = contextPtr->code41Ptr
        + contextPtr->code41Size - 1;

initContext initializes a block of eight pointers to the results of stageInfoCmd, which looks like this:

enum StageInfoCmdSelector
    SET_OFFSET = 0,
    GET_OFFSET = 1,

Ptr stageInfoCmd(
    StageInfoCmdSelector selector,
    Ptr stagePtr,
    short index,
    Ptr codePtr)
    static short offsets[] =
        0x0086,     // offset to Stage 1 from top of 'CODE' 41
        0x0388,     // offset to Stage 2 from top of 'CODE' 41
        0x0D24,     // ?

    Ptr outPtr = nullptr;

    switch (selector)
    case SET_OFFSET:
        offsets[index] = (short)(stagePtr - codePtr);
    case GET_OFFSET:
        outPtr = codePtr + offsets[index];
        outPtr = offsets;

    return outPtr;

So initContext initializes a block of eight pointers in our context to point to eight different areas of 'CODE' resource ID 41. The first of these pointers points to the Stage 1 code we're currently executing, and the second of these pointers points to the encrypted block of 'CODE' 41 immediately following the bits of Stage 1 that are recognizable as executable code. This eventually becomes the Stage 2 code that we jump to later. (It's interesting that stageInfoCmd is only ever called with the GET_OFFSET selector, making the rest of that function dead code.)

Next we initialize a field in our context that's used to store an aggregate of all computed checksums. We then call updateStage2Checksums which looks like this:

long updateStage2Checksums(Ptr contextPtr)
    contextPtr->latestChecksum = calculateChecksum(

    contextPtr->aggregateChecksum += calculateChecksum(
        'PACE') + contextPtr->latestChecksum;

    return contextPtr->latestChecksum;

updateStage2Checksums in turn makes a couple of calls to calculateChecksum, passing each call the blocks leading up to, and following Stage 2, respectively.

long calculateChecksum(Ptr startPtr, Ptr endPtr, long seed)
    long size = endPtr - startPtr;
    long cksum = seed;
    long sizeInLongs = size / sizeof(long);

    Ptr ptr = startPtr;
    if (sizeInLongs != 0)
        size -= sizeInLongs * sizeof(long);
        for (long i = 0; i < sizeInLongs; ++i)
            cksum += *((long*)ptr)++;

    if (size > 0)
        for (long i = 0; i < size; ++i)
            cksum += *(byte*)ptr)++;

    return cksum;

One thing that sticks out immediately to me is the use of the longword 'PACE' as an initial checksum seed. 'PACE' is likely a reference to PACE Anti-Piracy: a company founded in 1985 that's still around today. MOTU adopted PACE's protection code for ProCo starting with version 2.1, released in late 1987. Indeed, pirated versions of ProCo existed as early as 1985; some are still available on the Internet. With 1.0 and 2.0 unencrypted, "sharing" these was much easier than with later versions. My dad acquired his Mac Plus / ProCo combo in the summer of 1987—possibly June of that year—so with 2.1 having a creation date of September 11, 1987 and assuming MOTU shipped the latest version with new orders, it follows that 2.0 is the earliest version my dad owns. MOTU periodically shipped new master disks containing updated versions of ProCo to registered owners as they became available, free of charge—a practice I commend them for. 🙂

Gotta catch 'em all!

Now that we have our checksums, we're ready to head into decryptStage2. The decryption code takes the checksum of the code to be decrypted as the "key." This makes binary patching the ProCo application an involved process, since the remainder of the code would need to be reencrypted for decryption with its new checksum to succeed. One thing is for sure about this protection: it is designed to be resilient against quick-and-dirty patches.

void decryptStage2(Ptr contextPtr)

void decryptStage2Block(long key, Ptr startPtr, Ptr endPtr)
    long size = endPtr - startPtr;
    long sizeInLongs = size / sizeof(long);

    Ptr ptr = startPtr;
    if (sizeInLongs-- != 0)
        long longsLeft = sizeInLongs;
            long rotCount = key & 0x0F;
            if (rotCount == 0)
                rotCount = 1;
            key = rotateRight(key, rotCount);   // Ror.L in 68K
            *((long*)ptr)++ ^= key;
        while (longsLeft--);

        size -= sizeInLongs * sizeof(long);

    if (--size >= 0)
        long bytesLeft = size;
            long rotCount = key & 0x0F;
            if (rotCount == 0)
                rotCount = 1;
            key = rotateRight(key, rotCount);   // Ror.L in 68K
            *((byte*)ptr)++ ^= (byte)key;
        while (bytesLeft--);

With Stage 2 now fully decrypted, we can jump right in by loading contextPtr->stagePtrs[1] into A0 and JMPing right to it.

That wasn't so hard, was it? 😛

There is still at least another stage of this protection to get through, and we're hardly that much closer to decrypting the remaining 'CODE' resources or the jump table. Knowing PACE's reputation, this is likely a small victory in what will ultimately be a long battle. ProCo is legendary in some Mac circles for its copy protection, so if what I've heard of it is true, then I'm surely in for a ride. PACE even makes a bold claim on their own website:

We know it sounds like an unrealistic boast to say our anti piracy software cannot be cracked. Our goal is to stay ahead of the curve and hacking trends. We avoid giving known hooks or patterns that they recognize, and we pepper our anti piracy solutions with methods that we know are time consuming and difficult, if not impossible, to remove.

Challenge accepted. 😉

Exploring the Pippin ROM(s), part 9: Kickstart

I’ve been busy. The Pippinizer is going to take me longer than I expected to put together into a releasable form, so I wrote a small utility that should tide folks over until that’s ready.

Introducing Pippin Kickstart. This is a small, carefully-crafted boot disc for the Pippin that circumvents the console’s built-in security and instead offers the choice to boot from an unsigned volume. It works on 1.0, 1.2, and 1.3 Pippins (so, every known retail Pippin ROM out there as of the time of this writing) without any modification.

Pippin Kickstart booting
It’s basically like Swap Magic, but for Pippin.

To use it, simply download the Pippin Kickstart disc image available here, burn it to CD, and use that disc to boot the Pippin. Pippin Kickstart will identify what ROM and RAM it detects, eject itself, and then immediately begin searching for a bootable volume candidate. The Pippin will boot from CD-ROM using only its internal drive, but other types of removable media may work as well assuming that they can boot a regular Mac without special drivers. It also has been tested working using an external hard drive.

“But Keith, I thought 1.3 Pippins don’t do the authentication check at startup. Why would I use Pippin Kickstart with a 1.3 Pippin?” While it is true that ROM 1.3 does away with the signing check, it is still hardcoded to boot only using the Pippin’s internal CD-ROM drive. Pippin Kickstart offers owners of 1.3 Pippins the ability to boot from other media sources such as a hard drive, providing itself as a sort of “launch pad.”

The Pippin Kickstart disc is a hybrid HFS/ISO image containing the source code, a short README, and—just for fun—a few extra “goodies” that I found useful during its development:

  • MacRelix ROM Copier by Josh Juran, used to dump the ROM of my own 1.2 Pippin
  • tbxi by Elliot Nunn, a project which evolved from an early tool Elliot wrote that I used to extract the ‘rvpr’ resource kicking off this whole mess
  • FDisasm and FindCode by Paul Pratt, indispensable tools used to locate and examine code within the Pippin’s ROM

All of these extras are licensed according to their respective authors.

I’ve licensed the Pippin Kickstart bootloader under the GPLv2. Source code is available on my Bitbucket: https://bitbucket.org/blitter/pippin-kickstart

Have fun.

UPDATE (20210209): I’ve updated Pippin Kickstart to version 1.1, which patches the SCSI Manager on units equipped with ROM 1.0 so that they too may boot from external SCSI volumes. It is available here (I’ve updated the rest of this post as well) and a detailed explanation is available here.

Exploring the Pippin ROM(s), part 8: Cracking open the Pippin (without the boys)

The Bandai Pippin has finally been cracked.

They say a picture is worth a thousand words. So I put together a fun proof-of-concept demo, and made a video to summarize these last few months.

For an in-depth technical explanation of what’s happening here, here’s some further reading:

Exploring the Pippin ROM(s), part 7: A lot to digest

Apple’s public key for verifying the authentication data on a Pippin boot volume is:
E0 E0 27 5C AB 60 C8 86 A3 FA C2 98 21 79 54 A8 9F D1 B9 DC 8A BA 84 EF B1 E7 C9 E2 1B F7 DD D7 DC F0 E4 4A BB 79 51 0E 7C EB 80 B1 1D

How did I find this? Strap in and let’s go for a ride. 🙂

The What

Quick recap: I want to unlock homebrew on the Pippin. Every time a Pippin tries to boot from an HFS volume on CD-ROM, it loads the ‘rvpr’ resource with ID 0 from ROM and executes it as code, passing as arguments that volume’s ID and two blocks of data found elsewhere in ROM. ‘rvpr’ 0 locates and reads an RSA-signed “PippinAuthenticationFile” from the volume. It contains a 128-bit digest for each 128K block in the volume, along with a 45-byte signature at the end of the file. If the signature cannot be verified or any of the digests don’t match what ‘rvpr’ 0 calculates from the volume in its main loop, the Pippin (r)ejects the disc.

“Feed me, Seymour!”

Last time I looked at ‘rvpr’ 0, I examined and broke down what the main loop does at a high level. Outside of main, there are ten non-library function symbols in the resource, six of which I’ve identified their usage. Reading through these, I determined the format of the PippinAuthenticationFile and how its data is fed to the rest of the main loop. The remaining four functions—VerifyDigestInfo, VerifySignature, CreateDigest, and InitRSAAlgorithmChooser—form what I conjectured to be the “meat” of the authentication process. I elected to pore over these at a later time.

‘rvpr’ 0 is over 35K in size, with almost 34K of that comprised of 68K assembly code. Unfortunately, what I looked at in part 6 only touches about 3K of that—not even 10% of the whole. What’s worse is that the remaining 31K/90+% of code is almost completely lacking symbols, save for an occasional call to T_malloc, T_memset, T_memcpy, or T_free. Without human-readable symbols to guide what the remaining memory locations, values, and functions mean in the context of ‘rvpr’ 0’s greater purpose, I would be “flying blind” without a safety net. If I was to attempt to grok this code to the same degree as I currently understand main and its (named, mind you) auxiliary functions, I would have a long road ahead of me especially if I used the same static analysis technique of stepping through the code offline on paper.

The Why

I decided that the best way to figure out the rest of this code was to use dynamic analysis; that is, to examine it while it’s running. There were just too many subroutines and too much data being pushed around to keep it all straight in my head. I needed a computer to help. I don’t have any hardware debuggers for the Pippin that would allow me to step the CPU and examine the contents of RAM, and no working software emulators exist for the Pippin (yet). What I found does exist is a suite of 68K assembly tools—a code editor, binary editor, and, crucially: a simulator—called EASy68K. If I wanted to look at ‘rvpr’ 0 in something that even somehow resembled a debugger, I’d have to build a working version of ‘rvpr’ 0 that could be run outside of a Pippin, without first understanding how the code works in the first place.

EASy68K’s simulator provides a hypothetical computer system featuring a 68K CPU, 16MB of RAM, and rudimentary I/O facilities. Luckily, ‘rvpr’ 0 is pretty self-contained, which allowed me to quickly “port” it to EASy68K. I was correct in that this technique significantly accelerated my understanding of the digesting and verification process, but as I hope to elucidate later in this post, that pursuit required very little actual parsing of code. 🙂

The How

The first step to building an ‘rvpr’ 0 replica in EASy68K was to adopt the syntax EASy68K likes. I prefer FDisasm for disassembly because it’s part of the Mini vMac project and as such, it’s at least a tiny bit aware of the classic Mac OS API, known as the Toolbox. FDisasm can replace raw A-traps (two-byte 68K instructions beginning with the hex digit $A, which typically map to commonly-used subroutines) with their corresponding human-readable names according to official Mac documentation, which is a nice time-saver especially in large blocks of 68K Mac code. I also like FDisasm’s output formatting, which is the basis of how I list 68K assembly in these blog posts.

I admit it ain’t neat, but it’s handy.

Cloning ‘rvpr’ 0 in EASy68K serves two purposes. First, I can step through it using a real signed Pippin CD, observe what its code does, and document it. After I reverse-engineer the authentication process though, this functional copy will serve a second purpose: to verify that my own authentication files are crafted properly. Since we know from part 2 and part 6 that the main loop will return zero in register D0 if the verification process succeeds, we should be able to observe that in the simulator. Using our own ‘rvpr’ 0 binary that’s as close as possible to what’s in ROM on an actual Pippin should assuage doubt as to whether a proof-of-concept will pass the console’s tests or not. Plus, since it’s all simulated in software, it saves me from having to burn a ton of test CDs. 😛

Converting my (annotated) disassembly from FDisasm’s syntax to EASy68K’s was easy—regular expressions to the rescue. Assembling the result produces code identical to what’s in the original resource—yay, the assembler works, and we have a byte-for-byte clone of what’s in the Pippin ROM. But making this new replica functional required a little bit of creativity.

On a real Pippin, ‘rvpr’ 0 is loaded from the ROM’s resource map into an area on the system heap in RAM. The relocation code at the beginning of ‘rvpr’ 0 patches each subroutine jump by offsetting them relative to where the code resides in RAM (discussed in part 6). It keeps track of whether this is done by storing this offset in a global when relocation is complete. Recall from part 3 that this global has an initial value of zero when ‘rvpr’ 0 is first run. If this code is executed a second time, it subtracts this global from its base address in RAM and, if the result is zero, it doesn’t need to do relocation again since the jumps already have valid destination addresses.

The simulator comes with no bootloader at all but starts up fully initialized, so in essence the contents of ‘rvpr’ 0 form the “ROM” of our virtual computer. We thus boot directly to ‘rvpr’ 0’s entry point, at the start of the simulator’s memory space. But since ‘rvpr’ 0 now always starts at address 0, the difference between that initial global and our base address… is zero. So the relocation code never runs in the simulator; it doesn’t have to because those unpatched jumps are already relative to zero. 😉

By the time ‘rvpr’ 0 executes in a real Pippin’s boot process, many subsystems on the console have been readied: the QuickDraw API for graphics, the Device Manager for device I/O, and the HFS filesystem package to name a few. These APIs, having been designed and built by Apple for the Mac, only exist on a Mac-based system and therefore naturally aren’t present in the fantasy system we get from EASy68K. We are in luck though in that ‘rvpr’ 0 only makes calls to a grand total of nine Toolbox APIs. Four of these calls are used in the “prologue” code discussed earlier that relocate all the jumps before main is even called. Since that relocation code doesn’t run in our simulator, that leaves five Toolbox APIs essential to the main loop: _Read, _Random, _NewPtr, _DisposePtr, and _BlockMoveData. We need equivalents to these routines if we are to expect ‘rvpr’ 0 to work properly.

Oh yes I would. And I did.

_BlockMoveData is an easy one. It copies D0 bytes from (A0) to (A1):

8DCA   12D8    Move.B    (A0)+, (A1)+
8DCC   5380    SubQ.L    #1, D0
8DCE   6CFA    BGE.B     _myBlockMoveData

I took a shortcut with _Random: my implementation simply returns a constant value. I did this partially because I’m lazy but also because _Random is only called once, albeit in a loop: to determine which 128K chunks to digest. By controlling the values returned, I can selectively and deterministically test chunk hashing.

There’s an XKCD for everything, even cracking Pippins.

I took similar liberties with _NewPtr and _DisposePtr: I keep a global pointing to the next unused block, and _NewPtr simply returns the value of that global and then advances it by the requested size. _DisposePtr is implemented as a no-op. Why did I do this? Well, again, part of it is because I’m lazy and didn’t want to write a proper heap allocator for this, but also because it affords me the ability to inspect memory blocks used even after they’ve been “freed.” I don’t care about memory leaks in this case—in fact, here they’re a feature! 🙂 Since ‘rvpr’ 0 is roughly 36K, I set aside the first 64K of memory for it (and any additional supporting code/data I add, like these replacement Toolbox routines). With register A7 initially pointing to the top of memory for use by the stack, the rest of RAM starting at $10000 I designate for my “heap.”

Finally we come to _Read. EASy68K may be pretty bare-bones, but it does come with some niceties allowing for basic interactions with its host PC. In this case, I needed a way for my “virtual Pippin” to have random-access readability from a virtual CD-ROM drive. Fortunately, EASy68K provides this in the form of the Trap #15 instruction. My version of _Read only does the bare minimum of what ‘rvpr’ 0’s main loop requires: it opens an HFS disk image on the host PC, seeks to the offset specified in the ParamBlockRec passed on the stack, reads the requested amount of bytes into the specified buffer, then closes the file.

8DF0=  6361 6E64 ...     DC.B      'candidate.hfs', 0

8DFE   43F9 0000 8DF0    Lea.L     _myFilename, A1
8E04   303C 0033         Move      #51, D0
8E08   4E4F              Trap      #15

8E0A   2428 002E         Move.L    $46(A0), D2
8E0E   303C 0037         Move      #55, D0
8E12   4E4F              Trap      #15

8E14   2268 0020         Move.L    $32(A0), A1
8E18   2428 0024         Move.L    $36(A0), D2
8E1C   303C 0035         Move      #53, D0
8E20   4E4F              Trap      #15

8E22  303C 0032          Move      #50, D0
8E26  4E4F               Trap      #15

8E28  303C 0000          Move      #0, D0

Now that we’ve got functional replacements for the necessary Toolbox routines, how do we refit the rest of the code so that our versions are called instead of Apple’s, which don’t exist? I already had the Toolbox API names substituted in my listing, thanks to FDisasm, so I could simply create macros with those names that execute a tiny bit of code in place of those calls. The easiest way, and the method I tried first, is to invoke each replacement with a Jsr instruction, which is short for “Jump to SubRoutine.” This was really straightforward to do and assembled without issue, but upon loading and running in the simulator, I quickly discovered why this approach wouldn’t work. Jsr is a four, sometimes six byte instruction, whereas the original A-traps they were to replace use only two bytes. Since these larger instructions are inserted in the main loop near the beginning of the code, this throws off hardcoded addresses used later. Needless to say, when I ran in the simulator, a hardcoded Jsr landed instead in an unexpected area of code and I crashed almost instantly.

However I was going to invoke my faux-Toolbox calls, they had to be done in only two bytes. I thought for a second about how I could write an A-trap exception handler and leave Apple’s original A-trap instructions as-is, but I didn’t do that either because 1) laziness and 2) I thought of an easier way. Remember the Trap #15 instruction I used to implement _Read?

On a 68K processor, the two-byte Trap instruction provides a way to jump to any of 16 different addresses stored in a predefined area of memory, known as “vectors.” These 32-bit vectors are all stored consecutively in a block of memory that always starts at address $80. ‘rvpr’ 0 normally executes code at address $80, but that’s part of the address relocation done only on a real Pippin, not in our simulator. It is therefore safe for us to replace that block of code with the addresses of our replacement Toolbox routines, starting with Trap #0 and ending with Trap #3. Recall that I’ve implemented _DisposePtr as a no-op—which is the two-byte opcode $4E71—so I don’t need to set aside a trap vector for it. EASy68K only sets aside trap 15 for itself, leaving traps 0-14 for us to use however we wish. The code we do care about executing in the simulator doesn’t start until after address $100, so our entire trap table easily fits inside this block of unused code. How lucky can you get? 🙂

My very own Pippin “emulator”

With my cobbled-together Pippin “emulator” now up and running, finally I could take a look at the before and after of the remaining functions called by the main loop. I decided to start with CreateDigest, as I had already figured out the inner workings of CompareDigests, so this seemed like a simple starting point. CreateDigest starts by creating a context object for itself, and initializes one of its buffers with a curious but predictable pattern of 16 bytes: 67 45 23 01 EF CD AB 89 98 BA DC FE 10 32 54 76. Along the way it checks to see if any of this setup fails, and if so, the entire authentication check is written off as a failure. But assuming everything is fine, it enters a loop which digests our 128K input chunk up to 16K at a time, by passing each 16K “window” to an unnamed subroutine. Each time we iterate through this loop, the aforementioned buffer which was initially filled with a predictable pattern now instead contains a brand new seemingly random jumble of numbers. I suspected that this 16-byte buffer was used as a sort of working space for whatever Apple chose as a hashing algorithm, since its size matched that of the digests in the PippinAuthenticationFile.

764   B883              Cmp.L     D3, D4               ; chunk size > 16K?
766   6C02              BGE.B     dontResetD3          ; if so, use 16K for window size
768   2604              Move.L    D4, D3               ; otherwise we have <16K left, so use what's left as window size instead
76A   4878 0000         Pea.L     ($0)                 ; push zero
76E   2F03              Move.L    D3, -(A7)            ; push window size (typically 16K until the end)
770   2F0A              Move.L    A2, -(A7)            ; push working chunk buffer ptr
772   2F2E FFFC         Move.L    -$4(A6), -(A7)       ; push ptr to our hash buffer
776   4EB9 0000 6516    Jsr       Anon217              ; patched, creates hash? digest?
77C   2A00              Move.L    D0, D5
77E   4A85              Tst.L     D5
780   4FEF 0010         Lea.L     $10(A7), A7          ; cleanup stack
784   6628              BNE.B     createDigestCleanup  ; if Anon217 failed, bail
786   9883              Sub.L     D3, D4               ; subtract window size from chunk size
788   D5C3              AddA.L    D3, A2               ; advance working chunk buffer to next 16K-ish window
78A   4A84              Tst.L     D4                   ; still more data to hash/digest?
78C   6ED6              BGT.B     createDigestLoop     ; hash/digest it

78E   4878 0000         Pea.L     ($0)                 ; push zero
792   4878 0010         Pea.L     ($10)                ; push 16
796   486E FFF8         Pea.L     -$8(A6)              ; push address of local longword
79A   2F2E 0010         Move.L    $10(A6), -(A7)       ; push out buffer ptr
79E   2F2E FFFC         Move.L    -$4(A6), -(A7)       ; push ptr to our hash buffer
7A2   4EB9 0000 654C    Jsr       Anon218              ; patched, copies hash out?
7A8   2A00              Move.L    D0, D5
7AA   4FEF 0014         Lea.L     $14(A7), A7          ; cleanup stack

Finally, CreateDigest makes one more call to an unnamed subroutine, passing it among other things the number 16 (presumably the digest size in bytes), a pointer to its context object, and a pointer to a 16-byte area on the stack filled with $FF bytes. After this call, the working buffer is once again reset to its initial pattern, and the area on the stack is filled with what looks like it could be a digest.

Wait a minute, upon closer examination...


... the output hash matches what's in the PippinAuthenticationFile! This makes sense, because all CreateDigest does after this is tear down and dispose of its context object. It then returns to the main loop, where the computed digest is passed along for CompareDigests to, well, compare. So clearly those two unnamed subroutines play a vital role in computing the digest, however that's done.

I dove right in to the routine called in the loop. It starts by doing several integrity checks of the data structures it's about to use, then goes right into a confusing routine that appears to add the amount of bits in our up-to-16K input window to a counter of some sort, optionally increasing the next byte if the counter rolls over. I suspected this was used for a 40-bit counter of bits hashed, its purpose not obvious yet. It then enters a loop of its own, dividing our input window into yet another sliding window, this time of 64 bytes in size. Each iteration of this loop, it passes these 64 bytes and CreateDigest's 16-byte working buffer to an unnamed subroutine with some very interesting behavior.

7186   2612              Move.L    (A2), D3
7188   282A 0004         Move.L    $4(A2), D4
718C   2A2A 0008         Move.L    $8(A2), D5
7190   2C2A 000C         Move.L    $C(A2), D6
7194   4878 0040         Pea.L     ($40)
7198   2F2E 000C         Move.L    $C(A6), -(A7)
719C   486E FFC0         Pea.L     -$40(A6)
71A0   4EB9 0000 7C58    Jsr       ($7C58).L
71A6   2004              Move.L    D4, D0
71A8   4680              Not.L     D0
71AA   C086              And.L     D6, D0
71AC   2204              Move.L    D4, D1
71AE   C285              And.L     D5, D1
71B0   8280              Or.L      D0, D1
71B2   D2AE FFC0         Add.L     -$40(A6), D1
71B6   0681 D76A A478    AddI.L    #$D76AA478, D1
71BC   D681              Add.L     D1, D3
71BE   2003              Move.L    D3, D0
71C0   7219              MoveQ.L   #$19, D1
71C2   E2A8              LsR.L     D1, D0
71C4   2203              Move.L    D3, D1
71C6   EF89              LsL.L     #$7, D1
71C8   8280              Or.L      D0, D1
71CA   2601              Move.L    D1, D3
71CC   D684              Add.L     D4, D3

Looking at this new subroutine, I was fairly convinced that it does the actual hashing. It is an unrolled loop that does a number of bitwise operations before adding a longword (32-bit quantity) from our input window along with what appear to be magic numbers to one of four 32-bit registers. At the end of this function, the contents of these registers are concatenated and added to the existing contents of CreateDigest's 16-byte working buffer. In order to maybe recognize a pattern to what this hash function was doing and perhaps identify the algorithm, I converted this assembly back to C code, and then verified that my C version produced identical output. Unfortunately, the algorithm didn't look familiar to me at all—I assumed it was something Apple invented specifically for the Pippin. I feared the initial "salt" might not be constant and could change depending on where the input chunk exists in the volume. Perhaps I merely found one hash function, but the Pippin could switch between different hash functions depending on some heuristic? It would require more disassembly and careful analysis to verify whether or not this was the case and why. 🙁

A Fun Side Story

Last Friday was 4/26, known informally among fans as Alien Day. I’m a big fan of the Alien universe, and this year happens to be the 40th anniversary of the 1979 Ridley Scott classic. So on Friday I had a number of friends over to watch both Alien films. 😉 There was pizza, chips, my homemade queso (half a box of Velveeta + a can of Rotel chiles—nuke it in the microwave for five minutes, stirring occasionally), and everybody had a good time.

One of the folks who dropped by was my friend Allison, who wanted to leave me with a disc of early Xbox demos her wife Erica found for me. I’m interested in investigating the contents of this disc in case there’s anything of historical value on it, but Xbox discs cannot be mounted or copied using a run-of-the-mill DVD-ROM drive. I remember years ago burning one or two (homebrew, ahem) Xbox DVDs with my PC, so I know writing Xbox discs is possible, but I was curious why reading them posed such an obstacle.

All roads lead back to Presto.

After some Googling, I found that the Xbox employs its own scheme to verify and “unlock” a boot disc candidate (described by none other than Multimedia Mike—an intrepid hacker whose blog I recommend). As I read, I learned that the Xbox’s disc verification involves the host (in most cases, an Xbox) answering an encrypted series of challenges at the drive level. This process, which is unique to each Xbox disc, uses SHA-1 hashes and RC4 encryption. This is a pretty cool and fascinating way to hide Xbox game data from non-Xboxes—it’s definitely worth checking out the details.

As one does on a Friday evening, I found myself clicking through to the Wikipedia entry on SHA-1. Not much time later, I was deep in the Wikipedia rabbit hole, ultimately landing on the page describing the MD5 message digest algorithm. Those of you reading this who have at least a passing familiarity with cryptography might recognize where this is going based on my description of CreateDigest's behavior above. I did not. 😛


According to Wikipedia, MD5 was designed in 1991 by Ronald Rivest, one of the inventors of the RSA cryptosigning algorithm used by the Pippin. MD5 was designed to replace an earlier version, MD4, which traded better security for increased performance. At a basic level, MD5 takes a bitstring of arbitrary length—the "message"—and generates a 128-bit string that uniquely identifies this input, called a "digest." The input string is padded to a multiple of 512 bits by adding a 1 bit, a number of zero bits, and finally the size of the original message in bits, stored as a little-endian 64-bit value. The padded message is finally split into chunks of 16 longwords, and these 64-byte chunks are then passed into MD5's core hash function to be added to a final 16-byte digest. If that sounds confusing, here's a summary: MD5 takes an arbitrary-sized message and turns it into a unique fixed-size message. The same input will result in the same output, but no two distinct inputs will result in the same output (this isn't 100% true, but for the purpose of this discussion we'll pretend it is).

On the surface, MD5 sounded like it might be what Apple adopted as the Pippin's digest algorithm, but I knew I'd hit paydirt when I instantly recognized the "magic numbers" used in its reference implementation. What's more, the Transform function looked almost exactly the same as the C code I derived from that unnamed subroutine in 'rvpr' 0, and the MD5Update function likewise performs the same steps as the 68K routine that calls into Transform. I am confident that Apple licensed this particular implementation for use in the Pippin. It follows the MD5 specification to the letter, even going so far as endian-swapping the input longwords from the Pippin's native big-endianness.

See? They match!

Armed with the knowledge that MD5 is the message digest algorithm used in the Pippin authentication process, it is clear that the digests computed in CreateDigest, and the digests read from the PippinAuthenticationFile used in CompareDigests, are themselves not signed with RSA. In fact, RSA is not involved with verifying chunks of the disc at all. This tells me that the only thing RSA is used for in the authentication process is for verifying the signature at the end of the PippinAuthenticationFile.

The signature, to recap, is 45 bytes long and lives at the end of the PippinAuthenticationFile. Before entering the main loop, 'rvpr' 0 makes a call to VerifyDigestInfo, which in turn makes a call to VerifySignature. VerifySignature calls upon MD5 to digest the "message" portion of the PippinAuthenticationFile—everything but the signature. It then must use RSA to decrypt and verify the signature against that MD5 digest. If it does, we know the chunk hashes therein can be trusted, so RSA is no longer needed. Otherwise, we know the PippinAuthenticationFile has been tampered with in some way.

Diagram credit: Tommy Yune
This diagram took way too long to make.

Let's say for illustration's sake that the PippinAuthenticationFile is 64K and the last 1K is the signature. When the signature is decrypted, it should contain a digest of that first 63K. If we digest that first 63K ourselves and the two match, we're verified. The whole process is... pretty modern, actually, when you consider this was 1995. 🙂

Using the "Macintosh on Pippin" CD (a.k.a. "Tuscon") as a test case, I stepped through VerifySignature to obtain the MD5 digest of the authentication file's "message:" AE 1A EC AE A4 C5 11 68 2E 38 7D D1 48 F0 55 C2. With this in mind, I set out to test my hypothesis and hopefully find our computed MD5 digest of the message portion somewhere in memory. If I could find this, I could work backwards and reveal how the signature is decrypted. Apple indicated in a Pippin technote that RSA was licensed for their authentication software library. Whether this means Apple used the library as-is, or licensed the code to augment for their own needs, I wanted to verify this one way or the other. VerifySignature makes two unnamed calls before cleaning up and returning a result:

668   B883              Cmp.L     D3, D4          ; is the remaining bytes to hash greater than 16K
66A   6C02              BGE.B     (* + $4)        ; then hash another 16K
66C   2604              Move.L    D4, D3          ; else hash the remaining bytes (which will be < 16K)
66E   4878 0000         Pea.L     ($0)
672   2F03              Move.L    D3, -(A7)       ; push size as bytes to hash (typically 16K until the last chunk)
674   2F0A              Move.L    A2, -(A7)       ; push ptr to start of chunk to hash
676   2F2E FFFC         Move.L    -$4(A6), -(A7)  ; -$4(A6) -> hash object
67A   4EB9 0000 816C    Jsr       ($816C).L       ; create digest of 16K chunk in hash object
680   2A00              Move.L    D0, D5          ; (don't know what is returned in D0, I think a size?)
682   9883              Sub.L     D3, D4          ; remaining bytes -= how many bytes we just hashed (typically 16K)
684   D5C3              AddA.L    D3, A2          ; A2 -> next chunk to hash
686   4FEF 0010         Lea.L     $10(A7), A7     ; cleanup stack
68A   4A84              Tst.L     D4              ; are there any remaining bytes left?
68C   6EDA              BGT.B     (* + -$24)      ; keep hashing
68E   4878 0000         Pea.L     ($0)
692   4878 0000         Pea.L     ($0)
696   2F2E 001C         Move.L    $1C(A6), -(A7)  ; $1C(A6) == $2D (size of signature?)
69A   2F2E 0018         Move.L    $18(A6), -(A7)  ; $18(A6) -> second longword in data block after hashes in auth file (start of signature?)
69E   2F2E FFFC         Move.L    -$4(A6), -(A7)  ; -$4(A6) -> hash object
6A2   4EB9 0000 81A2    Jsr       ($81A2).L       ; probably decrypt the signature?
6A8   2A00              Move.L    D0, D5
6AA   4FEF 0014         Lea.L     $14(A7), A7     ; cleanup stack

I knew that the first call at $816C computes the MD5 digest of the PippinAuthenticationFile. I intuited that in order to determine whether verification succeeds, whatever occurs in the second call at $81A2 must accomplish that. Therefore, the public key must exist in memory at some point during $81A2's execution. In addition, the signature bytes must exist in memory at the same time. If I drill down into $81A2 until I find the signature bytes in RAM, I should find clues as to what data is used to decrypt it based on the proximity of what's changed to what hasn't, thanks to how I implemented my "heap."

$81A2 eventually makes its way to a subroutine at $1B0E, wherein I found the following:

1B58   2F0B              Move.L    A3, -(A7)    ; push nullptr?
1B5A   4878 0000         Pea.L     ($0)         ; push 0
1B5E   2F04              Move.L    D4, -(A7)    ; push size of signature?
1B60   2F05              Move.L    D5, -(A7)    ; push address of signature?
1B62   42A7              Clr.L     -(A7)        ; push 0
1B64   486E FF78         Pea.L     -$88(A6)     ; push ptr to area on stack
1B68   4878 0000         Pea.L     ($0)         ; push 0
1B6C   486A 0028         Pea.L     $28(A2)      ; push ptr to $10F0 in hash object?
1B70   4EB9 0000 12E0    Jsr       ($12E0).L    ; jump somewhere that copies the signature to a working buffer
1B76   2600              Move.L    D0, D3       ; D0 == result code
1B78   4FEF 0020         Lea.L     $20(A7), A7
1B7C   6600 00AE         BNE       (* + $B0)    ; if it's nonzero, bail
1B80   2F0B              Move.L    A3, -(A7)    ; push nullptr?
1B82   4878 0000         Pea.L     ($0)         ; push 0
1B86   4878 0040         Pea.L     ($40)        ; push 64
1B8A   486E FF98         Pea.L     -$68(A6)     ; push ptr to somewhere on stack
1B8E   486E FFA8         Pea.L     -$58(A6)     ; push ptr to somewhere else on stack
1B92   486A 0028         Pea.L     $28(A2)      ; push ptr to $10F0 in hash object?
1B96   4EB9 0000 13D8    Jsr       ($13D8).L    ; jump somewhere, get processed data we care about on stack

$12E0 copies our signature from the PippinAuthenticationFile into a working buffer shortly after a copy of part of the data block in ROM passed to 'rvpr' 0 upon initial invocation. Could the "processed data" coming from the subroutine at $13D8 be our decrypted signature? I took a look at the memory before and after the call, and...

Do you see it?

Look at memory location $20451.

When I saw it, I gasped. There it is. There's our decrypted digest.

I wasn't as lucky with the RSA code as I was with MD5—neither the reference implementations 1.0 nor 2.0 of RSA have portions that appear in this code, but they do answer the question of the signature format. The bytes appearing before our decrypted digest are a header consisting mostly of magic numbers and some $FF padding bytes, but with a lonely "05" byte at address $2044C, or offset 24 into our decrypted signature. This byte's value indicates that the digest is an MD5 digest, just like the reference implementation specifies.

That completed my understanding of the format of the PippinAuthenticationFile, leaving only one final piece of the puzzle: what and where the public key is. The public key must come from somewhere, but at this point I hadn't yet determined the purpose of the data passed in from ROM to 'rvpr' 0...

RSA (in which I dust off my math minor)

The RSA algorithm for cryptosecurity, invented in 1977 by Ronald Rivest, Adi Shamir, and Leonard Adleman, is built upon the notion that factoring large semiprime numbers is considered a hard problem. Not impossible, but very hard. Finding these primes can take a computer or cluster of computers a significant amount of time, proportional to the size of the semiprime to factor. Mathematically, it involves a few steps but can be implemented using basic algebraic concepts.

First, find two numbers P and Q such that they are both prime, meaning that both P and Q can only be divided neatly by themselves and one. Let's use P = 19 and Q = 17 as examples.

Next, compute P \cdot Q. 19 \cdot 17 = 323 in our example case. Call this N.

Now we need to calculate \lambda. We do this by computing (P - 1) \cdot (Q - 1). (19 - 1) \cdot (17 - 1) = 18 \cdot 16 = 288.

We need to choose a value for e such that e and \lambda are coprime; that is, \lambda is not neatly divisible by e. The smallest value that works here is 5, so we'll use that in our example case, but typically a larger value is used with a small amount of 1 bits in its binary representation.

Finally we need to find the value of D. D can be found by solving the equation D \cdot e \equiv 1 \pmod \lambda. We do this by using the extended Euclidean algorithm.

All we do here is integer divide \lambda by e and also take \lambda \mod e; that is, we divide \lambda by e with remainder. Then repeat this process with the results until the remainder equals one: divide the divisor by the remainder from the previous calculation. For example, start with \lambda \div e:

\begin{aligned}  288 \div 5 &= 57\text{ remainder }3 \\  288 &= (57 \cdot 5) + 3  \end{aligned}

Take the divisor from that and divide by the remainder:

\begin{aligned}  5 \div 3 &= 1\text{ remainder }2 \\  5 &= (1 \cdot 3) + 2  \end{aligned}

One more time:

\begin{aligned}  3 \div 2 &= 1\text{ remainder }1 \\  3 &= (1 \cdot 2) + 1  \end{aligned}

Once we have a remainder of one, we've found the greatest common divisor and so we need to build back up to find our value for D. Do so by substituting our results until we have a linear combination of \lambda and e (288 and 5, respectively):

\begin{aligned}  1 &= 3 - (1 \cdot 2) \\  &= 3 - 1 \cdot (5 - (1 \cdot 3)) \\  &= 3 - 5 + 3 \\  &= 2 \cdot 3 - 1 \cdot 5 \\  &= 2 \cdot (288 - (57 \cdot 5)) - 1 \cdot 5 \\  &= 2 \cdot 288 - 114 \cdot 5 - 5 \\  &= 2 \cdot 288 - 115 \cdot 5  \end{aligned}

Since we need to satisfy D \cdot e \equiv 1 \pmod \lambda, we ignore \lambda's coefficient here, leaving e with a coefficient of -115. \lambda equals 288 (as we calculated earlier), so -115 \mod 288 is 173, giving us our value for D.

We now have everything we need to sign and verify messages. Our "private key" is our values for D and N—messages signed with the private key can only be verified by someone with our "public key". Our "public key" is our values for e and N—only our public key can verify messages signed by our private key, assuring our recipient that the signed message indeed comes from us and can be trusted.

Let's say we want to send someone the answer to the ultimate question, but we want to sign it first in case our message gets intercepted by Vogons. 😛 Call the original message M with the value 42, and here we'll calculate the signature S. We do so using our values for D and N:

\begin{aligned}  S &= M^{D} \mod N \\  S &= 42^{173} \mod 323 \\  &= 111  \end{aligned}

When our recipient receives our message, they will need to verify our signature in order to be sure it can be trusted and that it has not been tampered with by any poetic aliens. 😉 They do so using our values for e and N:

\begin{aligned}  M &= S^{e} \mod N \\  M &= 111^{5} \mod 323 \\  &= 42  \end{aligned}

If the signature matches the original message (as it does here in our example), the message arrived safely intact.

Notice that the values for P, Q, e, \lambda, N, and D are all relatively small—the largest of these, N, only needs nine bits for its binary representation. You could therefore say that we used a 9-bit key in our above example. Furthermore, notice that our message 42—and its signature 111—also fit inside of nine bits. This is a property of RSA: a key of bit length X can only operate on a message with a maximum bit length also of X.

The signature size as defined in the PippinAuthenticationFile is 45 bytes, suggesting that the Pippin's public RSA key is at least 360 bits long (45 * 8 bits per byte). Recall from earlier that although the RSA public key is still unknown, it must come from somewhere in the ROM, and it is still at this point unclear what purpose the blocks of data passed to 'rvpr' 0 serve.

I found part of one of the blocks in RAM near where the decrypted signature is: 45 bytes, same as the signature. I also found a nearby value of 0x10001, or 65537, which seems to be a popular choice for the value of e. Hmm. Interesting.

I found another block of memory also nearby containing the same data, but reversed by 16-bit words. Hmm. Interesting.

Wonder what the odds are... 😉

It was reasonable to hypothesize that one of these blocks contained the public key. There are plenty of web pages out there that explain RSA using examples, some with implementations in JavaScript allowing someone to plug in their own keys, messages, and signatures. I tried the nearest 45-byte "key" in one such website with the raw 45-byte signature from the PippinAuthenticationFile and...

It didn't work. The "decrypted" signature didn't match at all. Garbage in, garbage out. Cue sigh of disappointment.

I had one data block left, and with little hope remaining...


I'd found it.

Apple's public key for verifying the signature on a PippinAuthenticationFile is:
E0 E0 27 5C AB 60 C8 86 A3 FA C2 98 21 79 54 A8 9F D1 B9 DC 8A BA 84 EF B1 E7 C9 E2 1B F7 DD D7 DC F0 E4 4A BB 79 51 0E 7C EB 80 B1 1D

... and I didn't even have to look at very much code. 😀

Cracking RSA

I just have to crack RSA now, right?

Fortunately, available tools make that a much less daunting prospect than popular media contemporary with the Pippin suggested. There was even an ongoing RSA Factoring Challenge for a while until 2007. Back then though, it was a different story. The Open Source Initiative had yet to be founded. Prime factoring was done mostly in isolation by dedicated teams with access to massive amounts of computing power (for the time). A 364-bit decimal number took a two-person team and a supercomputer about a month to factor in 1992.

But this isn't 1992 anymore. The computers on our desks and in our pockets have more than enough number-crunching power to factor the Pippin's public key. Today, with some freely available open source software and a typical desktop PC, a 360-bit key can be factored in a matter of hours. And thanks to the efforts of several open source projects within recent years, we have a little tool to help us called msieve. 😀

msieve is very user-friendly. 😉 You pass the number you want to factor as its only command line argument and it just goes. It even saves its progress to disk, just in case it's a Really Big Number and something terrible happens like a power outage or something.

msieve took 18 hours, 34 minutes, and 4 seconds on my i7 Intel NUC to find two prime factors P and Q of the Pippin's public key:

Hard part's over. 😀

Let's plug these into our RSA formulas from above and find Apple's private key, shall we?

P = 0F 2D 25 BF 3C 5B 70 28 72 6E 49 75 3F D5 62 67 11 37 38 94 51 EF D7
Q = 0E D1 47 5D E1 92 41 28 59 2C 4B 3E 47 4E 5F C1 23 1F 1B AF A0 D8 2B
e = 0x10001
N = P \cdot Q = E0 E0 27 5C AB 60 C8 86 A3 FA C2 98 21 79 54 A8 9F D1 B9 DC 8A BA 84 EF B1 E7 C9 E2 1B F7 DD D7 DC F0 E4 4A BB 79 51 0E 7C EB 80 B1 1D (this is the public key—we know this from stepping through 'rvpr' 0 and examining memory)
\lambda = (P - 1) \cdot (Q - 1) = E0 E0 27 5C AB 60 C8 86 A3 FA C2 98 21 79 54 A8 9F D1 B9 DC 8A BA 66 F1 44 CA AB F4 6A A7 12 3D 48 3D 5D 26 F9 51 1C B8 28 A7 8D E9 1C
D \equiv e^{-1} \pmod \lambda = 01 1C D3 AD E7 99 86 67 D6 E9 E2 17 11 DB EC 33 07 B6 0E 4D 6D 03 26 20 77 5D DB 9B 3B 64 CF 22 B2 0E 4A F3 2F 07 40 EE B0 6F 85 F2 A0 1D

That last value for D should be Apple's private signing key. Now let's verify, using the same JavaScript RSA calculator I found on the Web. As a test case, let's again use the PippinAuthenticationFile from the "Macintosh on Pippin" CD (a.k.a. "Tuscon").

Its first four bytes indicate a message size of $FD4F, or 64847 bytes. The MD5 digest of those first 64847 bytes is AE 1A EC AE A4 C5 11 68 2E 38 7D D1 48 F0 55 C2.

It has the following signature:
5A 90 36 69 DD 06 F5 15 EF 7A A2 04 5D 24 C2 CA 3C DD 2E C3 85 7D BB B8 9C 53 78 24 65 CC F0 0A 52 09 20 76 E1 9D F7 CC B3 C6 6D 7E AF

When decrypted, we get:
00 01 FF FF FF FF FF FF FF FF 00 30 20 30 0C 06 08 2A 86 48 86 F7 0D 02 05 05 00 04 10 AE 1A EC AE A4 C5 11 68 2E 38 7D D1 48 F0 55 C2

Notice that the last 16 bytes match our computed MD5 digest.

Finally, if we take the decrypted signature and re-sign it using what we think is Apple's private key, we get:
5A 90 36 69 DD 06 F5 15 EF 7A A2 04 5D 24 C2 CA 3C DD 2E C3 85 7D BB B8 9C 53 78 24 65 CC F0 0A 52 09 20 76 E1 9D F7 CC B3 C6 6D 7E AF

... which matches the original signature found in the PippinAuthenticationFile.

Mr. Hammond, I think we're in business. 😀
The RSA keys used in the signing and verification of a PippinAuthenticationFile are 360 bits long.

Apple’s public key for verifying the PippinAuthenticationFile is:
E0 E0 27 5C AB 60 C8 86 A3 FA C2 98 21 79 54 A8 9F D1 B9 DC 8A BA 84 EF B1 E7 C9 E2 1B F7 DD D7 DC F0 E4 4A BB 79 51 0E 7C EB 80 B1 1D

Apple’s private key for signing a PippinAuthenticationFile is:
01 1C D3 AD E7 99 86 67 D6 E9 E2 17 11 DB EC 33 07 B6 0E 4D 6D 03 26 20 77 5D DB 9B 3B 64 CF 22 B2 0E 4A F3 2F 07 40 EE B0 6F 85 F2 A0 1D

There you go, Internet. We now have all the information we need to sign and boot our own Pippin media.

Exploring the Pippin ROM(s), part 6: Back in the ‘rvpr’

(How’s that for a punny subtitle?)

It’s been a while. I haven’t lost interest, I’ve just been busy with work and other things. Life has a funny way of sneaking up on you. 😉

The wait is worth it, though. Buckle up; this one’s a doozy.

Apple hasn’t documented the Pippin’s authentication process beyond what developers needed to know. There exists a technote that was distributed via the SDK(s) that gives an overview of what developers were expected to do to get their discs signed before final mastering and duplication. The Pippin’s authenticated boot process hinges upon the presence of a specially-crafted, RSA-signed file unique to each disc called the “PippinAuthenticationFile.” Since the Pippin platform was abandoned and subsequently cancelled in 1998, Apple no longer signs Pippin discs nor have they made available the means for third parties to do so. To my knowledge, most of the specifics of how the PippinAuthenticationFile plays a role in the Pippin’s boot process have never been documented outside of Apple.

That changes today.

This post is pretty dense, so I highly recommend (re-)reading parts 1 through 4 for some background before getting too deep. Otherwise, here’s a quick recap: during every boot, a retail Pippin console locates a potential boot volume on CD, loads an ‘rvpr’ 0 resource from ROM, then calls the code therein in order to verify that the target volume passes an authentication check allowing it to boot the system. (An aside: Previously, I asserted that while I found identical copies of ‘rvpr’ 0 in the 1.2 and 1.3 ROMs, I couldn’t find an entry for it in the resource map, therefore it must either be dead code or called some other way. This conclusion turned out to be incorrect—the resource map is not contiguous in ROMs 1.2 and 1.3, which made manually searching it more difficult, but it does indeed contain an entry for ‘rvpr’ 0. The authentication process is therefore identical between ROM 1.0 and 1.2.) When I last looked at ‘rvpr’ 0, I was stymied by a routine called upon entry which, absent of any symbols to help point me toward its purpose, I conjectured used a complex block of data at the end of the resource to “decrypt” the code therein. After taking a closer look a few days ago, I was delighted to find that its purpose is much simpler—it exists to patch the absolute memory locations in the code so they are relative to the buffer where ‘rvpr’ 0 is loaded. Without these patches, the code would crash the Pippin on boot practically every time!

The way this routine accomplishes this is kind of elegant. We initialize a cursor pointer to the beginning of our buffer where ‘rvpr’ 0 is loaded. The offset table starting at offset $8A47 from the start of ‘rvpr’ 0 begins with a 32-bit longword defining the size of the table. Then, the table itself is compressed: a byte with bit 7 set means it’s a relative sign-extended 7-bit offset from our cursor position, a byte with bit 6 set means it along with the next byte form a sign-extended 14-bit offset from our cursor position, but if both bit 6 and 7 are clear, then combine the next three bytes to form a 30-bit absolute cursor position. Multiply these offsets by two before applying them (because 68K opcodes are always at least two bytes), add the address of our ‘rvpr’ 0 buffer to the 32-bit longword pointed to by our cursor, then repeat the process until we’ve exhausted the offset table. Easy peasy.

5E   1218         Move.B   (A0)+, D1   ; grab the next byte into D1, we'll call it the command byte
60   1001         Move.B   D1, D0
62   0240 0080    AndI     #$80, D0    ; is bit 7 set?
66   670C         BEQ.B    @checkBit6  ; then handle the bit 6 case

; command byte bit 7 is set, so
; D2 += signExtend(D1 * 2) as a byte (* 2 because alignment)
68   D201         Add.B    D1, D1
6A   1001         Move.B   D1, D0
6C   4880         Ext      D0
6E   48C0         Ext.L    D0
70   D480         Add.L    D0, D2
72   6028         Bra.B    @gotOffset

; else, command byte bit 7 not set...
74   1E81         Move.B   D1, (A7)         ; put D1 into the highest byte of temp
76   1F58 0001    Move.B   (A0)+, $1(A7)    ; grab the next byte into the 2nd byte of temp
7A   1001         Move.B   D1, D0
7C   0240 0040    AndI     #$40, D0         ; is bit 6 of D1 set?
80   670C         BEQ.B    @get32BitOffset  ; yes? then

; command byte bit 6 is set, so
; our address offset is only 14 bits
82   3017         Move     (A7), D0    ; grab the new temp into D0
84   E548         LsL      #2, D0      ; D0 <<= 2
86   E240         AsR      #1, D0      ; D0 /= 2
88   48C0         Ext.L    D0          ; sign extend it
8A   D480         Add.L    D0, D2      ; D0 is the found offset * 2 (because alignment), add to our current offset
8C   600E         Bra.B    @gotOffset  ; apply it

; bit 6 not set...
8E   1F58 0002    Move.B   (A0)+, $2(A7)  ; grab the next byte into the 3rd byte of temp
92   1F58 0003    Move.B   (A0)+, $3(A7)  ; grab the next byte into the 4th byte of temp
96   2417         Move.L   (A7), D2       ; D2 is a brand new offset!
98   E58A         LsL.L    #2, D2         ; D2 <<= 2
9A   E282         AsR.L    #1, D2         ; D2 /= 2

; D2 == the offset we want to apply to argument 2
; D6 == the offset we want to apply to the longword found there (typically @start)
9C   DDB1 2800    Add.L    D6, $0(A1,D2.L)  ; add D6 to the longword at (@start + D2)
A0   5385         SubQ.L   #1, D5
A2   4A85         Tst.L    D5               ; are we out of longs to patch?
A4   6EB8         BGT.B    @loopBody

Now that we've "unpacked" the code of 'rvpr' 0, let's dig into it. 🙂

main starts off initializing a number of globals, first by calling InitRSAAlgorithmChooser and then a handful of other subroutines. It then initializes some local variables on the stack: previous A4, values related to the PippinAuthenticationFile, and a ParamBlockRec for calls to _Read. In addition, among those locals is a 16-byte temporary buffer for digests created during the main loop.

Recall from part 2 that the Pippin ROM passes as input to 'rvpr' 0 the following arguments: two pointers to some as-of-yet-unknown data in ROM shortly preceding the callsite, the ID of the boot volume candidate, and the refNum of the candidate's disk driver. After we've initialized our variables, we hit the ground running by calling GetVolAuthFileInfo to fetch the offset to and size of the PippinAuthenticationFile. Note that if at any point during 'rvpr' 0 one of its internal subroutines fails, the entire process is reported as having failed the authentication check.

24C   41EE FFAA         Lea.L     -$56(A6), A0    ; A0 -> temp var for created digests
250   2E08              Move.L    A0, D7          ; D7 == ptr to temp digest
252   486E FFCA         Pea.L     -$36(A6)        ; pass size out address
256   486E FFBE         Pea.L     -$42(A6)        ; pass offset out address
25A   3F05              Move      D5, -(A7)       ; $10(A6) is dqDrive passed in from ROM
25C   3F2E 0012         Move      $12(A6), -(A7)  ; $12(A6) is dqRefNum passed in from ROM
260   4EB9 0000 03E6    Jsr       GetVolAuthFileInfo
266   3600              Move      D0, D3
268   4A43              Tst       D3
26A   4FEF 000C         Lea.L     $C(A7), A7
26E   6600 0142         BNE       @mainCleanup    ; if GetVolAuthFileInfo returns nonzero, fail
272   202E FFBE         Move.L    -$42(A6), D0    ; D0 = offset from start of HFS volume to PippinAuthenticationFile in allocation blocks
276   7209              MoveQ.L   #9, D1
278   E3A8              LsL.L     D1, D0          ; D0 = offset from start of HFS volume to PippinAuthenticationFile in bytes
27A   2D40 FFC6         Move.L    D0, -$3A(A6)    ; save the offset into -$3A(A6)
27E   202E FFCA         Move.L    -$36(A6), D0    ; D0 = size of the PippinAuthenticationFile in bytes
282   A11E              _NewPtr
284   2648              MoveA.L   A0, A3
286   200B              Move.L    A3, D0
288   4A80              Tst.L     D0
28A   660E              BNE.B     @gotAuthBuffer  ; if _NewPtr returns null, clean up the stack, and fail
28C   554F              SubQ      #2, A7
28E   3EB8 0220         Move      (MemErr), (A7)
292   301F              Move      (A7)+, D0
294   3600              Move      D0, D3
296   6000 011A         Bra       @mainCleanup

29A   3D6E 0012 FFE6    Move      $12(A6), ioRefNum(A6)
2A0   3D45 FFE4         Move      D5, ioVRefNum(A6)         ; dqDrive
2A4   2D4B FFEE         Move.L    A3, ioBuffer(A6)
2A8   2D6E FFCA FFF2    Move.L    -$36(A6), ioReqCount(A6)  ; size of the PippinAuthenticationFile
2AE   3D7C 0001 FFFA    Move      #fsFromStar, ioPosMode(A6)
2B4   202E FFBE         Move.L    -$42(A6), D0              ; offset from start of HFS volume to PippinAuthenticationFile in device blocks
2B8   7209              MoveQ.L   #9, D1
2BA   E3A8              LsL.L     D1, D0                    ; get the offset in bytes by multiplying by device block size (512 bytes)
2BC   2D40 FFFC         Move.L    D0, ioPosOffset(A6)
2C0   41EE FFCE         Lea.L     -$32(A6), A0
2C4   A002              _Read
2C6   3600              Move      D0, D3
2C8   4A43              Tst       D3
2CA   6600 00E6         BNE       @mainCleanup              ; if _Read returns something other than noErr, fail

Following this call, we allocate enough space for the file by calling _NewPtr. We then call _Read with our local ParamBlockRec filled with the disk driver refNum, the volume refNum, the pointer to the buffer we just allocated, our buffer's size, and the byte offset to the authentication file on our candidate. Armed with the contents of the PippinAuthenticationFile, we then pass them to VerifyDigestInfo to verify the signature contained therein. If that succeeds, we're clear to start verifying the candidate's contents against this file, so we allocate temp space large enough to load a single "chunk" of data from the candidate to be hashed and verified.

Every Pippin except the KMP 2000 shipped with a built-in 4x speed CD-ROM drive. A 1x CD-ROM drive can read data at a rate of 150KB/sec, which is the speed necessary for smooth playback of audio CDs. A 2x drive doubles that rate to 300KB/sec, a 4x drive quadruples it to 600KB/sec, and so on. At 600KB/sec, it would take a Pippin almost a full minute to read just over 35MB, and nearly 20 minutes to read the entire contents of a 700MB CD-ROM. Even the KMP 2000 with its 8x drive would take almost 10 minutes to do the same. Hashing the entire contents of a CD during every boot would be unacceptable at this speed, and since the Pippin only takes a couple seconds to verify a disc at startup, it's clearly not verifying the whole thing. So what does the Pippin do?

320   4A86               Tst.L     D6
322   6604               BNE.B     @pickRandomChunk
324   7800               MoveQ.L   #0, D4
326   6016               Bra.B     @readChunk

328    202B 004C         Move.L    $4C(A3), D0  ; longword after the 128K size field at $48, appears to be number of entries in table
32C    5380              SubQ.L    #1, D0
32E    2F00              Move.L    D0, -(A7)    ; upper bound == total number of chunks
330    4878 0001         Pea.L     ($1)         ; lower bound == 1
334    4EB9 0000 0814    Jsr       RangedRand   ; patched
33A    2800              Move.L    D0, D4       ; D4 == pseudorandom integer between [1, <total number of chunks>]?
33C    504F              AddQ      #8, A7       ; clean up stack

33E    2004              Move.L    D4, D0               ; D0 == pseudorandom integer in the lowword, probably
340    2205              Move.L    D5, D1               ; D1 == 128K? 0 x 0002 0000
342    4EB9 0000 0116    Jsr       _D0timesD1           ; patched, does some weird multiplication, returns in D0
348    2D40 FFFC         Move.L    D0, ioPosOffset(A6)  ; D0 == the pseudorandom integer * 128K? D0 == offset to random 128K chunk in disc?
34C    41EE FFCE         Lea.L     -$32(A6), A0
350    A002              _Read
352    3600              Move      D0, D3
354    4A43              Tst       D3
356    665A              BNE.B     @mainCleanup

Put simply, the Pippin randomly spot-checks the candidate volume's contents every boot. The PippinAuthenticationFile isn't just a key, it isn't just a single hash—it is in fact a collection of hashes corresponding to as many 128K chunks of data that make up the boot volume. main enters a loop that iterates six times: the first check, it loads the first 128K of the volume containing important metadata about the HFS filesystem into our temporary buffer, and then verifies that data against its corresponding digested hash previously loaded from the PippinAuthenticationFile. The remaining five checks, it does the same, but on randomly selected other 128K chunks of the volume. This way, the Pippin only has to load and verify 768K—a process that takes less than a couple seconds on its 4x CD-ROM drive. But because this loop selects five of those six input chunks at random each run-through, the PippinAuthenticationFile still needs digests of the entire volume. For it's not known ahead of time which five chunks will be verified and furthermore, they rarely will be the same five chunks.

Examining several PippinAuthenticationFile examples with this code in mind quickly reveals how this file is structured. Both the chunk size and the total number of chunks in the volume are stored in a common header. This loop uses that information to determine the upper bound of which chunks to select at random and how large. Following these two fields is a table of digested 128-bit hashes corresponding, in sequential order, to the chunks in the volume. Finally, there is a signature near the end of the file, which gets verified in the call to VerifyDigestInfo before entering the loop. The process by which a PippinAuthenticationFile is created, therefore, is essentially as follows:

  1. Get the size of the target volume.
  2. Integer divide this size into 128K chunks. Call the total number of chunks N.
  3. Allocate 80 bytes for a file header.
  4. Multiply N chunks by the 16-byte size of each digest (N * 16). Call this table size T. Allocate T bytes for digests of each 128K chunk.
  5. Allocate 16 bytes for the signature size S.
  6. Pad the signature size until it is a multiple of 16 bytes (((S + 16) % 16) *16). Call this padded size P. Allocate P bytes for the signature itself.
  7. Pad additional bytes until the total file size is the next multiple of the device block size (512 bytes). The total file size therefore should be ((80 + T + 16 + P + 512) % 512) * 512.
  8. Preallocate a blank version of this file on the target volume.
  9. Starting at offset 80, compute and store a 16-byte (128-bit) digest for each of the N sequential 128K chunks. Note that to compute the digests for the entire finalized volume correctly, this file must already exist in the filesystem. It is therefore necessary to compute the size of this file in advance, preallocate a "dummy" version of it on the target volume, then compute the digests and overwrite the file in-place.
  10. At offset 80 + T + 15, store the signature size S as a byte (only one byte at the end of this space is actually used, the rest are zeroes).
  11. At offset 80 + T + 16 + 3, store the signature itself. The signature always seems to be 45 bytes long, placed such that it ends on a 16-byte boundary, explaining the extra 3-byte offset.
  12. Fill in the file header at the beginning of the file:
    • offset 0 (4 bytes): offset to signature size byte (80 + T + 15)
    • offset 4 (4 bytes): longword equal to zero (version?)
    • offset 8 (64 bytes): copyright notice (60 bytes, zero-padded right)
    • offset 72 (4 bytes): chunk size longword equal to 128K, or $20000
    • offset 76 (4 bytes): chunk count longword equal to N

Apple probably provided a tool that automated this process for stamping houses. Said tool presumably would have named the aforementioned file "PippinAuthenticationFile" with type/creator 'PpnV'/'PpnA' and saved it to the filesystem root. I imagine that this same tool likely would have filled the file's contents in-place with the signed version received from Apple. However, I have never seen such a tool in the wild so this is pure speculation on my part.

Incidentally, the name, placement within the folder hierarchy, and type/creator codes of the authentication file itself are inconsequential. The Pippin makes no HFS calls to locate the PippinAuthenticationFile—it could technically be buried within a nest of folders or named "FoobarAuthenticationFile." The verification code does not care. Instead, it fetches the Master Directory Block—512 bytes located at byte offset 1024 from the start of the boot volume. The "logical" MDB is a data structure 161 bytes in size and found immediately at the start of this "physical" MDB. However, that leaves 351 bytes unaccounted for. For Pippin CD-ROMs, Apple chose to set aside two 32-bit longwords at the end of the physical MDB for the purpose of locating the PippinAuthenticationFile at the block level. The first of these longwords defines the offset, in 512-byte blocks from the start of the volume, to the contents of the authentication file. The second of these longwords define the authentication file's size in bytes.

(As an aside, this mechanism is one reason why deleting the PippinAuthenticationFile and naively replacing it with a new version at the filesystem level is not likely to work. The new file would likely reside starting at a different allocation block in the volume; the offset in the MDB would still point to where the deleted file was/is, and HFS wouldn't know to patch it up—why should it?)

One important component of creating the authentication file, and verifying against it, is the concept of chunk "cleansing." Once the loop selects and loads a chunk, it passes it to CleanseInputChunk to optionally "cleanse" it. What does that mean in this context?

358   2F2E FFCA         Move.L    -$36(A6), -(A7)         ; size of the PippinAuthenticationFile in bytes
35C   2F2E FFC6         Move.L    -$3A(A6), -(A7)         ; offset from start of HFS volume to PippinAuthenticationFile in bytes
360   2F05              Move.L    D5, -(A7)               ; 128K
362   2F2E FFFC         Move.L    ioPosOffset(A6), -(A7)  ; the computed offset
366   2F0A              Move.L    A2, -(A7)               ; working chunk buffer
368   4EB9 0000 049A    Jsr       CleanseInputChunk       ; patched, remove the auth file from this chunk if we happened to land on it

The digested hashes contained within the authentication file do not include hashing the authentication file itself, for obvious reasons. Similarly, certain fields in the MDB should not be hashed because they change upon writing the final signed authentication file to the volume. "Cleansing" solves this problem by zeroing out these areas before hashing and digesting them. While creating an authentication file, the blocks of the volume containing the file itself should be zeroed out when hashing, and likewise most of the MDB. Upon loading a chunk, the verification loop checks its offset to see whether the chunk overlaps the MDB or the authentication file. If so, it zeroes out that data so that the digested hash matches the corresponding one stored in the authentication file.

I was going to recreate this diagram in Illustrator, but then I remembered I suck at Illustrator.

Finally, the loop passes the chunk to CreateDigest, and compares its result byte-for-byte with the digest in the authentication file by calling CompareDigests. If all six digested chunks match what's found in the authentication file, we pass the check (and can boot from this volume! Yay!). Otherwise, we return -1 to indicate failure.

36E   2F07              Move.L    D7, -(A7)       ; D7 -> out buffer for temp digest
370   2F05              Move.L    D5, -(A7)       ; chunk size (128K)
372   2F0A              Move.L    A2, -(A7)       ; cleansed working chunk buffer
374   4EB9 0000 06EE    Jsr       CreateDigest    ; patched
37A   3600              Move      D0, D3
37C   4A43              Tst       D3
37E   4FEF 0020         Lea.L     $20(A7), A7     ; clean up stack
382   662E              BNE.B     mainCleanup
384   2F07              Move.L    D7, -(A7)       ; D7 -> the digest we just created
386   2004              Move.L    D4, D0          ; D4 == which chunk this is
388   E988              LsL.L     #4, D0          ; D4 == chunk * 16 bytes (128 bit hash per chunk)
38A   206E FFC2         MoveA.L   -$3E(A6), A0    ; A0 -> chunk hashes
38E   D1C0              AddA.L    D0, A0          ; A0 -> hash of this chunk
390   4850              Pea.L     (A0)
392   4EB9 0000 07D4    Jsr       CompareDigests  ; patched, returns zero in D0 if digests match
398   7200              MoveQ.L   #0, D1
39A   1200              Move.B    D0, D1
39C   3601              Move      D1, D3
39E   4A43              Tst       D3
3A0   504F              AddQ      #8, A7          ; clean up stack
3A2   6704              BEQ.B     @nextMainForLoopIteration
3A4   76FF              MoveQ.L   #-1, D3         ; D3 == -1, fail
3A6   600A              Bra.B     mainCleanup

3A8    5286              AddQ.L    #1, D6

3AA    7006              MoveQ.L   #6, D0
3AC    BC80              Cmp.L     D0, D6
3AE    6D00 FF70         BLT       @topOfMainForLoop

So what's left?

Of the named functions I found in part 2, only four of them I have yet to step through and understand:

main done
GetVolAuthFileInfo done
CleanseInputChunk done
VerifyDigestInfo in progress
VerifySignature to do
CreateDigest in progress
CompareDigests done
RangedRand done
CleanseVCB done
GetVAFileInfoGivenMDB done
InitRSAAlgorithmChooser in progress

Astute readers may notice that I have provided sparse details about the functions related to dealing with digests and signatures so far. Those are next on my list but will also be the hardest to grok because I'll be more or less "flying solo" without any symbols whatsoever to guide me. Fortunately I have a passing familiarity with the RSA algorithm so I have a vague idea of what logic to look for. I have already found functions for 32-bit multiply and 32-bit modulo, both of which are essential for RSA.

In part 4 I explored hacking the Apple Partition Map to load custom disk drivers before the authentication check takes effect. I was not successful with that experiment and decided not to pursue it further, but my discoveries here reveal a possible alternate avenue to explore: additional partitions. The authentication check is performed on the boot volume, and only the boot volume (emphasis on each word). The partition map is not included in the check, nor are any other partitions. It should be totally possible to take an existing signed Pippin disc with unpartitioned free space available (for example, the "Tuscon" disc) and graft an additional HFS partition containing whatever apps or documents one might want. The OS should mount the other partition as it would normally, without performing any checks beyond what would be done on a real Mac.

I leave that as an exercise for the reader. I'm going after the big fish: authoring an entire homebrew disc from scratch.

Exploring the Pippin ROM(s), part 5: Open Firmware

According to the NetBSD/macppc FAQ, Open Firmware “is part of the boot ROMs in most PowerPC-based Macintosh systems, and we use it to load the kernel from disk or network.”

Turns out, “most PowerPC-based Macintosh systems” happens to include the Pippin. If you have the rare keyboard/tablet (or an ADB keyboard via the AppleJack dongle) attached and hold down Command-Option-O-F at startup, the Pippin boots to an Open Firmware prompt. However, you won’t see anything on screen because it outputs to a serial console by default; specifically, all console I/O is handled through the GeoPort. My Mac Plus happens to sit next to my Pippin, so tonight I temporarily switched my ImageWriter II’s cable over, booted both machines, and fired up ZTerm.

The following is what I discovered.

Open Firmware, PipPCI.
To continue booting the MacOS type:
To continue booting from the default boot device type:
0 > dev / ls
FF829230: /PowerPC,603@0
FF829B28: /chosen@0
FF829C58: /memory@0
FF829DA0: /openprom@0
FF829E60: /AAPL,ROM@FFC00000
FF82A088: /options@0
FF82A528: /aliases@0
FF82A6F0: /packages@0
FF82A778:   /deblocker@0,0
FF82AF78:   /disk-label@0,0
FF82B4B8:   /obp-tftp@0,0
FF82D8F8:   /mac-files@0,0
FF82E0F0:   /mac-parts@0,0
FF82E850:   /aix-boot@0,0
FF82ECC8:   /fat-files@0,0
FF830298:   /iso-9660-files@0,0
FF830BE0:   /xcoff-loader@0,0
FF8315A0:   /terminal-emulator@0,0
FF831638: /aspen@F2000000
FF832900:   /gc@10
FF832D38:     /scc@13000
FF832E90:       /ch-a@13020
FF833540:       /ch-b@13000
FF833BF0:     /awacs@14000
FF833CD8:     /swim3@15000
FF834DE0:     /via-cuda@16000
FF835970:       /adb@0,0
FF835A60:         /keyboard@0,0
FF8361B0:         /mouse@1,0
FF836260:       /pram@0,0
FF836310:       /rtc@0,0
FF8367D8:       /power-mgt@0,0
FF836898:     /mesh@18000
FF838418:       /sd@0,0
FF839048:       /st@0,0
FF839CC8:     /nvram@1D000
FF839DA0: /taos@F0800000
FF839EC8: /aspenmemory@F8000000
0 > dev /openprom  ok
0 > .properties
name                    openprom
model                   Open Firmware, PipPCI.

0 > printenv auto-boot?

auto-boot?          true                true
0 > printenv use-nvramrc?

use-nvramrc?        false               false
0 > printenv real-base

real-base           -1                  -1
0 > printenv load-base

load-base           4000                4000
0 > printenv boot-device

boot-device         /AAPL,ROM           /AAPL,ROM
0 > printenv boot-file

0 > printenv input-device

input-device        ttya                ttya
0 > printenv output-device

output-device       ttya                ttya
0 > printenv nvramrc

0 > printenv boot-command

boot-command        boot                boot
0 > bye

This dump was generated on my @WORLD Pippin with ROM 1.2. Some observations:

  • OF doesn't report a version number, instead reporting "PipPCI" in its place. Searching the ROM for strings reveals "June 28, 1996" as the latest date I could find, so whatever Apple was using in its Power Macs at that time I imagine is what is running here.
  • The ROM is located at 0xFFC00000, which follows what I've seen from hardcoded addresses I've found.
  • "taos" is the video hardware starting at 0xF0800000. I'm not sure offhand if that address is the base of video memory, but I do know from the Pegasus Prime code that taos does allow for writing directly to VRAM.
  • There is a TFTP package(!)—wonder how it works?
  • The Pippin has a SWIM III chip onboard. There is an official floppy drive expansion dock and an unofficial floppy drive expansion board, both of which appear to be "dumb" hardware that merely connect a drive directly to pins of the Pippin's X-PCI connector on the underside of the system. The drive itself is powered and controlled entirely by hardware already built in to the Pippin. However, as far as I know, the SWIM II and later floppy controllers (including the SWIM III) lack the low-level access necessary for HD20 support, so large drives emulated by hardware such as the Floppy Emu will not work.