Friday, October 30, 2020

Sentimental Shooting Graphics Files: Code Update

So, the disassembly thing is basically not going to happen.  Or at least, not by me.  As it turns out, people want tons of money for this kind of software and the free alternatives that I'm willing to run on my computer are lacking in documentation and the features I need to get my foot in the door.  Some are written in Python, which is annoying to deal with because everyone has their own pet version of Python that they never upgrade from and require you to install for all their stuff to work.

However, I can make my existing code better, and figure out more of the files.

I've already made the code better, by translating it to C#.  It runs faster and is far simpler now.  The reason for the simplicity is because my original PowerShell scripts were designed to work with paths containing wildcards and the PowerShell pipeline, so there had to be a bit of overhead in there to transition from one file to the next cleanly.  Stuff that an end-user probably wouldn't think of, but would absolutely annoy them if I didn't do it.  Namely, closing file handles properly so files don't stay marked as "in use" and become unable to be moved, renamed, or deleted.

But also in my typical fashion there was code in the PowerShell script for the UME files, for a use case I didn't need: some of the UME files are sort of grafted at the hip to System.ume, and being that the script was designed to be able to process an arbitrary number of input files, I put in code that would try to use System.ume from different directories, but if it's still in the same directory it'll keep the current System.ume open, and just seek back to the beginning of it.  It's kinda dumb, but I wrote it off as being forward-thinking.  After all, this code I'm working on could very well be the basis for modding the game, so maybe you have files for a few different mods you want to convert and they all have their own custom System.ume...  While ostensibly there to avoid closing and reopening the same file repeatedly, it does have this other use case.

Yeah, like I said, it's a use case that I didn't actually need.  It's technically preserved in the C# code, but the C# code also only handles one file at a time.  If you want to process a whole directory, you get to call the method once per file, specifying the path to your desired System.ume with each call.  Yeah, there's some overhead to that, constantly closing and re-opening the same System.ume, but the code's cleaner and easier to follow.  I also got to re-examine what I was doing and remove some of the stupider bits, so the new code is just... more better.

Both the PCG code and the UME code now exist as C# static methods, along with a couple static methods I created to see what was going on with my transformation of the input path to the output path, that I decided to leave in there because why not.  They could be useful.  Yeah, the methods don't take an output path as a parameter, they just transform the input path and then bitch if that file already exists.  That's kinda how I roll I guess.  I did put in a lot of work transforming the file names to be useful, it's not just extension swapping.  For example, Akr1b.ume will get transformed to Akira Stage 1 Background.bmp.

As for figuring out more of the files, honestly, without the ability to understand what's going on in the game's code, I can't do very much more.  The interesting stuff, the PRT, ROT, SPR, and DMO files, will all take more work than I can realistically do right now.  Running some quick commands to get file extension stats shows there's a couple other oddballs in there as well, Enemy.inf and Enemy.pnt.  I've looked at those and have no clue where to start.  This leaves two pieces of extremely low-hanging fruit: Option.dat and Destroy.scoOption.dat holds the options, of which the game only has a handful, and aside from the difficulty setting they're all toggles.  Destroy.sco holds your high scores.  They're such low-hanging that I already have classes for both to allow creating, loading, and saving them.

Destroy.sco was interesting because it contains the same structure repeated 12 times.  I chose to represent this in two ways: named properties on my class to allow accessing a specific stage's data by character name, and a custom indexer that puts the stages in the order they appear on the stage select screen from right to left.  That's actually significant because the score data in Destroy.sco is not in the same order as the stage select screen.  Figuring out the order was easy enough since it was simple to identify which values in the file were the scores.  I just set each stage's score to a number from 1 to 12, so when I looked at it in-game I could see which score showed up where.  Because I implemented an indexer, it didn't feel right without a Count property and an IEnumerable<T> implementation, so those are present as well.  I didn't implement ICollection<T> because it's not actually a collection and doesn't need management methods.

Also, the scores are signed values, which is interesting because if you give it a negative score, it reads an incorrect graphics tile from System.ume when it tries to display the minus sign, despite the fact that there is a minus sign available.  Also, if it has to display more than 7 characters for the score (7 digits, or 6 and a minus sign), it won't fit properly in the box on the stage select screen.  My code range checks and sanitizes both the scores and the percentages to avoid causing display bugs.

The structure of Option.dat suggests to me that the difficulty setting was probably added later on in development.  It appears at the very end of the file even though it shows up at the top of the options screen (everything else is in the order in which it appears on the options screen), Normal difficulty has a value of 0, Easy has a value of 1, and Hard has a value of 2.

Anyway, I've basically done all I can do in terms of figuring out file formats.  I suppose that my next endeavor will be trying to tweak the game's graphics files to avoid the palette corruption.  This will be annoying given that I can't get GIMP to load the palette from the files in its palette editor.  Worst case scenario, I have to roll my own code to change a palette entry and update palette indices to point to the new color, because I'm not writing an image editor.

Friday, October 2, 2020

Sentimental Shooting Graphics Files

So at the end of my previous post, I left a juicy little nugget of information about me poking around the game's graphics files.  I've "completed" a major milestone in doing so, and while there's still stuff left to do, I have enough to actually post about.

The game's graphics are in the PCG and UME files.  Judging by file extension alone, you wouldn't expect them to be any kind of standard file format that's documented outside of a dark corner of the internet, but... you'd be wrong.  Both are standard Windows BMP files, changed ever so slightly so that they don't appear to be proper files if you change their extensions and open them.

The PCG files are the omake pictures, and simply have 0x70 added to the first 0x03E8 bytes of the file.  That's it.

The UME files are the stage backgrounds, clothing fragments, enemy and boss sprites, menus, and everything else.  System.ume is actually a BMP that's just renamed, but the rest are very close to being proper BMP files.  Some have extra bytes at the end that any viewer or editor can easily ignore, but the real kicker is that they have improper data for some of the header fields.  You can make extremely minimal changes to allow an image editor to open them, but... for the stage backgrounds at least, we have the ability to make the game output the file directly.  Making these minimal changes does not result in a file that matches the one generated by the game.

Thus began my journey for matching output.  I discovered that the stage backgrounds saved by the game in the SNAPSHOT directory have various values pulled in from System.ume, including the color palette.  Most of the UME files for stage backgrounds actually have a zeroed out section of their palette where the palette from System.ume should go, which made it easy to spot.  Something about how the game handles the data zeroes out the padding in each row by the time it ends up in the SNAPSHOT directory, and any extra bytes at the end of the file get removed.  Towards the end I identified two groups of files that needed different behavior, but couldn't pinpoint how to detect this from the file contents alone, so I gave in and implemented dirty hacks that check hardcoded filename lists.  It's finally over, I have matching output for all 24 stage backgrounds, and no visible corruption in any of the other files, but... it doesn't feel right with the dirty hacks still in there.  This is why I put the word "complete" in quotes in the first paragraph: It works, but I don't like it.

One issue is that some files exhibit... interesting changes if the System.ume palette is imported, even though they have the space for it.  Some of these changes are easily identified to be incorrect, but others I'm not so sure.  I have an idea how to proceed here, but it's low priority in the long run.

A major sticking point is that I have no idea if my code that generates matching output for the stage backgrounds also does so for any of the other files.  I have no way to make the game output these files, and thus no way to see what needs to be done.  Or, do I?

I have reached a point where all of my remaining logical steps involve disassembling the game to see how it loads the UME files, handles the color palettes, and outputs snapshots.  This will be difficult for me for three reasons:
  1. I don't know x86 assembly
  2. x86 assembly is far more complex than the assembly languages I've learned so far (Z80 and 65816)
  3. I don't trust the NSA so I'm not installing Ghidra (so there goes its much-touted decompilation feature)
I can still think logically enough to know how to proceed after getting a non-Ghidra disassembler installed, though.  I've already looked at SGSTG.EXE in HxD, and I can clearly see the strings the game uses.  Windows executables include the names of the various API functions called by the executable, and I can see the various file names as well; so labelling all the strings I'm interested in and then searching the code for references to them seems like a cromulent first step.  From looking around at different disassemblers, "identification of Windows API calls" is a feature that pops up a lot, which would be handy.  Figuring out how it builds the palette from reading the files shouldn't be that difficult once I have my foot in the door, it's just a matter of copying data around and that's generally pretty easy to follow in any assembly language.

Still though, I now have PowerShell scripts that work for both the PCG and UME files, both of which produce output that matches what the game outputs on its own for every file I can possibly verify it with.

Fixing the palette corruption issues that the Windows compatibility modes mitigate by tweaking the corresponding UME files looks to be possible.  Logo.ume defines a full 256-color grayscale palette but only uses a small portion of it.  Enemy2.ume, which contains the sprites in which I'd noticed some palette corruption, has plenty of unused palette entries.  I just need to figure out which palette entries get corrupted, and move them to areas of the palette that were previously unused.  The disassembly can potentially help with that, if I can catch where the palette corruption happens and discover its true extent.

The pipe dream would be to fix whatever code is causing the corruption in the first place and just patch SGSTG.EXE.

I think I'll leave it there for now.  I'm at the point where I can open a BMP file in HxD and identify all the fields without having to consult an external reference.  If I don't post anything on this subject for a while, I've probably drowned myself in x86 assembly just to tweak a silly hentai game, or given up.