File Formats used by MP3 File Manager v2 & upward

Introduction

MP3FileManager v2 uses a simpler obfuscation algorithm than v1.x, but
uses more complex structures to describe the device contents. The
files are still, as with v1.x, obfuscated MP3 files. Note, I have
indications that this doesn't work for all v2 devices, and even that
there are now devices which do not obfuscate the file in any
meaningful way at all beyond changing the header from ID3 to EA3.

File Mechanism and Layout

Please note that this describes v2.0 and later versions of the
MP3FileManager application, and reverse-engineering of this is not
complete. For v1.x, see FILE_FORMAT.txt.

V2.0 uses more files and a more complex layout, but a simpler
obfuscation algorithm. The files on the device are as follows:

* A file called DvID.dat is created in the same directory as the
  MP3FileManager executable. There is a correlation between this file
  and the scrambling key, which k Kryss has figured out: the
  significant bytes are the 11th to 14th inclusive, which are then
  used in a keying function along with the track ID to generate the
  scrambling code.

* All other files go into a folder on the device called OMGAUDIO.

[device root]
|
+-OMGAUDIO
  |
  +- 10F00    ** 1XXXXXXXX.OMA files go here
  |
  +- 00gtrlst.dat
  +- 01treeXX.dat (minimum XX = 01 to 04)
  +- 02treinf.dat
  +- 03ginfXX.dat (minimum XX = 01 to 04)
  +- 04cntinf.dat
  +- 05cidlst.dat

Specifics of File Formats

General

As with v1.x, data is bigendian and text strings are UTF16BE. I have
not as yet put a lot of effort into understanding the non-music files
on the device, but all appear to be constructed similarly:

[16-byte file header, including object count]
[block of 16-byte object pointers]
[objects]

The file header consists of 4 bytes of magic to describe the file,
what appears to be a constant 4 bytes, 1 byte indicating the number of
object pointers, and padding out to 16 bytes.

Object pointers consist of 4 bytes of magic to describe the object
type, a 32-bit file offset to the start of the objects (which are
stored contiguously), a 32-bit data length, and padding to 16 bytes.

Objects start with 4 bytes of magic which match the pointer for the
object, a 16-bit record count, a 16-bit record size, and padding to 16
bytes.

Further investigation on the actual meaning of the various objects and
records is required.

00GTRLST.DAT

This file contains two object types: SYSB and GTLB. SYSB seems to
contain little to no useful data; I've only one sample file where it's
not entirely filled with NULLs. The GTLB sections contain ID3 tag
names (TPE1, TALB, and TCON) in various combinations. This may
indicate which files on the device have which tags available. Each
record looks something like this:
2-byte record number
2-byte record group?
pad to 16 bytes
2-byte # tags
2-byte zeroes
4-byte*#tags tags
pad to record size with zeros

If "record group" is non-zero, there will be a 03GINFXX.DAT file
corresponding to the record number? (may be a bitfield) The record
numbers aren't necessarily in sequence.

01TREEXX.DAT

magic: TREE; GPLB and TPLB objects. GPLB records are 8 bytes each; all
samples look like
2 bytes track #
2 bytes 0100
2 bytes track #
2 bytes 0000
TPLB records are 2 bytes each and are simply the track number.

02TREINF.DAT

The file magic is GTIF. Contains GTFB objects. Records correspond to
GTR or TREE? Sample non-null record:
[zero padding][0a 5a9c 0001 0080] <= 16 bytes
TIT2[0x0002]ATRAC AD1 Database: New Tracks[zero padding] <= remainder
0x0002 is the ID3 encoding used - utf16be

03GINFXX.DAT

magic: GPIF; contains GPFB objects, which appear to be hold ID3 tags
describing the source albums (i.e. you may have put the music into a
folder named "New Folder", but this records the original album
tagging). Objects start with a 16-byte header:
4 bytes 00000000
4 bytes 00000000
4 bytes # of mp3 frames
2 bytes # tag records
2 bytes tag record size

Tag records consist of raw ID3v2(.4?) tags - 4 bytes indicating the
tag type, 2 bytes to indicate the encoding, and the encoded data up to
the size of the tag record. I don't know if the record size increases
to cope with longer tags, or if the tags just get truncated. Current
default appears to be 128-byte records, giving a 61-character maximum
size.

04CNTINF.DAT

magic: CNIF; contains CNFB objects. Each object starts with a 16-byte header:
4 bytes [0000 fffe] (file type? would be MPEG-1 Layer 1, which is wrong)
4 bytes track size in ms
4 bytes # of mp3 frames
2 bytes number of tag records
2 bytes size of each tag record (0x80)

Tag records consist of raw ID3v2(.4?) tags - 4 bytes indicating the
tag type, 2 bytes to indicate the encoding, and the encoded data up to
the size of the tag record. I don't know if the record size increases
to cope with longer tags, or if the tags just get truncated. Current
default appears to be 128-byte records, giving a 61-character maximum
size.

05CIDLST.DAT

magic: CIDL; contains CILB records, one per track. Each record is 48
bytes. Record format:
six longwords which appear as-is in the corresponding track:
0000 0000 [constant]
010f 5000 [constant]
0005 0004 [constant]
00a0 8ddd [varies with track]          003f 2d76
ca19 9809 [varies with track]          278e f5f0
44b4 730b [varies with track]          befc 1727
zero-padding to 48 bytes

OMA files

These are the actual MP3 files. The file starts with a tag "ea3";
replacing the "ea" with "ID" turns this into an ID3 block, complete
with size tag, which can be read with a standard ID3 library. So far
all sample files have had 3072 bytes of ID3 data on the device,
regardless of the amount in the input files. After the ID3 tag there
is a second block starting with either "ea3" or "EA3" (not sure why
there's a case difference, nor whether it changes between
versions). The next byte, 0x02, is probably part of this
signature. The next 16-bit word is the size of the header including
the 4-byte signature. Immediately after the header is the mp3 data,
obfuscated in 8-byte blocks. If the MP3 data does not fit exactly into
8-byte blocks, the left-over bytes are copied "in the clear". The
obfuscation is a simple XOR with a 4-byte pattern (repeated twice to
obfuscate 8 bytes, obviously). The 4-byte pattern can be determined
from already-written files through brute force, but this is obviously
no help with writing. With access to the DvID file, you can use the
following algorithm to generate the scramble key:

  dvid = bytes 11 - 14 of DvID.dat
  n    = track id
  key = ( 0x2465 + n * 0x5296E435 ) ^ dvid

The key can also be retrieved by sending a command sequence directly
over USB to the device. The sequence is as follows:

/* Thanks to Ricardo Ferreira (from the GYM project) for this byte sequence */
const char g_query_device_dvid[] =
{
  0x55, 0x53, 0x42, 0x43, 0xB0, 0x0B, 0x9E, 0x81,
  0x12, 0x00, 0x00, 0x00, 0x80, 0x00, 0x0C, 0xA4,
  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xBC, 0x00,
  0x12, 0x3F, 0x00, 0x00, 0x00, 0x00, 0x00
};

You can use usb_bulk_write to write it to the device, usb_bulk_read to
get the response (should be 18 bytes), and the bytes you're looking
for are at offset 12 in the response (bytes 12, 13, 14, 15).

Credits:
Waider <waider@waider.ie> (staring at hex dumps for many hours)
Grégory 'GaLi' Cavelier / Romain 'Mayem' Tavenard (sample files, suggestions)
Joe (jmk)        (sample files)
tor              (sample files)
k Kryss			 (DvID -> scramble key algorithm)
Ricardo Ferreria (USB commands to fetch key)
Jul Otias		 (some further work on index files)