Retrieving information from Picasa is not an easy thing, the software is quite limited and hardly offers any data export function.
What I would like to do: extract the raw face recognition information from the Picasa database. I need the person names, the image filename and the rectangle associated with each face.
On Windows 7, the Picasa database is located in C:\Users\USERNAME\AppData\Local\Google\Picasa2. In this folder, we can found mainly pmp and db files. The pmp files store tabular data. Each PMP file contains a column of a table in the database. The name of the file is table_column.pmp. The Picasa database contains 3 tables:
albumdata, which contains information on the albums (folders and face album)
catdata, categories data (almost empty on my computer)
imagedata, images data (includes rectangles and references albums)
How to read PMP files ?
PMP files are binary files in little-endian format. The header is described by the following table :
Size | Description |
---|---|
4 bytes | magic constant : 0x3fcccccd |
2 bytes | field type (unsigned short) |
2 bytes | constant: 0x1332 |
4 bytes | constant: 0x00000002 |
2 bytes | field type (unsigned short) |
2 bytes | constant: 0x1332 |
4 bytes | number of entries (unsigned int) |
The field-type values are :
Value | Description |
---|---|
0x0 | null-terminated strings |
0x1 | Unsigned int, 4 bytes |
0x2 | Dates, Microsoft Variant Time format, 8 bytes |
0x3 | byte field, 1 byte |
0x4 | unsigned long, 8 bytes |
0x5 | unsigned short, 2 bytes |
0x6 | null-terminated strings |
0x7 | unsigned int, 4 bytes |
See http://sbktech.blogspot.fr/2011/12/picasa-pmp-format.html for more information on pmp files.
The interesting values for faces are facerect (rectangle coordinates) and personalbumid (album reference) in the table imagedata, and the values token (album reference) and name (person name) in the table albumdata.
Example (keeping only 4 columns):
The rectangle is described by a value in the format rectangle64. A 64 bit number breakable in 4 16-bit numbers. The 4 numbers, once divided by 2^16-1 (the maximum value), are the relative coordinates of the top left corner and the bottom right corner. The absolute values can be obtained by multiplying the values by the width and height of the picture.
Example :
original number (64 bits) | 0x67873bec9e1e933d | |||
Break in 4 16-bit number | 0x6787 | 0x5678 | 0x3b51 | 0x4a89 |
Convert to decimal | 26503 | 15340 | 40478 | 37693 |
Divide by 2^16-1 (65535) | 0,4044 | 0,2341 | 0,6176 | 0,5751 |
Multiply by the width (3264) and the height (2448) | x1=1319 | y1=573 | x2=2016 | y2=1407 |
With those information, we know that the image x has a rectangle corresponding to a specific person but we don’t know yet the file name of the picture.
This information is held in the file thumbindex.db. This file contains the whole list of folders and files indexed inside the Picasa database. the line x in thumbindex.db file will correspond to the same image as the line x in the table imagedata.
How to read thumbindex.db ?
Header :
Size | Description |
---|---|
4 bytes | magic constant: 0x40466666 |
4 bytes | number of entries (unsigned int) |
And each line follows this schema:
Size | Description |
---|---|
until null character | null-terminated strings |
26 bytes | useless content |
4 bytes | index |
A line will be either a folder with its complete path and a specific index value (4294967295), or an image with its filename and an index value pointing to the parent folder.
In this example the image 266 is in the folder 5.
See http://projects.mindtunnel.com/picasa3meta/docs/picasa3meta.thumbindex.ThumbIndex-class.html for more information on the thumbindex.db file.
If we merge the table imagedata and the data from thumbindex.db, for a specific image, we have the filename, the face rectangle but no album reference to associate the face to a person!
Picasa will actually add a virtual image to store this information. In the previous example, the virtual picture 268 (which has no filename) is linked to the image 266 and will contain information on the face of one person (1 virtual image per person). The rectangle in the image 266 will contain all face rectangles present in the image (when the image has more than one face identified, otherwise the rectangle will be the same as the one of the single person). So, we just need to read the reference album of the image 268 and associate it with the image file 266.
I have created a software that parse all those information and store them in csv files. One file per table pmp and one file for the faces. If imagemagick is installed (and the convert application is in the path), the software can create thumbshots of all the faces.
How to use the program to parse Picasa database ?
There are in fact 2 programs, one called PMPDB that will convert the pmp tables into csv files and one called PicasaFaces that will create a nice human readable csv with all the face information and the face thumbshots.
Usage:
java -classpath ".:bin/:commons-cli-1.2.jar" PMPDB -folder "/path/to/PicasaDB/Picasa2/db3/" -output ./OutputFolder
java -classpath ".:bin/:commons-cli-1.2.jar:commons-io-2.4.jar" PicasaFaces -folder "/path/to/PicasaDB/Picasa2/db3/" -output ./OutputFolder -replaceRegex C: -replacement /media/HardDrive -convert /path/to/convert(.exe)
If the command line contains the argument -convert, then imagemagick will create all the face thumbshots (in the output folder with a folder for each person). A string replacement of the original image paths can be done if the pictures location is different from the database (in the example “C:” will be replaced by “/media/HardDrive”).
source are available on github : https://github.com/skisoo/PicasaDBReader
Working on Windows and Linux.