How to Read Picasa 3.9 Database and extract faces data

How to Read Picasa 3.9 Database and extract faces data

Retrieving information from Picasa is not an easy thing, the software is quite limited and hardly offers any data export function.

What I would like to do: extract the raw face recognition information from the Picasa database. I need the person names, the image filename and the rectangle associated with each face.

On Windows 7, the Picasa database is located in C:\Users\USERNAME\AppData\Local\Google\Picasa2. In this folder, we can found mainly pmp and db files. The pmp files store tabular data. Each PMP file contains a column of a table in the database. The name of the file is table_column.pmp. The Picasa database contains 3 tables:

albumdata, which contains information on the albums (folders and face album)
catdata, categories data (almost empty on my computer)
imagedata, images data (includes rectangles and references albums)

How to read PMP files ?

PMP files are binary files in little-endian format. The header is described by the following table :

Size Description
4 bytes magic constant : 0x3fcccccd
2 bytes field type (unsigned short)
2 bytes constant: 0×1332
4 bytes constant: 0×00000002
2 bytes field type (unsigned short)
2 bytes constant: 0×1332
4 bytes number of entries (unsigned int)

The field-type values are :

Value Description
0×0 null-terminated strings
0×1 Unsigned int, 4 bytes
0×2 Dates, Microsoft Variant Time format, 8 bytes
0×3 byte field, 1 byte
0×4 unsigned long, 8 bytes
0×5 unsigned short, 2 bytes
0×6 null-terminated strings
0×7 unsigned int, 4 bytes

See http://sbktech.blogspot.fr/2011/12/picasa-pmp-format.html for more information on pmp files.

The interesting values for faces are facerect (rectangle coordinates) and personalbumid (album reference) in the table imagedata, and the values token (album reference) and name (person name) in the table albumdata.

Example (keeping only 4 columns):

Extract from the table imagedata from Picasa 3.9

Extract from the table imagedata from Picasa 3.9

The rectangle is described by a value in the format rectangle64. A 64 bit number breakable in 4 16-bit numbers. The 4 numbers, once divided by 2^16-1 (the maximum value), are the relative coordinates of the top left corner and the bottom right corner. The absolute values can be obtained by multiplying the values by the width and height of the picture.

Example :

original number (64 bits) 0x67873bec9e1e933d
Break in 4 16-bit number 0×6787 0×5678 0x3b51 0x4a89
Convert to decimal 26503 15340 40478 37693
Divide by 2^16-1 (65535) 0,4044 0,2341 0,6176 0,5751
Multiply by the width (3264) and the height (2448) x1=1319 y1=573 x2=2016 y2=1407

With those information, we know that the image x has a rectangle corresponding to a specific person but we don’t know yet the file name of the picture.

This information is held in the file thumbindex.db. This file contains the whole list of folders and files indexed inside the Picasa database. the line x in thumbindex.db file will correspond to the same image as the line x in the table imagedata.

How to read thumbindex.db ?

Header :

Size Description
4 bytes magic constant: 0×40466666
4 bytes number of entries (unsigned int)

And each line follows this schema:

Size Description
until null character null-terminated strings
26 bytes useless content
4 bytes index

A line will be either a folder with its complete path and a specific index value (4294967295), or an image with its filename and an index value pointing to the parent folder.

Extract from the file thumbindex.db from Picasa 3.9

Extract from the file thumbindex.db from Picasa 3.9

In this example the image 266 is in the folder 5.

See http://projects.mindtunnel.com/picasa3meta/docs/picasa3meta.thumbindex.ThumbIndex-class.html for more information on the  thumbindex.db file.

If we merge the table imagedata and the data from thumbindex.db, for a specific image, we have the filename, the face rectangle but no album reference to associate the face to a person!

Picasa will actually add a virtual image to store this information. In the previous example, the virtual picture 268 (which has no filename) is linked to the image 266 and will contain information on the face of one person (1 virtual image per person). The rectangle in the image 266 will contain all face rectangles present in the image (when the image has more than one face identified, otherwise the rectangle will be the same as the one of the single person). So, we just need to read the reference album of the image 268 and associate it with the image file 266.

I have created a software that parse all those information and store them in csv files. One file per table pmp and one file for the faces. If imagemagick is installed (and the convert application is in the path), the software can create thumbshots of all the faces.

How to use the program to parse Picasa database ?

There are in fact 2 programs, one called PMPDB that will convert the pmp tables into csv files and one called PicasaFaces that will create a nice human readable csv with all the face information and the face thumbshots.

Usage:

java -classpath ".:bin/:commons-cli-1.2.jar" PMPDB -folder "/path/to/PicasaDB/Picasa2/db3/" -output ./OutputFolder

java -classpath ".:bin/:commons-cli-1.2.jar:commons-io-2.4.jar" PicasaFaces -folder "/path/to/PicasaDB/Picasa2/db3/" -output ./OutputFolder -replaceRegex C: -replacement /media/HardDrive -convert /path/to/convert(.exe)

If the command line contains the argument -convert, then imagemagick will create all the face thumbshots (in the output folder with a folder for each person). A string replacement of the original image paths can be done if the pictures location is different from the database (in the example “C:” will be replaced by “/media/HardDrive”).

Result from the face data extraction

Result from the face data extraction

source are available on github : https://github.com/skisoo/PicasaDBReader

Working on Windows and Linux.

25 thoughts on “How to Read Picasa 3.9 Database and extract faces data

  1. It is simpler if you enable ‘Store name tags in photo’ in Picasa Preferences under ‘Name tags’ tab. Then use exiftool to print out tags and look for tags named ‘Region’.

  2. Is there a procedure to download the entire content of the
    property fife of picasa images, taps, names, dates, coordinates etc?

    • The little program I did can extract the data from the pmp files. Names, Dates, tags are included, Coordinates I haven’t checked but should be there as well. The result will be CSV files that you can open with Microsoft Excel or LibreOffice Calc.

  3. Hi!
    Thanks a lot for this!
    Unfortunately it doesn’t really solve my Problem. What i want to achieve is: Extract all the Album Information. (My wife creates a lot of Albums, linking only the really nice Pictures together) I would then like to create Slideshow-Files for my Twonky-Media Server.
    However, the key is to get the information of the Albums out of Picasa. It seems to me, that the Album-Information is stored in the albums_0.db.
    Have you also done an extract for the albums_0.db?
    Would be great! Thanks in advance!
    zap

    • Zap,

      you can read file “.picasa.ini” from folders where albums are stored (folder where are pictures from album). In those files are all data that you need… One thing that you must do is merge data from different photo folders (different .picasa.ini files).

      If at least one file from picture folder is in some album, then in .picasa.ini in that folder, you can find .album section

      example:

      [.album:cb1c79e3ed1108367efc8f034fe2386d]
      name=Album name
      token=cb1c79e3ed1108367efc8f034fe2386d
      date=2012-04-30T17:49:15+02:00

      if picture is in album (check album token) then you can find:

      [DSC01936.jpg]
      albums=cb1c79e3ed1108367efc8f034fe2386d

      I hope it helps. :)

      bye

      • Thanks for pointing me to picasa.ini. It can be so easily parsed.

        I regret Picasa doesn’t retrieve album information from these files :(

  4. I am not a java developer and do not know how to compile the sources correctly. Would you add instructions to GitHub and/or this blog entry about how to compile the utility?

    Thank you for putting together such a useful utility. I can’t wait to start using it.

    Kevin Hall

    • Hi Kevin,
      I have added compilation instructions in the Readme in the git repository on GitHub. Another option is to create a new Java project in Eclipse, import the source and add the 2 libraries used. Let me know if you manage to run the program successfully.

      • I finally got it to compile. What threw me for a loop was that on my system, I had to use a semi-colon to separate classpath directories whereas your instructions use colons.

        Now, when I run ‘Faces’, I get a runtime exception:

        C:\picasadbreader>java -classpath “.;bin/;commons-cli-1.2.jar;commons-io-2.4.jar” PicasaFaces -output ./OutputFolder
        nb entries: 74008
        Exception in thread “main” java.lang.IndexOutOfBoundsException: Index: 73985, Size: 73985
        at java.util.ArrayList.rangeCheck(Unknown Source)
        at java.util.ArrayList.get(Unknown Source)
        at PicasaFaces.gatherImages(PicasaFaces.java:142)
        at PicasaFaces.main(PicasaFaces.java:117)

        I have a pretty big database. Might that be a problem?

          • I don’t think the size matter much, my own database is 1.7GB (for 110GB of photos).

            Maybe Picasa updated their software and changed a little bit the database structure or maybe you use Picasa differently than me (for example I don’t use Albums but I don’t if that has an impact on my program)…

            Anyway, in the file PicasaFaces.java at the line 135, try to replace “i<nb” by “i<nb-1″ or “i<nb-2″, that will prevent the program to read the last lines. Let me know if that works for you.

        • I just tried PMPDB, and that worked like a charm. I’m currently looking to see what data is saved there.

          Thanks for everything!

      • Thank you for your help. I did get things to work eventually. Skipping the last 1 or 2 entries still didn’t work. However, skipping 100 entries did. I widdled it down to something occuring in the the 18th to the last entry.

        Anyway, the solution I ended up going with was to wrap the following line:

        personsId.get(db.imagedata.get(“personalbumid”).get(i));

        in a try block. In the new corresponding catch block (that catches all exceptions), I just continued the for loop with a ‘continue’ statement.

        I also tried a return statement. It didn’t affect my results. There were only 5 indices that threw exceptions in the last 18 indices, but apparently none of those last non-throwing entries contained any data.

        With this, I was able to retrieve the 20000+ known faces in my library. Thank you again for this tool!

        If you have any questions about what I did, please let me know.

        Kevin

        (Sorry, I couldn’t reply to your last message. Perhaps the reply nesting was too deep?)

  5. Great src. Thanks for posting it and thanks for all of the help. I just have one issue that I can’t seem to tackle. I need to identify which photos are starred. The purpose of this is to use Google Chromecast to present a “starred photos only” slideshow on a homemade digital picture frame. Any ideas? Thanks again.

    • Hi Jason,
      I’m sorry I haven’t really look into this topic for a few months. I don’t know if the database shows the starred pictures but If I remember correctly, the database shows the tags. So you can tag all your starred pictures with a special tag and extract them.

  6. Hi !
    Thx for all detail. It’s help me a lot !
    You made an awesome work !

    I’m trying to make my own soft to extract all the picture of a person. I’ve success to read the pmp and thumbindex.db.
    But I’ve a problem. I don’t understant the link between the file imagedata and thumbindex.
    You say the line x in thumbindex.db file will correspond to the same image as the line x in the table imagedata. But I’ve 163757 entries in thumbindex and only 32513 in imagedata !
    How i can make a ling between them. I’m disappointed…

    I’ve already see the link that you give. And I’ve not find any kind of answer… Can you explain me ?

    Thx !

    • Hum.. Picasa might have change the content of their files since I wrote the program or maybe you use the software differently. I do not use Albums so I don’t really know how the albums are stored, maybe they are polluting your database.

    • I’m a piece of porridge…
      It’s because each file pmp has his own number of entries. And I’ve used only the number of entries of the first pmp file…
      After correction, I’ve the same number of line.
      But I don’t understand why, in this case with a cardinality of 1-1, they made two files instead of one iwth all the data…

      Thx for all.

  7. Hi!

    All starred files are listed in the “starlist.txt” file.

    Alas, you can neither add nor can you remote the star just by editing this file.

    Many thanks for the source code!

  8. Once they have identified which joints and
    bones are not moving properly, the chiropractor will then begin to readjust the spine,
    joint by joint to help you to start moving more freely.
    Finishing an undergraduate course before pursuing another 4-5 years of academic
    learning in a chiropractic school, chiropractors study the medical disciplines that any medical doctor does.
    Chiropractic treatment is usually in a kind of a
    massage.

  9. Having read this I believed it was extremely enlightening.
    I appreciate you taking the time and effort to put this article together.
    I once again find myself personally spending
    way too much time both reading and leaving comments.
    But so what, it was still worth it!

    Here is my blog post: get paid online free

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>