Retrieving information from Picasa is not an easy thing, the software is quite limited and hardly offers any data export function.
What I would like to do: extract the raw face recognition information from the Picasa database. I need the person names, the image filename and the rectangle associated with each face.
On Windows 7, the Picasa database is located in C:\Users\USERNAME\AppData\Local\Google\Picasa2. In this folder, we can found mainly pmp and db files. The pmp files store tabular data. Each PMP file contains a column of a table in the database. The name of the file is table_column.pmp. The Picasa database contains 3 tables:
albumdata, which contains information on the albums (folders and face album)
catdata, categories data (almost empty on my computer)
imagedata, images data (includes rectangles and references albums)
How to read PMP files ?
PMP files are binary files in little-endian format. The header is described by the following table :
Size | Description |
---|---|
4 bytes | magic constant : 0x3fcccccd |
2 bytes | field type (unsigned short) |
2 bytes | constant: 0x1332 |
4 bytes | constant: 0x00000002 |
2 bytes | field type (unsigned short) |
2 bytes | constant: 0x1332 |
4 bytes | number of entries (unsigned int) |
The field-type values are :
Value | Description |
---|---|
0x0 | null-terminated strings |
0x1 | Unsigned int, 4 bytes |
0x2 | Dates, Microsoft Variant Time format, 8 bytes |
0x3 | byte field, 1 byte |
0x4 | unsigned long, 8 bytes |
0x5 | unsigned short, 2 bytes |
0x6 | null-terminated strings |
0x7 | unsigned int, 4 bytes |
See http://sbktech.blogspot.fr/2011/12/picasa-pmp-format.html for more information on pmp files.
The interesting values for faces are facerect (rectangle coordinates) and personalbumid (album reference) in the table imagedata, and the values token (album reference) and name (person name) in the table albumdata.
Example (keeping only 4 columns):
The rectangle is described by a value in the format rectangle64. A 64 bit number breakable in 4 16-bit numbers. The 4 numbers, once divided by 2^16-1 (the maximum value), are the relative coordinates of the top left corner and the bottom right corner. The absolute values can be obtained by multiplying the values by the width and height of the picture.
Example :
original number (64 bits) | 0x67873bec9e1e933d | |||
Break in 4 16-bit number | 0x6787 | 0x5678 | 0x3b51 | 0x4a89 |
Convert to decimal | 26503 | 15340 | 40478 | 37693 |
Divide by 2^16-1 (65535) | 0,4044 | 0,2341 | 0,6176 | 0,5751 |
Multiply by the width (3264) and the height (2448) | x1=1319 | y1=573 | x2=2016 | y2=1407 |
With those information, we know that the image x has a rectangle corresponding to a specific person but we don’t know yet the file name of the picture.
This information is held in the file thumbindex.db. This file contains the whole list of folders and files indexed inside the Picasa database. the line x in thumbindex.db file will correspond to the same image as the line x in the table imagedata.
How to read thumbindex.db ?
Header :
Size | Description |
---|---|
4 bytes | magic constant: 0x40466666 |
4 bytes | number of entries (unsigned int) |
And each line follows this schema:
Size | Description |
---|---|
until null character | null-terminated strings |
26 bytes | useless content |
4 bytes | index |
A line will be either a folder with its complete path and a specific index value (4294967295), or an image with its filename and an index value pointing to the parent folder.
In this example the image 266 is in the folder 5.
See http://projects.mindtunnel.com/picasa3meta/docs/picasa3meta.thumbindex.ThumbIndex-class.html for more information on the thumbindex.db file.
If we merge the table imagedata and the data from thumbindex.db, for a specific image, we have the filename, the face rectangle but no album reference to associate the face to a person!
Picasa will actually add a virtual image to store this information. In the previous example, the virtual picture 268 (which has no filename) is linked to the image 266 and will contain information on the face of one person (1 virtual image per person). The rectangle in the image 266 will contain all face rectangles present in the image (when the image has more than one face identified, otherwise the rectangle will be the same as the one of the single person). So, we just need to read the reference album of the image 268 and associate it with the image file 266.
I have created a software that parse all those information and store them in csv files. One file per table pmp and one file for the faces. If imagemagick is installed (and the convert application is in the path), the software can create thumbshots of all the faces.
How to use the program to parse Picasa database ?
There are in fact 2 programs, one called PMPDB that will convert the pmp tables into csv files and one called PicasaFaces that will create a nice human readable csv with all the face information and the face thumbshots.
Usage:
java -classpath ".:bin/:commons-cli-1.2.jar" PMPDB -folder "/path/to/PicasaDB/Picasa2/db3/" -output ./OutputFolder
java -classpath ".:bin/:commons-cli-1.2.jar:commons-io-2.4.jar" PicasaFaces -folder "/path/to/PicasaDB/Picasa2/db3/" -output ./OutputFolder -replaceRegex C: -replacement /media/HardDrive -convert /path/to/convert(.exe)
If the command line contains the argument -convert, then imagemagick will create all the face thumbshots (in the output folder with a folder for each person). A string replacement of the original image paths can be done if the pictures location is different from the database (in the example “C:” will be replaced by “/media/HardDrive”).
source are available on github : https://github.com/skisoo/PicasaDBReader
Working on Windows and Linux.
It is simpler if you enable ‘Store name tags in photo’ in Picasa Preferences under ‘Name tags’ tab. Then use exiftool to print out tags and look for tags named ‘Region’.
Is there a procedure to download the entire content of the
property fife of picasa images, taps, names, dates, coordinates etc?
The little program I did can extract the data from the pmp files. Names, Dates, tags are included, Coordinates I haven’t checked but should be there as well. The result will be CSV files that you can open with Microsoft Excel or LibreOffice Calc.
Hi!
Thanks a lot for this!
Unfortunately it doesn’t really solve my Problem. What i want to achieve is: Extract all the Album Information. (My wife creates a lot of Albums, linking only the really nice Pictures together) I would then like to create Slideshow-Files for my Twonky-Media Server.
However, the key is to get the information of the Albums out of Picasa. It seems to me, that the Album-Information is stored in the albums_0.db.
Have you also done an extract for the albums_0.db?
Would be great! Thanks in advance!
zap
Unfortunately I don’t have any solution for your problem, I don’t really use albums myself.
Zap,
you can read file “.picasa.ini” from folders where albums are stored (folder where are pictures from album). In those files are all data that you need… One thing that you must do is merge data from different photo folders (different .picasa.ini files).
If at least one file from picture folder is in some album, then in .picasa.ini in that folder, you can find .album section
example:
[.album:cb1c79e3ed1108367efc8f034fe2386d]
name=Album name
token=cb1c79e3ed1108367efc8f034fe2386d
date=2012-04-30T17:49:15+02:00
if picture is in album (check album token) then you can find:
[DSC01936.jpg]
albums=cb1c79e3ed1108367efc8f034fe2386d
I hope it helps. 🙂
bye
Thanks for pointing me to picasa.ini. It can be so easily parsed.
I regret Picasa doesn’t retrieve album information from these files 🙁
Hi, but you can come into trouble, when you have done something with your image, for example did some filter and also if the photo is already marked as star. My example:
…
[IMG_9324.JPG]
filters=unsharp2=1,1.105263;
backuphash=31820
star=yes
albums=35abd060008019fe2a910e152c2ff463
[IMG_9329.JPG]
….
Then it is not so easily parsed 😉
It isn’t so bad to parse. You can use the system function GetPrivateProfileString() to read ini files. Any .picasa.ini that contains an image linked to an album will have a section that starts with [.album: …]. You can retrieve all sections looking for that key. Then you can call it again for each section and get all the keys. If one of the keys is “album” you know the section (an image file) is linked to the album.
“albums_0.db” contains thumbnails for the album. No useful cross-refence info. To recreate album contents, you need to use a combination of thumb_index.db, the albumdata_ files, and the .picasa.ini files.
I am not a java developer and do not know how to compile the sources correctly. Would you add instructions to GitHub and/or this blog entry about how to compile the utility?
Thank you for putting together such a useful utility. I can’t wait to start using it.
Kevin Hall
Hi Kevin,
I have added compilation instructions in the Readme in the git repository on GitHub. Another option is to create a new Java project in Eclipse, import the source and add the 2 libraries used. Let me know if you manage to run the program successfully.
I finally got it to compile. What threw me for a loop was that on my system, I had to use a semi-colon to separate classpath directories whereas your instructions use colons.
Now, when I run ‘Faces’, I get a runtime exception:
C:\picasadbreader>java -classpath “.;bin/;commons-cli-1.2.jar;commons-io-2.4.jar” PicasaFaces -output ./OutputFolder
nb entries: 74008
Exception in thread “main” java.lang.IndexOutOfBoundsException: Index: 73985, Size: 73985
at java.util.ArrayList.rangeCheck(Unknown Source)
at java.util.ArrayList.get(Unknown Source)
at PicasaFaces.gatherImages(PicasaFaces.java:142)
at PicasaFaces.main(PicasaFaces.java:117)
I have a pretty big database. Might that be a problem?
I meant, I have an aweful lot of photos (and thus probably a big database).
I don’t think the size matter much, my own database is 1.7GB (for 110GB of photos).
Maybe Picasa updated their software and changed a little bit the database structure or maybe you use Picasa differently than me (for example I don’t use Albums but I don’t if that has an impact on my program)…
Anyway, in the file PicasaFaces.java at the line 135, try to replace “i<nb” by “i<nb-1” or “i<nb-2”, that will prevent the program to read the last lines. Let me know if that works for you.
I just tried PMPDB, and that worked like a charm. I’m currently looking to see what data is saved there.
Thanks for everything!
Thank you for your help. I did get things to work eventually. Skipping the last 1 or 2 entries still didn’t work. However, skipping 100 entries did. I widdled it down to something occuring in the the 18th to the last entry.
Anyway, the solution I ended up going with was to wrap the following line:
personsId.get(db.imagedata.get(“personalbumid”).get(i));
in a try block. In the new corresponding catch block (that catches all exceptions), I just continued the for loop with a ‘continue’ statement.
I also tried a return statement. It didn’t affect my results. There were only 5 indices that threw exceptions in the last 18 indices, but apparently none of those last non-throwing entries contained any data.
With this, I was able to retrieve the 20000+ known faces in my library. Thank you again for this tool!
If you have any questions about what I did, please let me know.
Kevin
(Sorry, I couldn’t reply to your last message. Perhaps the reply nesting was too deep?)
Great src. Thanks for posting it and thanks for all of the help. I just have one issue that I can’t seem to tackle. I need to identify which photos are starred. The purpose of this is to use Google Chromecast to present a “starred photos only” slideshow on a homemade digital picture frame. Any ideas? Thanks again.
Hi Jason,
I’m sorry I haven’t really look into this topic for a few months. I don’t know if the database shows the starred pictures but If I remember correctly, the database shows the tags. So you can tag all your starred pictures with a special tag and extract them.
Hi !
Thx for all detail. It’s help me a lot !
You made an awesome work !
I’m trying to make my own soft to extract all the picture of a person. I’ve success to read the pmp and thumbindex.db.
But I’ve a problem. I don’t understant the link between the file imagedata and thumbindex.
You say the line x in thumbindex.db file will correspond to the same image as the line x in the table imagedata. But I’ve 163757 entries in thumbindex and only 32513 in imagedata !
How i can make a ling between them. I’m disappointed…
I’ve already see the link that you give. And I’ve not find any kind of answer… Can you explain me ?
Thx !
Hum.. Picasa might have change the content of their files since I wrote the program or maybe you use the software differently. I do not use Albums so I don’t really know how the albums are stored, maybe they are polluting your database.
I’m a piece of porridge…
It’s because each file pmp has his own number of entries. And I’ve used only the number of entries of the first pmp file…
After correction, I’ve the same number of line.
But I don’t understand why, in this case with a cardinality of 1-1, they made two files instead of one iwth all the data…
Thx for all.
You’re welcome 😉
Hi!
All starred files are listed in the “starlist.txt” file.
Alas, you can neither add nor can you remote the star just by editing this file.
Many thanks for the source code!
Whether a picture is starred is also stored in the .picasa.ini file in the same folder as the image itself, which I suspect is the master copy of the information.
Once they have identified which joints and
bones are not moving properly, the chiropractor will then begin to readjust the spine,
joint by joint to help you to start moving more freely.
Finishing an undergraduate course before pursuing another 4-5 years of academic
learning in a chiropractic school, chiropractors study the medical disciplines that any medical doctor does.
Chiropractic treatment is usually in a kind of a
massage.
Hi!
Is it possible to get faces from this database without adding names of these people to picasa ?
I don’t know.
Hi Dan,
Did you find out how to retrieve face suggestions without adding names to picasa?
Thanks,
Saumya
You have to use Picasa in order to be able to extract the data. Else you can use other software just for face recognition.
I stumbled upon your project and it did what I needed: recover face recognition data from raw images with face recognition within Picasa.
To manipulate the data, I however needed to choose a different CSV field separator – in my case the pipe symbol “|” did the trick better than the semicolon “;”. First recommendation is to allow selecting an output field separator for the CSV (even the horizontal tab ‘\t’ would work).
It would be great if your software could also generate / update XMP sidecar files from the Picasa database.
Hi skisoo,
I used your code and it worked perfectly. I had one more requirement and I dont know how to do it. I also want to read from Picasa the face suggestions for an image. Is this possible?
Thanks in advance!
Thanks a lot for your work. I modified the tool in order to blur faces from pictures using Picasa.
https://github.com/AddisMap/PicasaFaceBlur
Thanks for your work.
I have one question. Is there a way to identify the photo (including region), Picasa uses as the thumbnail for the persons album?
I just want to export this thumbnail, not all the identified pictures.
Thanks.
Alex
Hi,
You talk about finding out the association of images to filenames/paths and the faces that they contain.
I am also interested in the association between images and the albums that they are contained in.
Do you have any information in this regard?
Thanks,
Unfortunately I don’t have any solution for your problem, I don’t really use albums myself.
Hi.
I still have some problem with virtual album.
There aren’t in the thumbindex.db. The picture and their physic album are in, not the junction with virtual album.
Where I can find them ?
there are 4 types of “thumbs” (bigthumbs, previews, thumbs and thumbs2).
All works in the same way: Each one consists of 2 files (_0.db and _index.db).
_0.db contains images and _index.db how to retrieve images in the corresponding _0.db.
The _index.db file:
4 bytes: magic constant
4 bytes: not used
after that there 3 series of data:
Frist one (for picasa internal use)
4 bytes: number of entries for the first section
each entries has 4 bytes long.
Second section (the start position on the image in the corresponding _0.db)
4 bytes: Number of entries of the second section (same as the first section)
each entries has 4 bytes long.
Third section (Length of the Image)
4 bytes: Number of entries in the third second (same as the first section)
each entries has 4 bytes long.
With the second and third section you can construct all ‘thumbs’
for each index, seek the position in the ‘_0.db’ (second serie) and read the corresponding Length of bytes (third serie).
convert it to an image.
You can read the corresponding index in “thumbindex.db”.
If the “path” section is not empty, thumb is the image itself.
Otherwise it’s a person: read the “reference” to retrieve the image where the person is,
After that, read the “imagedata_personalbumid” to find index of the person name and in the “imagedata_name” to retrieve the name of the person.
The “index” part of Albums and FaceTemplatesV2 is similar to “thumbs” but until now I dont be able to decode their _0.db
Savez-vous comment Picasa fait la différence entre les “Personnes ignorées” et les “Personnes sans nom”?
Hello,
Thanks for this usefull tool!
I’m not a Java developer and whould know if you know if there is some tools which can generate a CSV export of thumbindex.db file?
Regards
Thanks for the tool. With current Picasa DB your routine for building the image path does not work. In case if somebody is interested in add method get method to class indexes
public String getPath(int index) {
Long folderIndex = new Long(4294967295L);
if (originalIndexes.get(index) == folderIndex)
return “”;
int orgIndex = originalIndexes.get(index).intValue();
return names.get(orgIndex) + names.get(index);
}
In class PicasaFaces.gatherImages l 137 change to
//String path = /*db.indexes.names.get(new Long(db.indexes.indexes.get(i)).intValue()) + */db.indexes.names.get(i);
String path = db.indexes.getPath(i);
Also encapsulate the body of the for loop here with a try catch to handle ArrayOutOfBound exception
Hello,
Nice work!
How do I get all the thumbs as .jpg files? excluding face thumbs. Your tool only extracts faces thumbs, I want exact opposite – need to extract all the image thumbs.
Thanks!
I’m 10+ years behond on this – sorry! I noticed that the lat and long were coming through as date strings “12/30/1899 12:00:00 AM” and realized that both files were presenting type=2, which calls for reading a 7-byte number. However the PicasaDBReader takes the further step of converting it to a date string by subtracting 25569 to get the Unix epoch and then returning it as a string.
I’m not sure there’s a generic way to fix this, but I’m going to basically ask it to return a double when the file is lat or long, and allow it to continue to return date fpr the tagdate file