Techniques¶

Several techniques are used to extract all data and metadata hidden in digital images. They are briefly described in this chapter.

MIME information¶

Multipurpose Internet Mail Extensions (MIME) is a standard to describe content type of a file, MIME is detected using magic number inside the image. Magic numbers implement strongly typed data and are a form of in-band signaling to the controlling program that reads the data type(s) at program run-time. Many files have such constants that identify the contained data. Detecting such constants in files is a simple and effective way of distinguishing between many file formats and can yield further run-time information. The image MIME type is detected to know the image type your are dealing with, in both contacted (example: image/jpeg) and extended form.

Metadata information¶

Metadata may be written into a digital photo file that will identify who owns it, copyright and contact information, what camera created the file, along with exposure information and descriptive information such as keywords about the photo, making the file searchable on the computer and/or the Internet. Some metadata are written by the camera and some is input by the photographer and/or software after downloading to a computer. Metadata are divided in several categories depending on the standard they come from. The following categories are extracted and analyzed:

EXIF metadata extraction

Standard Exif tags

Canon MakerNote tags

Fujifilm MakerNote tags

Minolta MakerNote tags

Nikon MakerNote tags

Olympus MakerNote tags

Panasonic MakerNote tags

Pentax MakerNote tags

Samsung MakerNote tags

Sigma/Foveon MakerNote tags

Sony MakerNote tags

IPTC metadata extraction

IPTC datasets

XMP metadata extraction

Dublin Core schema (dc)

XMP Basic schema (xmp)

XMP Rights Management schema (xmpRights)

XMP Media Management schema (xmpMM)

XMP Basic Job Ticket schema (xmpBJ)

XMP Paged-Text schema (xmpTPg)

XMP Dynamic Media schema (xmpDM)

Adobe PDF schema (pdf)

Photoshop schema (photoshop)

Camera Raw schema (crs)

Exif schema for TIFF Properties (tiff)

Exif schema for Exif-specific Properties (exif)

Exif schema for Additional Exif Properties (aux)

IPTC Core schema (Iptc4xmpCore)

IPTC Extension schema (Iptc4xmpExt)

PLUS License Data Format schema (plus)

digiKam Photo Management schema (digiKam)

KDE Image Program Interface schema (kipi)

Microsoft Photo schema (MicrosoftPhoto)

iView Media Pro schema (mediapro)

Microsoft Expression Media schema (expressionmedia)

Microsoft Photo 1.2 schema (MP)

Microsoft Photo RegionInfo schema (MPRI)

Microsoft Photo Region schema (MPReg)

Metadata Working Group Regions schema (mwg-rs)

Preview thumbnail extraction¶

Most digital camera and phones write a preview, called thumbnail, in image metadata. The thumbnails and data related to them are extracted from image metadata and stored for review.

Preview thumbnail consistency¶

Sometimes when a photo is edited, if the image editing software does not support image preview, the original image is edited but the thumbnail not. A simple comparison between the original image and the thumbnail could detect image edits.

GPS Localization¶

Embedded in the image metadata sometimes there is a geotag, a bit of GPS data providing the longitude and latitude of where the photo was taken. Geotagging is when a device such as an iPhone, Android smartphone or digital camera stores your location or geographical information, such as your GPS coordinates, within a photo. A geotagged photograph is a photograph which is associated with a geographical location by geotagging. Geotags are useful in helping people find a wide variety of location-specific information. For example, one can find images taken near a given location by entering latitude and longitude coordinates into a suitable image search engine. The geotag inside image metadata is read and the position where the photo was taken is displayed on a map.

ELA (Error Level Analysis)¶

Error Level Analysis (ELA) is a technique aimed to detect if an image is edited or not. It can be applied to compressed images, i.e. JPEG or PNG. The main idea is that an image in his original form has unique levels of compression. The analyzed image is resaved and differences in compression levels are calculated, if differences are detected a probability of edits is high. Ghiro calculates error levels and detects differences between them.

Hash digest generation¶

Most common hash are calculated for the image, to create an unique identifier of it. The calculated hashes are:

CRC32

MD5

SHA1

SHA224

SHA256

SHA384

SHA512

Hash list matching¶

Suppose you are searching for an image and you have only the hash. You can provide a list of hashes and all images matching are reported.

Strings extraction¶

All text strings contained in the analyzed image are extracted, like in the unix strings tool. The more interesting (i.e. URLs) are highlighted.

Signature engine¶

Signature provides evidence about most critical data to highlight focal points and common exposures. Signature engine to highlight common exposure on over 120 signatures