OSINT from images' metadata hosted on websites

TL;DR; Images hosted by websites contains numerous metadata fields depending on their filetype (JPG, PNG…). These fields include interesting information for reconnaissance purposes such as: names, telephone numbers, email addresses or URLs. Often, website editors do not strip the images hosted on their websites, making leaks of information possible.

Introduction

Imagine you are part of a red team and your task is to penetrate inside a company’s perimeter. First, you want to gather intelligence. That company has a website presenting their activity, services, team etc. The team page happen to have a photo of every member with an alternative text.

You may then wonder:

  • Who was hired when?
  • What is the email scheme of the company? Is it john.doe@company.com? Or jdoe@company.com?
  • Is there a way to get more information on the team members than their names and faces?

Now imagine these pictures were given by each member of the team when they were hired and put on the website “as-is”. One could retreive all of it and have a look at the metadata contained in the image files.

The results could give insight on each member, ranging from the approximate date they were hired to their full name, what software they used, their email address or phone number.

Metadata of interest

Several image file formats (e.g. JPG, PNG…) include fields of interest for an attacker doing reconnaissance, including:

  • First and lastnames: for targeting people via phishing, social engineering, building a dictionnary or simply having a business directory
  • Email addresses: for phishing
  • Phone numbers: for social engineering
  • Postal addresses: for intelligence or identity theft
  • Locations: for intelligence
  • Software and platforms: for targeting exploits in later phases
  • Dates and times: for intelligence

Additional fields may also contain either different information or context-specific data:

  • Comments
  • Descriptions
  • Copyrights

Exiftool

According to Exiftool’s website:

ExifTool is a platform-independent Perl library plus a command-line application for reading, writing and editing meta information in a wide variety of files.** ExifTool supports many different metadata formats including EXIFGPSIPTCXMPJFIFGeoTIFFICC ProfilePhotoshop IRBFlashPixAFCP and ID3Lyrics3, as well as the maker notes of many digital cameras by CanonCasioDJIFLIRFujiFilmGEGoProHPJVC/VictorKodakLeafMinolta/Konica-MinoltaMotorolaNikonNintendoOlympus/EpsonPanasonic/LeicaPentax/AsahiPhase OneReconyxRicohSamsungSanyoSigma/Foveon and Sony.

It is the de-facto standard tool to extract such information from image files. Their field names conventions are used in the rest of this article.

Names

First and last names are usually stored in either dedicated fields such as:

XMP:Author
XMP:Creator
EXIF:Artist
XMP:Source
IPTC:Source
IPTC:Writer-Editor
IPTC:By-line

Or in copyrights fields:

EXIF:Copyright
IPTC:CopyrightNotice
IPTC:Credit
XMP:Credit
IPTC:Credit

Example from a French newspaper (Le Monde) image extracted from their website (www.lemonde.fr):

names

This “Cyril” is likely a photograph for the newspaper.

Emails

Emails have dedicated fields:

XMP:CreatorWorkEmail

They could also be in the same copyrights fields than the “names” category. If we take back our newspaper image:

email

We get the email address of Cyril but the needed information could also have been the email convention (firstname.lastname@gmail.com) or the email domain name used.

Phone numbers

Phone number fields exists and are occasionaly filled, but most images do not include it:

XMP:CreatorWorkTelephone

Example:

phone

Postal addresses

At lease two postal addresses are interesting in images metadata:

  • The creator address
XMP:CreatorAddress
XMP:CreatorCity
XMP:CreatorCountry
XMP:CreatorPostalCode
XMP:CreatorRegion
  • The address where the picture was taken
IPTC:City
IPTC:Province-State
IPTC:Country-PrimaryLocationName
IPTC:Country-PrimaryLocationCode
IPTC:Sub-location

Example:

address

Cyril likely took the picture in Paris, France.

Locations

Metadata can include GPS coordinates in addition to postal addresses:

EXIF:GPSLatitude
EXIF:GPSLongitude
EXIF:GPSAltitude
EXIF:GPSLatitudeRef
EXIF:GPSLongitudeRef
EXIF:GPSAltitudeRef

Unfortunately, Cyril’s photo do not have location information.

Software and plateforms

This is the information the most commonly found in images as it is automatically filled by the producing/editing software:

XMP:HistorySoftwareAgent
XMP:Platform
XMP:CreatorTool
XMP:HistorySoftwareAgent
EXIF:Software
IPTC:OriginatingProgram

It contain useful information that can be used when trying to find exploits on machines that were used to upload the images files:

sw

Cyril is using Photoshop version 10.4 on a Macintosh.

Dates

There are multiple dates and timestamps for an image, which is relevant is up to what information you are trying to obtain. Image creation date may not interest you but modification could etc.

File:FileModifyDate
File:FileAccessDate
File:FileInodeChangeDate
ICC_Profile:ProfileDateTime
PNG:Datecreate
PNG:Datemodify
PNG:ModifyDate
XMP:MetadataDate
XMP:ModifyDate
XMP:CreateDate
IPTC:DateSent
IPTC:TimeSent
IPTC:DateCreated
IPTC:TimeCreated

Example:

The article in which Cyril’s photo serve as an illustration is pretty recent.

Others

Several other fields may contain information classified in above categories or extra information:

EXIF:ImageDescription
XMP:Description
EXIF:UserComment
IPTC:ObjectName
IPTC:Keywords
IPTC:Headline
IPTC:Caption-Abstract
IPTC:SubjectReference
IPTC:SupplementalCategories
IPTC:SpecialInstructions

For example, in our newspaper image, it basically describes the scene also with people names involved in it:

misc

Here the scene is taken at a governemental location (Matignon) and several political people were present.

Note: I am growing the above field list as I test websites, they may be fields of interest not in it yet.

Automatisation

Getting all images from a web page and analyzing them is a fastidious task that can be automatized. I made a small script to extract and analyze images from a website that can be found here: https://github.com/jeffbencteux/webimginfo

It heavily relies on open-source libraries (beautifulsoup4 and PyExifTool) and basically consist in gluing them together in a loop. It takes an URL, try to get all the <img> tags and associated images from it and parse them using exiftool. It then only display the fields of interest described previously.

Enjoy and strip your images.

References

Thanks to Cyril for supplying good examples if he ever reads these lines.