Anestis Bechtsoudis » Search Results » enumerating metadata

Enumerating Metadata: Part3 odf files

anestisb — Thu, 12 May 2011 15:44:04 +0000

In the third part of the Enumerating Metadata sequence, we will talk about Open Document Format (ODF) supported by popular document software suites (OpenOffice, LibreOffice, Microsoft Office 2007 and more). ODF are XML-based file formats used to represent new-age electronic documents (spreadsheets, presentations, word documents etc). The standard ODF file is a ZIP commpressed archive containing the appropriate files and directories. The document metadata information is stored in a seperate XML file under the name meta.xml. The types of metadata contained in the file can comprise pre-defined metadata, user defined metadata, as well as custom metadata (like ODF version, Title, Description and more).

The most common filename extensions used for OpenDocument documents are:

.odt for word processing (text) documents
.ods for spreadsheets
.odp for presentations
.odb for databases
.odg for graphics
.odf for formulae, mathematical equations

A packaged ODF file will contain, at a minimum, six files and two directories archived into a modified ZIP file. The structure of the basic package is as follows

|-- META-INF
|   `-- manifest.xml
|-- Thumbnails
|   `-- thumbnail.png
|-- content.xml
|-- meta.xml
|-- mimetype
|-- settings.xml
`-- styles.xml

Important! In case you encrypt your document using a protection password, the meta.xml file is not encrypted and is readable from anyone without knowning the document password. So be careful, password protection does not solve the metadata problem.

We can see that ODF metadata types contain large amount of usable information profiling editors and their software tools. An attacker can gather this kind of information and create a startup point for his exploitation attacks. So it is important for document users to control the information leakage emanated from hidden metadata.

Document software suites, such as OpenOffice and LibreOffice, offer editing options (usually under the path File->Properties) for the metadata types. You can use this feature in order to edit or clean the desired fields. The problem is that the previous method is per file, so if you have a large document database of ODF files that you want to handle, you obvious need an automated tool/script. Because ODF files are zip containers the solution if pretty easy. You can massively delete or update all meta.xml document files using the zip tool and it’s delete/update options. In case you delete the meta.xml from a document be careful, because the next time the document is saved from the relevant software, the XML is recreated with the software’s predefined values for the metadata fields.

I usually do not want any metadata leakage for my documents, so I delete the meta.xml file from the document container. I wrote a simple bash script which delete all meta.xml files from ODF documents under a user specified directory

#!/bin/bash
 
#============================================================#
# Author     : Anestis Bechtsoudis                           #
# Date       : 12 May 2011                                   #
# Description: Bash script that removes metadata (meta.xml)  #
# from ODF (Open Document Format) files used from OpenOffice #
#============================================================#
 
SAVEIFS=$IFS
IFS=$(echo -en "\n\b")
 
if [ $# -ne 1 ] ; then
  echo "Usage: $0 [dir]"
  echo -e "\t[dir]: Directory containing ODF Files"
  exit
fi
 
#===============================================#
# Open Document format supports:                #
#    .odt for word processing (text) documents  #
#    .ods for spreadsheets                      #
#    .odp for presentations                     #
#    .odb for databases                         #
#    .odg for graphics                          #
#    .odf for formulae, mathematical equations  #
#                                               #
# Remove unwanted filetypes                     #
#===============================================#
FILETYPES='(odt)|(ods)|(odp)|(odb)|(odg)|(odf)'
 
# Temp file for search results
TMPFILE=/tmp/$0.tmp
 
find $1 -type f | egrep $FILETYPES > $TMPFILE
 
while read line
do
  zip -d $line meta.xml
done < $TMPFILE
 
rm $TMPFILE
 
IFS=$SAVEIFS

In case that you do not want to completely remove meta.xml files, you can write a basic meta.xml template and alter the above script in order update (instead of delete) all the meta.xml from ODF documents. The update can be done using the -f argument of the ZIP tool.

The above approach can be adopted under a Windows OS by writing the relevant batch files.

Useful sources:

A. Bechtsoudis

Enumerating Metadata: Part2 pdf files

anestisb — Tue, 03 May 2011 00:26:57 +0000

In my article Gathering & Analyzing Metadata Information I empasized the security risk of hidden metadata info of publicly shared documents and how this info can be gathered massively through certain tools. So I begun writing a series of articles in order to analyze the different types of file metadata and what tools can someone use in order to view and edit/remove them. In the first part, I analyzed the case of exif jpeg metadata and in this article I will continue with the famous Portable Document Format (PDF) file, presenting the appropriate tools to handle the metadata information.

We all use PDF files due to professional or personal needs of document sharing with others. PDF metadata is usually populated by PDF converting applications and might expose undesirable information to third-parties. Especially after the adoption of XMP (after version 1.6) in PDF metadata, there has been an increase in the available hidden information fields. Adobe Acrobat Pro offers an extended editor in order to edit metadata fields, but the Adobe Reader and many other editors and converters do not. Some of the metadata information fields are:

AdHocReviewCycleID
Appligent
Author
AuthorEmail
AuthorEmailDisplayName
Company
CreationDate
Creator
EmailSubject
Keywords
ModDate
PreviousAdHocReviewCycleID
Producer
PTEX.Fullbanner
SourceModified
Subject
Title

There exist a lot of tools that can extract/edit/remove PDF metadata information, but I prefer to use open source tools. So I will analyze the use of the PDF Toolkit (pdftk) under a linux environment. PDFTk does not require Acrobat and can run under Windows, Linux, Mac OS X, FreeBSD and Solaris systems. PDF Toolkit has many features but in this article I will cover the ones that we need for metadata manipulation.

Initially you will have to install pdftk using your distribution’s package manager or by compiling the sources.

In order to extract metadata information from a pdf file you can use the dump_data option as follows:

$pdftk file.pdf dump_data
InfoKey: Creator
InfoValue: PScript5.dll Version 5.2.2
InfoKey: Title
InfoValue: Microsoft Word - Ergastiriaki_Askisi_2011.doc
InfoKey: Author
InfoValue: Administrator
InfoKey: Producer
InfoValue: GPL Ghostscript 8.15
InfoKey: ModDate
InfoValue: D:20110406122119
InfoKey: CreationDate
InfoValue: D:20110406122119
PdfID0: bb8f9ac70cc66e8cabecb4144806f
PdfID1: bb8f9ac70cc66e8cabecb4144806f
NumberOfPages: 3

In order to edit metadata fields you have to extract metadata into a file, edit the desired values in the file and then update the pdf by importing the edited metadata file.

To extract metada to file use the output option:

$pdftk file.pdf dump_data output pdf-metada

Using your preferred text editor, you can edit the pdf-metadata InfoValues (I prefer to leave every field blank). Then you can update the initial file using the edited metadata file.

$pdftk file.pdf update_info pdf-metadata output no-metadata.pdf

In order to automate the above steps, I have wrote a simple script to work in a whole directory containing pdf files.

#!/bin/bash
SAVEIFS=$IFS
IFS=$(echo -en "\n\b")
 
if [ $# -ne 2 ] ; then
        echo "Usage: $0 [dir] [meta-file]"
        echo -e "\t[search_dir]"
        echo -e "\t\tDirectory with pdf files"
        echo -e "\t[metafile]"
        echo -e "\t\tFile containing desired metadata"
        exit
fi
 
PDFTK="/usr/bin/pdftk"
SOURCEDIR="$1"
METAFILE="$2"
PDFTMPFILE="/tmp/temp.pdf"
 
for i in $( find $SOURCEDIR -type f -name "*.pdf" ); do
  cp $i $PDFTMPFILE
  $PDFTK $PDFTMPFILE update_info $METAFILE output $i
  rm $PDFTMPFILE
done
 
IFS=$SAVEIFS

And here is a clean metadata file that you can use:

InfoKey: Author
InfoValue:
InfoKey: Company
InfoValue:
InfoKey: CreationDate
InfoValue:
InfoKey: Creator
InfoValue:
InfoKey: ModDate
InfoValue:
InfoKey: Producer
InfoValue:
InfoKey: SourceModified
InfoValue:
InfoKey: Title
InfoValue:

A. Bechtsoudis

Gathering & Analyzing Metadata Information

anestisb — Sun, 01 May 2011 23:14:55 +0000

Any organization or individual who sends or receives files (documents, spreadsheets, images etc) electronically needs to be aware of the dangers of hidden metadata. Metadata information includes user names, path and system information (like directories on your hard drive or network share), software versions, and more. This data can be used for a brute-force attack, social engineering, or finding pockets of critical data once inside a compromised network. Thwarting an attacker’s attempts to exploit the metadata easily found on your company’s or personal website, in digital documents, and in search-engine caches is hard, if not nearly impossible.

Mass metadata information gathering can be accomplished pretty easily using search engines and their caching features. In this article i will present the use of MetaGoofil & FOCA, two free metadata information gathering & analyzing tools. Using these kind of gathering & analysis tools an attacker can gather large amounts of crucial information about a possible target organization or individual. On the other side, an IT Security Administrator can use these tools in order to locate the metadata information leakage of the organization and prevent or reduce it to a safe level.

FOCA

FOCA (Fingerprinting an Organization with Collected Archives) is one of the most popular pen-testing tools for automated gather and extraction of file metadata information developed by Informatica64. FOCA supports all the common document extensions (doc, docx, ppt, pptx, pdf, xls, xlsx, ppsx, etc). FOCA runs on Windows OS and you can download a free version from here. There is also a commercial version available.

FOCA is a pretty powerful tool with a lot of different options, although in this article I want to show how someone would use its basic feature set to search a domain for documents containing metadata. In order to do this you will first need to download and install FOCA and create a new project from the File menu. This project will need to be centered on a particular target domain. Once the project is created FOCA will use a list of search engines to search the domain for particular file types known to contain usable metadata.

Here are some screenshots of FOCA in action under a Windows 7 machine.

MetaGoofil

Metagoofil is an information gathering tool, that can extract metadata out of public documents (pdf, doc, xls, ppt, odp, ods) that are available in targeted websites. It can download all the public documents published in the target website and create an html report page which includes all the extracted metadata. At the end of the report there are listed all the potential usernames and disclosed paths recorded in the gathered metadata information. Using the list of potential usernames, an attacker can prepare a bruteforce attack on running services (ftp, ssh, pop3, vpn etc) and using the disclosed PATHs can make guesses about the OS, network names, shared resources etc.

Metagoofil uses google search engine in order to find documents that are published in the target website. For example, site:example.com filetype:pdf. After locating the file URLs, it downloads the files in a local directory and extract the hidden metadata using the libextractor. Metagoofil is written in python and can be run in any OS that fulfills the libextractor dependency. Depending your OS, you must edit the running script and provide the correct path of the extract binary.

You can download metagoofil from the official site, although google has changed the format of searching queries and the 1.4b version needs some alterations. For more information take a look at the unofficial fix.

Let’s see Metagoofil in action under a linux OS.

It is pretty obvious that the metadata gathering & extraction is easily accomplished. Recognizing the high security risk of hidden metadata leakage, I began writing a series of articles about metadata information included in different file types. I recently published the first part about exif jpeg metadata and I will continue with details and tools for others too.

DISCLAIMER: I’m not responsible with what you do with this info. This information is for educational purposes only.

A. Bechtsoudis

Enumerating Metadata: Part1 jpeg files

anestisb — Sat, 30 Apr 2011 18:26:46 +0000

During the information gathering and reconnaissance phases potential intruders spend a great deal of time learning everything they can about their targets before they launch an attack. The gathered information is often crucial in order to find a weakness in a system or network and users participating in them. Hackers can gather useful information by examining the Metadata (data about data) content of files that are used by the victim user, system or network. In a try to enumerate all the possible Metadata leakage cases, i will write a series of articles covering different filetypes & file groups. In the first part i examine the exif jpeg metadata and how to handle them.

When you take a picture with your cell phone, digital camera & other relative devices, there is a lot more information than just the picture that is stored in the file. Depending on the device you use the metadata information stored may include:

Time and date picture was taken
Camera make and model
Integral low-res Exif thumbnail
Shutter speed
Camera F-stop number
Flash used (yes/no)
Distance camera was focused at
Focal length and calculate 35 mm equivalent focal length
Image resolution
GPS info (if device is not configured properly)
IPTC header
XMP data

Spying users and hackers may take advantage of this kind of information in order to create a starting point for their malicious activities. For example using the GPS info and the type of camera you have, the attacker may create a targeted promotional malicious email in order to compromise your system. Or third persons can gather information about you (vacation places & time, place living etc) through the images you have published in social networks & blogs.

There exist a lot of tools that are capable of reading/altering & removing the jpeg metatada information. In this article i will use a cross-platform tool named jhead and run it under a linux environment.

Download jhead source code (or binary) from the official site and compile using the make file (ignore compiling if you download a binary). Now lets see an example output:

$jhead /tmp/IMG_0247.JPG
File name    : /tmp/IMG_0247.JPG
File size    : 866859 bytes
File date    : 2008:03:10 22:48:36
Camera make  : Canon
Camera model : Canon PowerShot A460
Date/Time    : 2008:03:10 16:53:09
Resolution   : 2048 x 1536
Flash used   : No (auto)
Focal length :  5.4mm  (35mm equivalent: 39mm)
CCD width    : 4.93mm
Exposure time: 0.013 s  (1/80)
Aperture     : f/2.8
ISO equiv.   : 80
Whitebalance : Auto
Metering Mode: pattern

You can see that metadata information can be gathered pretty easily. In the worst case scenario you may have a GPS location leakage as seen in the screenshots at the end of the post.

Now let’s see how can we prevent this kind of leakage by removing metadata information. In the simplest scenario we can remove all metadata information using jhead. This can be done with the purejpg argument:

$jhead -purejpg /tmp/IMG_0247.JPG

Resulting in:

$jhead /tmp/IMG_0247.JPG
File name    : /tmp/IMG_0247.JPG
File size    : 857529 bytes
File date    : 2011:04:30 20:58:25
Resolution   : 2048 x 1536

Jhead offers a lot more options in order make more advanced changes in jpeg headers, including comment insertion, date & time changes, thumbnail changes, copy exif header from other image and more. Type -h to see the available options or read the official documentation. Additionally jhead can operate on a directory too, making mass changes to all jpeg files in that directory.

And some execution screenshots:

To be continued…..

A. Bechtsoudis