Gathering & Analyzing Metadata Information

Any organization or individual who sends or receives files (documents, spreadsheets, images etc) electronically needs to be aware of the dangers of hidden metadata. Metadata information includes user names, path and system information (like directories on your hard drive or network share), software versions, and more. This data can be used for a brute-force attack, social engineering, or finding pockets of critical data once inside a compromised network. Thwarting an attacker’s attempts to exploit the metadata easily found on your company’s or personal website, in digital documents, and in search-engine caches is hard, if not nearly impossible.

Mass metadata information gathering can be accomplished pretty easily using search engines and their caching features. In this article i will present the use of MetaGoofil & FOCA, two free metadata information gathering & analyzing tools. Using these kind of gathering & analysis tools an attacker can gather large amounts of crucial information about a possible target organization or individual. On the other side, an IT Security Administrator can use these tools in order to locate the metadata information leakage of the organization and prevent or reduce it to a safe level.

 

FOCA

FOCA (Fingerprinting an Organization with Collected Archives) is one of the most popular pen-testing tools for automated gather and extraction of file metadata information developed by Informatica64. FOCA supports all the common document extensions (doc, docx, ppt, pptx, pdf, xls, xlsx, ppsx, etc). FOCA runs on Windows OS and you can download a free version from here. There is also a commercial version available.

FOCA is a pretty powerful tool with a lot of different options, although in this article I want to show how someone would use its basic feature set to search a domain for documents containing metadata. In order to do this you will first need to download and install FOCA and create a new project from the File menu. This project will need to be centered on a particular target domain. Once the project is created FOCA will use a list of search engines to search the domain for particular file types known to contain usable metadata.

 

Here are some screenshots of FOCA in action under a Windows 7 machine.

 

 

MetaGoofil

Metagoofil is an information gathering tool, that can extract metadata out of public documents (pdf, doc, xls, ppt, odp, ods) that are available in targeted websites. It can download all the public documents published in the target website and create an html report page which includes all the extracted metadata. At the end of the report there are listed all the potential usernames and disclosed paths recorded in the gathered metadata information. Using the list of potential usernames, an attacker can prepare a bruteforce attack on running services (ftp, ssh, pop3, vpn etc) and using the disclosed PATHs can make guesses about the OS, network names, shared resources etc.

Metagoofil uses google search engine in order to find documents that are published in the target website. For example, site:example.com filetype:pdf. After locating the file URLs, it downloads the files in a local directory and extract the hidden metadata using the libextractor. Metagoofil is written in python and can be run in any OS that fulfills the libextractor dependency. Depending your OS, you must edit the running script and provide the correct path of the extract binary.

You can download metagoofil from the official site, although google has changed the format of searching queries and the 1.4b version needs some alterations. For more information take a look at the unofficial fix.

Let’s see Metagoofil in action under a linux OS.

 

 

It is pretty obvious that the metadata gathering & extraction is easily accomplished. Recognizing the high security risk of hidden metadata leakage, I began writing a series of articles about metadata information included in different file types. I recently published the first part about exif jpeg metadata and I will continue with details and tools for others too.

 

DISCLAIMER: I’m not responsible with what you do with this info. This information is for educational purposes only.

 

 

A. Bechtsoudis

3 Comments

[...] my article Gathering & Analyzing Metadata Information I empasized the security risk of hidden metadata info of publicly shared documents and how this [...]

[...] Metadata information gathering tools [...]

[...] large amount of usable information profiling editors and their software tools. An attacker can gather this kind of information and create a startup point for his exploitation attacks. So it is important for document users to [...]