Enumerating Metadata: Part3 odf files

In the third part of the Enumerating Metadata sequence, we will talk about Open Document Format (ODF) supported by popular document software suites (OpenOffice, LibreOffice, Microsoft Office 2007 and more). ODF are XML-based file formats used to represent new-age electronic documents (spreadsheets, presentations, word documents etc). The standard ODF file is a ZIP commpressed archive containing the appropriate files and directories. The document metadata information is stored in a seperate XML file under the name meta.xml. The types of metadata contained in the file can comprise pre-defined metadata, user defined metadata, as well as custom metadata (like ODF version, Title, Description  and more).

The most common filename extensions used for OpenDocument documents are:

  • .odt for word processing (text) documents
  • .ods for spreadsheets
  • .odp for presentations
  • .odb for databases
  • .odg for graphics
  • .odf for formulae, mathematical equations

A packaged ODF file will contain, at a minimum, six files and two directories archived into a modified ZIP file. The structure of the basic package is as follows

|-- META-INF
|   `-- manifest.xml
|-- Thumbnails
|   `-- thumbnail.png
|-- content.xml
|-- meta.xml
|-- mimetype
|-- settings.xml
`-- styles.xml

 

Important! In case you encrypt your document using a protection password, the meta.xml file is not encrypted and is readable from anyone without knowning the document password. So be careful, password protection does not solve the metadata problem.

 

We can see that ODF metadata types contain large amount of usable information profiling editors and their software tools. An attacker can gather this kind of information and create a startup point for his exploitation attacks. So it is important for document users to control the information leakage emanated from hidden metadata.

Document software suites, such as OpenOffice and LibreOffice, offer editing options (usually under the path File->Properties) for the metadata types. You can use this feature in order to edit or clean the desired fields. The problem is that the previous method is per file, so if you have a large document database of ODF files that you want to handle, you obvious need an automated tool/script. Because ODF files are zip containers the solution if pretty easy. You can massively delete or update all meta.xml document files using the zip tool and it’s delete/update options. In case you delete the meta.xml from a document be careful, because the next time the document is saved from the relevant software, the XML is recreated with the software’s predefined values for the metadata fields.

I usually do not want any metadata leakage for my documents, so I delete the meta.xml file from the document container. I wrote a simple bash script which delete all meta.xml files from ODF documents under a user specified directory

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
#!/bin/bash
 
#============================================================#
# Author     : Anestis Bechtsoudis                           #
# Date       : 12 May 2011                                   #
# Description: Bash script that removes metadata (meta.xml)  #
# from ODF (Open Document Format) files used from OpenOffice #
#============================================================#
 
SAVEIFS=$IFS
IFS=$(echo -en "\n\b")
 
if [ $# -ne 1 ] ; then
  echo "Usage: $0 [dir]"
  echo -e "\t[dir]: Directory containing ODF Files"
  exit
fi
 
#===============================================#
# Open Document format supports:                #
#    .odt for word processing (text) documents  #
#    .ods for spreadsheets                      #
#    .odp for presentations                     #
#    .odb for databases                         #
#    .odg for graphics                          #
#    .odf for formulae, mathematical equations  #
#                                               #
# Remove unwanted filetypes                     #
#===============================================#
FILETYPES='(odt)|(ods)|(odp)|(odb)|(odg)|(odf)'
 
# Temp file for search results
TMPFILE=/tmp/$0.tmp
 
find $1 -type f | egrep $FILETYPES > $TMPFILE
 
while read line
do
  zip -d $line meta.xml
done < $TMPFILE
 
rm $TMPFILE
 
IFS=$SAVEIFS

In case that you do not want to completely remove meta.xml files, you can write a basic meta.xml template and alter the above script in order update (instead of delete) all the meta.xml from ODF documents. The update can be done using the -f argument of the ZIP tool.

The above approach can be adopted under a Windows OS by writing the relevant batch files.
 

Useful sources:

 

 

A. Bechtsoudis