Enumerating Metadata: Part3 odf files
In the third part of the Enumerating Metadata sequence, we will talk about Open Document Format (ODF) supported by popular document software suites (OpenOffice, LibreOffice, Microsoft Office 2007 and more). ODF are XML-based file formats used to represent new-age electronic documents (spreadsheets, presentations, word documents etc). The standard ODF file is a ZIP commpressed archive containing the appropriate files and directories. The document metadata information is stored in a seperate XML file under the name meta.xml. The types of metadata contained in the file can comprise pre-defined metadata, user defined metadata, as well as custom metadata (like ODF version, Title, Description and more).
The most common filename extensions used for OpenDocument documents are:
.odt
for word processing (text) documents.ods
for spreadsheets.odp
for presentations.odb
for databases.odg
for graphics.odf
for formulae, mathematical equations
A packaged ODF file will contain, at a minimum, six files and two directories archived into a modified ZIP file. The structure of the basic package is as follows
|-- META-INF | `-- manifest.xml |-- Thumbnails | `-- thumbnail.png |-- content.xml |-- meta.xml |-- mimetype |-- settings.xml `-- styles.xml
Important! In case you encrypt your document using a protection password, the meta.xml file is not encrypted and is readable from anyone without knowning the document password. So be careful, password protection does not solve the metadata problem.
We can see that ODF metadata types contain large amount of usable information profiling editors and their software tools. An attacker can gather this kind of information and create a startup point for his exploitation attacks. So it is important for document users to control the information leakage emanated from hidden metadata.
Document software suites, such as OpenOffice and LibreOffice, offer editing options (usually under the path File->Properties) for the metadata types. You can use this feature in order to edit or clean the desired fields. The problem is that the previous method is per file, so if you have a large document database of ODF files that you want to handle, you obvious need an automated tool/script. Because ODF files are zip containers the solution if pretty easy. You can massively delete or update all meta.xml document files using the zip tool and it’s delete/update options. In case you delete the meta.xml from a document be careful, because the next time the document is saved from the relevant software, the XML is recreated with the software’s predefined values for the metadata fields.
I usually do not want any metadata leakage for my documents, so I delete the meta.xml file from the document container. I wrote a simple bash script which delete all meta.xml files from ODF documents under a user specified directory
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | #!/bin/bash #============================================================# # Author : Anestis Bechtsoudis # # Date : 12 May 2011 # # Description: Bash script that removes metadata (meta.xml) # # from ODF (Open Document Format) files used from OpenOffice # #============================================================# SAVEIFS=$IFS IFS=$(echo -en "\n\b") if [ $# -ne 1 ] ; then echo "Usage: $0 [dir]" echo -e "\t[dir]: Directory containing ODF Files" exit fi #===============================================# # Open Document format supports: # # .odt for word processing (text) documents # # .ods for spreadsheets # # .odp for presentations # # .odb for databases # # .odg for graphics # # .odf for formulae, mathematical equations # # # # Remove unwanted filetypes # #===============================================# FILETYPES='(odt)|(ods)|(odp)|(odb)|(odg)|(odf)' # Temp file for search results TMPFILE=/tmp/$0.tmp find $1 -type f | egrep $FILETYPES > $TMPFILE while read line do zip -d $line meta.xml done < $TMPFILE rm $TMPFILE IFS=$SAVEIFS |
In case that you do not want to completely remove meta.xml files, you can write a basic meta.xml template and alter the above script in order update (instead of delete) all the meta.xml from ODF documents. The update can be done using the -f argument of the ZIP tool.
The above approach can be adopted under a Windows OS by writing the relevant batch files.
Useful sources:
A. Bechtsoudis