docx file returning as application/zip files

Locked
shawnzam
User
Posts:2
Joined:Fri Sep 28, 2012 2:01 pm
Are you a spambot?:no
docx file returning as application/zip files

Post by shawnzam » Mon Nov 19, 2012 6:52 pm

I am seeing some strange behavior with .doc files. They return a mime-type of 'application/octet-stream' and download fine on all browsers, but .docx files return a mime-type of 'application/zip' and download fine on all browsers expect IE.

Any ideas?

S

James Heinrich
getID3() v1 developer
Posts:1477
Joined:Fri May 04, 2001 4:00 pm
Are you a spambot?:no
Location:Northern Ontario, Canada
Contact:

Re: docx file returning as application/zip files

Post by James Heinrich » Mon Nov 19, 2012 11:10 pm

That's not unexpected. The newer (Office 2007+) "file format" is really a zip file with a bunch of XML files inside. If you rename a .docx (or .xlsx, or .pttx) file to .zip you can open it with your favourite compression program and see the file structure.
http://en.wikipedia.org/wiki/Office_Open_XML

shawnzam
User
Posts:2
Joined:Fri Sep 28, 2012 2:01 pm
Are you a spambot?:no

Re: docx file returning as application/zip files

Post by shawnzam » Mon Nov 19, 2012 11:27 pm

Thanks James. The only problem with this expected behavior is that when file when downloaded, is treated as if it is a zip file. WinZip and the link then tried to open the file, rather than MS Word.
Any ideas? Or should my fix exist outside of this library?

James Heinrich
getID3() v1 developer
Posts:1477
Joined:Fri May 04, 2001 4:00 pm
Are you a spambot?:no
Location:Northern Ontario, Canada
Contact:

Re: docx file returning as application/zip files

Post by James Heinrich » Tue Nov 20, 2012 12:01 am

I'm not sure if getID3 even should return anything other than application/zip for these files, since that's fundamentally what they are. However, if they can be reasonably identified as a more-specific "subspecies" of zip file, then that is probably even better. I'll move this thread to Bug Reports and ponder that question around the time of a next-version release. Most likely I will alter module.archive.zip.php something similar to the below to include detection of Office Open XML files and return better values for mime_type.

In the mean time, I'd suggest you handle it on your end, perhaps something like this:

Code: Select all

$ThisFileInfo = $getID3->analyze($filename);
if ((@$ThisFileInfo['fileformat'] == 'zip')
	&& !empty($ThisFileInfo['zip']['files']['[Content_Types].xml'])
	&& !empty($ThisFileInfo['zip']['files']['_rels']['.rels'])
	&& !empty($ThisFileInfo['zip']['files']['docProps']['app.xml'])
	&& !empty($ThisFileInfo['zip']['files']['docProps']['core.xml'])
) {
	// http://technet.microsoft.com/en-us/library/cc179224.aspx
	if (!empty($ThisFileInfo['zip']['files']['ppt'])) {
		$ThisFileInfo['mime_type'] = 'application/vnd.openxmlformats-officedocument.presentationml.presentation';
	} elseif (!empty($ThisFileInfo['zip']['files']['xl'])) {
		$ThisFileInfo['mime_type'] = 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet';
	} elseif (!empty($ThisFileInfo['zip']['files']['word'])) {
		$ThisFileInfo['mime_type'] = 'application/vnd.openxmlformats-officedocument.wordprocessingml.document';
	}
}
This skims over the intricacies of the many possible MIME types (http://technet.microsoft.com/en-us/libr ... 79224.aspx) but should suffice.

James Heinrich
getID3() v1 developer
Posts:1477
Joined:Fri May 04, 2001 4:00 pm
Are you a spambot?:no
Location:Northern Ontario, Canada
Contact:

Re: docx file returning as application/zip files

Post by James Heinrich » Wed Feb 20, 2013 5:38 pm

Included as part of v1.9.5

Locked