Musings from Mars Banner Image
For Software Addicts: Yes!MaybeNah!
Mars Report:

WebArchive Folderizer: Great Way To “Unpack” WebArchived Pages

Published February 13th, 2006

WebArchive Folderizer: A drop-launch utility to extract the content from Safari webarchives.

Webarchive Folderizer FreewareDownloaded 2/13/06. If this does what it says, it’s a utility that I definitely will be keeping! Webarchives are cool, but there’s nothing like being able to get at all the individual pieces of a web page.

Update 6/11/06

Yes, indeedy! This is a must-have utility for anyone who wants or needs to view and save all the bits and pieces that make up today’s complicated web pages. Safari’s .webarchive format is an amazing critter… much better, in my limited testing on the Mac, than the Windows-only .MHTML format. The Windows format is designed primarily for transmitting a webpage through email, so it seems to only capture the visual parts of a page. The .webarchive format saves everything… including all the javascript and CSS files. It also preserves the directory structure of the page, which I suspect the .MHTML format does not. What really amazed me in my testing tonight was when I .webarchived the Musings from Mars home page… opening it up in Safari as a static .webarchive file still let me work the page as usual… all the JavaScript functions worked, all the Ajax loaded as expected… Everything! Opera can save .MHT files on the Mac, but when I opened the .MHT version of Musings, it left quite a bit to be desired. If I have time, I’ll report more fully on both of these formats, since there seems to be a dearth of information about Safari’s format on the web. I found an article on .MHT in Wikipedia, but nothing about .Webarchive.

Needless to say, the .webarchive format is a great tool for Mac users, and this handy freeware utility lets you unarchive the files quickly and easily. Just drag the Webarchive file onto the Folderizer, and it creates a folder in place next to the original file. The folder contains the folder structure and all the graphics, HTML, CSS, JavaScript, and any other files that were included with the page. In some cases, the file naming may be somewhat different from the original, but only if the original files were dynamically generated. For example, I .webarchived a page from Fluxiom, and amazingly, Safari preserved even the state of the dynamic content I was viewing. Since Fluxiom delivers all of its JavaScript and CSS files dynamically (to control caching on the client), Folderizer renamed them. But what really amazed me was how well it handled my home page! The webarchive had multiple layers of directories, given the nature of WordPress sites, and Folderizer preserved them all precisely… putting plugin files in their proper folder in the directory tree (see screenshot).

Pod Util Software

WebArchive Folderizer has the barest of bare-bones user interfaces. It doesn’t have a pretty icon (or any icon!), and it has no built-in help page. But even without those things, this software now has a nice home in my Utilities folder. Incidentally, the shareware File Juicer can also handle .webarchive files, but in my quick tests tonight, it failed to extract the CSS and Javascript files, and didn’t preserve the archive’s directory structure. (File Juicer did a better job with the .MHT files saved from Opera.

    
  • del.icio.us
  • Google
  • Slashdot
  • Technorati
  • blogmarks
  • Tumblr
  • Digg
  • Facebook
  • Mixx

Show Comments
Just Say No To Flash