Pitfalls of HTML Encryption
Michael observed the encrypted contents of the webpage with an immediate curiosity.
"Ars you can see Michael, we use HTML Code Guard to protect ur’ downloads page and to prevent spidurs’ and other automated programs from dawnloadin’ all ur’ products."
Michael tried to imagine how the developer across the phone looked. His voice was high-pitched, with a thick distinctive Irish accent.
The idea behind HTML Encryption is to prevent source code theft and to limit information leakage regarding the applications layout and structure.
HTML Encryption applications are designed to protect and secure HTML code. This post discusses a brief introduction to some of the weaknesses and pitfalls associated with HTML Encryption.
A snippet of encrypted HTML code (1 link) can be seen below:
<meta name="generator" content="HTML Code Guard" /><meta http-equiv="expires" content="0" /><script language="JavaScript"></script>
<!--<br /-->var d="b=~6,98a|6**.doo15;6=92:='p/,7|`*9+*bo=`";var fcrc="5DC10B9C";function dc(e){var ds="";e=e.toUpperCase();for(i=0;i<e.length;i+=2){ds+=unescape("%">
72657475726e206e6e3b7d66756e6374696f6e20686578286e756d297b7661722048657843686172732
03d202230313233343536373839414243444546223b76617220486578537472203d2022223b6e756d3d
61626e286e756d293b6966286e756d3d3d30292072657475726e20223030223b7768696c65286e756d3
e30297b486578537472203d2048657843686172732e636861724174286e756d25313629202b20486578
5374723b6e756d203d204d6174682e666c6f6f72286e756d2f3136293b7d72657475726e204865785374
723b7d66756e6374696f6e2043616c63435243333228737472297b766172206c696d69743d2d33303636
37343931322c206372632c204352435461626c653d6e657720417272617928293b666f7228693d303b693
c3d3235353b692b2b297b6372633d693b666f72286a3d303b6a3c3d373b6a2b2b297b6966286372632026
2031297b637263203d202828286372632026202d3229202f2032292026203231343734383336343729205
e206c696d69743b7d656c73657b637263203d2028286372632026202d3229202f203229202620323134373
438333634373b7d207d4352435461626c655b695d3d6372633b7d766172206372633d2d313b666f7228693
d303b693c7374722e6c656e6774682d313b692b2b297b637263203d202828286372632026202d32353629
202f2032353629202620313637373732313529205e20284352435461626c655b286372632026203235352
9205e207374722e63686172436f646541742869295d293b7d637263203d20637263205e202d313b6372633
d68657828637263293b6372633d6372632e746f55707065724361736528293b72657475726e2
</e.length;i+=2){ds+=unescape("%">
As seen above, its like finding a needle in a haystack. The encryption program above, utilises what I call the JavaScript chaos approach (as many do). This means the original code is pushed, obfuscated, pushed some more etc to prevent reverse engineering. When it is complete we are left with what you see above, a mess. Although, this is not readable to humans, it is obviously readable by the browser, it has to be, or the browser could never render it; however, this particular encryption application must have been designed and tested around Internet Explorer, as Firefox dies and enters an infinite loop when its JavaScript engine attempts to render this. This is our first pitfall when utilising these programs: The large amount of JavaScript obfuscation may cause unexpected results with various browsers.
Second, without attempting to reverse enginner the code (to make it normal HTML once again), we can simply use the browser DOM (Document Object Model) to pry open the websites contents. It is trivial to retrieve images, links, CSS, forms, tables etc. For example, to retrieve the first link on the page, we could simply do this from the navigation bar:
javascript:var x=document.getElementsByTagName('A');alert(x[0].href);
The outcome:

This was me playing around for 10 minutes (literally). In summary… erm.. need I say more.

Unless the site encrypts it’s code using a password entered by the user, then it’s still not optimal, but better.
I guess there are two aspects here, first, the end-user security perspective (Nick, I think this is what you are referring to), second, the encryption of the actual site content to prevent source code theft and information leakage.
Their are advanced HTML encryptor on the Internet, we ‘Cromer High School’ purchased http://www.htmlblock.co.uk and used their pre made software on our web site which works very well. Even better on our Intranet to divide teacher and student from accessing each others files. I think HTML Encryption can be a very powerful tool for any webmaster.
Sarah, well as I mentioned in my post (see my proof of concept picture above), it is still possible to get around HTML Encryption in many cases it all depends how its implemented. If its used alongside actual encryption with keys (or passwords) then chances are its alot more secure. If its simple obfuscation (hidden) then its easily reversible.
Hmmm htmlblock:
//
I cannot find answer to one simple question – how a HTML encryption effects contextual advertizing i.e. Google AdSense and Google PR?