Hacker News new | past | comments | ask | show | jobs | submit login
New Statistical De-minifier and De-obfuscator for JavaScript (jsnice.org)
173 points by mvechev on June 2, 2014 | hide | past | favorite | 33 comments



I took the code presented and put it into packer - http://dean.edwards.name/packer/ , and the "nice" output was not very helpful in that it still looked obfuscated. Maybe I'm misunderstanding something.

So the input to js nice was the packed generateSeries function:

eval(function(p,a,c,k,e,r){e=function(c) {return(c<a?'':e(parseInt(c/a)))+((c=c%a)>35? String.fromCharCode(c+29):c.toString(36))};if(!''.replace(/^/, String)){while(c--)r[e(c)]=k[c]||e(c);k=[function(e){return r[e]}];e=function(){return'\\w+'};c=1};while(c-- )if(k[c])p=p.replace(new RegExp('\\b'+e(c)+'\\b','g'),k[c]);return p}('4 3(e){2 t=[];2 n=f+e;2 r=6+e;k(i=1;i<=6;i++){2 s=5.y(5.a()*(r- n+1)+n);t.b([i,s]);n++;r++}c t}$(d).B(4(){2 e=3(0);2 t=3(g);$.h($("#j"),[{7:"l",8:e},{7:"o",8:t}],{p:{q:{u: ["#v","#w"]}},x:{9:z},A: {9:m}})})',38,38,'||var|generateSeries|function|Math|200|label |data|ticks|random|push|return|document||100|300|plot||flotcon tainer|for|data1|10||data2|grid|backgroundColor||||colors|D1D1 D1|7A7A7A|xaxis|floor|20|yaxis|ready'.split('|'),0,{}))


The various jsBeautifiers special-case the eval() obfuscations, and in particular packer.

I think the scope of this tool is to annotate and de-uglify js code without changing the logic, so you get a more readable version of the routine that generates the eval()'d code, that sounds right to me.

EDIT: btw, can you please put the code in a code box, it messes with the layout. Mods: this is happening frequently, is it a bug?


Mods could also just change CSS a bit by adding:

  span.comment {
    display: block;
    width: 800px;
  }
Of course, it should be checked on more pages, but it works here... (using FF)


Yes, it is a bug and on our list to fix. In the meantime, we fix it manually when we see it. It would be helpful to fire a note to hn@ycombinator.com when you notice it.


Thanks! Will do.


Noticed that too, was kind of happy about it. For most projects I use uglify or Google Closure Compiler and the like, which it seems to do a good job with. When I really want something obfuscated to make it as difficult as possible for someone to figure out what's going on behind the scenes though I use packer and am glad this tool couldn't figure out what to do with it.


http://jsbeautifier.org/ has no problem deobfuscating that.


It's easy to depack even the packer's "Base 64 encode", just replace the eval function with a call to console.log for example, and you will have your original code.

Jsnice doesn't exceutes eval as it's doing statical analysis ie. without executing the programs.


About: http://www.srl.inf.ethz.ch/jsnice.php

Apparently this is a machine learning research project.

It seems that this would be quite useful for looking at the code of closed-source webapps. Would love to see it open-sourced or available as a service (via an API perhaps). Think auto-deminifying browser extension.


On average, more than 60% of the identifiers are recovered to the same name as before the minification process.

I wonder how much of that 60% comes from common libraries like jQuery or Underscore. Still, a neat project.


We tried to evaluate on data that is as much as possible independent from the training data. So we evaluate on projects outside of github.


So, I saw this and wanted to see if it could decode output of something I saw earlier: http://patriciopalladino.com/files/hieroglyphy/

I took the source of a sample pasted here and ran it through the heiroglyphy generator:

var expect = function(val) { return "string" == typeof val; };

which output something along the lines of (truncated):

[][(![]+[])[!+[]+!![]+!![]]+([]+{})[+!![]]+(!![]+[])[+!![]]+(!![]+[])[+[]]][([]+{})[!+[]+!![]+!![]+!![]+!![]]+([]+{})...

But JSNice was unable to deobfuscate this code. Any ideas why?


This involves program optimizations. JSNice doesn't optimize `var x = 1+1` to `var x = 2`


Naturalize ( http://groups.inf.ed.ac.uk/naturalize/ ) is another related machine learning based tool that suggests appropriate identifier names for Java. Details on the implementation and an evaluation may be found here http://arxiv.org/abs/1402.4182


Whoa, it even infers local variable names automatically.


Some rather spectacular failures, though of course these things happen with statistical methods:

  /**
   * @param {string} val
   * @return {?}
   */
  var expect = function(val) {
    return "string" == typeof val;
  };
  /**
   * @param {boolean} deepDataAndEvents
   * @return {?}
   */
  var clone = function(deepDataAndEvents) {
    return "boolean" == typeof deepDataAndEvents;
  };
  /**
   * @param {(boolean|number|string)} obj
   * @return {?}
   */
  var isString = function(obj) {
    return "number" == typeof obj;
  };
Still, neat idea. Seems like there's a lot of room to train it, probably a lot of fun to try to improve things :)


Why was it unable to infer the type of the return value for generateSeries? The function only has one return statement, and it already knows the type of the return variable is {Array}.


It never shows types of return values, unless it is undefined. The reason is that the method may be overridden with a method with other return type.


Can this approach be extended to for example generating "matching" tests for the code? Like "I see this function processes dates, here are some popular test cases learned from 1000s other projects"?

Could someone point me to good resources about mining code, most data mining and machine learning articles deal with points in multidimensional space and not objects with complex internal structure like programs...


The complex internal structure of programs could be reduced to graphs, and machine learning has been working with graphs for decades.


Very interesting project. Would be great to get some more info on the actual algorithms being used. The about page offers relatively limited info.


I wonder how many people, like me, right off the bad went and pasted minified jQuery into this thing. Looks like an invaluable tool for reverse-engineering when unminified code is unavailable.


didn't do too well on http://js1k.com/2014-dragons/details/1903 :)

but processed jQuery relatively fast.


I put in some real world JS found on Hulu, and got a slew of errors like this one:

Line 1: Parse error. missing ; before statement


Josh,

Thanks for trying it out. I tried few large samples from Hulu and they seemed to work fine, e.g.:

http://static.huluim.com/huluguru/i18n/en-us/translations-9d...

But indeed, sometimes there could be issues if the code does not compile with the compiler of choice.


Thank you all for the comments...keep them coming.

we will definitely soon provide more details on how the overall system works...


> var width = $container.height();

Hehe


Yes, it is statistical. A browser extension would be cool!


Handy tool to have. Works suprisingly good.


What is this sorcery?!?!?

Infinitely helpful in reverse engineering google stuff like www.googletagservices.com/tag/js/gpt.js

Thanks!!!


Is it doing something beyond pretty print in Chrome?


Yes.


Nice tool!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: