There are some great videos by the author on the same subject at: http://videolectures.net/course_information_theory_pattern_r...
They cover a lot of the same ground but in a gentler way so they're good for building intuition before working through the book fully.
Other formats are available along with some other information.
David's course and drafts of this book were my introduction to machine learning. It starts out very accessible (for those with maths at undergraduate science-subject level or a very good high-school), and also contains more dense advanced material later on. This is a book with hidden gems throughout that you can return to many times.
I read a chapter of the book, it criticizes non Bayesian statistics too much. For example at discussing p-values. The p-value method is for getting sound results but not for interpreting the particular value obtained in one experiment, for example 5% is for being right 19 of 20 times, the numbers obtained in the experiments doesn't change this.
The author is trying to present a probabalistic framework for reasoning about the world. The issue with p-values is that they have a narrow band of utility, and can't really contribute much to the process of inference as a whole; you can say no to a nearly infinite number of absurd propositions and still not be able to say anything insightful about a system. On top of that, p-values are clearly not intuitive, as demonstrated by their widespread misuse.
honestly, i tried reading through the initial chapters, and found them too be waay too much for me. any pointers to gentler introduction to the subject matter ? thank you !
Elements of Information Theory by Cover and Thomas
I had the same response as you did, and thought to myself "I never got round to reading Cover and Thomas, but this pdf is two or three times the speed, I really need to read Cover and Thomas first."