I heard about the bestseller The Life-Changing Magic of Tidying Up at a SciPy talk about deculttering your data science project. The speakers admitted they hadn't read it - they were simply trying to point out that tidying up your space and tidying up your software project are both similar.
In his book, Neural Networks - A Comprehensive Foundation, Simon Haykin has an entire section (3.10) dedicated to how perceptrons and Bayesian classifiers are closely related when operating in a Gaussian environment. However, it is not until the end of the section that Haykin mentions that the relation is only limited to linearity. What is interesting about this is that a Perceptron can produce the same classification "model" as a Bayesian classifier, provided that the underlying data is drawn from a Gaussian distribution. This post is an experimental verification of that.
The last time I studied neural networks in detail was five years ago in college. I did touch upon backpropagation when Andrew Ng's machine learning MOOC was offered on Coursera for the first time, but beyond that I've only dabbled with them through keras. Then recently, when I read about Coursera's imminent decision to pull down much of their freely available material (you can read a rant about it here), I went on a downloading spree (many thanks to the wonderful coursera-dl). Of all the courses I downloaded, the one that caught my eye was Geoffrey Hinton's course on Neural Networks for Machine Learning. Because of that and the fact that there were some computer vision projects going on at work, I decided to dive right in.
Hinton's course is wonderful. He is funny, and unsurprisingly, very very insightful about the core concepts in neural networks. One of the signs of this is the fact that this course is not at all cluttered with too much mathematics, and can be traveresed by someone with only a working knowledge of calculus. One of his most insightful moments in the course is when he describes the Perceptron learning rule as simply as follows:
- If the perceptron makes no mistake, leave it alone.
- If it predicts a false negative, add the input vector to the weight vector
- If it predicts a false positive, subtract the input vector from the weight vector
This is so simple, that a literal implementation of this can make train a perceptron reasonably well (as we shall see). There are of course, numerous heuristics required when applying it in production, but the training algorithm is just this simple. Now, the popularity of the perceptron is because it guarantees linear convergence, i.e. if a binary classification problem is linearly separable in the feature space, the perceptron will always eventually learn to correctly classify the input samples. An algebraic or analytical proof of this can be found anywhere, but relies almost always on the Cauchy-Schwarz inequality. I thought that since the learning rule is so simple, then there must be a way to understand the convergence theorem using nothing more than the learning rule itself, and some simple data visualization. I think I've found a reasonable explanation, which is what this post is broadly about. But first, let's see a simple demonstration of training a perceptron.
Allen Downey makes a very good case for learning advanced mathematics through programming (Check the first section of the preface of Think Bayes, titled "My theory, which is mine"). But before the reader can hit paydirt with using the Bayes theorem in programming, Downey makes you go through some elementary problems in probability, which have to be solved by hand first, if you expect to have a clear enough understanding of the concept. I can vouch for this way of learning complex concepts. The way I learnt the backpropagation algorithm (and its derivation), was with a pen, paper and a calculator.
One of the most dangerous things that can affect any FOSS community is the tendency of evangelism for the sake of evangelism. Promoting the Python stack, expanding the userbase, etc, should come only as a consequence of the content we produce as developers. If evangelism even remotely becomes one of your goals, your quality is sure to suffer. And it's not just the empirical evidence that prompts me to say this. It even makes logical sense. If we want to "promote" Python and related tech, our best market would be the young and unexperienced (and therefore non-opinionated) minds. But note that such audiences are also very fickle. They may not return for the next conference. And since they don't, we have to count on more fresh entries each year. And in the conference itself, since we're all acutely aware of the demographic, we spend too many talks pandering to this part of the audience.
Actually, I'll even go so far as to say that expanding the user base of a language is not the purpose of a PyCon. For that we've got activities going on all year. Come on! It's a three day event that happens once a year! If you're so concerned about evangelism, focus on the local chapter meetups. PyCon isn't the place to do it. It's a place for people to get together and exchange ideas. Teaching basics gets in the way of that like nothing else.
So, in short, focus on the quality. Users will follow.
I have recently moved from Pune to Delhi. I had spent only a year in Pune, having moved there from Mumbai, where I lived for three years. Whenever I move, the bulk of my luggage consists of books and clothes. My stay in Pune was just a transition, so I never bothered to unpack and store all my stuff too carefully. Thus, a corner of my bedroom in my Pune apartment always looked like this:
from IPython.display import Image Image('images/before.jpg', height=300, width=400)
The actual volume of books that I carried from Pune to Delhi is about twice of what is seen in the picture. My house in Delhi has six bookshelves, three in the living room and three in another room that I've made into a study. Naturally, I wanted to arrange my nearly 200 books in a manner that would be convenient to access, and such that the arrangement made some thematic sense. For example, I should not have to go to the living room to get one of my books on data analysis, and my partner shouldn't have to keep coming into the study to get one of her novels. Also, books that deal with similar subjects should naturally lie together. Obviously, arraging books semantically and such that they look good on the shelves isn't a data analysis problem as such, and even a haphazard, arbitrary arrangement of books would still work well for most practical purposes, but I still thought making it data driven would be a fun exercise.
My implementation of the Hilbert Huang transform (PyHHT) is quite close to a beta release. After nearly three years of inactivity, I've found some time to develop the PyHHT library in the last few weeks. In all this time many people have written to me about a lot of things - from the inability to use the module because of the lack of documentation, to comparison between the results of HHT and conventional time series analysis techniques. And now the time has come when I can no longer avoid writing the documentation. But as I write the documentation, I've found that I need to re-learn the concepts which I learned back when I started writing this module. It is out of this learning cycle that came my last blog post.
This blog post is the second in a series of posts in which I will discuss HHT in detail. (After all I've always thought that blogging is a lot more fun than writing documentation. This really doesn't bode well for my software development habits, but here I am.)
It has been a little over three years since I started working on a Python implementation of the Hilbert Huang Transform. When I first presented it at SciPy India 2011 (video) it was just a collection of small scripts, without packaging, testing or even docstrings. Over time PyHHT has garnered some interest, and I have, since the last few weeks, found the time to regularly work on it. I have been able to come up with a decent implementation of empirical mode decomposition, pretty much in line with everything described in the original paper by Huang et al. The other parts of HHT, like the Hilbert spectrum and its time frequency representations are right around the corner.