Logistic regression is a classification technique for classifying data points x (a vector) into one of K classes . It works by (essentially) projecting the datapoints onto a set of (pre-specified) features (which are simply vectors formed of functions of the datapoints’ components), and then finding linear separating (hyper-)planes in…

# Hadoop Part IV: Online Parameter Estimation

For the final (for now) part of this series, I am going to extend the particle filter to do online parameter estimation using online Expectation-Maximization (EM) to calculate an estimate of the autoregression parameter at each stage of the particle filter. There are many options for online parameter estimation, including…

# Hadoop Part III: Multiple Output in Hadoop

The output of the Hadoop MapReduce particle filter from the previous post is simply a list of doubles giving the state for each particle after resampling. This is not ideal because this post-resampling particle collection is a more crude representation of the post-observation state posterior than the pre-resampling, weighted collection. Obviously…

# Hadoop Part II: Particle Filters in Hadoop MapReduce

In this article, I’m going to go about implementing a basic particle filter in Hadoop MapReduce. This is really just a personal interest project for me to get started learning Hadoop based on an algorithm that I am familiar with and suits MapReduce (to some extent), but this might have…