Saturday, 5 September 2015

Simplify Machine Learning on Spark with Databricks

As many data scientists and engineers can attest, the majority of the time is spent not on the models themselves but on the supporting infrastructure. Key issues include on the ability to easily visualize, share, deploy, and schedule jobs. More disconcerting is the need for data engineers to re-implement the models developed by data scientists for production. With Databricks, data scientists and engineers can simplify these....[More]

Scalable Collaborative Filtering with Spark MLlib

Recommendation systems are among the most popular applications of machine learning. The idea is to predict whether a customer would like a certain item: a product, a movie, or a song. Scale is a key concern for recommendation systems, since computational complexity increases with the size of a company’s customer base. In this blog post, we discuss how Spark MLlib enables building recommendation .....[More]

Spark MLib - Use Case

In this chapter, we will use MLlib to make personalized movie recommendations tailored for you. We will work with 10 million ratings from 72,000 users on 10,000 movies, collected..[More]

Apache Spark - MLlib Introduction

In one of our earlier posts we have mentioned that we use Scalding (among others) for writing MR jobs. Scala/Scalding simplifies the implementation of many MR patterns and makes it easy to implement quite complex jobs like machine learning algorithms. Map Reduce is a mature and widely used framework and it is a good choice for processing large amounts of data – but not as great if you’d like to use it for fast iterative algorithms/processing. This is a use case...[More]

Friday, 4 September 2015

Apache Spark on MapR with MLlib

Subscribe to: Posts (Atom)