Wikipedia
Search results
Saturday, 19 September 2015
Apache Spark 1.5 presented by Databricks co-founder Patrick Wendell
Spark 1.5 ships Spark's Project Tungsten initiative, a cross-cutting performance update that uses binary memory management and code generation to dramatically improve latency of most Spark jobs. This release also includes several updates to Spark's DataFrame API and SQL optimizer, along with new Machine Learning algorithms and feature transformers, and several new features in Spark's native streaming engine
Spark DataFrames: Simple and Fast Analysis of Structured Data
This session will provide a technical overview of Spark’s DataFrame API. First, we’ll review the DataFrame API and show how to create DataFrames from a variety of data sources such as Hive, RDBMS databases, or structured file formats like Avro. We’ll then give example user programs that operate on DataFrames and point out...[More]
Saturday, 5 September 2015
New Features in Machine Learning Pipelines in Spark 1.4
Spark 1.2 introduced Machine Learning (ML) Pipelines to facilitate the creation, tuning, and inspection of practical ML workflows. Spark’s latest release, Spark 1.4, significantly extends the ML library. In this post, we highlight several new features in the....[More]
ML Pipelines: A New High-Level API for MLlib
MLlib’s goal is to make practical machine learning (ML) scalable and easy. Besides new algorithms and performance improvements that we have seen in each release, a great deal of time and effort has been spent on making MLlib easy. Similar to Spark Core, MLlib provides APIs in three languages: Python, Java, and Scala, along with user guide and example code, to ease the learning curve for users...[More]
New Features in Machine Learning Pipelines in Spark 1.4
Spark 1.2 introduced Machine Learning (ML) Pipelines to facilitate the creation, tuning, and inspection of practical ML workflows. Spark’s latest release, Spark 1.4, significantly extends the ML library. In this post, we highlight several new features in the...[More]
Simplify Machine Learning on Spark with Databricks
As many data scientists and engineers can attest, the majority of the time is spent not on the models themselves but on the supporting infrastructure. Key issues include on the ability to easily visualize, share, deploy, and schedule jobs. More disconcerting is the need for data engineers to re-implement the models developed by data scientists for production. With Databricks, data scientists and engineers can simplify these....[More]
Scalable Collaborative Filtering with Spark MLlib
Recommendation systems are among the most popular applications of machine learning. The idea is to predict whether a customer would like a certain item: a product, a movie, or a song. Scale is a key concern for recommendation systems, since computational complexity increases with the size of a company’s customer base. In this blog post, we discuss how Spark MLlib enables building recommendation .....[More]
Spark MLib - Use Case
In this chapter, we will use MLlib to make personalized movie recommendations tailored for you. We will work with 10 million ratings from 72,000 users on 10,000 movies, collected..[More]
Apache Spark - MLlib Introduction
In one of our earlier posts we have mentioned that we use Scalding (among others) for writing MR jobs. Scala/Scalding simplifies the implementation of many MR patterns and makes it easy to implement quite complex jobs like machine learning algorithms. Map Reduce is a mature and widely used framework and it is a good choice for processing large amounts of data – but not as great if you’d like to use it for fast iterative algorithms/processing. This is a use case...[More]
Friday, 4 September 2015
Demo: Apache Spark on MapR with MLlib
Editor's Note: In this demo we are using Spark and PySpark to process and analyze the data set, calculate aggregate statistics about the user base in a PySpark script, persist all of that back into MapR-DB for use in Spark and Tableau, and finally use MLlib to build ...[more]
Big Data Ecosystem – Spark and Tableau
In this article we give you the big picture of how Big Data fits in your actual BI architecture and how to connect Tableau to Spark to enrich your current BI reports and dashboards with data ...Read More....
Realtime Advanced Analytics: Spark Streaming+Kafka, MLlib/GraphX, SQL/DataFrames
We'll present a real-world, open source, advanced analytics and machine learning pipeline using all "15" Open Source technologies listed below.This Meetup is based on my recent "Top-5" Hadoop Summit/Data Science talk called "Spark After Dark". Spark After Dark is a mock online dating site...[Read More]
Tips to Create Proposal
If you're in the services or consulting business, you know all about RFPs: Requests for Proposal are how many professional agencies win new work. NMC receives a lot of them from organizations around the world wanting either to upgrade their existing web presence or start from scratch with a new one. Some of them are clear, detailed, and provide the right kind of information to help us quickly write a great proposal. Others, not so much! Keeping up with web technologies that change daily is a full-time job, which is probably they're looking for.....
Spark Streaming More
Spark Streaming is an extension of the core Spark API that allows enables scalable, high-throughput, fault-tolerant stream processing of live data streams. Data can be ingested from many sources like Kafka, Flume, Twitter, ZeroMQ, Kinesis or plain old TCP sockets and be processed using complex algorithms expressed with high-level functions like map, reduce, join and window. Finally, processed data can be pushed out to filesystems, databases, and live dashboards. In fact, you can apply Spark’s machine learning algorithms, ......
Subscribe to:
Posts (Atom)