Find me on Github.

GitHub license


“They aren’t quite pandas. Manatees, the pandas of the sea.”

Manatee is a wrapper class around PySpark DataFrames. It adds some much needed user-friendliness by providing helper methods to the pyspark.sql.dataframe.DataFrame object. It also offers the ability to pair the dataframe with a pyspark.mllib classification or regression model, neatly keeping everything in one place.

This project is in pre-alpha. Check out the documentation.


This project was last updated on 2016-06-17.


Many thanks to Alexey Svyatkovskiy for his help getting me started with Spark and for his support with my numerous questions !