Fast on Machines
Dask is lightweight, and runs your raw code on your machines without getting in the way. No virtualization or compilers.
As the Python stack matures your code matures. Today Dask is 50% faster than Spark on standard benchmarks.
Dask DataFrames use pandas under the hood, so your current code likely just works. It’s faster than Spark and easier too.
Parallelize your Python code, no matter how complex. Dask is flexible and supports arbitrary dependencies and fine-grained task scheduling.
Use Dask and NumPy/Xarray to churn through terabytes of multi-dimensional array data in formats like HDF, NetCDF, TIFF, or Zarr.
Use Dask with common machine learning libraries to train or predict on large datasets, increasing model accuracy by using all of your data.
Dask is lightweight, and runs your raw code on your machines without getting in the way. No virtualization or compilers.
As the Python stack matures your code matures. Today Dask is 50% faster than Spark on standard benchmarks.
Computers are cheap. Humans are expensive.
Fortunately, humans already know how to use Dask.
It’s just Python. It’s just pandas. It’s just NumPy.
Dask’s dashboard guides you towards efficiency, quickly teaching you to become a distributed computing expert.
Fast humans + Fast machines = Cheap Computing
Run Dask on your laptop (it’s trivial) or deploy it on any resource manager like Kubernetes, an HPC job schedulers, cloud SaaS services, or even legacy Hadoop/Spark clusters.
Run Dask in the cloud with open source Kubernetes, or with an easy SaaS solution. Coiled is free for individuals with modest use and easy for anyone with a cloud account.