Fast on Machines
Dask is lightweight, and runs your raw code on your machines without getting in the way. No virtualization or compilers.
As the Python stack matures your code matures. Today Dask is 50% faster than Spark on standard benchmarks.
Use Dask with pandas to intuitively process terabytes of tabular data.
It’s faster than Spark and easier too.
Dask dataframes use pandas under the hood, so your current code likely just works.
Use Dask to parallelize your own Python functions and scripts, no matter how complex.
Use Dask and NumPy/Xarray to churn through terabytes of multi-dimensional array data in formats like HDF, NetCDF, TIFF, or Zarr.
Run regular jobs on a schedule. Know that they’ll finish. Dask backs modern workflow managers like Prefect and Dagster.
Use Dask with common machine learning libraries to train or predict on large datasets, increasing model accuracy by using all of your data.
Dask is lightweight, and runs your raw code on your machines without getting in the way. No virtualization or compilers.
As the Python stack matures your code matures. Today Dask is 50% faster than Spark on standard benchmarks.
Computers are expensive. Humans are really expensive.
Fortunately, humans already know how to use Dask.
It’s just Python. It’s just pandas. It’s just NumPy.
Dask’s dashboard also guides you towards efficient computation, quickly teaching humans to become expert in distributed computing.
Fast humans + Fast machines = Cheap Computing
Run Dask on your laptop (it’s trivial) or deploy it on any resource manager from HPC job schedulers, to Kubernetes, to cloud SaaS services.
Run Dask in the cloud with an easy SaaS solution. Coiled is free for individuals with modest use and easy for anyone with a cloud account.