Scale the Python tools you love

Scale PyData libraries

Dask makes it easy to scale the Python libraries that you know and love like NumPy, pandas, and scikit-learn.

Scale any Python code

Parallelize any Python code with Dask Futures, letting you scale any function and for loop, and giving you control and power in any situation.

Learn more about Dask Futures

Deploy anywhere

Start on a laptop, but scale to a cluster, no matter what infrastructure you use. Dask deploys on Kubernetes, cloud, or HPC, and Dask libraries make it easy to use as much or as little compute as you need.

Learn more about Dask Deployments

Powered by Dask

Dask is used throughout the PyData ecosystem and is included in many libraries today like Xarray, Prefect, RAPIDS, and XGBoost.

Learn more about libraries

Get started

Dask Users are Saying

"At Capital One, early implementations of Dask have reduced model training times by 91% within a few months of development effort."

Ryan McEntee, Capital One

“Dask also makes it easy to deploy distributed work locally using multiple Python processes in a way that is nearly identical to how full production load is distributed”

Hugues Demers, Grubhub

"To further accelerate our users’ ability to scale easily on the cloud, we expanded this by setting up pre-configured Horovod and Dask clusters."

Meenakshi Sharma, Wayfair

Why Choose Dask?

Python has grown to become the dominant language both in data analytics and general programming. This growth has been fueled by computational libraries like NumPy, pandas, and scikit-learn. However, these packages weren’t designed to scale beyond a single machine. Dask was developed to natively scale these packages and the surrounding ecosystem to multi-core machines and distributed clusters when datasets exceed memory.

Data professionals have many reasons to choose Dask.

Try Dask now

Has a familiar Python API

Integrates natively with Python code to ensure consistency and minimize friction

Scales out to clusters

Scales to thousand-node clusters in house, in the cloud, or on HPC clusters with no need for code rewrites

Scales down to single computers

Allows users to manipulate 100GB+ datasets on a laptop or 1TB+ datasets on a workstation

Supports complex applications

Enables advanced algorithms for statistics or machine learning, time series or local operations, or bespoke parallelism applications

Provides real-time visibility

Empowers user to quickly identify and resolve bugs and performance issues via an interactive dashboard

Supports GPU acceleration

Integrates with other frameworks for GPU-accelerated data analytics and machine learning

The Advantages of Using Dask

It’s easy to use

requires no configuration or setup

It’s secure

supports encryption and authentication using TLS/SSL certificates

It’s resilient

handles worker node failures gracefully

It’s elastic

takes advantage of new nodes added on-the-fly

It’s efficient

operates with low overhead and low latency for high performance

It’s customizable

can be used to build your own parallel computing system with custom business logic

It’s proven

has been used by thousands of data professionals across the globe

Ecosystem

Many software projects integrate with Dask or use Dask to power components of their infrastructure. Dask enables pandas, NumPy, scikit-learn, PyTorch, XGBoost, Xarray, Prefect, and RAPIDS, among many others.

See projects powered by Dask

The Dask Blog

Design Principles of Distributed Systems

Stay Connected

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Submit New Event

Submit News Feature

Scale the Python tools you love

Scale PyData libraries

Scale any Python code

Deploy anywhere

Powered by Dask

Organizations That Use Dask

Dask Users are Saying

Why Choose Dask?

Has a familiar Python API

Scales out to clusters

Scales down to single computers

Supports complex applications

Provides real-time visibility

Supports GPU acceleration

The Advantages of Using Dask

It’s easy to use

It’s secure

It’s resilient

It’s elastic

It’s efficient

It’s customizable

It’s proven

Ecosystem

Events

Future Event Details Coming Soon

The Dask Blog

High Level Query Optimization in Dask

High Level Query Optimization in Dask

Upstream testing in Dask

Upstream testing in Dask

Do you need consistent environments between the client, scheduler and workers?

Do you need consistent environments between the client, scheduler and workers?

Deep Dive into creating a Dask DataFrame Collection with from_map

Deep Dive into creating a Dask DataFrame Collection with from_map

Stay Connected

Get Started with Dask

Submit New Event

Submit News Feature

Sign up for Newsletter

Scale the Python tools you love

Scale PyData libraries

Scale any Python code

Deploy anywhere

Powered by Dask

Organizations That Use Dask

Dask Users are Saying

Why Choose Dask?

Has a familiar Python API

Scales out to clusters

Scales down to single computers

Supports complex applications

Provides real-time visibility

Supports GPU acceleration

The Advantages of Using Dask

It’s easy to use

It’s secure

It’s resilient

It’s elastic

It’s efficient

It’s customizable

It’s proven

Ecosystem

Events

Future Event Details Coming Soon

The Dask Blog

High Level Query Optimization in Dask

High Level Query Optimization in Dask

Upstream testing in Dask

Upstream testing in Dask

Do you need consistent environments between the client, scheduler and workers?

Do you need consistent environments between the client, scheduler and workers?

Deep Dive into creating a Dask DataFrame Collection with from_map

Deep Dive into creating a Dask DataFrame Collection with from_map

Stay Connected

Get Started with Dask