Python API (advanced)

In some rare cases, experts may want to create Scheduler, Worker, and Nanny objects explicitly in Python. This is often necessary when making tools to automatically deploy Dask in custom settings.

It is more common to create a Local cluster with Client() on a single machine or use the Command Line Interface (CLI). New readers are recommended to start there.

If you do want to start Scheduler and Worker objects yourself you should be a little familiar with async/await style Python syntax. These objects are awaitable and are commonly used within async with context managers. Here are a few examples to show a few ways to start and finish things.

Full Example

Scheduler([center, loop, delete_interval, …]) Dynamic distributed task scheduler
Worker([scheduler_ip, scheduler_port, …]) Worker node in a Dask distributed cluster
Client([address, loop, timeout, …]) Connect to and drive computation on a distributed Dask cluster

We first start with a comprehensive example of setting up a Scheduler, two Workers, and one Client in the same event loop, running a simple computation, and then cleaning everything up.

import asyncio
from dask.distributed import Scheduler, Worker, Client

async def f():
    async with Scheduler() as s:
        async with Worker(s.address) as w1, Worker(s.address) as w2:
            async with Client(s.address, asynchronous=True) as client:
                future = client.submit(lambda x: x + 1, 10)
                result = await future
                print(result)

asyncio.get_event_loop().run_until_complete(f())

Now we look at simpler examples that build up to this case.

Scheduler

Scheduler([center, loop, delete_interval, …]) Dynamic distributed task scheduler

We create scheduler by creating a Scheduler() object, and then await that object to wait for it to start up. We can then wait on the .finished method to wait until it closes. In the meantime the scheduler will be active managing the cluster..

import asyncio
from dask.distributed import Scheduler, Worker

async def f():
    s = Scheduler()        # scheduler created, but not yet running
    s = await s            # the scheduler is running
    await s.finished()     # wait until the scheduler closes

asyncio.get_event_loop().run_until_complete(f())

This program will run forever, or until some external process connects to the scheduler and tells it to stop. If you want to close things yourself you can close any Scheduler, Worker, Nanny, or Client class by awaiting the .close method:

await s.close()

Worker

Worker([scheduler_ip, scheduler_port, …]) Worker node in a Dask distributed cluster

The worker follows the same API. The only difference is that the worker needs to know the address of the scheduler.

import asyncio
from dask.distributed import Scheduler, Worker

async def f(scheduler_address):
    w = await Worker(scheduler_address)
    await w.finished()

asyncio.get_event_loop().run_until_complete(f("tcp://127.0.0.1:8786"))

Start many in one event loop

Scheduler([center, loop, delete_interval, …]) Dynamic distributed task scheduler
Worker([scheduler_ip, scheduler_port, …]) Worker node in a Dask distributed cluster

We can run as many of these objects as we like in the same event loop.

import asyncio
from dask.distributed import Scheduler, Worker

async def f():
    s = await Scheduler()
    w = await Worker(s.address)
    await w.finished()
    await s.finished()

asyncio.get_event_loop().run_until_complete(f())

Use Context Managers

We can also use async with context managers to make sure that we clean up properly. Here is the same example as from above:

import asyncio
from dask.distributed import Scheduler, Worker

async def f():
    async with Scheduler() as s:
        async with Worker(s.address) as w:
            await w.finished()
            await s.finished()

asyncio.get_event_loop().run_until_complete(f())

Alternatively, in the example below we also include a Client, run a small computation, and then allow things to clean up after that computation..

import asyncio
from dask.distributed import Scheduler, Worker, Client

async def f():
    async with Scheduler() as s:
        async with Worker(s.address) as w1, Worker(s.address) as w2:
            async with Client(s.address, asynchronous=True) as client:
                future = client.submit(lambda x: x + 1, 10)
                result = await future
                print(result)

asyncio.get_event_loop().run_until_complete(f())

This is equivalent to creating and awaiting each server, and then calling .close on each as we leave the context. In this example we don’t wait on s.finished(), so this will terminate relatively quickly. You could have called await s.finished() though if you wanted this to run forever.

Nanny

Nanny([scheduler_ip, scheduler_port, …]) A process to manage worker processes

Alternatively, we can replace Worker with Nanny if we want your workers to be managed in a separate process. The Nanny constructor follows the same API. This allows workers to restart themselves in case of failure. Also, it provides some additional monitoring, and is useful when coordinating many workers that should live in different processes in order to avoid the GIL.

# w = await Worker(s.address)
w = await Nanny(s.address)

API

These classes have a variety of keyword arguments that you can use to control their behavior. See the API documentation below for more information.

Scheduler

class distributed.Scheduler(center=None, loop=None, delete_interval='500ms', synchronize_worker_interval='60s', services=None, allowed_failures=3, extensions=None, validate=False, scheduler_file=None, security=None, worker_ttl=None, **kwargs)

Dynamic distributed task scheduler

The scheduler tracks the current state of workers, data, and computations. The scheduler listens for events and responds by controlling workers appropriately. It continuously tries to use the workers to execute an ever growing dask graph.

All events are handled quickly, in linear time with respect to their input (which is often of constant size) and generally within a millisecond. To accomplish this the scheduler tracks a lot of state. Every operation maintains the consistency of this state.

The scheduler communicates with the outside world through Comm objects. It maintains a consistent and valid view of the world even when listening to several clients at once.

A Scheduler is typically started either with the dask-scheduler executable:

$ dask-scheduler
Scheduler started at 127.0.0.1:8786

Or within a LocalCluster a Client starts up without connection information:

>>> c = Client()  
>>> c.cluster.scheduler  
Scheduler(...)

Users typically do not interact with the scheduler directly but rather with the client object Client.

State

The scheduler contains the following state variables. Each variable is listed along with what it stores and a brief description.

  • tasks: {task key: TaskState}
    Tasks currently known to the scheduler
  • unrunnable: {TaskState}
    Tasks in the “no-worker” state
  • workers: {worker key: WorkerState}
    Workers currently connected to the scheduler
  • idle: {WorkerState}:
    Set of workers that are not fully utilized
  • saturated: {WorkerState}:
    Set of workers that are not over-utilized
  • host_info: {hostname: dict}:
    Information about each worker host
  • clients: {client key: ClientState}
    Workers currently connected to the scheduler
  • services: {str: port}:
    Other services running on this scheduler, like Bokeh
  • loop: IOLoop:
    The running Tornado IOLoop
  • client_comms: {client key: Comm}
    For each client, a Comm object used to receive task requests and report task status updates.
  • stream_comms: {worker key: Comm}
    For each worker, a Comm object from which we both accept stimuli and report results
  • task_duration: {key-prefix: time}
    Time we expect certain functions to take, e.g. {'sum': 0.25}
  • coroutines: [Futures]:
    A list of active futures that control operation
add_client(comm, client=None)

Add client to network

We listen to all future messages from this Comm.

add_keys(comm=None, worker=None, keys=())

Learn that a worker has certain keys

This should not be used in practice and is mostly here for legacy reasons. However, it is sent by workers from time to time.

add_plugin(plugin=None, idempotent=False, **kwargs)

Add external plugin to scheduler

See https://distributed.readthedocs.io/en/latest/plugins.html

add_worker(comm=None, address=None, keys=(), ncores=None, name=None, resolve_address=True, nbytes=None, now=None, resources=None, host_info=None, memory_limit=None, metrics=None, pid=0, services=None, local_directory=None)

Add a new worker to the cluster

broadcast(comm=None, msg=None, workers=None, hosts=None, nanny=False, serializers=None)

Broadcast message to workers, return all results

cancel_key(key, client, retries=5, force=False)

Cancel a particular key and all dependents

client_heartbeat(client=None)

Handle heartbeats from Client

client_releases_keys(keys=None, client=None)

Remove keys from client desired list

close(comm=None, fast=False)

Send cleanup signal to all coroutines then wait until finished

See also

Scheduler.cleanup

close_worker(stream=None, worker=None, safe=None)

Remove a worker from the cluster

This both removes the worker from our local state and also sends a signal to the worker to shut down. This works regardless of whether or not the worker has a nanny process restarting it

coerce_address(addr, resolve=True)

Coerce possible input addresses to canonical form. resolve can be disabled for testing with fake hostnames.

Handles strings, tuples, or aliases.

coerce_hostname(host)

Coerce the hostname of a worker.

decide_worker(ts)

Decide on a worker for task ts. Return a WorkerState.

feed(comm, function=None, setup=None, teardown=None, interval='1s', **kwargs)

Provides a data Comm to external requester

Caution: this runs arbitrary Python code on the scheduler. This should eventually be phased out. It is mostly used by diagnostics.

finished()

Wait until all coroutines have ceased

gather(comm=None, keys=None, serializers=None)

Collect data in from workers

get_comm_cost(ts, ws)

Get the estimated communication cost (in s.) to compute the task on the given worker.

get_task_duration(ts, default=0.5)

Get the estimated computation cost of the given task (not including any communication cost).

get_worker_service_addr(worker, service_name, protocol=False)

Get the (host, port) address of the named service on the worker. Returns None if the service doesn’t exist.

Parameters:

worker : address

service_name : str

Common services include ‘bokeh’ and ‘nanny’

protocol : boolean

Whether or not to include a full address with protocol (True) or just a (host, port) pair

handle_long_running(key=None, worker=None, compute_duration=None)

A task has seceded from the thread pool

We stop the task from being stolen in the future, and change task duration accounting as if the task has stopped.

handle_worker(comm=None, worker=None)

Listen to responses from a single worker

This is the main loop for scheduler-worker interaction

See also

Scheduler.handle_client
Equivalent coroutine for clients
identity(comm=None)

Basic information about ourselves and our cluster

proxy(comm=None, msg=None, worker=None, serializers=None)

Proxy a communication through the scheduler to some other worker

rebalance(comm=None, keys=None, workers=None)

Rebalance keys so that each worker stores roughly equal bytes

Policy

This orders the workers by what fraction of bytes of the existing keys they have. It walks down this list from most-to-least. At each worker it sends the largest results it can find and sends them to the least occupied worker until either the sender or the recipient are at the average expected load.

reevaluate_occupancy(worker_index=0)

Periodically reassess task duration time

The expected duration of a task can change over time. Unfortunately we don’t have a good constant-time way to propagate the effects of these changes out to the summaries that they affect, like the total expected runtime of each of the workers, or what tasks are stealable.

In this coroutine we walk through all of the workers and re-align their estimates with the current state of tasks. We do this periodically rather than at every transition, and we only do it if the scheduler process isn’t under load (using psutil.Process.cpu_percent()). This lets us avoid this fringe optimization when we have better things to think about.

register_worker_callbacks(comm, setup=None)

Registers a setup function, and call it on every worker

remove_client(client=None)

Remove client from network

remove_plugin(plugin)

Remove external plugin from scheduler

remove_worker(comm=None, address=None, safe=False, close=True)

Remove worker from cluster

We do this when a worker reports that it plans to leave or when it appears to be unresponsive. This may send its tasks back to a released state.

replicate(comm=None, keys=None, n=None, workers=None, branching_factor=2, delete=True)

Replicate data throughout cluster

This performs a tree copy of the data throughout the network individually on each piece of data.

Parameters:

keys: Iterable

list of keys to replicate

n: int

Number of replications we expect to see within the cluster

branching_factor: int, optional

The number of workers that can copy data in each generation. The larger the branching factor, the more data we copy in a single step, but the more a given worker risks being swamped by data requests.

report(msg, ts=None, client=None)

Publish updates to all listening Queues and Comms

If the message contains a key then we only send the message to those comms that care about the key.

reschedule(key=None, worker=None)

Reschedule a task

Things may have shifted and this task may now be better suited to run elsewhere

restart(client=None, timeout=3)

Restart all workers. Reset local state.

retire_workers(comm=None, workers=None, remove=True, close_workers=False, **kwargs)

Gracefully retire workers from cluster

Parameters:

workers: list (optional)

List of worker IDs to retire. If not provided we call workers_to_close which finds a good set

remove: bool (defaults to True)

Whether or not to remove the worker metadata immediately or else wait for the worker to contact us

close_workers: bool (defaults to False)

Whether or not to actually close the worker explicitly from here. Otherwise we expect some external job scheduler to finish off the worker.

**kwargs: dict

Extra options to pass to workers_to_close to determine which workers we should drop

Returns:

Dictionary mapping worker ID/address to dictionary of information about

that worker for each retired worker.

run_function(stream, function, args=(), kwargs={})

Run a function within this process

scatter(comm=None, data=None, workers=None, client=None, broadcast=False, timeout=2)

Send data out to workers

send_task_to_worker(worker, key)

Send a single computational task to a worker

start(addr_or_port=8786, start_queues=True)

Clear out old state and restart all running coroutines

start_ipython(comm=None)

Start an IPython kernel

Returns Jupyter connection info dictionary.

stimulus_cancel(comm, keys=None, client=None, force=False)

Stop execution on a list of keys

stimulus_missing_data(cause=None, key=None, worker=None, ensure=True, **kwargs)

Mark that certain keys have gone missing. Recover.

stimulus_task_erred(key=None, worker=None, exception=None, traceback=None, **kwargs)

Mark that a task has erred on a particular worker

stimulus_task_finished(key=None, worker=None, **kwargs)

Mark that a task has finished execution on a particular worker

story(*keys)

Get all transitions that touch one of the input keys

transition(key, finish, *args, **kwargs)

Transition a key from its current state to the finish state

Returns:Dictionary of recommendations for future transitions

See also

Scheduler.transitions
transitive version of this function

Examples

>>> self.transition('x', 'waiting')
{'x': 'processing'}
transition_story(*keys)

Get all transitions that touch one of the input keys

transitions(recommendations)

Process transitions until none are left

This includes feedback from previous transitions and continues until we reach a steady state

update_data(comm=None, who_has=None, nbytes=None, client=None, serializers=None)

Learn that new data has entered the network from an external source

See also

Scheduler.mark_key_in_memory

update_graph(client=None, tasks=None, keys=None, dependencies=None, restrictions=None, priority=None, loose_restrictions=None, resources=None, submitting_task=None, retries=None, user_priority=0, actors=None, fifo_timeout=0)

Add new computations to the internal dask graph

This happens whenever the Client calls submit, map, get, or compute.

valid_workers(ts)

Return set of currently valid workers for key

If all workers are valid then this returns True. This checks tracks the following state:

  • worker_restrictions
  • host_restrictions
  • resource_restrictions
worker_objective(ts, ws)

Objective function to determine which worker should get the task

Minimize expected start time. If a tie then break with data storage.

worker_send(worker, msg)

Send message to worker

This also handles connection failures by adding a callback to remove the worker on the next cycle.

workers_list(workers)

List of qualifying workers

Takes a list of worker addresses or hostnames. Returns a list of all worker addresses that match

workers_to_close(memory_ratio=None, n=None, key=None, minimum=None)

Find workers that we can close with low cost

This returns a list of workers that are good candidates to retire. These workers are not running anything and are storing relatively little data relative to their peers. If all workers are idle then we still maintain enough workers to have enough RAM to store our data, with a comfortable buffer.

This is for use with systems like distributed.deploy.adaptive.

Parameters:

memory_factor: Number

Amount of extra space we want to have for our stored data. Defaults two 2, or that we want to have twice as much memory as we currently have data.

n: int

Number of workers to close

minimum: int

Minimum number of workers to keep around

key: Callable(WorkerState)

An optional callable mapping a WorkerState object to a group affiliation. Groups will be closed together. This is useful when closing workers must be done collectively, such as by hostname.

Returns:

to_close: list of worker addresses that are OK to close

Examples

>>> scheduler.workers_to_close()
['tcp://192.168.0.1:1234', 'tcp://192.168.0.2:1234']

Group workers by hostname prior to closing

>>> scheduler.workers_to_close(key=lambda ws: ws.host)
['tcp://192.168.0.1:1234', 'tcp://192.168.0.1:4567']

Remove two workers

>>> scheduler.workers_to_close(n=2)

Keep enough workers to have twice as much memory as we we need.

>>> scheduler.workers_to_close(memory_ratio=2)

Worker

class distributed.Worker(scheduler_ip=None, scheduler_port=None, scheduler_file=None, ncores=None, loop=None, local_dir='dask-worker-space', services=None, service_ports=None, name=None, reconnect=True, memory_limit='auto', executor=None, resources=None, silence_logs=None, death_timeout=None, preload=None, preload_argv=None, security=None, contact_address=None, memory_monitor_interval='200ms', extensions=None, metrics=None, **kwargs)

Worker node in a Dask distributed cluster

Workers perform two functions:

  1. Serve data from a local dictionary
  2. Perform computation on that data and on data from peers

Workers keep the scheduler informed of their data and use that scheduler to gather data from other workers when necessary to perform a computation.

You can start a worker with the dask-worker command line application:

$ dask-worker scheduler-ip:port

Use the --help flag to see more options

$ dask-worker –help

The rest of this docstring is about the internal state the the worker uses to manage and track internal computations.

State

Informational State

These attributes don’t change significantly during execution.

  • ncores: int:
    Number of cores used by this worker process
  • executor: concurrent.futures.ThreadPoolExecutor:
    Executor used to perform computation
  • local_dir: path:
    Path on local machine to store temporary files
  • scheduler: rpc:
    Location of scheduler. See .ip/.port attributes.
  • name: string:
    Alias
  • services: {str: Server}:
    Auxiliary web servers running on this worker
  • service_ports: {str: port}:
  • total_out_connections: int
    The maximum number of concurrent outgoing requests for data
  • total_in_connections: int
    The maximum number of concurrent incoming requests for data
  • total_comm_nbytes: int
  • batched_stream: BatchedSend
    A batched stream along which we communicate to the scheduler
  • log: [(message)]
    A structured and queryable log. See Worker.story

Volatile State

This attributes track the progress of tasks that this worker is trying to complete. In the descriptions below a key is the name of a task that we want to compute and dep is the name of a piece of dependent data that we want to collect from others.

  • data: {key: object}:
    Dictionary mapping keys to actual values
  • task_state: {key: string}:
    The state of all tasks that the scheduler has asked us to compute. Valid states include waiting, constrained, exeucuting, memory, erred
  • tasks: {key: dict}
    The function, args, kwargs of a task. We run this when appropriate
  • dependencies: {key: {deps}}
    The data needed by this key to run
  • dependents: {dep: {keys}}
    The keys that use this dependency
  • data_needed: deque(keys)
    The keys whose data we still lack, arranged in a deque
  • waiting_for_data: {kep: {deps}}
    A dynamic verion of dependencies. All dependencies that we still don’t have for a particular key.
  • ready: [keys]
    Keys that are ready to run. Stored in a LIFO stack
  • constrained: [keys]
    Keys for which we have the data to run, but are waiting on abstract resources like GPUs. Stored in a FIFO deque
  • executing: {keys}
    Keys that are currently executing
  • executed_count: int
    A number of tasks that this worker has run in its lifetime
  • long_running: {keys}
    A set of keys of tasks that are running and have started their own long-running clients.
  • dep_state: {dep: string}:
    The state of all dependencies required by our tasks Valid states include waiting, flight, and memory
  • who_has: {dep: {worker}}
    Workers that we believe have this data
  • has_what: {worker: {deps}}
    The data that we care about that we think a worker has
  • pending_data_per_worker: {worker: [dep]}
    The data on each worker that we still want, prioritized as a deque
  • in_flight_tasks: {task: worker}
    All dependencies that are coming to us in current peer-to-peer connections and the workers from which they are coming.
  • in_flight_workers: {worker: {task}}
    The workers from which we are currently gathering data and the dependencies we expect from those connections
  • comm_bytes: int
    The total number of bytes in flight
  • suspicious_deps: {dep: int}
    The number of times a dependency has not been where we expected it
  • nbytes: {key: int}
    The size of a particular piece of data
  • types: {key: type}
    The type of a particular piece of data
  • threads: {key: int}
    The ID of the thread on which the task ran
  • active_threads: {int: key}
    The keys currently running on active threads
  • exceptions: {key: exception}
    The exception caused by running a task if it erred
  • tracebacks: {key: traceback}
    The exception caused by running a task if it erred
  • startstops: {key: [(str, float, float)]}
    Log of transfer, load, and compute times for a task
  • priorities: {key: tuple}
    The priority of a key given by the scheduler. Determines run order.
  • durations: {key: float}
    Expected duration of a task
  • resource_restrictions: {key: {str: number}}
    Abstract resources required to run a task
Parameters:

scheduler_ip: str

scheduler_port: int

ip: str, optional

ncores: int, optional

loop: tornado.ioloop.IOLoop

local_dir: str, optional

Directory where we place local resources

name: str, optional

memory_limit: int, float, string

Number of bytes of memory that this worker should use. Set to zero for no limit. Set to ‘auto’ for 60% of memory use. Use strings or numbers like 5GB or 5e9

memory_target_fraction: float

Fraction of memory to try to stay beneath

memory_spill_fraction: float

Fraction of memory at which we start spilling to disk

memory_pause_fraction: float

Fraction of memory at which we stop running new tasks

executor: concurrent.futures.Executor

resources: dict

Resources that thiw worker has like {'GPU': 2}

See also

distributed.scheduler.Scheduler, distributed.nanny.Nanny

Examples

Use the command line to start a worker:

$ dask-scheduler
Start scheduler at 127.0.0.1:8786

$ dask-worker 127.0.0.1:8786
Start worker at:               127.0.0.1:1234
Registered with scheduler at:  127.0.0.1:8786
executor_submit(key, function, args=(), kwargs=None, executor=None)

Safely run function in thread pool executor

We’ve run into issues running concurrent.future futures within tornado. Apparently it’s advantageous to use timeouts and periodic callbacks to ensure things run smoothly. This can get tricky, so we pull it off into an separate method.

get_current_task()

Get the key of the task we are currently running

This only makes sense to run within a task

See also

get_worker

Examples

>>> from dask.distributed import get_worker
>>> def f():
...     return get_worker().get_current_task()
>>> future = client.submit(f)  # doctest: +SKIP
>>> future.result()  # doctest: +SKIP
'f-1234'
memory_monitor()

Track this process’s memory usage and act accordingly

If we rise above 70% memory use, start dumping data to disk.

If we rise above 80% memory use, stop execution of new tasks

start_ipython(comm)

Start an IPython kernel

Returns Jupyter connection info dictionary.

trigger_profile()

Get a frame from all actively computing threads

Merge these frames into existing profile counts

worker_address

For API compatibility with Nanny

Nanny

class distributed.Nanny(scheduler_ip=None, scheduler_port=None, scheduler_file=None, worker_port=0, ncores=None, loop=None, local_dir='dask-worker-space', services=None, name=None, memory_limit='auto', reconnect=True, validate=False, quiet=False, resources=None, silence_logs=None, death_timeout=None, preload=(), preload_argv=[], security=None, contact_address=None, listen_address=None, worker_class=None, **kwargs)

A process to manage worker processes

The nanny spins up Worker processes, watches then, and kills or restarts them as necessary.

instantiate(comm=None)

Start a local worker process

Blocks until the process is up and the scheduler is properly informed

kill(comm=None, timeout=2)

Kill the local worker process

Blocks until both the process is down and the scheduler is properly informed

memory_monitor()

Track worker’s memory. Restart if it goes above terminate fraction