Running a PET analysis using the Python client
Last updated on 2024-10-03 | Edit this page
Overview
Questions
In the context of the Python client:
- How to connect to the vantage6 server?
- How to explore the
Client
object? - How to check details of a collaborations?
- How to start a compute task?
- How to collect the results of a finished computation?
Objectives
After completing this episode, you will be able to:
- Understand the basic concepts of the vantage6 Python client.
use the Python client to …
- Connect to the vantage6 server.
- Use the Python client to get details of a collaboration.
- Create a task using the Python client.
- Collect the results of a finished computation using the Python client.
Prerequisite
Make sure you completed the Setup Episode before starting this episode. Some basic knowledge of Python is also required to complete the exercises in this episode.
The Python client
The vantage6 Python client is a library designed to facilitate interaction with the vantage6 server, to perform various tasks such as creating computation tasks, managing organizations, collaborations, users, and collecting results. It is a versatile alternative to the web-based user interface we have used in previous lessons.
Data scientists and administrators may use it to manage resources programatically. For example, to automate actions or integrating them on other applications. The Python client communicates with the REST API (wikipedia.org) of the vantage6 server, handling encryption and decryption where applicable.
Besides the Python client, there is also an R client (github.com) available. This client is more focused on starting federated analysis and does not provide tools to manage the server. Important to note that this client is poorly maintained and lags behind in terms of features. Therefore we do not recommend using it. You can find more information in the documentation (docs.vantage6.ai).
If your organization uses a different programming language, you can always create a client in that language by following the API documentation (docs.vantage6.ai).
In this workshop, we will only use on the Python client.
Connect & authenticate
Creating an instance of the vantage6 Python client is relatively straightforward. The user defines server connection details: server address, login credentials, and the organization’s private key in case encryption (docs.vantage6.ai) is used in the collaboration. In case the server has two-factor authentication (docs.vantage6.ai) (2FA) enabled, you should also enter the corresponding time-based 6-digit code accordingly.
To avoid leaking your username and/or password by accident, they can
be defined in a separate Python file (e.g., config.py
),
which is then imported into the main script. This way, the main script
does not contain any sensitive information.
PYTHON
server_url = "https://<vantage6-server-address>"
server_port = 443
server_api = "/api"
username = "MY USERNAME"
password = "MY PASSWORD"
# Path to the private key, if encryption is enabled. Can be None if
# encryption is not used. Note that this key is the organization's
# private key. In case of this workshop we do not use encryption, so
# this can be None.
organization_key = None
Once you have created the Python module with the configuration settings, you can import it and create the client instance as follows:
PYTHON
from vantage6.client import Client
# It is assumed here that the `config.py` you just created is in the current
# directory. If it is not, then you need to make sure it can be found on
# your PYTHONPATH
import config
# Initialize the client object, and authenticate
client = Client(config.server_url, config.server_port, config.server_api,
log_level='info')
client.authenticate(config.username, config.password)
# In the case of 2FA, you should also include the 6-digit code:
# client.authenticate(config.username, config.password, '123456')
# In case encryption is used, this line can be used to set the organizations private
# key.
# client.setup_encryption(config.organization_key)
1. Connect 🌍!
Connect to the vantage6 server using the Python client!
- Create the
config.py
with the your credentials and connection details. - Create a cell with the
client
script with the code above. - Run the
client
cell as defined above to create the client instance. - Make sure to use the correct user / password
- Check the output to see if there are any errors
Make sure you have created the
config.py
file with your credentials and theclient
cell with the code above.-
Run the
client
cell. If the connection is successful, you should see the message--> Succesfully authenticated
:Welcome to _ __ | | / / __ ____ _ _ __ | |_ __ _ __ _ ___ / /_ \ \ / / _` | '_ \| __/ _` |/ _` |/ _ \ '_ \ \ V / (_| | | | | || (_| | (_| | __/ (_) | \_/ \__,_|_| |_|\__\__,_|\__, |\___|\___/ __/ | |___/ --> Join us on Discord! https://discord.gg/rwRvwyK --> Docs: https://docs.vantage6.ai --> Blog: https://vantage6.ai ------------------------------------------------------------ Cite us! If you publish your findings obtained using vantage6, please cite the proper sources as mentioned in: https://vantage6.ai/vantage6/references ------------------------------------------------------------ ... --> Succesfully authenticated
Using the client
The Client
instance offers a set of attributes that
correspond to the vantage6
server resources (Episode 3) also described in the official
documentation (docs.vantage6.ai). The available attributes are:
Resource | Description |
---|---|
client.user |
Manage users including your own user details |
client.organization |
Manage organizations or the organization that you are part of |
client.rule |
View all available permission rules |
client.role |
Manage roles (are collections of rules) |
client.collaboration |
Manage collaborations |
client.task |
Create new tasks and view their run data |
client.result |
Obtain results from the tasks |
client.util |
Provides utility functions for the vantage6 Python client. For example to reset your password |
client.node |
Manage nodes |
client.store |
Manage algorithm stores |
client.algorithm |
Manage algorithms that can be used for the computations |
Method and parameter documentation
There are many methods available in each of the resources and each
method has its own set of parameters. To know which parameters are
available for a specific method, you can use the help()
function in Python. For example, to get the documentation of the
client.organization.list()
method, you can use the
following command:
OUTPUT
list(self, name: 'str' = None, country: 'int' = None, collaboration: 'int' = None, study: 'int' = None, page: 'int' = None, per_page: 'int' = None) -> 'list[dict]'
List organizations
Parameters
----------
name: str, optional
Filter by name (with LIKE operator)
country: str, optional
Filter by country
collaboration: int, optional
Filter by collaboration id. If client.setup_collaboration() was called,
the previously setup collaboration is used. Default value is None
This shows you that you can filter the list of organizations (among
others) by name, country, and collaboration. It is also possible to
request documentation of a higher level method, for example
help(client.organization)
or even
help(client)
.:
To view all Client
functions and their arguments without
using help()
you can use the official
documentation (docs.vantage6.ai). Which is the same as the Python
client’s docstring.
Make sure you are viewing the documentation of the version of the client you are using. You can find the version of the client by one of the following commands:
or by running the following command in the terminal:
Permissions
Note that the authenticated user may not be allowed to perform all operations or view all resources. For example, a user may not be allowed to create a new organization, but may be allowed to list the organizations within all collaboration its organization participates in. The server will only allow the user to perform operations on the resources that the user has permission to perform.
The 5 basic operations
Almost all of the resources provide a get, list, create, update and delete operation.
When using the get
and list
methods a
dictionary is returned with the requested information. In the case of
the create
and update
methods typically the
created resource is returned. Finally in the case of delete
nothing is returned but a message is printed to confirm the
deletion.
2. Collect collaboration details
Before starting a task, you need to know the details of the collaboration you are working with. Use the Python client to get the details of the collaboration you have access to. Write down the name and ID of each collaboration.
Use help(client.collaboration.list)
to see which
arugments you can use to filter the collaborations.
OUTPUT
[
{'id': 168, 'name': 'Lychee Labs', ...},
{'id': 158, 'name': 'Pineapple Paradigm', ...},
{'id': 155, 'name': 'Huckleberry Hub', ...},
{'id': 140, 'name': 'Mango Matrix', ...},
{'id': 128, 'name': 'Apple Innovations', ...},
{'id': 170, 'name': 'eScience center', ...},
{'id': 165, 'name': 'Grapefruit Group', ...},
{'id': 145, 'name': 'Raspberry Revolution', ...},
{'id': 136, 'name': 'Ivy Berry Solutions', ...},
{'id': 166, 'name': 'Huckleberry Holdings', ...}
]
Additional operations
Some resources do not provide all five operations and some resource provide additional operations. For example:
- it is not possible to create new rules. In other words:
client.rule.create
does not exist. - the
client.task
has aclient.task.kill
method which is able to stop a task that currently is running.
Top level methods
Up untill now we only discussed Client
methods that are
bound to an resource. There are also some methods that are not bound to
a specific resource. Examples are:
-
client.authenticate()
to authenticate the user. You have already used this method when you authenticated to the server. -
client.setup_encryption()
to setup the encryption. -
client.wait_for_results()
to wait for the results of a task.
3. Find documentation
Find the documentation on how to reset your password 🔑 in case you
forgot it. You can use the help()
to explore the client
functions.
Have a look at the client table, and see if you can find a resource group that might contain the method you are looking for.
Use the help()
function to find the documentation of the
client.util
resource.
Identifiers are key
It is important to note that the Python client use identifiers rather than names to select resources. For example, to filter the organizations that belong to a a given collaboration, you need to know the collaboration’s identifier first.
In a previous challenge, you were asked to get the details of the collaborations you have access to. This is common practice when working with the vantage6 Python client.
It is also possible to obtain these identifiers through the UI. However, when working with the UI, identifiers are not as important as the names can be used to identify the resources. But also for the UI, it is important to know the identifiers of the resources as names are not always unique.
The reason for this is that the UI is designed to more be user-friendly, and it is expected that users will interact with the resources using their names. Typically they will select the resources from a list or dropdown. Since names do not always have to be unique (e.g. task names) it is impossible to identify a resource uniquely by its name.
Now that we understand the basic concepts of the vantage6 Python
client, let us get some more details about our collaborations. First, as
before, we collect the details of the collaborations we have access to.
We do so by specifying an additional parameter fields
to
the list()
method. This parameter allows us to specify
which fields we want to see in the output. This makes it more readable
and easier to find the information we are looking for.
OUTPUT
[
{'id': 12, 'name': 'Birch Brotherhood'},
{'id': 9, 'name': 'Pine Partners'},
{'id': 11, 'name': 'Cedar Coalition'},
{'id': 10, 'name': 'Maple Consortium'},
{'id': 15, 'name': 'demo'},
{'id': 8, 'name': 'Oak Alliance'},
{'id': 14, 'name': 'Redwood Union'},
{'id': 13, 'name': 'Willow Network'}
]
Then, we are interested in all the organizations that participate in
one of the collaborations. Lets assume that the collaboration ID is 1.
We then can get the organizations that are part of this collaboration by
using the client.organization.list()
method with the
collaboration
parameter set to 1.
OUTPUT
[
{'id': 171, 'name': 'IKNL'},
{'id': 172, 'name': 'new_organization'}
]
Write down the ID of each organization and collaboration. You will need them in the next challenges.
Creating a new task
Before starting an analysis we need several details about the collaboration and the analysis to be performed. This includes the organization and collaboration identifiers, we have just collected.
Before we start the analysis, let us check if everything is in place:
Checklist
Network
- ✔ Connect to the vantage6 server using the Python client.
- ✔ Use the Python client to get the details of the collaboration and its organizations you have access to.
- ⚠ Check the status of the nodes
Average Algorithm
- ✔ published at:
harbor2.vantage6.ai/demo/average
. - ✔ We are going to use the
partial_average()
function. - ✔ The function requires a
column_name
parameter, we are setting this to ‘age’.
Node status
As the checklist above indicates, we have not checked the status of the nodes. You can start an analys when nodes are offline, they will start the analysis once they are online. In case a node is offline, you might need to inquire with the node owner to get it back online.
4. Check the status of the nodes
Use the Python client to check the status of the nodes that are part of the collaboration you are interested in.
To check the status of the nodes, you can use the
client.node
resource.
See help(client.node.list)
to see how to get the details
of the nodes of a given collaboration. You can also use the
fields
parameter to get only the information you are
interested in.
You can obtain node details by using the
client.node.list()
method. To filter the nodes you are
interested in, you can use the collaboration
parameter in
the client.node.list()
method:
You can also specify the fields
parameter to get only
the information you are interested in:
You can also use the is_online
parameter to filter the
nodes that are online. But for now it would be good to see both online
and offline nodes.
OUTPUT
[
{
'id': 155,
'name': 'IKNL demo node',
'status': None
}
]
One of the nodes in our collaboration is offline. In the real world,
you would need to contact the node owner to get the node back online.
But for the purpose of this workshop we have defined a study
that contains only online nodes. Which has a name that ends with
Subset. You can find the study ID by using the
client.study.list()
method. Write down the ID of the
study.
Task definition
A task in vantage6 is a request to execute an algorithm on a given organization. When creating a task, you need to specify the following:
- The collaboration[, study] and organization identifiers.
- The algorithm to be executed.
- The input parameters for the algorithm.
The average algorithm we are going to use is the same as in Episode 3.
This algorithm has two functions: partial_average()
and
central_average()
. If you do not know the difference
between partial and central function, you should read
How
algorithms run in vantage6 (Episode 2) again.
We can use the client.task.create()
method to create a
new task to be executed by the nodes.
OUTPUT
create(organizations: 'list', name: 'str', image: 'str', description: 'str', input_: 'dict', collaboration: 'int' = None, study: 'int' = None, store: 'int' = None, databases: 'list[dict]' = None) -> 'dict' method of vantage6.client.Task instance
Create a new task
Parameters
----------
organizations : list
Organization ids (within the collaboration) which need
to execute this task
name : str
Human readable name
image : str
Docker image name which contains the algorithm
description : str
Human readable description
input_ : dict
Algorithm input
collaboration : int, optional
ID of the collaboration to which this task belongs. Should be set if
the study is not set
study : int, optional
ID of the study to which this task belongs. Should be set if the
collaboration is not set
store : int, optional
ID of the algorithm store to retrieve the algorithm from
databases: list[dict], optional
Databases to be used at the node. Each dict should contain
at least a 'label' key. Additional keys are 'query' (if using
SQL/SPARQL databases), 'sheet_name' (if using Excel databases),
and 'preprocessing' information.
Returns
-------
dict
A dictionairy containing data on the created task, or a message
from the server if the task could not be created
Lets start by defining the input for the task. The
partial_average()
function requires a
column_name
parameter. We can define the input as
follows:
Basically we are defining the method to be executed, the arguments
and keyword arguments for the method. In other words, we just created a
function call in Python that would look like this:
partial_average(column_name=age)
. If you are not familiar
with Python’s args and kwargs, you can read more about them at W3C
Schools: args
(w3schools.com) and kwargs
(w3schools.com).
Now that you have defined the task input, you can create and start it by also specifying (using the IDs we collected earlier) which organizations and for which collaboration, it will be executed:
Database parameter
We have not yet explained the databases
parameter. This
parameter is used to specify the database that the nodes will use to
execute the algorithm. Each node can have multiple databases, and you
can specify them in the databases
parameter. We will go
into more detail about this in Seting
up a vantage6 node (Episode 6).
Obtaining results
A client’s task execution request is asynchronous. This means that
once the client.task.create()
method is invoked, the task
will begin running in the background, returning the control to the
Python program immediately (i.e., without waiting for the task to
complete).
This means that in case you want to use the task result in the remainder of your code, your program needs to wait until the task is completed, so you can get access to the results (or to the error details, if something goes wrong).
You can use the client.wait_for_result()
method to make
the program execution wait until the task is completed. For that, you
need the ID of the task you just created, which was included in the
dictionary returned by the client.task.create()
method. For
the task execution request of the code snippet above, this will look
like:
Aggregate results
The results contain the output of the algorithm. In the case of the
partial_average()
function, the output is not yet
aggregated. This means that the output of each node is returned
separately. In the case of the central_average()
function,
the output is aggregated and only the aggregated result is returned.
For now we can aggregate the results ourselfs:
Create a central task
In the previous section you created a task to run the
partial_average()
function. Now, create a task to run the
central_average()
function.
5. Run central method
In section Creating a new task it
is explained how to create a task to run the
partial_average()
function. Now, create a task to run the
central_average()
function. ⚠ Make sure to
only send the task to a single organization.
PYTHON
input_ = {
'method': 'central_average',
'args': [],
'kwargs': {'column_name': 'age'}
}
average_task = client.task.create(
organizations=[1],
study=1,
name="name_for_the_task",
image="harbor2.vantage6.ai/demo/average",
description='',
input_=input_,
databases=[
{'label': 'default'}
]
)
Then you can obtain the results by using the
client.wait_for_results()
method:
Inspecting log files
Each task consists of several runs. Each node included in the task execution will at least have one run. But in case of multi-step algorithms or iterative algorithms, a node can have multiple runs. Each run has a log file that contains information about the execution of the algorithm on the node.
6. Inspect log files
- Retrieve the log files from the central method from previous challenge.
- Rerun the central method, but this time use a column name that does
not exist in the dataset (e.g.
abc123
). Retrieve the log files from this task as well.
To retrieve the log files from any task, you can use the
client.run.from_task()
method:
In the second case you should be able to find an exception in the log file. Is the error message clear enough to understand what went wrong?