Introduction
A Mantle dataset allows you to link raw data files, values, and/or metadata together.
Datasets can be created, updated, accessed, and queried for via the Mantle SDK.
Creating a dataset
A dataset must have at minimum a name.
import mantlebio
mantle = mantlebio.MantleClient()
dataset = mantle.dataset.create(
name="example_minimal_dataset",
local=False
)
The local
keyword in Mantle relates to whether you’re asking the system
to create a new dataset in the database via an API request.
In this version of the SDK, local
is set to True
by default to be consistent with earlier versions.
However, for most purposes, setting it to False
is appropriate.
When set to False
, the dataset is automatically pushed to Mantle.
Creating a dataset with properties
Properties can be added to a dataset upon creation using a dictionary.
The valid types for properties are:
- string
- integer
- double (float)
- boolean
- file (S3 files)
Create a new dataset with properties
import mantlebio
mantle = mantlebio.MantleClient()
dataset = mantle.dataset.create(
name="4XP1",
local=False,
properties={
"description": "X-ray structure of Drosophila dopamine transporter bound to neurotransmitter dopamine",
"resolution": 2.5,
"r_value": 0.2,
"pdb": {"file_upload": {"filename": "4xp1.pdb"}}
}
)
To test this out yourself, download the PDB file here.
Creating a dataset with a specified data type
The data type of a dataset can be specified on creation.
In this case, the data type must already exist and the dataset must have all the required properties of the data type on creation.
Create a new dataset and specify the data type
import mantlebio
mantle = mantlebio.MantleClient()
dataset = mantle.dataset.create(
name="palmer_penguins_create_dataset_example",
dataset_type="mantle_tabular_penguins",
local=False,
properties={
"penguin_data_csv": {"file_upload": {"filename": "palmer_penguins.csv"}}
}
)
To test this out yourself, download the CSV file here.
Interacting with a single dataset
Getting dataset by unique ID
Get a dataset by its unique ID
import mantlebio
mantle = mantlebio.MantleClient()
dataset = mantle.dataset.get("E000001")
Getting dataset properties
import mantlebio
mantle = mantlebio.MantleClient()
dataset = mantle.dataset.get("E000001")
dataset_properties = dataset.properties
Download S3 file properties
Download files from S3 file properties using the download_s3
method,
which takes as arguments the key of the property and the local path to
which the file will be downloaded.
Download files from properties
import mantlebio
mantle = mantlebio.MantleClient()
dataset = mantle.dataset.get("E000001")
dataset.download_s3("penguin_data_csv", "local_penguins.csv")
Updating a dataset with additional properties
To add a file to an existing dataset, you can use the upload_s3
method,
which takes as arguments the key of the property and the path to the file to be uploaded.
This method uploads your file into AWS S3 and attaches the S3 path as a file property on the dataset.
Non-file properties, such as strings and Booleans,
can be added using the set_property
method, which takes as arguments the key of the property and the value to be set.
Create a new dataset and update it with properties
import mantlebio
mantle = mantlebio.MantleClient()
dataset = mantle.dataset.create(
name="palmer_penguins_add_properties_example",
local=False
)
dataset.upload_s3("mantle_example_file", "palmer_penguins.csv")
dataset.set_property("mantle_example_continent", "Antartica")
Querying for datasets and returning a DataFrame
To get a Pandas DataFrame where datasets are represented as rows,
you can query for a group of datasets and turn them into a DataFrame.
Querying by data type
Query for datasets by data type
import mantlebio
mantle = mantlebio.MantleClient()
dataset_list = mantle.dataset.build_query().where(
"data_type_unique_id=mantle_penguin_records"
).execute()
Querying by property
Query for datasets by property
import mantlebio
mantle = mantlebio.MantleClient()
dataset_list = mantle.dataset.build_query().where(
"props.{species}.string.eq=Adelie"
).execute()
Creating a Pandas DataFrame of datasets
Convert list of datasets to Pandas DataFrame
import mantlebio
mantle = mantlebio.MantleClient()
dataset_list = mantle.dataset.build_query().where(
"data_type_unique_id=mantle_penguin_records"
).execute()
df = dataset_list.to_dataframe()