Module alvadesc
Wrapper for alvaDesc command line application (alvaDescCLI).
Requirements
The alvaDescCLIWrapper
package requires:
- Python 3.5 or higher
- A licensed copy of alvaDesc installed on the same computer
- Minimum alvaDesc version: 1.0.14
A few examples of use
1: Calculate two descriptors for three molecules on Windows:
from alvadesccliwrapper.alvadesc import AlvaDesc
aDesc = AlvaDesc('C:\Program Files\Alvascience\alvaDesc\alvaDescCLI.exe') # Windows default alvaDescCLI.exe location
aDesc.set_input_SMILES(['C#N', 'CCCC', 'CC(=O)OC1=CC=CC=C1C(=O)O'])
if not aDesc.calculate_descriptors(['MW', 'AMW']):
print('Error: ' + aDesc.get_error())
else:
print(aDesc.get_output_descriptors())
print(aDesc.get_output())
The result is a list of lists of float containing the required descriptors:
['MW', 'AMW']
[[27.03, 9.01], [58.14, 4.1529], [180.17, 8.5795]]
2: Calculate all descriptors for an input file on Linux:
from alvadesccliwrapper.alvadesc import AlvaDesc
aDesc = AlvaDesc('/usr/bin/alvaDescCLI') # Linux default alvaDescCLI location
aDesc.set_input_file('./myfile.sdf', 'MDL')
if not aDesc.calculate_descriptors('ALL'): # with alvaDesc v2.0.0 you can also use ALL2D keyword
print('Error: ' + aDesc.get_error())
else:
print(aDesc.get_output())
3: Calculate the ECFP fingerprint with size 1024 saving the result to a text file on macOS:
from alvadesccliwrapper.alvadesc import AlvaDesc
aDesc = AlvaDesc('/Applications/alvaDesc.app/Contents/MacOS/alvaDescCLI') # macOS default alvaDescCLI location
aDesc.set_input_file('./myfile.sdf', 'MDL')
aDesc.set_output_file('./test.txt')
if not aDesc.calculate_fingerprint('ECFP', 1024):
print('Error: ' + aDesc.get_error())
# the result is in the output file
#else:
# print(aDesc.get_output())
Notes on set_output_file:
- when using set_output_file, the results will be saved in the specified file and they won’t be available with the get_output function.
- set_output_file writes the output using alvaDesc standard (which can be influenced by alvaDesc settings). Do not use this function if you need a specific output file format.
4: Convert descriptors output to NumPy / Pandas:
import numpy as np
import pandas as pd
from alvadesccliwrapper.alvadesc import AlvaDesc
aDesc = AlvaDesc() # Windows is the default
aDesc.set_input_SMILES(['C#N', 'CCCC', 'CC(=O)OC1=CC=CC=C1C(=O)O'])
if not aDesc.calculate_descriptors(['AMW', 'MW', 'nBT']):
print('Error: ' + aDesc.get_error())
else:
res_out = aDesc.get_output()
# get molecule names according to alvaDescCLI standard
res_mol_names = aDesc.get_output_molecule_names()
res_desc_names = aDesc.get_output_descriptors()
# NumPy array of array and matrix
numpy_array_of_array = np.array([np.array(xs) for xs in res_out])
numpy_matrix = np.matrix(res_out) # NumPy matrix
print('NumPy matrix')
print(numpy_matrix)
# Pandas dataframe
pandas_df = pd.DataFrame(res_out)
pandas_df.columns = res_desc_names
pandas_df.insert(loc=0, column='NAME', value=res_mol_names)
print('Pandas dataframe')
print(pandas_df)
The result is:
NumPy matrix
[[ 4.1529 58.14 13. ]
[ 8.5795 180.17 21. ]]
Pandas dataframe
NAME AMW MW nBT
0 Molecule1 4.1529 58.14 13.0
1 Molecule2 8.5795 180.17 21.0
5. Calculate the MACCS 166 fingerprint for the molecules contained in a MDL file on Linux:
from alvadesccliwrapper.alvadesc import AlvaDesc
aDesc = AlvaDesc('/usr/bin/alvaDescCLI') # Linux default alvaDescCLI location
aDesc.set_input_file('./myfile.sdf', 'MDL')
if not aDesc.calculate_fingerprint('MACCSFP'):
print('Error: ' + aDesc.get_error())
else:
print(aDesc.get_output())
The result is a simple list of strings containing the required fingerprint:
['0000000000000000000000000000000000010000000000000000000000000000100000000000000110101111011...']
6: Run a script file created with alvaDescGUI on Windows:
from alvadesccliwrapper.alvadesc import AlvaDesc
aDesc = AlvaDesc() # Windows is the default
# set_ functions are ignored when using run_script
# aDesc.set_input_file(...)
# aDesc.set_output_file(...)
if not aDesc.run_script('./myscript.adscr'):
# it could happen also if the script does not write output on the stdout
print('Error: ' + aDesc.get_error())
else:
print(aDesc.get_output())
Classes
class AlvaDesc (exePath='C:/Program Files/Alvascience/alvaDesc/alvaDescCLI.exe')
-
Initialize the alvaDesc command line wrapper.
Args
exePath
:str
- alvaDescCLI executable file path (by default the path is set for the Windows version)
Methods
def calculate_descriptors(self, descriptors)
-
Calculate the requested descriptors.
Args
descriptors
:str
orlist[str]
- list of descriptors (e.g., ['MW', 'AMW']) or a single descriptor (e.g., 'ALL' or 'MW')
Returns
bool
- The return value. True for success, False otherwise.
def calculate_fingerprint(self, fingerprint_type, fingerprint_size=1024)
-
Calculate the requested fingerprint.
Args
fingerprint_type
:str
- type of fingerprint: 'ECFP' or 'PFP' or 'MACCSFP'
fingerprint_size
:int
- size of fingerprint; it's not used for MACCS and by default is 1024
Returns
bool
- The return value. True for success, False otherwise.
def get_descriptors(self)
-
Get the list of all available descriptors.
Returns
list[str]
- the list of all available descriptors.
def get_error(self)
-
Return the error message of the previous execution.
Returns
str
- the error message of the previous execution.
def get_output(self)
-
Return the output of the previous execution.
Returns
list[float]
orlist[str]
- the output of the previous execution.
Note
The result is a list of lists that can be seen as a matrix where the number of rows is equal to the number of molecules and the number of columns is equal to the number of requested descriptors.
def get_output_descriptors(self)
-
Return the name of the descriptors calculated in the previous execution.
Returns
list[str]
- the name of the descriptors of the previous execution.
Note
The result is a list with the same order as the result of get_output.
def get_output_molecule_names(self)
-
Return the molecule names of the previous execution.
Returns
list[str]
- the name of the molecules of the previous execution.
Note
The result is a list with the same order as the result of get_output.
def get_wrapper_version(self)
-
Return the version of the Python alvaDescCLIWrapper.
Returns
str
- the version of the Python alvaDescCLIWrapper.
def run_script(self, file_path)
-
Run alvaDesc with a script file.
Args
file_path
:str
- script file path
Returns
bool
- The return value. It returns True only if the script writes the output to the stdout, False otherwise.
Notes
- When it's used the other 'set_' functions, except for set_threads, are ignored
- It identifies Nan values only if the Missing_String is set to 'na' in the script (i.e., <Missing_String value="na"/>)
def set_input_SMILES(self, SMILES)
-
Set the input molecules using the SMILES format.
Args
SMILES
:str
orlist[str]
- list of SMILES (e.g., ['CC', 'CCC']) or a single molecules (e.g., 'CC')
Note
It's alternative to set_input_file.
def set_input_file(self, file_path, file_type)
-
Set the input file path.
Args
file_path
:str
- input file path
file_type
:str
- input file type (SMILES or MDL or SYBYL or HYPERCHEM)
Note
It's alternative to set_input_SMILES.
def set_output_file(self, file_path)
-
Set the output file path.
Args
file_path
:str
- output file path
Note
if file_path is not an empty string, the results won't be available through get_output.
def set_threads(self, num_threads)
-
Set the number of threads to be used during the calculation.
Args
num_threds
:int
- use 0 to let alvaDescCLI automatically determine the appropriate number of threads.