Module alvadesc

Wrapper for alvaDesc command line application (alvaDescCLI).

Requirements

The alvaDescCLIWrapper package requires:

  • Python 3.5 or higher
  • A licensed copy of alvaDesc installed on the same computer
  • Minimum alvaDesc version: 1.0.14

A few examples of use

1: Calculate two descriptors for three molecules on Windows:

    from alvadesccliwrapper.alvadesc import AlvaDesc

    aDesc = AlvaDesc('C:\Program Files\Alvascience\alvaDesc\alvaDescCLI.exe') # Windows default alvaDescCLI.exe location
    aDesc.set_input_SMILES(['C#N', 'CCCC', 'CC(=O)OC1=CC=CC=C1C(=O)O'])
    if not aDesc.calculate_descriptors(['MW', 'AMW']):
      print('Error: ' + aDesc.get_error())
    else:
      print(aDesc.get_output_descriptors())
      print(aDesc.get_output())

The result is a list of lists of float containing the required descriptors:

    ['MW', 'AMW']
    [[27.03, 9.01], [58.14, 4.1529], [180.17, 8.5795]]

2: Calculate all descriptors for an input file on Linux:

    from alvadesccliwrapper.alvadesc import AlvaDesc

    aDesc = AlvaDesc('/usr/bin/alvaDescCLI') # Linux default alvaDescCLI location
    aDesc.set_input_file('./myfile.sdf', 'MDL')
    if not aDesc.calculate_descriptors('ALL'): # with alvaDesc v2.0.0 you can also use ALL2D keyword
      print('Error: ' + aDesc.get_error())
    else:
      print(aDesc.get_output())

3: Calculate the ECFP fingerprint with size 1024 saving the result to a text file on macOS:

    from alvadesccliwrapper.alvadesc import AlvaDesc

    aDesc = AlvaDesc('/Applications/alvaDesc.app/Contents/MacOS/alvaDescCLI') # macOS default alvaDescCLI location
    aDesc.set_input_file('./myfile.sdf', 'MDL')
    aDesc.set_output_file('./test.txt')
    if not aDesc.calculate_fingerprint('ECFP', 1024):
      print('Error: ' + aDesc.get_error())
    # the result is in the output file
    #else:
    #  print(aDesc.get_output())

Notes on set_output_file:

  • when using set_output_file, the results will be saved in the specified file and they won’t be available with the get_output function.
  • set_output_file writes the output using alvaDesc standard (which can be influenced by alvaDesc settings). Do not use this function if you need a specific output file format.

4: Convert descriptors output to NumPy / Pandas:

    import numpy as np
    import pandas as pd
    from alvadesccliwrapper.alvadesc import AlvaDesc

    aDesc = AlvaDesc() # Windows is the default
    aDesc.set_input_SMILES(['C#N', 'CCCC', 'CC(=O)OC1=CC=CC=C1C(=O)O'])
    if not aDesc.calculate_descriptors(['AMW', 'MW', 'nBT']):
      print('Error: ' + aDesc.get_error())
    else:
      res_out = aDesc.get_output()
      # get molecule names according to alvaDescCLI standard
      res_mol_names = aDesc.get_output_molecule_names()
      res_desc_names = aDesc.get_output_descriptors()

      # NumPy array of array and matrix
      numpy_array_of_array = np.array([np.array(xs) for xs in res_out])
      numpy_matrix = np.matrix(res_out) # NumPy matrix
      print('NumPy matrix')
      print(numpy_matrix)

      # Pandas dataframe
      pandas_df = pd.DataFrame(res_out)
      pandas_df.columns = res_desc_names
      pandas_df.insert(loc=0, column='NAME', value=res_mol_names)
      print('Pandas dataframe')
      print(pandas_df)

The result is:

      NumPy matrix
      [[  4.1529  58.14    13.    ]
      [  8.5795 180.17    21.    ]]

      Pandas dataframe
              NAME     AMW      MW   nBT
      0  Molecule1  4.1529   58.14  13.0
      1  Molecule2  8.5795  180.17  21.0

5. Calculate the MACCS 166 fingerprint for the molecules contained in a MDL file on Linux:

    from alvadesccliwrapper.alvadesc import AlvaDesc

    aDesc = AlvaDesc('/usr/bin/alvaDescCLI') # Linux default alvaDescCLI location
    aDesc.set_input_file('./myfile.sdf', 'MDL')
    if not aDesc.calculate_fingerprint('MACCSFP'):
      print('Error: ' + aDesc.get_error())
    else:
      print(aDesc.get_output())

The result is a simple list of strings containing the required fingerprint:

    ['0000000000000000000000000000000000010000000000000000000000000000100000000000000110101111011...']

6: Run a script file created with alvaDescGUI on Windows:

    from alvadesccliwrapper.alvadesc import AlvaDesc

    aDesc = AlvaDesc() # Windows is the default

    # set_ functions are ignored when using run_script
    # aDesc.set_input_file(...)
    # aDesc.set_output_file(...)

    if not aDesc.run_script('./myscript.adscr'):
      # it could happen also if the script does not write output on the stdout
      print('Error: ' + aDesc.get_error())
    else:
      print(aDesc.get_output())

Classes

class AlvaDesc (exePath='C:/Program Files/Alvascience/alvaDesc/alvaDescCLI.exe')

Initialize the alvaDesc command line wrapper.

Args

exePath : str
alvaDescCLI executable file path (by default the path is set for the Windows version)

Methods

def calculate_descriptors(self, descriptors)

Calculate the requested descriptors.

Args

descriptors : str or list[str]
list of descriptors (e.g., ['MW', 'AMW']) or a single descriptor (e.g., 'ALL' or 'MW')

Returns

bool
The return value. True for success, False otherwise.
def calculate_fingerprint(self, fingerprint_type, fingerprint_size=1024)

Calculate the requested fingerprint.

Args

fingerprint_type : str
type of fingerprint: 'ECFP' or 'PFP' or 'MACCSFP'
fingerprint_size : int
size of fingerprint; it's not used for MACCS and by default is 1024

Returns

bool
The return value. True for success, False otherwise.
def get_descriptors(self)

Get the list of all available descriptors.

Returns

list[str]
the list of all available descriptors.
def get_error(self)

Return the error message of the previous execution.

Returns

str
the error message of the previous execution.
def get_output(self)

Return the output of the previous execution.

Returns

list[float] or list[str]
the output of the previous execution.

Note

The result is a list of lists that can be seen as a matrix where the number of rows is equal to the number of molecules and the number of columns is equal to the number of requested descriptors.

def get_output_descriptors(self)

Return the name of the descriptors calculated in the previous execution.

Returns

list[str]
the name of the descriptors of the previous execution.

Note

The result is a list with the same order as the result of get_output.

def get_output_molecule_names(self)

Return the molecule names of the previous execution.

Returns

list[str]
the name of the molecules of the previous execution.

Note

The result is a list with the same order as the result of get_output.

def get_wrapper_version(self)

Return the version of the Python alvaDescCLIWrapper.

Returns

str
the version of the Python alvaDescCLIWrapper.
def run_script(self, file_path)

Run alvaDesc with a script file.

Args

file_path : str
script file path

Returns

bool
The return value. It returns True only if the script writes the output to the stdout, False otherwise.

Notes

  • When it's used the other 'set_' functions, except for set_threads, are ignored
  • It identifies Nan values only if the Missing_String is set to 'na' in the script (i.e., <Missing_String value="na"/>)
def set_input_SMILES(self, SMILES)

Set the input molecules using the SMILES format.

Args

SMILES : str or list[str]
list of SMILES (e.g., ['CC', 'CCC']) or a single molecules (e.g., 'CC')

Note

It's alternative to set_input_file.

def set_input_file(self, file_path, file_type)

Set the input file path.

Args

file_path : str
input file path
file_type : str
input file type (SMILES or MDL or SYBYL or HYPERCHEM)

Note

It's alternative to set_input_SMILES.

def set_output_file(self, file_path)

Set the output file path.

Args

file_path : str
output file path

Note

if file_path is not an empty string, the results won't be available through get_output.

def set_threads(self, num_threads)

Set the number of threads to be used during the calculation.

Args

num_threds : int
use 0 to let alvaDescCLI automatically determine the appropriate number of threads.