Peptide Generation script using PyPept

Dhruv_J · October 26, 2023, 2:02pm

Hello,
I am currently working on peptide generation script using pypept and Rdkit.
I am having doubts regarding the script:
Code:

def getOptions():
    userOptions = {}

    userOptions['Format'] = {}
    userOptions['Format']['label'] = 'Peptide Format'
    userOptions['Format']['type'] = 'stringList'
    userOptions['Format']['values'] = ["biln","helm", "fasta"]
    userOptions['Format']['default'] = "biln"

    userOptions['Sequence'] = {}
    userOptions['Sequence']['label'] = 'Peptide Sequence'
    userOptions['Sequence']['type'] = 'string'
    userOptions['Sequence']['default'] = "ac-D-T-H-F-E-I-A-am"

    userOptions['secondary_structure'] = {}
    userOptions['secondary_structure']['label'] = 'Secondary Structure'
    userOptions['secondary_structure']['type'] = 'stringList'
    userOptions['secondary_structure']['values'] = ['B (beta bridge)', 'H (alpha helix)','E (beta strand)','S (bend)','T (turn)','G (3/10 helix)' ]
    userOptions['secondary_structure']['default'] = 'B (beta bridge)'

    opts = {'userOptions': userOptions}

    return opts


def peptide_generation(opts):
    format = opts['Format']
    sequence = opts['Sequence']
    secondary_structure = opts['secondary_structure']
    seq = Sequence(sequence)

    # Correct atom names in the sequence object
    seq = correct_pdb_atoms(seq)

    if format == "helm":
        # Convert HELM to BILN
        b = Converter(helm=sequence)
        biln = b.get_biln()
        seq = Sequence(biln)
        seq = correct_pdb_atoms(seq)

        # Generate the RDKit object
    mol = Molecule(seq)
    romol = mol.get_molecule(fmt='ROMol')

        # Create the peptide conformer with corrected atom names and secondary structure
    # fasta = Conformer.get_peptide(sequence)
    romol = Conformer.generate_conformer(romol, secondary_structure, generate_pdb=True)

What if the user provides input in fasta format. The docs here just say about biln and helm format inputs
How to provide inputs:
is this correct?

 echo '{"format":"biln", "Sequence":"ac-D-T-H-F-E-I-A-am","secondary_structure":"B (beta bridge)"}'| python peptide_generation.py --run-command

Also I am facing this error:

Traceback (most recent call last):
  File "C:\Users\dhruv\Desktop\OpenChemistry\avogadro-commands\peptide_generation.py", line 11, in <module>
    from pyPept.sequence import Sequence
  File "<frozen zipimport>", line 259, in load_module
  File "C:\Users\dhruv\anaconda3\envs\pypept\lib\site-packages\pypept-1.0-py3.9.egg\pyPept\sequence.py", line 33, in <module>
  File "C:\Users\dhruv\anaconda3\envs\pypept\lib\site-packages\rdkit\Chem\__init__.py", line 23, in <module>
    from rdkit.Chem.rdmolfiles import *
ImportError: DLL load failed while importing rdmolfiles: The specified procedure could not be found.

I have tried to install RDkit in the environment but the problem still persists.
Also confirm whether the procedure to generate pdb file is correct or not
Sry for the inconvenience and Thank you in advance

Dhruv_J · October 26, 2023, 3:53pm

I also referred to stack overflow for the above issue.Thus, I even tried to install previous version of python but then when I tried to reinstall rdkit I encountered this error

Solving environment: \ *** picosat: out of memory in 'new'

Do you have any other alternative package or resource or any other way to proceed with the script?

ghutchis · October 26, 2023, 4:56pm

Thanks for looking into it.

I’ll take a look at PyPept in a bit - I have a bunch of meetings today and am working to get v1.98 released.

Dhruv_J · October 27, 2023, 7:18pm

Hello,
The code is working for biln and helm inputs ie the pdb file is being generated but

How to proceed with fasta input?
I had created envs separately for pypept and rdkit but the conda base python executable is supporting both the packages ? I do not know the reason so can you please help.

(Sry had asked a bit silly doubt previously for helm input but now I have fixed the problem in my code so its working for helm input as well)

ghutchis · October 28, 2023, 4:14pm

I’m not sure I understand… are you saying the base environment does not support both packages? If so, can you tell me what version of Python and what version of rdkit you have installed?

Dhruv_J · October 28, 2023, 4:26pm

No, I am saying that I had installed separate envs for both pypept and rdkit in \anaconda3\envs
But the base environment is supporting both of them. So, basically my doubt was …Do the base environment supports all packages which have been installed in \anaconda3\envs?

Python 3.11.5
Rdkit 2022.09.5

Thanks

ghutchis · October 28, 2023, 4:40pm

No. The point of environments is that sometimes packages … don’t get along together.

For example, some packages took a long time to update to python3 syntax. So you create an environment for old packages.

The base environment is just the default if you don’t create / specify environments. If you want a tool like pypept that also requires rdkit you need to have them both installed in the same environment.

Hope that helps!

The script looks really useful.

ghutchis · October 28, 2023, 4:41pm

I looked at their script:

        residues=list(args.fasta)
        biln = "-".join(residues)

So it seems easy to go from fasta to biln.
(It would have been nice to see a note in the README / docs, but at least you can look at their code.)

Dhruv_J · October 28, 2023, 4:42pm

Oh, I didn’t find that, okay then, yes we are almost there. Sending pr soon

ghutchis · October 28, 2023, 4:44pm

This one seems like it might be better in a separate repository, so we can add a requirements.txt to install pypept and rdkit properly.

Another thought would be to fork the repository and add the plugin script and requirements.txt that includes rdkit, etc.

Dhruv_J · October 28, 2023, 4:56pm

So, should I create a separate folder in avogadro commands repo or completely different repo?
Also just have a look at the final code

def getOptions():
    userOptions = {}

    userOptions['Format'] = {}
    userOptions['Format']['label'] = 'Peptide Format'
    userOptions['Format']['type'] = 'stringList'
    userOptions['Format']['values'] = ["biln","helm", "fasta"]
    userOptions['Format']['default'] = "biln"

    userOptions['Sequence'] = {}
    userOptions['Sequence']['label'] = 'Peptide Sequence'
    userOptions['Sequence']['type'] = 'string'
    userOptions['Sequence']['default'] = "ac-D-T-H-F-E-I-A-am"

    userOptions['secondary_structure'] = {}
    userOptions['secondary_structure']['label'] = 'Secondary Structure'
    userOptions['secondary_structure']['type'] = 'stringList'
    userOptions['secondary_structure']['values'] = ['B', 'H','E','S','T','G' ]
    userOptions['secondary_structure']['default'] = 'B'

    opts = {'userOptions': userOptions}

    return opts


def peptide_generation(opts):
    format = opts['Format']
    sequence = opts['Sequence']
    secondary_structure = opts['secondary_structure']

    if format == "helm":
        b = Converter(helm=sequence)
        sequence = b.get_biln()

    if format == "fasta":
        sequence = "-".join(sequence)

    seq = Sequence(sequence)
    seq = correct_pdb_atoms(seq)
    mol = Molecule(seq)
    romol = mol.get_molecule(fmt='ROMol')
    romol = Conformer.generate_conformer(romol, secondary_structure, generate_pdb=True)

Do let me know if any changes are required

ghutchis · October 28, 2023, 5:06pm

Yes, I think it should get a separate repository, maybe as a fork of pypept since that’s not yet available through PyPI or conda.

One suggestion I should have made earlier … you can save some typing with the userOptions, e.g.:

userOptions['Format'] = {
    'label': 'Peptide Format',
    'type': 'stringList',
    'values': ["biln","helm", "fasta"]
    'default': 'biln'
}
userOptions['Sequence'] = {
   'label': 'Peptide Sequence',
   'type': 'string',
   'default': "ac-D-T-H-F-E-I-A-am"
}
# etc.

It’s not required, but it might be easier to set up.

Dhruv_J · October 28, 2023, 5:08pm

So, After forking pypept I will have to just paste the plugin, peptide generation script and generate the requirements.txt file for the whole repo right??

So, should I do it for all scripts? Just asking so that there is uniformity in the scripts code

ghutchis · October 28, 2023, 5:33pm

Yes, that would be my suggestion.

I don’t think it’s necessary. I’m trying to switch the documentation since I think it’s easier to type. But both syntax work.

Dhruv_J · October 28, 2023, 6:13pm

Please have a look at the script.
Sry took quite long time for this
How can I send it to the avogadro repo?

Any feedback then please let me know I’ll be happy to work on the same

ghutchis · October 29, 2023, 7:03pm

Submit a pull request to the repository list

One concern is that the requirements.txt seems like it’s from your conda environment. I think something like this should be enough.

rdkit >= v2022.03.1
numpy >= 1.22.2
pandas >= 1.4.1
requests >= 2.27.1
igraph >= 0.9.10
biopython >= 1.79

Dhruv_J · October 30, 2023, 11:40am

Yes, have made the required changes, kindly have a look at the pr. If everything is apt kindly merge the same.

If the required scripts are done
please do let me know if there are any issues which I can solve? or should I pick any issue available on github and try to find solutions for the same?

Thank you very much

Dhruv_J · November 2, 2023, 6:03pm

Hello
Can you please look into this as I believe it’s ready for review. Your feedback is highly valued. If you have any comments or need any additional information, please let me know.

If there are other GitHub issues I can tackle or any way I can assist, please do direct me.

ghutchis · November 2, 2023, 7:34pm

Please forgive my delay - I’ve been busy this week and haven’t had time to verify it myself.

I guess another package that might be interesting is for sugars:

You would have the same thing - a string input … and then:

from glyles import convert_generator

# get input from the user options
label, smiles = convert(input)

# return smiles with format "SMILES" to Avogadro

Dhruv_J · November 4, 2023, 10:55am

Thank you for your message, and please don’t feel the need to apologize… I completely understand that you’ve been busy this week, and I truly appreciate your effort in taking the time to verify it. Your feedback is highly valuable.

I am ready with the code and am requesting your feedback on this:

def getOptions():
    userOptions = {}

    userOptions['glycan'] = {}
    userOptions['glycan']['label'] = 'convert glycan to smiles'
    userOptions['glycan']['type'] = 'string'
    userOptions['glycan']['default'] = "Gal"

    opts = {'userOptions': userOptions}

    return opts


def convert_to_smiles(opts):
    glycan= opts['glycan']
    smiles=convert(glycan)
    return smiles


def runCommand():
    # Read options from stdin
    stdinStr = sys.stdin.read()

    # Parse the JSON strings
    opts = json.loads(stdinStr)

    # Prepare the result
    result = convert_to_smiles(opts)
    return result

Output:

PS C:\Users\dhruv\Desktop\OpenChemistry\avogadro-commands> echo '{
>>   "glycan": "Man(a1-2)Man"}'| python glycans_to_smiles.py --run-command

[[“Man(a1-2)Man”, “O1C(O)C@@H C@@H C@H[C@H]1CO”]]

So, I was thinking can I send the pr directly to avogadro-commands repo, instead of creating a separate repo.
We can add in the readme file to perform

pip install glyles

for the plugin…Is that okay??

Thank you very much