PanACoTA.corepers_module
package¶
corepers module of PanACoTA
persistent_functions
submodule¶
Functions to generate a persistent genome from a pangenome.
@author gem April 2017
- PanACoTA.corepers_module.persistent_functions.get_pers(fam_by_strain, fam_all_members, nb_strains, tol=1, multi=False, mixed=False, floor=False)¶
From the list of families, get the Pers Genome families, that are families having at least tol% of ‘nb_strain’ members.
- Parameters:
- fam_by_straindict
{fam_num: {genome1: [members], genome2: [members]}, fam_num2: {genome1: [members]}}
- fam_all_membersdict
{fam_num: [all members]}
- nb_strainsint
total number of strains/genomes in dataset
- tolfloat
min percentage of different genomes present in a family ex: if tol=50%, and there are 8 genomes. If a family contains 3 genomes, it is not persistent. If it contains 7 genomes, it can be persistent (depends on multi and mixed parameters)
- multibool
True if multiple genes from the same genome/strain in a family are tolerated. -> a family is considered as multi-persistent if it has members from at least ‘tol%’ genomes False otherwise
- mixedbool
True if mixed families are allowed (mixed family = exactly 1 member per genome for at least tol% of the genomes, 0 or several members allowed for other (1-tol)% genomes)
- floorbool
Use a minimum number of genomes containing a gene to consider the family persistent equal to: floor(nb_strains*tol) genomes if True, ceil(nb_strains*tol) if False.
- Returns:
- dict
{fam_num: [list of members]} for persistent families
- PanACoTA.corepers_module.persistent_functions.get_subset_genomes(fam_by_strain, fam_all_members, list_file)¶
If the user gives a list of genomes, which is a subset of all genomes in the pangenome, just keep them in fam_by_strain and fam_all_members dicts. This will give the pangenome of those genomes. They will then be handled by the get_pers function to get corresponding core/persistent genome
- Parameters:
- fam_by_straindict
{fam_num: {genome1: [members], genome2: [members]}, fam_num2: {genome1: [members]}}
- fam_all_membersdict
{fam_num: [all members]}
- list_filestr
name of file containing all genome names
- PanACoTA.corepers_module.persistent_functions.is_in_subset(member, list_genomes)¶
From a list of members, keep only those in the given list of genomes
- Parameters:
- membersstr
protein name
- list_genomesstr
filename containing list of genomes
- PanACoTA.corepers_module.persistent_functions.mixed_family(family, thres)¶
1 family = several genomes (genome=strain), each containing x members Returns True if at least ‘thres’ genomes of the family have exactly 1 member.
- Parameters:
- familydict
{strain1: [members in strain1]}
- thresfloat
minimum number of genomes which must have exactly 1 member
- Returns:
- bool
- PanACoTA.corepers_module.persistent_functions.uniq_members(family, num=1)¶
Returns True if, in the family, each genome has no more than ‘num’ member(s), False otherwise (multigenic family)
- Parameters:
- familydict
{strain1: [members in strain1], strain2: [members in strain2]}
- numint
max number of members allowed in each genome to return True
- Returns:
- bool
- PanACoTA.corepers_module.persistent_functions.write_persistent(fams, outfile)¶
Write persistent families into output file
- Parameters:
- famsdict
{num_fam: [members]}
- outfilestr
output file to write all families