PanACoTA.corepers_module package

corepers module of PanACoTA

persistent_functions submodule

Functions to generate a persistent genome from a pangenome.

@author gem April 2017

PanACoTA.corepers_module.persistent_functions.get_pers(fam_by_strain, fam_all_members, nb_strains, tol=1, multi=False, mixed=False, floor=False)

From the list of families, get the Pers Genome families, that are families having at least tol% of ‘nb_strain’ members.

Parameters:
fam_by_straindict

{fam_num: {genome1: [members], genome2: [members]}, fam_num2: {genome1: [members]}}

fam_all_membersdict

{fam_num: [all members]}

nb_strainsint

total number of strains/genomes in dataset

tolfloat

min percentage of different genomes present in a family ex: if tol=50%, and there are 8 genomes. If a family contains 3 genomes, it is not persistent. If it contains 7 genomes, it can be persistent (depends on multi and mixed parameters)

multibool

True if multiple genes from the same genome/strain in a family are tolerated. -> a family is considered as multi-persistent if it has members from at least ‘tol%’ genomes False otherwise

mixedbool

True if mixed families are allowed (mixed family = exactly 1 member per genome for at least tol% of the genomes, 0 or several members allowed for other (1-tol)% genomes)

floorbool

Use a minimum number of genomes containing a gene to consider the family persistent equal to: floor(nb_strains*tol) genomes if True, ceil(nb_strains*tol) if False.

Returns:
dict

{fam_num: [list of members]} for persistent families

PanACoTA.corepers_module.persistent_functions.get_subset_genomes(fam_by_strain, fam_all_members, list_file)

If the user gives a list of genomes, which is a subset of all genomes in the pangenome, just keep them in fam_by_strain and fam_all_members dicts. This will give the pangenome of those genomes. They will then be handled by the get_pers function to get corresponding core/persistent genome

Parameters:
fam_by_straindict

{fam_num: {genome1: [members], genome2: [members]}, fam_num2: {genome1: [members]}}

fam_all_membersdict

{fam_num: [all members]}

list_filestr

name of file containing all genome names

PanACoTA.corepers_module.persistent_functions.is_in_subset(member, list_genomes)

From a list of members, keep only those in the given list of genomes

Parameters:
membersstr

protein name

list_genomesstr

filename containing list of genomes

PanACoTA.corepers_module.persistent_functions.mixed_family(family, thres)

1 family = several genomes (genome=strain), each containing x members Returns True if at least ‘thres’ genomes of the family have exactly 1 member.

Parameters:
familydict

{strain1: [members in strain1]}

thresfloat

minimum number of genomes which must have exactly 1 member

Returns:
bool
PanACoTA.corepers_module.persistent_functions.uniq_members(family, num=1)

Returns True if, in the family, each genome has no more than ‘num’ member(s), False otherwise (multigenic family)

Parameters:
familydict

{strain1: [members in strain1], strain2: [members in strain2]}

numint

max number of members allowed in each genome to return True

Returns:
bool
PanACoTA.corepers_module.persistent_functions.write_persistent(fams, outfile)

Write persistent families into output file

Parameters:
famsdict

{num_fam: [members]}

outfilestr

output file to write all families