plast module
This module implements the core classes and methods for the PLAST (Plasmid Search Tool)
- class plast.plast.PLAST(data, model=None, logger=None)
Bases:
objectClass representing a plasmid query for pLAST.
- Parameters:
data (PLASTData) – Reference PLASTData object.
model (str or None) – Model name to use.
logger (Any, optional) – Optional logger instance.
- to_dict()
Return the query as a dictionary.
- Returns:
Dictionary representation of the PLAST object.
- Return type:
dict
- static from_dict(data, plast_data, logger=None)
Load the query from a dictionary.
- to_json(filename=None)
Return the query as a JSON object or write to file.
- Parameters:
filename (str or None) – Optional filename to write JSON.
- Returns:
JSON string if filename is None, else None.
- Return type:
str or None
- Raises:
Exception – On file write error.
- static from_json(json_data, data)
Load the query from a JSON string or Path.
- log(message, level='info')
Log a message via self.logger if available, else stdout.
- Parameters:
message (str) – Message to log.
level (str) – Logging level.
- Return type:
None
- error(message)
Log an error message.
- Parameters:
message (str) – Error message.
- Return type:
None
- debug(message)
Log a debug message.
- Parameters:
message (str) – Debug message.
- Return type:
None
- tmp_dir()
- Return type:
Path
- static get_by_accession(plast_id, data, model)
Get a PLAST object by its accession ID.
- Parameters:
plast_id (str) – Accession ID.
data (PLASTData) – Reference PLASTData object.
model (str) – Model name.
- Returns:
PLAST object for the accession.
- Return type:
- Raises:
NotFoundError – If accession not found.
- load_gbff(path_or_stream)
Load a GBFF file or stream and populate parsed, name, and length.
- Parameters:
path_or_stream (pathlib.Path or file-like) – Path to GBFF file or stream.
- Returns:
Self.
- Return type:
- Raises:
OSError – On file read error.
ValueError – On parsing error.
- load_nt_fasta(input_fasta)
Load a nucleotide FASTA, run Prodigal to predict ORFs, and store results in self.parsed.
- Parameters:
input_fasta (str or file-like) – FASTA file path or string.
- Returns:
Self.
- Return type:
- Raises:
OSError – On file read error.
ValueError – On parsing error.
UnicodeDecodeError – On encoding error.
- assign_mmseqs_clusters(threads=1, use_gpu=True)
Assign clusters using MMseqs2.
- Parameters:
threads (int) – Number of threads to use.
use_gpu (bool) – Whether to use GPU if available.
- Returns:
Self.
- Return type:
- Raises:
Exception – On MMseqs2 failure.
- assign_eggnog_annot(processes=4)
Assign HMMscan annotations to the query plasmid by running multiple hmmscan processes in parallel (each using one CPU core) on subsets of the CDS translations.
- Parameters:
processes (int) – Number of parallel processes.
- Returns:
Self.
- Return type:
- encode(normalize=True, transform=True, inplace=True, return_mask=False)
Encode the plasmid query into an embedding vector.
- Parameters:
normalize (bool) – Whether to normalize the embedding vector.
transform (bool) – Whether to apply weighting transformation.
inplace (bool) – If True, store the embedding in the object and return self.
return_mask (bool) – If True, also return a mask of missing vectors.
- Returns:
If inplace is True, returns self. Otherwise, returns the embedding vector (and mask if return_mask is True).
- Return type:
PLAST or numpy.ndarray or tuple[numpy.ndarray, numpy.ndarray]
- Raises:
ValueError – If return_mask is True and inplace is True.
KeyError – On encoding failure.
AttributeError – On encoding failure.
ValueError – On encoding failure.
TypeError – On encoding failure.
- get_most_similar(maxret=10, metric='cosine')
Compute most similar plasmids to the current embedding.
- Parameters:
maxret (int) – Maximum number of results to return.
metric (str) – Similarity metric (‘cosine’ or ‘euclidean’).
- Returns:
Dictionary of most similar plasmids and their metadata.
- Return type:
dict
- Raises:
ValueError – If unknown metric is provided.
- draw_network(closest_num=3)
Compute network coordinates for the query and closest_num results.
- Parameters:
closest_num (int) – Number of closest results to include.
- Returns:
Dictionary with network coordinates and labels.
- Return type:
dict