plast module

This module implements the core classes and methods for the PLAST (Plasmid Search Tool)

class plast.plast.PLAST(data, model=None, logger=None)

Bases: object

Class representing a plasmid query for pLAST.

Parameters:
  • data (PLASTData) – Reference PLASTData object.

  • model (str or None) – Model name to use.

  • logger (Any, optional) – Optional logger instance.

copy()

Create a copy of the PLAST object.

Returns:

A copy of the PLAST object.

Return type:

PLAST

to_dict()

Return the query as a dictionary.

Returns:

Dictionary representation of the PLAST object.

Return type:

dict

static from_dict(data, plast_data, logger=None)

Load the query from a dictionary.

Parameters:
  • data (dict) – Dictionary containing PLAST data.

  • plast_data (PLASTData) – Reference PLASTData object.

  • logger (Any, optional) – Optional logger instance.

Returns:

Loaded PLAST object.

Return type:

PLAST

to_json(filename=None)

Return the query as a JSON object or write to file.

Parameters:

filename (str or None) – Optional filename to write JSON.

Returns:

JSON string if filename is None, else None.

Return type:

str or None

Raises:

Exception – On file write error.

static from_json(json_data, data)

Load the query from a JSON string or Path.

Parameters:
  • json_data (str or pathlib.Path) – JSON string or Path to JSON file.

  • data (PLASTData) – Reference PLASTData object.

Returns:

Loaded PLAST object.

Return type:

PLAST

Raises:

ValueError – If input is not valid.

log(message, level='info')

Log a message via self.logger if available, else stdout.

Parameters:
  • message (str) – Message to log.

  • level (str) – Logging level.

Return type:

None

error(message)

Log an error message.

Parameters:

message (str) – Error message.

Return type:

None

debug(message)

Log a debug message.

Parameters:

message (str) – Debug message.

Return type:

None

tmp_dir()
Return type:

Path

static get_by_accession(plast_id, data, model)

Get a PLAST object by its accession ID.

Parameters:
  • plast_id (str) – Accession ID.

  • data (PLASTData) – Reference PLASTData object.

  • model (str) – Model name.

Returns:

PLAST object for the accession.

Return type:

PLAST

Raises:

NotFoundError – If accession not found.

load_gbff(path_or_stream)

Load a GBFF file or stream and populate parsed, name, and length.

Parameters:

path_or_stream (pathlib.Path or file-like) – Path to GBFF file or stream.

Returns:

Self.

Return type:

PLAST

Raises:
  • OSError – On file read error.

  • ValueError – On parsing error.

load_nt_fasta(input_fasta)

Load a nucleotide FASTA, run Prodigal to predict ORFs, and store results in self.parsed.

Parameters:

input_fasta (str or file-like) – FASTA file path or string.

Returns:

Self.

Return type:

PLAST

Raises:
  • OSError – On file read error.

  • ValueError – On parsing error.

  • UnicodeDecodeError – On encoding error.

assign_mmseqs_clusters(threads=1, use_gpu=True)

Assign clusters using MMseqs2.

Parameters:
  • threads (int) – Number of threads to use.

  • use_gpu (bool) – Whether to use GPU if available.

Returns:

Self.

Return type:

PLAST

Raises:

Exception – On MMseqs2 failure.

assign_eggnog_annot(processes=4)

Assign HMMscan annotations to the query plasmid by running multiple hmmscan processes in parallel (each using one CPU core) on subsets of the CDS translations.

Parameters:

processes (int) – Number of parallel processes.

Returns:

Self.

Return type:

PLAST

encode(normalize=True, transform=True, inplace=True, return_mask=False)

Encode the plasmid query into an embedding vector.

Parameters:
  • normalize (bool) – Whether to normalize the embedding vector.

  • transform (bool) – Whether to apply weighting transformation.

  • inplace (bool) – If True, store the embedding in the object and return self.

  • return_mask (bool) – If True, also return a mask of missing vectors.

Returns:

If inplace is True, returns self. Otherwise, returns the embedding vector (and mask if return_mask is True).

Return type:

PLAST or numpy.ndarray or tuple[numpy.ndarray, numpy.ndarray]

Raises:
  • ValueError – If return_mask is True and inplace is True.

  • KeyError – On encoding failure.

  • AttributeError – On encoding failure.

  • ValueError – On encoding failure.

  • TypeError – On encoding failure.

get_most_similar(maxret=10, metric='cosine')

Compute most similar plasmids to the current embedding.

Parameters:
  • maxret (int) – Maximum number of results to return.

  • metric (str) – Similarity metric (‘cosine’ or ‘euclidean’).

Returns:

Dictionary of most similar plasmids and their metadata.

Return type:

dict

Raises:

ValueError – If unknown metric is provided.

draw_network(closest_num=3)

Compute network coordinates for the query and closest_num results.

Parameters:

closest_num (int) – Number of closest results to include.

Returns:

Dictionary with network coordinates and labels.

Return type:

dict