doubletdetection.BoostClassifier#

class doubletdetection.BoostClassifier(boost_rate=0.25, n_components=30, n_top_var_genes=10000, replace=False, clustering_algorithm='phenograph', clustering_kwargs=None, n_iters=10, normalizer=None, pseudocount=0.1, random_state=0, verbose=False, standard_scaling=False, n_jobs=1)[source]#

Classifier for doublets in single-cell RNA-seq data.

Parameters:

boost_rate (float (default: 0.25)) – Proportion of cell population size to produce as synthetic doublets.
n_components (int (default: 30)) – Number of principal components used for clustering.
n_top_var_genes (int (default: 10000)) – Number of highest variance genes to use. Other genes are discarded. Will use all genes when zero.
replace (bool (default: False)) – If False, a cell will be selected as a synthetic doublet’s parent no more than once.
clustering_algorithm (str (default: 'phenograph')) – One of “louvain”, “leiden”, or “phenograph”. “louvain” and “leiden” refer to the scanpy implementations.
clustering_kwargs (Optional[dict] (default: None)) – Keyword args to pass directly to clustering algorithm. Note that PhenoGraph ‘prune’ default is changed to True. For Louvain and Leiden clustering, we set directed=False and resolution=4. Include these params explicitly to change them. Do not override random_state and key_added for Louvain/Leiden.
n_iters (int (default: 10)) – Number of fit operations from which to collect p-values. Default is 25.
normalizer (Optional[Callable] (default: None)) – Method to normalize raw_counts. Defaults to normalize_counts from this package. To use normalize_counts with a different pseudocount value, use: lambda counts: doubletdetection.normalize_counts(counts, pseudocount=new_value)
pseudocount (float (default: 0.1)) – Pseudocount used in normalize_counts. Using 1 with standard_scaling=False makes the classifier more memory efficient but may detect fewer doublets.
random_state (int (default: 0)) – Passed to PCA and doublet parent creation. Note: PhenoGraph does not support random seeds, so identical results aren’t guaranteed across runs.
verbose (bool (default: False)) – Set to False to silence informational messages. Defaults to True.
standard_scaling (bool (default: False)) – Enable standard scaling of normalized count matrix prior to PCA. Recommended when not using Phenograph. Defaults to False.
n_jobs (int (default: 1)) – Number of jobs to use. Speeds up neighbor computation.

all_log_p_values_#: Hypergeometric test natural log p-value per cell for cluster enrichment of synthetic doublets. Use for thresholding. Shape (n_iters, num_cells).

all_scores_#: The fraction of a cell’s cluster that is synthetic doublets. Shape (n_iters, num_cells).

communities_#: Cluster ID for corresponding cell. Shape (n_iters, num_cells).

labels_#: 0 for singlet, 1 for detected doublet.

parents_#: Parent cells’ indexes for each synthetic doublet. A list wrapping the results from each run.

suggested_score_cutoff_#: Cutoff used to classify cells when n_iters == 1 (scores >= cutoff). Not produced when n_iters > 1.

synth_communities_#: Cluster ID for corresponding synthetic doublet. Shape (n_iters, num_cells * boost_rate).

top_var_genes_#: Indices of the n_top_var_genes used. Not generated if n_top_var_genes <= 0.

voting_average_#: Fraction of iterations each cell is called a doublet.

Methods table#

`doublet_score`()	Produce doublet scores
`fit`(raw_counts)	Fits the classifier on raw_counts.
`predict`([p_thresh, voter_thresh])	Produce doublet calls from fitted classifier

Methods#

doublet_score#

BoostClassifier.doublet_score()[source]#

Produce doublet scores

The doublet score is the average negative log p-value of doublet enrichment averaged over the iterations. Higher means more likely to be doublet.

Returns:: Average negative log p-value over iterations
Return type:: scores (ndarray, ndims=1)

fit#

BoostClassifier.fit(raw_counts)[source]#

Fits the classifier on raw_counts.

Parameters:: raw_counts (ndarray[Any, dtype[TypeVar(_ScalarType_co, bound= generic, covariant=True)]] | csr_matrix) – Count matrix, oriented cells by genes.

Sets:: all_scores_, all_log_p_values_, communities_, top_var_genes, parents, synth_communities

Return type:: BoostClassifier
Returns:: The fitted classifier.

predict#

BoostClassifier.predict(p_thresh=1e-07, voter_thresh=0.9)[source]#

Produce doublet calls from fitted classifier