doubletdetection.BoostClassifier#
- class doubletdetection.BoostClassifier(boost_rate=0.25, n_components=30, n_top_var_genes=10000, replace=False, clustering_algorithm='phenograph', clustering_kwargs=None, n_iters=10, normalizer=None, pseudocount=0.1, random_state=0, verbose=False, standard_scaling=False, n_jobs=1)[source]#
Classifier for doublets in single-cell RNA-seq data.
- Parameters:
boost_rate (
float(default:0.25)) – Proportion of cell population size to produce as synthetic doublets.n_components (
int(default:30)) – Number of principal components used for clustering.n_top_var_genes (
int(default:10000)) – Number of highest variance genes to use. Other genes are discarded. Will use all genes when zero.replace (
bool(default:False)) – If False, a cell will be selected as a synthetic doublet’s parent no more than once.clustering_algorithm (
str(default:'phenograph')) – One of “louvain”, “leiden”, or “phenograph”. “louvain” and “leiden” refer to the scanpy implementations.clustering_kwargs (
Optional[dict] (default:None)) – Keyword args to pass directly to clustering algorithm. Note that PhenoGraph ‘prune’ default is changed to True. For Louvain and Leiden clustering, we setdirected=Falseandresolution=4. Include these params explicitly to change them. Do not overriderandom_stateandkey_addedfor Louvain/Leiden.n_iters (
int(default:10)) – Number of fit operations from which to collect p-values. Default is 25.normalizer (
Optional[Callable] (default:None)) – Method to normalize raw_counts. Defaults to normalize_counts from this package. To use normalize_counts with a different pseudocount value, use:lambda counts: doubletdetection.normalize_counts(counts, pseudocount=new_value)pseudocount (
float(default:0.1)) – Pseudocount used in normalize_counts. Using 1 with standard_scaling=False makes the classifier more memory efficient but may detect fewer doublets.random_state (
int(default:0)) – Passed to PCA and doublet parent creation. Note: PhenoGraph does not support random seeds, so identical results aren’t guaranteed across runs.verbose (
bool(default:False)) – Set to False to silence informational messages. Defaults to True.standard_scaling (
bool(default:False)) – Enable standard scaling of normalized count matrix prior to PCA. Recommended when not using Phenograph. Defaults to False.n_jobs (
int(default:1)) – Number of jobs to use. Speeds up neighbor computation.
- all_log_p_values_#
Hypergeometric test natural log p-value per cell for cluster enrichment of synthetic doublets. Use for thresholding. Shape (n_iters, num_cells).
- all_scores_#
The fraction of a cell’s cluster that is synthetic doublets. Shape (n_iters, num_cells).
- communities_#
Cluster ID for corresponding cell. Shape (n_iters, num_cells).
- labels_#
0 for singlet, 1 for detected doublet.
- parents_#
Parent cells’ indexes for each synthetic doublet. A list wrapping the results from each run.
- suggested_score_cutoff_#
Cutoff used to classify cells when n_iters == 1 (scores >= cutoff). Not produced when n_iters > 1.
- synth_communities_#
Cluster ID for corresponding synthetic doublet. Shape (n_iters, num_cells * boost_rate).
- top_var_genes_#
Indices of the n_top_var_genes used. Not generated if n_top_var_genes <= 0.
- voting_average_#
Fraction of iterations each cell is called a doublet.
Methods table#
Methods#
doublet_score#
- BoostClassifier.doublet_score()[source]#
Produce doublet scores
The doublet score is the average negative log p-value of doublet enrichment averaged over the iterations. Higher means more likely to be doublet.
- Returns:
Average negative log p-value over iterations
- Return type:
scores (ndarray, ndims=1)
fit#
- BoostClassifier.fit(raw_counts)[source]#
Fits the classifier on raw_counts.
- Parameters:
raw_counts (
ndarray[Any,dtype[TypeVar(_ScalarType_co, bound=generic, covariant=True)]] |csr_matrix) – Count matrix, oriented cells by genes.
- Sets:
all_scores_, all_log_p_values_, communities_, top_var_genes, parents, synth_communities
- Return type:
- Returns:
The fitted classifier.
predict#
- BoostClassifier.predict(p_thresh=1e-07, voter_thresh=0.9)[source]#
Produce doublet calls from fitted classifier
- Parameters:
- Sets:
labels_ and voting_average_ if n_iters > 1. labels_ and suggested_score_cutoff_ if n_iters == 1.
- Returns:
0 for singlet, 1 for detected doublet
- Return type:
labels_ (ndarray, ndims=1)