Advanced / Optional Dependencies

The dependencies described here are not required to run Snekmer. They provide optional performance improvements for specific use cases.

Blazing Signature Filter (BSF): Faster Clustering

The Blazing Signature Filter is an optional dependency used by snekmer cluster to compute pairwise Jaccard distance matrices more efficiently. It applies to the following clustering methods (set via the cluster.method config parameter):

  • density-jaccard

  • hdensity-jaccard

  • agglomerative-jaccard (default)

If BSF is not installed, Snekmer automatically falls back to scipy.spatial.distance.pdist for Jaccard distance computation. Clustering will produce identical results; BSF simply runs faster on large datasets.

BSF is not compatible with Apple silicon (M1/M2/M3) systems. See the known Apple silicon issues for details.

Install GCC (required before installing BSF)

BSF requires GCC 4.9 or later. Install it for your operating system:

macOS

brew install gcc llvm libomp

After installing llvm, Homebrew may print a “Caveats” message with additional flags that need to be set. Follow those instructions to ensure GCC is correctly resolved. A typical caveats message looks like:

If you need to have llvm first in your PATH, run:
  echo 'export PATH="/usr/local/opt/llvm/bin:$PATH"' >> ~/.zshrc

For compilers to find llvm you may need to set:
  export LDFLAGS="-L/usr/local/opt/llvm/lib"
  export CPPFLAGS="-I/usr/local/opt/llvm/include"

Windows / Linux / Unix

See the BSF documentation for platform-specific GCC installation instructions.

Install BSF

With GCC installed and the Snekmer virtual environment active:

pip install git+https://github.com/PNNL-CompBio/bsf-jaccard-py#egg=bsf