{ "cells": [ { "cell_type": "markdown", "id": "a1b2c3d4", "metadata": {}, "source": [ "# Snekmer `easy` Tutorial\n", "\n", "**`snekmer easy`** is a streamlined front-end for the Learn/Apply pipeline. \n", "It takes your training sequences, query sequences, and annotation info and handles the rest.\n", "\n", "This tutorial uses the demo data included in the Snekmer repository and runs entirely from the command line.\n", "\n", "---\n", "\n", "**When to use `easy` vs. `snekmer learn` / `snekmer apply` directly:**\n", "\n", "| Situation | Recommendation |\n", "|---|---|\n", "| First time running, want results fast | `easy` |\n", "| Existing `config.yaml` and directory setup | `snekmer learn` then `snekmer apply` |\n", "| Adding new training data to an existing model | `snekmer learn` then `snekmer apply` |\n", "| Exploring parameters interactively | `easy` wizard |" ] }, { "cell_type": "markdown", "id": "b2c3d4e5", "metadata": {}, "source": [ "## Setup\n", "\n", "Install Snekmer and activate your environment before running this notebook. \n", "See the [installation guide](https://snekmer.readthedocs.io/en/latest/getting_started/install.html) for details.\n", "\n", "To use Snekmer inside a Jupyter notebook kernel:\n", "\n", "```bash\n", "source ~/snekmer_env/bin/activate\n", "pip install ipykernel\n", "python -m ipykernel install --user --name=snekmer\n", "jupyter notebook\n", "```\n", "\n", "> **Note:** This notebook assumes you are running it from the `docs/source/tutorial/` directory." ] }, { "cell_type": "code", "execution_count": 1, "id": "c3d4e5f6", "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "\n", "# Paths to demo data (relative to docs/source/tutorial/)\n", "DEMO_ROOT = Path(\"../../../resources/demo_sequences/learn_apply_inputs\")\n", "train_dir = DEMO_ROOT / \"learn\"\n", "query_file = DEMO_ROOT / \"apply\" / \"test_sequences_1.fasta\"\n", "ann_file = DEMO_ROOT / \"annotations\" / \"TIGRFAMs_annotation.ann\"\n", "output_dir = Path(\"easy_output\")\n", "results_path = output_dir / \"apply\" / \"snekmer_results.csv\"" ] }, { "cell_type": "code", "execution_count": 2, "id": "verify-version", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1.4.1\n" ] } ], "source": [ "!snekmer --version" ] }, { "cell_type": "markdown", "id": "d4e5f6a7", "metadata": {}, "source": [ "## What `easy` needs\n", "\n", "`easy` requires three inputs:\n", "\n", "| Input | Flag | Description |\n", "|---|---|---|\n", "| Training sequences | `--train` | FASTA file **or** directory of FASTA files with known annotations |\n", "| Query sequences | `--query` | FASTA file **or** directory of FASTA files to annotate |\n", "| Annotations | `--ann` | Path to a `.ann` file (TSV: `id` \\t `family`) |\n", "| *(or)* | `--create-ann` | Auto-generate annotations from training FASTA headers (see below) |\n", "\n", "### Annotation file format (`.ann`)\n", "\n", "A tab-separated file with two columns: `id` and `family`:\n", "\n", "```\n", "id family\n", "A0A2D0MWR0 TIGR04183\n", "A0A2D0MY79 TIGR04131\n", "A0A1Y4R5C6 TIGR00722\n", "```\n", "\n", "The `id` must match the accession in your FASTA headers. \n", "For UniProt-style headers (`>db|ACCESSION|name ...`), Snekmer extracts the field **between the first pair of `|` characters**.\n", "\n", "### Auto-generating annotations with `--create-ann`\n", "\n", "If your training FASTA headers encode the family label between pipes, you can skip the `.ann` file:\n", "\n", "```\n", ">db|FAMILY_LABEL|seqid description\n", "```\n", "\n", "Use `--create-ann` and Snekmer will parse the headers and build the annotation file for you." ] }, { "cell_type": "markdown", "id": "e5f6a7b8", "metadata": {}, "source": [ "## Demo data\n", "\n", "The demo data is included in the Snekmer repository under `resources/demo_sequences/learn_apply_inputs/`:\n", "\n", "```\n", "resources/demo_sequences/learn_apply_inputs/\n", "\u251c\u2500\u2500 learn/ \u2190 10 training FASTA files (5,000 annotated proteins, 200 TIGRFAM families)\n", "\u2502 \u251c\u2500\u2500 training_sequences_1.fasta\n", "\u2502 \u251c\u2500\u2500 ...\n", "\u2502 \u2514\u2500\u2500 training_sequences_10.fasta\n", "\u251c\u2500\u2500 apply/ \u2190 1 query FASTA (3,000 proteins: in-family, other families, unannotated)\n", "\u2502 \u2514\u2500\u2500 test_sequences_1.fasta\n", "\u2514\u2500\u2500 annotations/\n", " \u2514\u2500\u2500 TIGRFAMs_annotation.ann \u2190 id/family TSV\n", "```" ] }, { "cell_type": "code", "execution_count": 3, "id": "f6a7b8c9", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Available: ../../../resources/demo_sequences/learn_apply_inputs/learn\n", "Available: ../../../resources/demo_sequences/learn_apply_inputs/apply/test_sequences_1.fasta\n", "Available: ../../../resources/demo_sequences/learn_apply_inputs/annotations/TIGRFAMs_annotation.ann\n" ] } ], "source": [ "%%bash\n", "DEMO=../../../resources/demo_sequences/learn_apply_inputs\n", "for f in \"$DEMO/learn\" \\\n", " \"$DEMO/apply/test_sequences_1.fasta\" \\\n", " \"$DEMO/annotations/TIGRFAMs_annotation.ann\"; do\n", " [ -e \"$f\" ] && echo \"Available: $f\" || echo \"MISSING: $f\"\n", "done" ] }, { "cell_type": "markdown", "id": "a7b8c9d0", "metadata": {}, "source": [ "## Running `easy`\n", "\n", "The command below runs the full pipeline non-interactively. All required inputs are provided as flags, so no prompts will appear.\n", "\n", "```bash\n", "snekmer easy \\\n", " --train \\\n", " --query \\\n", " --ann \\\n", " --output \n", "```\n", "\n", "All other parameters (k-mer length, alphabet, etc.) have sensible defaults and do not need to be specified for most analyses." ] }, { "cell_type": "code", "execution_count": 4, "id": "b8c9d0e1", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Assuming unrestricted shared filesystem usage.\n", "host: WE47199\n", "Building DAG of jobs...\n", "Using shell: /bin/bash\n", "Provided cores: 10\n", "Rules claiming more threads will be scaled down.\n", "Job stats:\n", "job count\n", "------------------------- -------\n", "all 1\n", "copy_results_for_apply 1\n", "eval_apply_reverse_seqs 10\n", "eval_apply_sequences 10\n", "evaluate 1\n", "learn 10\n", "learn_report 1\n", "merge 1\n", "reverse_decoy_evaluations 1\n", "total 36\n", "\n", "Select jobs to execute...\n", "Execute 10 jobs...\n", "\n", "[Fri May 1 13:29:41 2026]\n", "Job 16: Building kmer-association matrix from output/vector/vector/training_sequences_2.npz. Output written to output/learn/kmer_counts_training_sequences_2.csv.\n", "Reason: Code has changed since last execution\n", "[Fri May 1 13:29:41 2026]\n", "Job 20: Building kmer-association matrix from output/vector/vector/training_sequences_9.npz. Output written to output/learn/kmer_counts_training_sequences_9.csv.\n", "Reason: Code has changed since last execution\n", "[Fri May 1 13:29:41 2026]\n", "Job 15: Building kmer-association matrix from output/vector/vector/training_sequences_7.npz. Output written to output/learn/kmer_counts_training_sequences_7.csv.\n", "Reason: Code has changed since last execution\n", "[Fri May 1 13:29:41 2026]\n", "Job 14: Building kmer-association matrix from output/vector/vector/training_sequences_10.npz. Output written to output/learn/kmer_counts_training_sequences_10.csv.\n", "Reason: Code has changed since last execution\n", "[Fri May 1 13:29:41 2026]\n", "Job 19: Building kmer-association matrix from output/vector/vector/training_sequences_8.npz. Output written to output/learn/kmer_counts_training_sequences_8.csv.\n", "Reason: Code has changed since last execution\n", "[Fri May 1 13:29:41 2026]\n", "Job 11: Building kmer-association matrix from output/vector/vector/training_sequences_3.npz. Output written to output/learn/kmer_counts_training_sequences_3.csv.\n", "Reason: Code has changed since last execution\n", "[Fri May 1 13:29:41 2026]\n", "Job 13: Building kmer-association matrix from output/vector/vector/training_sequences_5.npz. Output written to output/learn/kmer_counts_training_sequences_5.csv.\n", "Reason: Code has changed since last execution\n", "[Fri May 1 13:29:41 2026]\n", "Job 18: Building kmer-association matrix from output/vector/vector/training_sequences_4.npz. Output written to output/learn/kmer_counts_training_sequences_4.csv.\n", "Reason: Code has changed since last execution\n", "[Fri May 1 13:29:41 2026]\n", "Job 12: Building kmer-association matrix from output/vector/vector/training_sequences_1.npz. Output written to output/learn/kmer_counts_training_sequences_1.csv.\n", "Reason: Code has changed since last execution\n", "[Fri May 1 13:29:41 2026]\n", "Job 17: Building kmer-association matrix from output/vector/vector/training_sequences_6.npz. Output written to output/learn/kmer_counts_training_sequences_6.csv.\n", "Reason: Code has changed since last execution\n", "[Fri May 1 13:29:50 2026]\n", "Finished jobid: 20 (Rule: learn)\n", "1 of 36 steps (3%) done\n", "[Fri May 1 13:29:50 2026]\n", "Finished jobid: 14 (Rule: learn)\n", "2 of 36 steps (6%) done\n", "[Fri May 1 13:29:50 2026]\n", "Finished jobid: 11 (Rule: learn)\n", "3 of 36 steps (8%) done\n", "[Fri May 1 13:29:50 2026]\n", "Finished jobid: 15 (Rule: learn)\n", "4 of 36 steps (11%) done\n", "[Fri May 1 13:29:50 2026]\n", "Finished jobid: 18 (Rule: learn)\n", "5 of 36 steps (14%) done\n", "[Fri May 1 13:29:50 2026]\n", "Finished jobid: 16 (Rule: learn)\n", "6 of 36 steps (17%) done\n", "[Fri May 1 13:29:50 2026]\n", "Finished jobid: 19 (Rule: learn)\n", "7 of 36 steps (19%) done\n", "[Fri May 1 13:29:50 2026]\n", "Finished jobid: 17 (Rule: learn)\n", "8 of 36 steps (22%) done\n", "[Fri May 1 13:29:50 2026]\n", "Finished jobid: 12 (Rule: learn)\n", "9 of 36 steps (25%) done\n", "[Fri May 1 13:29:50 2026]\n", "Finished jobid: 13 (Rule: learn)\n", "10 of 36 steps (28%) done\n", "Select jobs to execute...\n", "Execute 1 jobs...\n", "\n", "[Fri May 1 13:29:50 2026]\n", "Job 21: Merging individual k-mer association matrix files into consolidated output/learn/kmer_counts_total.csv.\n", "Reason: Input files updated by another job: output/learn/kmer_counts_training_sequences_7.csv, output/learn/kmer_counts_training_sequences_2.csv, output/learn/kmer_counts_training_sequences_5.csv, output/learn/kmer_counts_training_sequences_4.csv, output/learn/kmer_counts_training_sequences_3.csv, output/learn/kmer_counts_training_sequences_8.csv, output/learn/kmer_counts_training_sequences_10.csv, output/learn/kmer_counts_training_sequences_1.csv, output/learn/kmer_counts_training_sequences_6.csv, output/learn/kmer_counts_training_sequences_9.csv\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Dataframes merged: 1 out of 10\n", "Dataframes merged: 2 out of 10\n", "Dataframes merged: 3 out of 10\n", "Dataframes merged: 4 out of 10\n", "Dataframes merged: 5 out of 10\n", "Dataframes merged: 6 out of 10\n", "Dataframes merged: 7 out of 10\n", "Dataframes merged: 8 out of 10\n", "Dataframes merged: 9 out of 10\n", "Dataframes merged: 10 out of 10\n", "\n", "Checking for base file to merge with.\n", "\n", "No file type detected. Please use a .csv file in input/base directory.\n", "\n", "\n", "Database Merged. Not merged with base file.\n", "\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "[Fri May 1 13:29:57 2026]\n", "Finished jobid: 21 (Rule: merge)\n", "11 of 36 steps (31%) done\n", "Select jobs to execute...\n", "Execute 20 jobs...\n", "\n", "[Fri May 1 13:29:57 2026]\n", "Job 38: Using Apply to test reversed (decoy) sequences in output/vector/vector/training_sequences_6.npz. Output written to output/evaluate/eval_apply_reversed/seq_annotation_scores_training_sequences_6.csv.gz.\n", "Reason: Input files updated by another job: output/learn/kmer_counts_total.csv\n", "[Fri May 1 13:29:57 2026]\n", "Job 41: Using Apply to test reversed (decoy) sequences in output/vector/vector/training_sequences_9.npz. Output written to output/evaluate/eval_apply_reversed/seq_annotation_scores_training_sequences_9.csv.gz.\n", "Reason: Input files updated by another job: output/learn/kmer_counts_total.csv\n", "[Fri May 1 13:29:57 2026]\n", "Job 30: Using Apply to test normal sequences in output/vector/vector/training_sequences_8.npz. Output written to output/evaluate/eval_apply_sequences/seq_annotation_scores_training_sequences_8.csv.gz.\n", "Reason: Input files updated by another job: output/learn/kmer_counts_total.csv\n", "[Fri May 1 13:29:57 2026]\n", "Job 24: Using Apply to test normal sequences in output/vector/vector/training_sequences_5.npz. Output written to output/evaluate/eval_apply_sequences/seq_annotation_scores_training_sequences_5.csv.gz.\n", "Reason: Input files updated by another job: output/learn/kmer_counts_total.csv\n", "[Fri May 1 13:29:57 2026]\n", "Job 27: Using Apply to test normal sequences in output/vector/vector/training_sequences_2.npz. Output written to output/evaluate/eval_apply_sequences/seq_annotation_scores_training_sequences_2.csv.gz.\n", "Reason: Input files updated by another job: output/learn/kmer_counts_total.csv\n", "[Fri May 1 13:29:57 2026]\n", "Job 33: Using Apply to test reversed (decoy) sequences in output/vector/vector/training_sequences_1.npz. Output written to output/evaluate/eval_apply_reversed/seq_annotation_scores_training_sequences_1.csv.gz.\n", "Reason: Input files updated by another job: output/learn/kmer_counts_total.csv\n", "[Fri May 1 13:29:57 2026]\n", "Job 36: Using Apply to test reversed (decoy) sequences in output/vector/vector/training_sequences_7.npz. Output written to output/evaluate/eval_apply_reversed/seq_annotation_scores_training_sequences_7.csv.gz.\n", "Reason: Input files updated by another job: output/learn/kmer_counts_total.csv\n", "[Fri May 1 13:29:57 2026]\n", "Job 39: Using Apply to test reversed (decoy) sequences in output/vector/vector/training_sequences_4.npz. Output written to output/evaluate/eval_apply_reversed/seq_annotation_scores_training_sequences_4.csv.gz.\n", "Reason: Input files updated by another job: output/learn/kmer_counts_total.csv\n", "[Fri May 1 13:29:57 2026]\n", "Job 32: Using Apply to test reversed (decoy) sequences in output/vector/vector/training_sequences_3.npz. Output written to output/evaluate/eval_apply_reversed/seq_annotation_scores_training_sequences_3.csv.gz.\n", "Reason: Input files updated by another job: output/learn/kmer_counts_total.csv\n", "[Fri May 1 13:29:57 2026]\n", "Job 25: Using Apply to test normal sequences in output/vector/vector/training_sequences_10.npz. Output written to output/evaluate/eval_apply_sequences/seq_annotation_scores_training_sequences_10.csv.gz.\n", "Reason: Input files updated by another job: output/learn/kmer_counts_total.csv\n", "[Fri May 1 13:29:57 2026]\n", "Job 31: Using Apply to test normal sequences in output/vector/vector/training_sequences_9.npz. Output written to output/evaluate/eval_apply_sequences/seq_annotation_scores_training_sequences_9.csv.gz.\n", "Reason: Input files updated by another job: output/learn/kmer_counts_total.csv\n", "[Fri May 1 13:29:57 2026]\n", "Job 22: Using Apply to test normal sequences in output/vector/vector/training_sequences_3.npz. Output written to output/evaluate/eval_apply_sequences/seq_annotation_scores_training_sequences_3.csv.gz.\n", "Reason: Input files updated by another job: output/learn/kmer_counts_total.csv\n", "[Fri May 1 13:29:57 2026]\n", "Job 28: Using Apply to test normal sequences in output/vector/vector/training_sequences_6.npz. Output written to output/evaluate/eval_apply_sequences/seq_annotation_scores_training_sequences_6.csv.gz.\n", "Reason: Input files updated by another job: output/learn/kmer_counts_total.csv\n", "[Fri May 1 13:29:57 2026]\n", "Job 34: Using Apply to test reversed (decoy) sequences in output/vector/vector/training_sequences_5.npz. Output written to output/evaluate/eval_apply_reversed/seq_annotation_scores_training_sequences_5.csv.gz.\n", "Reason: Input files updated by another job: output/learn/kmer_counts_total.csv\n", "[Fri May 1 13:29:57 2026]\n", "Job 37: Using Apply to test reversed (decoy) sequences in output/vector/vector/training_sequences_2.npz. Output written to output/evaluate/eval_apply_reversed/seq_annotation_scores_training_sequences_2.csv.gz.\n", "Reason: Input files updated by another job: output/learn/kmer_counts_total.csv\n", "[Fri May 1 13:29:57 2026]\n", "Job 40: Using Apply to test reversed (decoy) sequences in output/vector/vector/training_sequences_8.npz. Output written to output/evaluate/eval_apply_reversed/seq_annotation_scores_training_sequences_8.csv.gz.\n", "Reason: Input files updated by another job: output/learn/kmer_counts_total.csv\n", "[Fri May 1 13:29:57 2026]\n", "Job 23: Using Apply to test normal sequences in output/vector/vector/training_sequences_1.npz. Output written to output/evaluate/eval_apply_sequences/seq_annotation_scores_training_sequences_1.csv.gz.\n", "Reason: Input files updated by another job: output/learn/kmer_counts_total.csv\n", "[Fri May 1 13:29:57 2026]\n", "Job 29: Using Apply to test normal sequences in output/vector/vector/training_sequences_4.npz. Output written to output/evaluate/eval_apply_sequences/seq_annotation_scores_training_sequences_4.csv.gz.\n", "Reason: Input files updated by another job: output/learn/kmer_counts_total.csv\n", "[Fri May 1 13:29:57 2026]\n", "Job 35: Using Apply to test reversed (decoy) sequences in output/vector/vector/training_sequences_10.npz. Output written to output/evaluate/eval_apply_reversed/seq_annotation_scores_training_sequences_10.csv.gz.\n", "Reason: Input files updated by another job: output/learn/kmer_counts_total.csv\n", "[Fri May 1 13:29:57 2026]\n", "Job 26: Using Apply to test normal sequences in output/vector/vector/training_sequences_7.npz. Output written to output/evaluate/eval_apply_sequences/seq_annotation_scores_training_sequences_7.csv.gz.\n", "Reason: Input files updated by another job: output/learn/kmer_counts_total.csv\n", "[Fri May 1 13:30:07 2026]\n", "Finished jobid: 24 (Rule: eval_apply_sequences)\n", "12 of 36 steps (33%) done\n", "[Fri May 1 13:30:07 2026]\n", "Finished jobid: 31 (Rule: eval_apply_sequences)\n", "13 of 36 steps (36%) done\n", "[Fri May 1 13:30:07 2026]\n", "Finished jobid: 27 (Rule: eval_apply_sequences)\n", "14 of 36 steps (39%) done\n", "[Fri May 1 13:30:07 2026]\n", "Finished jobid: 25 (Rule: eval_apply_sequences)\n", "15 of 36 steps (42%) done\n", "[Fri May 1 13:30:08 2026]\n", "Finished jobid: 30 (Rule: eval_apply_sequences)\n", "16 of 36 steps (44%) done\n", "[Fri May 1 13:30:08 2026]\n", "Finished jobid: 22 (Rule: eval_apply_sequences)\n", "17 of 36 steps (47%) done\n", "[Fri May 1 13:30:08 2026]\n", "Finished jobid: 28 (Rule: eval_apply_sequences)\n", "18 of 36 steps (50%) done\n", "[Fri May 1 13:30:08 2026]\n", "Finished jobid: 37 (Rule: eval_apply_reverse_seqs)\n", "19 of 36 steps (53%) done\n", "[Fri May 1 13:30:08 2026]\n", "Finished jobid: 34 (Rule: eval_apply_reverse_seqs)\n", "20 of 36 steps (56%) done\n", "[Fri May 1 13:30:08 2026]\n", "Finished jobid: 39 (Rule: eval_apply_reverse_seqs)\n", "21 of 36 steps (58%) done\n", "[Fri May 1 13:30:08 2026]\n", "Finished jobid: 36 (Rule: eval_apply_reverse_seqs)\n", "22 of 36 steps (61%) done\n", "[Fri May 1 13:30:08 2026]\n", "Finished jobid: 38 (Rule: eval_apply_reverse_seqs)\n", "23 of 36 steps (64%) done\n", "[Fri May 1 13:30:08 2026]\n", "Finished jobid: 32 (Rule: eval_apply_reverse_seqs)\n", "24 of 36 steps (67%) done\n", "[Fri May 1 13:30:08 2026]\n", "Finished jobid: 41 (Rule: eval_apply_reverse_seqs)\n", "25 of 36 steps (69%) done\n", "[Fri May 1 13:30:08 2026]\n", "Finished jobid: 33 (Rule: eval_apply_reverse_seqs)\n", "26 of 36 steps (72%) done\n", "[Fri May 1 13:30:14 2026]\n", "Finished jobid: 23 (Rule: eval_apply_sequences)\n", "27 of 36 steps (75%) done\n", "[Fri May 1 13:30:14 2026]\n", "Finished jobid: 29 (Rule: eval_apply_sequences)\n", "28 of 36 steps (78%) done\n", "[Fri May 1 13:30:14 2026]\n", "Finished jobid: 26 (Rule: eval_apply_sequences)\n", "29 of 36 steps (81%) done\n", "[Fri May 1 13:30:15 2026]\n", "Finished jobid: 35 (Rule: eval_apply_reverse_seqs)\n", "30 of 36 steps (83%) done\n", "[Fri May 1 13:30:15 2026]\n", "Finished jobid: 40 (Rule: eval_apply_reverse_seqs)\n", "31 of 36 steps (86%) done\n", "Select jobs to execute...\n", "Execute 1 jobs...\n", "\n", "[Fri May 1 13:30:15 2026]\n", "Job 42: Evaluating reverse decoy sequences and writing family stats to output/eval_conf/family_summary_stats.csv.\n", "Reason: Input files updated by another job: output/evaluate/eval_apply_reversed/seq_annotation_scores_training_sequences_3.csv.gz, output/evaluate/eval_apply_reversed/seq_annotation_scores_training_sequences_7.csv.gz, output/evaluate/eval_apply_reversed/seq_annotation_scores_training_sequences_5.csv.gz, output/evaluate/eval_apply_reversed/seq_annotation_scores_training_sequences_8.csv.gz, output/evaluate/eval_apply_reversed/seq_annotation_scores_training_sequences_4.csv.gz, output/evaluate/eval_apply_reversed/seq_annotation_scores_training_sequences_9.csv.gz, output/evaluate/eval_apply_reversed/seq_annotation_scores_training_sequences_6.csv.gz, output/evaluate/eval_apply_reversed/seq_annotation_scores_training_sequences_1.csv.gz, output/evaluate/eval_apply_reversed/seq_annotation_scores_training_sequences_10.csv.gz, output/evaluate/eval_apply_reversed/seq_annotation_scores_training_sequences_2.csv.gz\n", "[Fri May 1 13:30:20 2026]\n", "Finished jobid: 42 (Rule: reverse_decoy_evaluations)\n", "32 of 36 steps (89%) done\n", "Select jobs to execute...\n", "Execute 1 jobs...\n", "\n", "[Fri May 1 13:30:20 2026]\n", "Job 43: Calculating global confidence scores based on Apply results. Output written to output/eval_conf/global_confidence_scores.csv.\n", "Reason: Input files updated by another job: output/evaluate/eval_apply_sequences/seq_annotation_scores_training_sequences_5.csv.gz, output/evaluate/eval_apply_sequences/seq_annotation_scores_training_sequences_8.csv.gz, output/evaluate/eval_apply_sequences/seq_annotation_scores_training_sequences_10.csv.gz, output/evaluate/eval_apply_sequences/seq_annotation_scores_training_sequences_2.csv.gz, output/evaluate/eval_apply_sequences/seq_annotation_scores_training_sequences_7.csv.gz, output/evaluate/eval_apply_sequences/seq_annotation_scores_training_sequences_6.csv.gz, output/eval_conf/family_summary_stats.csv, output/evaluate/eval_apply_sequences/seq_annotation_scores_training_sequences_3.csv.gz, output/evaluate/eval_apply_sequences/seq_annotation_scores_training_sequences_4.csv.gz, output/evaluate/eval_apply_sequences/seq_annotation_scores_training_sequences_9.csv.gz, output/evaluate/eval_apply_sequences/seq_annotation_scores_training_sequences_1.csv.gz\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Base confidence file not found or multiple files present. Only one file is allowed in baseConfidence.\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "[Fri May 1 13:30:25 2026]\n", "Finished jobid: 43 (Rule: evaluate)\n", "33 of 36 steps (92%) done\n", "Select jobs to execute...\n", "Execute 2 jobs...\n", "\n", "[Fri May 1 13:30:25 2026]\n", "Job 45: Generating full Snekmer Learn Report at output/Snekmer_Learn_Report.html\n", "Reason: Input files updated by another job: output/learn/kmer_counts_training_sequences_7.csv, output/learn/kmer_counts_training_sequences_2.csv, output/learn/kmer_counts_training_sequences_5.csv, output/learn/kmer_counts_training_sequences_4.csv, output/learn/kmer_counts_training_sequences_3.csv, output/eval_conf/global_confidence_scores.csv, output/learn/kmer_counts_training_sequences_8.csv, output/learn/kmer_counts_training_sequences_10.csv, output/eval_conf/family_summary_stats.csv, output/learn/kmer_counts_training_sequences_1.csv, output/learn/kmer_counts_training_sequences_6.csv, output/eval_conf/family_stats_checkpoint.csv, output/learn/kmer_counts_total.csv, output/learn/kmer_counts_training_sequences_9.csv\n", "[Fri May 1 13:30:25 2026]\n", "Job 44: Copying files needed for downstream apply workflow to local apply_inputs directory.\n", "Reason: Input files updated by another job: output/eval_conf/global_confidence_scores.csv, output/learn/kmer_counts_total.csv, output/eval_conf/family_summary_stats.csv\n", "[Fri May 1 13:30:29 2026]\n", "Finished jobid: 45 (Rule: learn_report)\n", "34 of 36 steps (94%) done\n", "[Fri May 1 13:30:31 2026]\n", "Finished jobid: 44 (Rule: copy_results_for_apply)\n", "35 of 36 steps (97%) done\n", "Select jobs to execute...\n", "Execute 1 jobs...\n", "\n", "[Fri May 1 13:30:31 2026]\n", "localrule all:\n", " input: output/vector/vector/training_sequences_3.npz, output/vector/vector/training_sequences_1.npz, output/vector/vector/training_sequences_5.npz, output/vector/vector/training_sequences_10.npz, output/vector/vector/training_sequences_7.npz, output/vector/vector/training_sequences_2.npz, output/vector/vector/training_sequences_6.npz, output/vector/vector/training_sequences_4.npz, output/vector/vector/training_sequences_8.npz, output/vector/vector/training_sequences_9.npz, output/learn/kmer_counts_training_sequences_3.csv, output/learn/kmer_counts_training_sequences_1.csv, output/learn/kmer_counts_training_sequences_5.csv, output/learn/kmer_counts_training_sequences_10.csv, output/learn/kmer_counts_training_sequences_7.csv, output/learn/kmer_counts_training_sequences_2.csv, output/learn/kmer_counts_training_sequences_6.csv, output/learn/kmer_counts_training_sequences_4.csv, output/learn/kmer_counts_training_sequences_8.csv, output/learn/kmer_counts_training_sequences_9.csv, output/learn/kmer_counts_total.csv, output/evaluate/eval_apply_sequences/seq_annotation_scores_training_sequences_3.csv.gz, output/evaluate/eval_apply_sequences/seq_annotation_scores_training_sequences_1.csv.gz, output/evaluate/eval_apply_sequences/seq_annotation_scores_training_sequences_5.csv.gz, output/evaluate/eval_apply_sequences/seq_annotation_scores_training_sequences_10.csv.gz, output/evaluate/eval_apply_sequences/seq_annotation_scores_training_sequences_7.csv.gz, output/evaluate/eval_apply_sequences/seq_annotation_scores_training_sequences_2.csv.gz, output/evaluate/eval_apply_sequences/seq_annotation_scores_training_sequences_6.csv.gz, output/evaluate/eval_apply_sequences/seq_annotation_scores_training_sequences_4.csv.gz, output/evaluate/eval_apply_sequences/seq_annotation_scores_training_sequences_8.csv.gz, output/evaluate/eval_apply_sequences/seq_annotation_scores_training_sequences_9.csv.gz, output/evaluate/eval_apply_reversed/seq_annotation_scores_training_sequences_3.csv.gz, output/evaluate/eval_apply_reversed/seq_annotation_scores_training_sequences_1.csv.gz, output/evaluate/eval_apply_reversed/seq_annotation_scores_training_sequences_5.csv.gz, output/evaluate/eval_apply_reversed/seq_annotation_scores_training_sequences_10.csv.gz, output/evaluate/eval_apply_reversed/seq_annotation_scores_training_sequences_7.csv.gz, output/evaluate/eval_apply_reversed/seq_annotation_scores_training_sequences_2.csv.gz, output/evaluate/eval_apply_reversed/seq_annotation_scores_training_sequences_6.csv.gz, output/evaluate/eval_apply_reversed/seq_annotation_scores_training_sequences_4.csv.gz, output/evaluate/eval_apply_reversed/seq_annotation_scores_training_sequences_8.csv.gz, output/evaluate/eval_apply_reversed/seq_annotation_scores_training_sequences_9.csv.gz, output/eval_conf/family_summary_stats.csv, output/eval_conf/global_confidence_scores.csv, apply_inputs/counts/kmer_counts_total.csv, apply_inputs/stats/family_summary_stats.csv, apply_inputs/confidence/global_confidence_scores.csv, output/Snekmer_Learn_Report.html\n", " jobid: 0\n", " reason: Input files updated by another job: output/learn/kmer_counts_training_sequences_7.csv, output/evaluate/eval_apply_sequences/seq_annotation_scores_training_sequences_8.csv.gz, output/evaluate/eval_apply_sequences/seq_annotation_scores_training_sequences_2.csv.gz, apply_inputs/confidence/global_confidence_scores.csv, output/learn/kmer_counts_training_sequences_3.csv, output/evaluate/eval_apply_sequences/seq_annotation_scores_training_sequences_7.csv.gz, output/Snekmer_Learn_Report.html, output/evaluate/eval_apply_sequences/seq_annotation_scores_training_sequences_6.csv.gz, output/learn/kmer_counts_training_sequences_8.csv, output/learn/kmer_counts_training_sequences_10.csv, output/evaluate/eval_apply_reversed/seq_annotation_scores_training_sequences_9.csv.gz, apply_inputs/counts/kmer_counts_total.csv, output/learn/kmer_counts_training_sequences_1.csv, output/evaluate/eval_apply_sequences/seq_annotation_scores_training_sequences_3.csv.gz, output/eval_conf/global_confidence_scores.csv, output/learn/kmer_counts_training_sequences_6.csv, output/eval_conf/family_summary_stats.csv, output/evaluate/eval_apply_reversed/seq_annotation_scores_training_sequences_8.csv.gz, output/evaluate/eval_apply_reversed/seq_annotation_scores_training_sequences_1.csv.gz, output/evaluate/eval_apply_reversed/seq_annotation_scores_training_sequences_10.csv.gz, output/learn/kmer_counts_training_sequences_9.csv, output/evaluate/eval_apply_sequences/seq_annotation_scores_training_sequences_5.csv.gz, output/learn/kmer_counts_training_sequences_2.csv, output/learn/kmer_counts_training_sequences_5.csv, output/evaluate/eval_apply_sequences/seq_annotation_scores_training_sequences_10.csv.gz, output/evaluate/eval_apply_reversed/seq_annotation_scores_training_sequences_3.csv.gz, output/learn/kmer_counts_training_sequences_4.csv, output/evaluate/eval_apply_reversed/seq_annotation_scores_training_sequences_5.csv.gz, output/evaluate/eval_apply_reversed/seq_annotation_scores_training_sequences_7.csv.gz, output/evaluate/eval_apply_reversed/seq_annotation_scores_training_sequences_4.csv.gz, output/evaluate/eval_apply_reversed/seq_annotation_scores_training_sequences_6.csv.gz, output/evaluate/eval_apply_sequences/seq_annotation_scores_training_sequences_4.csv.gz, output/evaluate/eval_apply_sequences/seq_annotation_scores_training_sequences_9.csv.gz, output/evaluate/eval_apply_reversed/seq_annotation_scores_training_sequences_2.csv.gz, apply_inputs/stats/family_summary_stats.csv, output/evaluate/eval_apply_sequences/seq_annotation_scores_training_sequences_1.csv.gz, output/learn/kmer_counts_total.csv\n", " resources: tmpdir=/var/folders/wt/_yr7rg_13t76sq_q5cw57hxw0000gn/T\n", "[Fri May 1 13:30:31 2026]\n", "Finished jobid: 0 (Rule: all)\n", "36 of 36 steps (100%) done\n", "Complete log(s): /Users/jaco059/OneDrive - PNNL/Desktop/Snekmer_New_laptop_rename/PRE_PAPER_PRS/Snekmer/docs/source/tutorial/easy_learn_apply_output/learn/.snakemake/log/2026-05-01T132936.958491.snakemake.log\n", "Assuming unrestricted shared filesystem usage.\n", "host: WE47199\n", "Building DAG of jobs...\n", "Using shell: /bin/bash\n", "Provided cores: 10\n", "Rules claiming more threads will be scaled down.\n", "Job stats:\n", "job count\n", "------------------- -------\n", "all 1\n", "apply 1\n", "apply_report 1\n", "concat_kmer_summary 1\n", "vectorize 1\n", "total 5\n", "\n", "Select jobs to execute...\n", "Execute 1 jobs...\n", "\n", "[Fri May 1 13:30:36 2026]\n", "Job 3: Kmerizing and re-encoding Amino acids in input/test_sequences_1.fasta. Output written to output/vector/test_sequences_1.npz.\n", "Reason: Updated input files: input/test_sequences_1.fasta\n", "[Fri May 1 13:30:49 2026]\n", "Finished jobid: 3 (Rule: vectorize)\n", "1 of 5 steps (20%) done\n", "Select jobs to execute...\n", "Execute 1 jobs...\n", "\n", "[Fri May 1 13:30:49 2026]\n", "Job 2: Running Snekmer Apply on output/vector/test_sequences_1.npz. Output written to output/apply/kmer_summary_test_sequences_1.csv.\n", "Reason: Input files updated by another job: output/vector/test_sequences_1.npz\n", "[Fri May 1 13:31:00 2026]\n", "Finished jobid: 2 (Rule: apply)\n", "2 of 5 steps (40%) done\n", "Select jobs to execute...\n", "Execute 1 jobs...\n", "\n", "[Fri May 1 13:31:00 2026]\n", "Job 1: Writing consolidated k-mer summary to snekmer_results.csv\n", "Reason: Input files updated by another job: output/apply/kmer_summary_test_sequences_1.csv\n", "[Fri May 1 13:31:00 2026]\n", "Finished jobid: 1 (Rule: concat_kmer_summary)\n", "3 of 5 steps (60%) done\n", "Select jobs to execute...\n", "Execute 1 jobs...\n", "\n", "[Fri May 1 13:31:00 2026]\n", "Job 4: Generating full Snekmer Apply Report at output/Snekmer_Apply_Report.html\n", "Reason: Input files updated by another job: output/apply/kmer_summary_test_sequences_1.csv, snekmer_results.csv\n", "[Fri May 1 13:31:06 2026]\n", "Finished jobid: 4 (Rule: apply_report)\n", "4 of 5 steps (80%) done\n", "Select jobs to execute...\n", "Execute 1 jobs...\n", "\n", "[Fri May 1 13:31:06 2026]\n", "localrule all:\n", " input: snekmer_results.csv, output/apply/kmer_summary_test_sequences_1.csv, output/Snekmer_Apply_Report.html\n", " jobid: 0\n", " reason: Input files updated by another job: output/Snekmer_Apply_Report.html, output/apply/kmer_summary_test_sequences_1.csv, snekmer_results.csv\n", " resources: tmpdir=/var/folders/wt/_yr7rg_13t76sq_q5cw57hxw0000gn/T\n", "[Fri May 1 13:31:06 2026]\n", "Finished jobid: 0 (Rule: all)\n", "5 of 5 steps (100%) done\n", "Complete log(s): /Users/jaco059/OneDrive - PNNL/Desktop/Snekmer_New_laptop_rename/PRE_PAPER_PRS/Snekmer/docs/source/tutorial/easy_learn_apply_output/apply/.snakemake/log/2026-05-01T133033.584336.snakemake.log\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "--- Running snekmer learn ---\n", "\n", "Copying learn outputs to apply input directories...\n", "\n", "--- Running snekmer apply ---\n", "\n", "=== Complete ===\n", "Results: /Users/jaco059/OneDrive - PNNL/Desktop/Snekmer_New_laptop_rename/PRE_PAPER_PRS/Snekmer/docs/source/tutorial/easy_learn_apply_output/apply/snekmer_results.csv\n" ] } ], "source": [ "%%bash\n", "export MPLBACKEND=agg\n", "# This is the command line call:\n", "snekmer easy \\\n", " --train ../../../resources/demo_sequences/learn_apply_inputs/learn \\\n", " --query ../../../resources/demo_sequences/learn_apply_inputs/apply/test_sequences_1.fasta \\\n", " --ann ../../../resources/demo_sequences/learn_apply_inputs/annotations/TIGRFAMs_annotation.ann \\\n", " --output easy_output" ] }, { "cell_type": "markdown", "id": "c9d0e1f2", "metadata": {}, "source": "## Output structure\n\nAfter running, the output directory contains two sub-workspaces:\n\n```\neasy_output/\n\u251c\u2500\u2500 learn/ \u2190 Learn pipeline workspace\n\u2502 \u251c\u2500\u2500 input/ \u2190 symlinks to training FASTA files\n\u2502 \u251c\u2500\u2500 annotations/ \u2190 your .ann file\n\u2502 \u251c\u2500\u2500 config.yaml \u2190 generated config\n\u2502 \u251c\u2500\u2500 apply_inputs/ \u2190 handoff files for apply\n\u2502 \u2502 \u251c\u2500\u2500 counts/kmer_counts_total.csv\n\u2502 \u2502 \u251c\u2500\u2500 stats/family_summary_stats.csv\n\u2502 \u2502 \u2514\u2500\u2500 confidence/global_confidence_scores.csv\n\u2502 \u2514\u2500\u2500 output/\n\u2502 \u251c\u2500\u2500 learn/ \u2190 per-file and total kmer count matrices\n\u2502 \u2514\u2500\u2500 eval_conf/ \u2190 confidence scores and family stats\n\u2502\n\u2514\u2500\u2500 apply/ \u2190 Apply pipeline workspace\n \u251c\u2500\u2500 input/ \u2190 symlinks to query FASTA files\n \u251c\u2500\u2500 counts/ \u2190 kmer_counts_total.csv (copied from learn)\n \u251c\u2500\u2500 confidence/ \u2190 global_confidence_scores.csv (copied from learn)\n \u251c\u2500\u2500 stats/ \u2190 family_summary_stats.csv (copied from learn)\n \u251c\u2500\u2500 config.yaml \u2190 generated config\n \u251c\u2500\u2500 snekmer_results.csv \u2190 main results file\n \u2514\u2500\u2500 output/\n \u2514\u2500\u2500 apply/ \u2190 per-file kmer_summary CSVs\n```\n\n**The main results file** is `apply/snekmer_results.csv`." }, { "cell_type": "markdown", "id": "d0e1f2a3", "metadata": {}, "source": [ "## Reading the results\n", "\n", "The results file contains one row per query sequence with five columns:\n", "\n", "| Column | Description |\n", "|---|---|\n", "| `Sequence` | Sequence identifier from the FASTA header |\n", "| `Prediction` | Predicted family (highest cosine similarity) |\n", "| `Score` | Cosine similarity between the sequence and the predicted family |\n", "| `delta` | Gap between top and second-best similarity scores |\n", "| `Confidence` | Calibrated probability the prediction is correct (0\u20131) |" ] }, { "cell_type": "code", "execution_count": 5, "id": "e1f2a3b4", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total sequences: 3000\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SequencePredictionScoredeltaConfidence
0tr|A0A2S8EUS7|A0A2S8EUS7_9RHOBTIGR017830.1991160.000.383333
1tr|A0A401ZGP4|A0A401ZGP4_9CHLRTIGR007570.3155370.020.921569
2tr|A0A427BXE3|A0A427BXE3_9GAMMTIGR010230.1982840.081.000000
3tr|J2LWC9|J2LWC9_9BURKTIGR007970.2106970.000.383333
4tr|A0A2Z5TEJ7|A0A2Z5TEJ7_9GAMMTIGR002290.2256550.000.383333
5tr|A0A6G7WF57|A0A6G7WF57_9LACTTIGR035340.2971060.081.000000
6tr|A0A1I1XLP5|A0A1I1XLP5_9FIRMTIGR002290.2654410.020.921569
7tr|A0A0K8J9U8|A0A0K8J9U8_9FIRMTIGR002310.2253920.010.654545
8tr|A0A0D8ZMS0|A0A0D8ZMS0_9CYANTIGR004960.1695430.010.654545
9tr|R7F3B1|R7F3B1_9BACTTIGR017330.2291760.000.383333
\n", "
" ], "text/plain": [ " Sequence Prediction Score delta Confidence\n", "0 tr|A0A2S8EUS7|A0A2S8EUS7_9RHOB TIGR01783 0.199116 0.00 0.383333\n", "1 tr|A0A401ZGP4|A0A401ZGP4_9CHLR TIGR00757 0.315537 0.02 0.921569\n", "2 tr|A0A427BXE3|A0A427BXE3_9GAMM TIGR01023 0.198284 0.08 1.000000\n", "3 tr|J2LWC9|J2LWC9_9BURK TIGR00797 0.210697 0.00 0.383333\n", "4 tr|A0A2Z5TEJ7|A0A2Z5TEJ7_9GAMM TIGR00229 0.225655 0.00 0.383333\n", "5 tr|A0A6G7WF57|A0A6G7WF57_9LACT TIGR03534 0.297106 0.08 1.000000\n", "6 tr|A0A1I1XLP5|A0A1I1XLP5_9FIRM TIGR00229 0.265441 0.02 0.921569\n", "7 tr|A0A0K8J9U8|A0A0K8J9U8_9FIRM TIGR00231 0.225392 0.01 0.654545\n", "8 tr|A0A0D8ZMS0|A0A0D8ZMS0_9CYAN TIGR00496 0.169543 0.01 0.654545\n", "9 tr|R7F3B1|R7F3B1_9BACT TIGR01733 0.229176 0.00 0.383333" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.read_csv(results_path)\n", "print(f\"Total sequences: {len(df)}\")\n", "df.head(10)" ] }, { "cell_type": "markdown", "id": "f2a3b4c5", "metadata": {}, "source": [ "## Filtering by confidence\n", "\n", "A **Confidence \u2265 0.95** threshold is a reasonable starting point for high-reliability annotations. \n", "For exploratory work you may lower this; for publication-quality calls you may want to raise it.\n", "\n", "> **Note:** All sequences receive a prediction. Sequences with `Score = 0.0` have no overlapping k-mers with any training family; these predictions are not meaningful and should be excluded." ] }, { "cell_type": "code", "execution_count": 6, "id": "a3b4c5d6", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "High-confidence predictions (\u22650.95): 995 / 3000\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SequencePredictionScoredeltaConfidence
2tr|A0A427BXE3|A0A427BXE3_9GAMMTIGR010230.1982840.081.000000
5tr|A0A6G7WF57|A0A6G7WF57_9LACTTIGR035340.2971060.081.000000
10tr|C5RAN2|C5RAN2_WEIPATIGR010170.2908990.111.000000
11tr|A0A2P1NPL9|A0A2P1NPL9_9BURKTIGR005930.4771100.141.000000
14tr|A0A1Y0FXC9|A0A1Y0FXC9_9GAMMTIGR003500.1667810.030.956522
15tr|A0A6P1ZCD6|A0A6P1ZCD6_9DELTTIGR029370.2073650.030.956522
17tr|A0A3A9ZRN3|A0A3A9ZRN3_9ACTNTIGR005940.4050800.060.990991
20tr|A8FCF0|A8FCF0_BACP2TIGR002540.3660380.051.000000
22tr|A0A0N0KF67|A0A0N0KF67_9SPHNTIGR006740.3518950.151.000000
29tr|A0A2S1LEL5|A0A2S1LEL5_9FLAOTIGR004350.3103150.051.000000
\n", "
" ], "text/plain": [ " Sequence Prediction Score delta Confidence\n", "2 tr|A0A427BXE3|A0A427BXE3_9GAMM TIGR01023 0.198284 0.08 1.000000\n", "5 tr|A0A6G7WF57|A0A6G7WF57_9LACT TIGR03534 0.297106 0.08 1.000000\n", "10 tr|C5RAN2|C5RAN2_WEIPA TIGR01017 0.290899 0.11 1.000000\n", "11 tr|A0A2P1NPL9|A0A2P1NPL9_9BURK TIGR00593 0.477110 0.14 1.000000\n", "14 tr|A0A1Y0FXC9|A0A1Y0FXC9_9GAMM TIGR00350 0.166781 0.03 0.956522\n", "15 tr|A0A6P1ZCD6|A0A6P1ZCD6_9DELT TIGR02937 0.207365 0.03 0.956522\n", "17 tr|A0A3A9ZRN3|A0A3A9ZRN3_9ACTN TIGR00594 0.405080 0.06 0.990991\n", "20 tr|A8FCF0|A8FCF0_BACP2 TIGR00254 0.366038 0.05 1.000000\n", "22 tr|A0A0N0KF67|A0A0N0KF67_9SPHN TIGR00674 0.351895 0.15 1.000000\n", "29 tr|A0A2S1LEL5|A0A2S1LEL5_9FLAO TIGR00435 0.310315 0.05 1.000000" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "CONF_THRESHOLD = 0.95\n", "\n", "high_conf = df[(df[\"Confidence\"] >= CONF_THRESHOLD) & (df[\"Score\"] > 0)].copy()\n", "print(f\"High-confidence predictions (\\u2265{CONF_THRESHOLD}): {len(high_conf)} / {len(df)}\")\n", "high_conf.head(10)" ] }, { "cell_type": "code", "execution_count": 7, "id": "b4c5d6e7", "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAxIAAAGGCAYAAADvrLe3AAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjgsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvwVt1zgAAAAlwSFlzAAAPYQAAD2EBqD+naQAAUoJJREFUeJzt3QmclXP///HvtO8l7ZuUrYgS0SqJyhKVyNJiqzuSdKfFFoXIlkjcqOykqNsWCaWIlKjctKvQIu3Rouv/eH9//+s8zjmdaeaaZuZc3zOv5+NxaK45M/M51/me67o+3+VzpXme5xkAAAAACCBfkCcDAAAAAIkEAAAAgCxhRAIAAABAYCQSAAAAAAIjkQAAAAAQGIkEAAAAgMBIJAAAAAAERiIBAAAAIDASCQAAAACBkUgAjqhZs6bp0aNHln/2wgsvNC56+eWXzQknnGAKFixoypQpY7e1bNnSPjLy+eefm7S0NPt/HJ7Vq1fbfTlhwoTItnvuucduyy68X6lv3rx5pkmTJqZ48eK27SxcuDB0bfhwjrVAXkMiASSBTmQ6eX377bcJv6+L5JNOOsnkdT/99JM9odeuXds899xz5j//+U+yQ0I2ePrpp2Mu5vKy3377zV7M5uYFdbxvvvnG3HjjjaZhw4Y2YT9UcqjvJXo8+OCDGf6dffv2mc6dO5s///zTPP7447aT4KijjsrmVwMgNxXI1b8GIMt+/vlnky9f3sr91UN94MAB88QTT5hjjjkmsv3jjz9Oalz4P3feeacZPHhwlhKJcuXKHdTr26JFC/PXX3+ZQoUK5alE4t5777W94PXr109KDB988IF5/vnnzcknn2xq1aplli5desjnn3vuuaZbt24x2xo0aJDh31mxYoX55ZdfbKfA9ddfb3Kbkha1LyVLh5IXj7VAVpFIAI4oXLiwyWs2btxo/+9PafLlpQvNw+V5nvn7779N0aJFs/13FyhQwD6yiy7eihQpkm2/D5nTu3dvM2jQINtG+vTpk2Eicdxxx5mrr7462z7PuUUjJ5lpX3nxWAtkFSk34IhE83Z/+OEHc9ZZZ9kLgGrVqpn77rvPjB8/3p4wNR843uzZs02jRo3syVQ9jy+99FKm//6vv/5qrrvuOlOlShV7oj366KPtBcjevXsjz1m5cqWdulC2bFlTrFgxc+aZZ5r3338/4Tz4iRMnmvvvv9/GrXjOOeccs3z58pjXO3ToUPvv8uXL25/RFJD01kisW7fOXHLJJXbudYUKFcytt95q9uzZk/C1fP3116Zt27amdOnSNk7twzlz5sQ8x587rZi033Xxo+dfc801Zvfu3Qf9zldeecXuW/2+I444wvaux4+cfPjhh6Z58+Y2xpIlS5oLLrjALFmyJNNT4WbNmmV69epljjzySFOqVCnbK7xly5aE62E++ugjc9ppp9m28eyzz9rvbd261fTr189Ur17dvoca5XnooYfsqE80PU+vWa9Xr7t79+52W7z01kgcal8oPr3mmTNnRqbF+O9lemsk3nrrLTvtRq9FIxm6iFV7jKZ4S5QoYberHejfajcDBgww//zzT8xz33jjDfv79B5oP9arV8+OeuU2vc7TTz/d/lvtyt8f0dO+grx2ff7atGlj25c+p8OGDbOJZEYqVqwYONFUz74S1MxSjPqciY4R0e+7jmP6vo5JOhZUqlTJXHvttWbz5s0J25sSHe0HtU+9x3fddZd9nWvXrjUXX3yxfU/1Ox599NEM10hk9lib2c9OWNoWkFsYkQCSaNu2beaPP/5IOJc4I7qYOPvss+2JcciQIfbiQdMT0utN0wXxpZdeapMBXRiOGzfOnix10jvxxBMznH6hC0OdTHv27GkXP+vvT5o0yV5Ua4Rgw4YNdhGlvu7bt6+92H3xxRdN+/bt7fM6dOgQ8zs1p1o90LrQ034YOXKkueqqq+xFvowaNcomOu+8844ZO3asvVDS1Iv0LmqUiKxZs8b+bV1Eaf71p59+etBzta1du3b2dStRUQxKvlq1amW++OIL+zqjXXbZZTZpGjFihFmwYIHdx0pUdBHh09QUXeTo9eviTftDr0N/67zzzrPPUTza77rQ089qP+l1NWvWzHz33Xf24iUj6i3Whb3+lqZf6Oc1VcS/APfpe1dccYVNOm644QZz/PHH27+nCzm9b9peo0YN8+WXX9q28/vvv9v9Lbog08WYks5//etfpk6dOvY9UOyZkdG+0N+5+eab7ft5xx13RC5k06OLPl1k64Jb74HamS7MlPhpv0X3bith0P4944wzzCOPPGI++eQTezGpNTZKemX69Ol236i9+O/h//73P/v7brnlFpObtG+1j+6++277uVKSKdp3WXntSo6VvOuzNG3aNNu+9+/fb/9GdlJcmp6mtqLXoCluV1555SF/Rm2uatWq5oEHHrCfUb0m/33Xe6IkSK9VCYASTa2H0v/nzp17ULJ6+eWX27+rY4g6KtSBos4LJcz6HOt9ffXVV+2xRX9HiezhyOxnJ0xtC8g1HoBcN378eHUTHvJx4oknxvzMUUcd5XXv3j3y9c033+ylpaV53333XWTb5s2bvbJly9qfX7VqVczPatusWbMi2zZu3OgVLlzY+/e//51hvN26dfPy5cvnzZs376DvHThwwP6/X79+9m988cUXke/t2LHDO/roo72aNWt6//zzj9322Wef2efVqVPH27NnT+S5TzzxhN2+aNGiyLahQ4fabZs2bYr5m2eddZZ9+EaNGmWfN3HixMi2Xbt2ecccc4zdrr/px3rsscd6bdq0icQtu3fvtnGee+65B/3ta6+9NuZvd+jQwTvyyCMjXy9btszuG233X2P8vtF+KFOmjHfDDTfEfH/9+vVe6dKlD9qeXntp2LCht3fv3sj2kSNH2u1Tp0496L2eNm1azO8YPny4V7x4cW/p0qUx2wcPHuzlz5/fW7Nmjf16ypQp9uf1u3379+/3mjdvbrcrlvh9FGRfiNp29Pvn89uG/37ptVaoUME76aSTvL/++ivyvPfee88+7+67745s02dD24YNGxbzOxs0aGD3m++WW27xSpUqZV9TGOgzFb9fs/radUyI3t8XXHCBV6hQoYM+P4dy0003xbyn8Zo0aWI/b2pzY8eOtfHp+U8//XSGv9t/f996662Y7fr8xXv99dcPOmb57a1nz56RbXofq1WrZo+FDz74YGT7li1bvKJFi8YcM3VMzKgNJzrWZvazE7a2BeQGpjYBSTRmzBjbixX/SK/nPZp6HBs3bhyzQFO9curVT6Ru3bqRHk/RlAD1VKsn8FA0dD9lyhRz0UUX2aky8fzeQi3YVG++eth96nVWT6umFPz4448xP6fex+i1Dn5sGcWTiP525cqV7YiLT9Nq9LejqTLOsmXLbO+ppk1oNEiPXbt22V5ETR2Kn6qgXvloilM/u337dvu19o1+Rr3K8Qs0/X2j91SjOeqt9P+mHvnz57e955999lmmXqdeT/RCUfWya42CXn80jaCoZz6apsgodk01io6hdevWtjdbr93fl/qdfg++KE6NImQkM/siCFU107x6VRSKntuuKWEaFYufNpfe+xXdptSLr/db70mYZeW1a8Qqen/ra0091MhMdvF71zXSqH09f/58W2Hu9ttvtyODWRE9rUrTpdQuNbIiGgWMF71QW21TxyWNjmi0Nfp9zszxLTMy+9lxpW0B2YmpTUAS6cI70cW5f8I6FE1pUSIRL7q6UTQNxyf6O/4ce50QN23aFPN9JSb6vi6aMypHq3h0URxPUxD870f/jvh4FIvEz/nPDP1uve74i1VdSERTEiGHmqajaVZ+LBnFqTnQqkSji2Ylaunx/66mXSSi35MZxx57bMzXStSUQMWvh1EikSgGzUVXAnmohbDal/qd+t2H2peJZGZfBKFY0vvbupjW9KtouuCOf33RbVx0Ya71OZrepqk2mm6l6WuaFnQo+mzEr7XILMWkC96cfO3a71pjEL8oWhKtl8ou6gxQwuInFdEdCZmlcrCaEqf1BX47jP48xov/TGqthN57rSGJ3x6/ziIrMvvZyWrbAlxGIgHkEeldyPiLMbVQMf4CVD3lfiKQ2/HkBH+04eGHH0631Gb8BXR2xOn/Xa2T0BzweNlZ+UgSLZxVDCrbOXDgwIQ/4190uiwzF+ta36KRKS1G1+J3PbRGRgvXtaYnPZpr71/cB7Vq1apMrYFxlRYg+wlBVuhiW2sObrvtNvu51GdQ7VUX4PEjhOm9zzl5PMnsZyerbQtwGYkE4CjVRI+ucuRLtC0zdIEbPyR/yimn2F499ZgvXrw4w3i0yDfRTeX87+cU/W7Fp4uG+EXH0bToVvR6NC0hO+h36kJDU7fSS078v6sLjcP5u+oZ1QJ7386dO+1iz/PPPz9Tcer5Gf197csZM2bY50YnVYne26zsiyDTnPw2o78dP5qjbVltU+pF11Q9PRSvepK1UFfVf9Ib0dPi3axO3UmUPGa0L4K+dr0OTeOJTgj9Mq45ncT404fS67E/FI0Wqb1pREJT4uJH8cIgs5+drLYtwGWskQAcpTnwX331VcwdcdUjqAuerNDUAJ0oox+aFqIpEyqn+e677ya8E7ff46eLWd0hVzH5NF9Y1Vd0IZNd010S0d9WZSlVh4qutBJ/J2xVatJFgSr66MIgXvzUrszQvtE+UmWc+N5Tf9/ovVLyooo1iSpyZfbv6vVE/7yqNqkqj6ZSZKbXV++Nekvjaf2Gfo+/L/Vv/W6fpvQ8+eST2bIvRBXGEpWTjadpf0q+nnnmmZhSvurpVTUcrRcIKn6qi+L11ySlVy5YmjZtetDnI7OPQ927QPtC4vdHVl77U089FbO/9bXW1Gj9T3ZI1E537NhhqxZpWpE+X0H5IwnxIwd+JaQwyOxnJ6ttC3AZIxKAozTMrnr9GnLXQli//KvmDyuhyMri1vToAlj3AVAJRC341XQn9YRrEaLmamuRoe5w/Prrr9uLWpV31PoKDedrWsfkyZNz9E6xKnGqiyZNIdA8bc3x1zQiLbiOphi0jxSjSt5qwbfmMquso6Zx6WJfCVMQ6mVUGdPhw4fbBZkdO3a0JXjnzZtny9CqbKd+ry7Mu3btak499VTTpUsX23urcrVaNKuL1OiLwPRo4awuCnVho15pleDUnHQtfM2Ipo3897//tfeY8Mv+KtFbtGiRTcA0j14Xg+pJVTx6P7VNCeDbb7+dcK56VvaF6G9rf6hsp35GF8yJ1o/oIlhlNPU+qe1psbpfAlXJqe4VEpQW6urzob+ne5houpKSJI2g5NQ0vkNRYqvPjxIG3XtAn2OtNdI0wyCvXcmKCjBo/Y9+XgmH2pYWQWc0UqB9oM+L+J0Fem9EIx9qt35xCL/wgo4zOgaojLTasX4+KzeK1GdD5VlVslZJsj6POtbouBEWmf3shK1tAbkiV2pDAUhYzjNROVVRacyMyr+KSr+qLKfKuKoE4ogRI7zRo0fb363SotE/q1KQif5OojKcifzyyy+2DGz58uXt36tVq5YtFRldwnXFihXepZdeakudFilSxGvUqJEtV5mZEpCHKs2YUflXP7727dt7xYoV88qVK2dLMaoEanQ50ej91rFjR1vGVa9F++eyyy7zZsyYkeHf9t+76PK6Mm7cOFtqVL/viCOOsPFNnz79oNeu0rMq+ar9U7t2ba9Hjx7et99+e8h97//NmTNn2tKX+v0lSpTwrrrqKlvyN1p677VfhnbIkCG2LK7Kgmo/qZznI488ElNWVr+za9eutpSlYtW/tc8yUzozM/tCbVMxlixZ0v68/17Gl3/1vfnmm5Hfp/LGet3r1q2LeY4+GyrRGS8+xkmTJnnnnXeeLa2qfVCjRg2vV69e3u+//+4li0qp1q1b1ytQoMBB+zjIa9fnT69Nn4GKFSva1x5fhjcRf78nekR/zj7++GNbIrlSpUpewYIF7edcfy/6c5OZvxP/2dfrUclg/T61t86dO3u//fabfa5eQ0afyfTe+/jjaFbLv2b2sxPGtgXktDT9J3dSFgC5QXdf1ZxcTd0JWikG4eTfmEw9+4mqfCFvUy+5esYTTdcDgJzEGgnAYfGLPzVHV1MMNN2FJAIAAOQk1kgADtN9JFq2bGnn32r+9AsvvGDv+aAKIQAAADmJRAJwmCrsaEqDqvlocbUW8iqZ0OJFAACAnMQaCQAAAACBsUYCAAAAQGAkEgAAAAACS/k1Erq7qu54qxv9ZOcNugAAAAAX6e4PujO9bhZ6ODeMTflEQklE9erVkx0GAAAAECpr1661d2LPqpRPJDQS4e+oUqVKJTscAAAAIKlUKl4d7f51clalfCLhT2dSEkEiAQAAAPyfw532z2JrAAAAAIGRSAAAAAAIjEQCAAAAQGAkEgAAAAACI5EAAAAAEBiJBAAAAIDASCQAAAAABEYiAQAAACAwEgkAAAAAgaX8na2zolavyTn2u1c+28m4GntuxA8AAAA3MCIBAAAAIDASCQAAAACBkUgAAAAACIxEAgAAAEBgJBIAAAAAAiORAAAAABAYiQQAAACAwEgkAAAAAARGIgEAAAAgMBIJAAAAAIGRSAAAAAAIjEQCAAAAQGAkEgAAAAACI5EAAAAAEBiJBAAAAIDASCQAAAAABEYiAQAAAMCtRGLEiBHm9NNPNyVLljQVKlQwl1xyifn5559jnvP333+bm266yRx55JGmRIkSplOnTmbDhg1JixkAAABAkhOJmTNn2iRh7ty5Zvr06Wbfvn3mvPPOM7t27Yo859ZbbzXvvvuueeutt+zzf/vtN9OxY0feOwAAACCJCiTzj0+bNi3m6wkTJtiRifnz55sWLVqYbdu2mRdeeMG89tprplWrVvY548ePN3Xq1LHJx5lnnpmkyAEAAIC8LVRrJJQ4SNmyZe3/lVBolKJ169aR55xwwgmmRo0a5quvvkr4O/bs2WO2b98e8wAAAACQoonEgQMHTL9+/UzTpk3NSSedZLetX7/eFCpUyJQpUybmuRUrVrTfS2/dRenSpSOP6tWr50r8AAAAQF4SmkRCayUWL15s3njjjcP6PUOGDLEjG/5j7dq12RYjAAAAgBCskfD16dPHvPfee2bWrFmmWrVqke2VKlUye/fuNVu3bo0ZlVDVJn0vkcKFC9sHAAAAgBQdkfA8zyYR77zzjvn000/N0UcfHfP9hg0bmoIFC5oZM2ZEtqk87Jo1a0zjxo2TEDEAAACApI9IaDqTKjJNnTrV3kvCX/egtQ1Fixa1/7/uuutM//797QLsUqVKmZtvvtkmEVRsAgAAAPJoIjF27Fj7/5YtW8ZsV4nXHj162H8//vjjJl++fPZGdKrI1KZNG/P0008nJV4AAAAAIUgkNLUpI0WKFDFjxoyxDwAAAADhEJqqTQAAAADcQSIBAAAAIDASCQAAAACBkUgAAAAACIxEAgAAAEBgJBIAAAAAAiORAAAAABAYiQQAAACAwEgkAAAAAARGIgEAAAAgMBIJAAAAAIGRSAAAAAAIjEQCAAAAQGAkEgAAAAACI5EAAAAAEFiB4D8C5JxavSbn2O9e+Wwn42rsuRE/AABAEIxIAAAAAAiMRAIAAABAYCQSAAAAAAIjkQAAAAAQGIkEAAAAgMBIJAAAAAAERiIBAAAAIDASCQAAAACBkUgAAAAACIxEAgAAAEBgBYL/CIBUVKvX5Bz73Suf7WRcjT034gcAwEWMSAAAAAAIjEQCAAAAQGAkEgAAAAACI5EAAAAAEBiJBAAAAIDASCQAAAAABEYiAQAAACAwEgkAAAAAgZFIAAAAAAiMRAIAAABAYAWC/wgAILvU6jU5R3fmymc75ejvBwDkXYxIAAAAAAiMRAIAAABAYCQSAAAAAAIjkQAAAAAQGIkEAAAAgMBIJAAAAAC4lUjMmjXLXHTRRaZKlSomLS3NTJkyJeb7PXr0sNujH23btk1avAAAAABCkEjs2rXLnHLKKWbMmDHpPkeJw++//x55vP7667kaIwAAAICQ3ZCuXbt29nEohQsXNpUqVcq1mAAAAACkwBqJzz//3FSoUMEcf/zxpnfv3mbz5s2HfP6ePXvM9u3bYx4AAAAA8lAioWlNL730kpkxY4Z56KGHzMyZM+0Ixj///JPuz4wYMcKULl068qhevXquxgwAAADkBUmd2pSRLl26RP5dr149c/LJJ5vatWvbUYpzzjkn4c8MGTLE9O/fP/K1RiRIJgAAAIA8NCIRr1atWqZcuXJm+fLlh1xTUapUqZgHAAAAgBAkEgsWLDCLFi2KfD116lRzySWXmNtvv93s3bvX5JR169bZNRKVK1fOsb8BAAAAIIcSiV69epmlS5faf69cudJOQSpWrJh56623zMCBAzP9e3bu3GkWLlxoH7Jq1Sr77zVr1tjv3XbbbWbu3Llm9erVdp3ExRdfbI455hjTpk0b3lsAAADAtURCSUT9+vXtv5U8tGjRwrz22mtmwoQJZvLkyZn+Pd9++61p0KCBfYjWNujfd999t8mfP7/54YcfTPv27c1xxx1nrrvuOtOwYUPzxRdf2OlLAAAAABxbbO15njlw4ID99yeffGIuvPBC+28tav7jjz8y/Xtatmxpf1d6Pvroo6yEBwAAACCMIxKnnXaaue+++8zLL79sS7JecMEFkalJFStWzO4YAQAAAKRCIjFq1Ci74LpPnz7mjjvusOsWZNKkSaZJkybZHSMAAACAVJjapPs5RFdt8j388MN2bQMAAACA1Jbl+0hs3brVPP/88/YGcH/++afd9uOPP5qNGzdmZ3wAAAAAUmVEQtWUdGfpMmXK2NKsN9xwgylbtqx5++23benWl156KfsjBQCETq1ema/UlxUrn+2UY7/b5dgBwNkRCZVpveaaa8yyZctMkSJFItvPP/98M2vWrOyMDwAAAECqJBLz5s2zN6WLV7VqVbN+/frsiAsAAABAqiUSuiHc9u3bE96ornz58tkRFwAAAIBUSyR0t+lhw4aZffv22a/T0tLs2ohBgwaZTp2YEwoAAACkuiwlEo8++qjZuXOnqVChgvnrr7/MWWedZe8lUbJkSXP//fdnf5QAAAAA3K/aVLp0aTN9+nQzZ84c8/3339uk4tRTTzWtW7fO/ggBAAAApEYi4WvatKl9AAAAAMhbsjS1qW/fvmb06NEHbX/qqadMv379siMuAAAAAKmWSEyePDnhSESTJk3MpEmTsiMuAAAAAKmWSGzevNmuk4hXqlQp88cff2RHXAAAAABSbY2EKjRNmzbN9OnTJ2b7hx9+aGrVqpVdsQEAgHTU6jU5R/fNymcp5w4gBxKJ/v372yRi06ZNplWrVnbbjBkzbFnYUaNGZeVXAgAAAEj1ROLaa681e/bssfeMGD58uN1Ws2ZNM3bsWNOtW7fsjhEAAABAqpR/7d27t31oVKJo0aKmRIkS2RsZAAAAgNS8j4SUL18+eyIBAAAAkNpVmzZs2GC6du1qqlSpYgoUKGDy588f8wAAAACQ2rI0ItGjRw+zZs0ac9ddd5nKlSubtLS07I8MAAAAQGolErNnzzZffPGFqV+/fvZHBAAAACA1pzZVr17deJ6X/dEAAAAASN1EQveKGDx4sFm9enX2RwQAAAAgNac2XX755Wb37t2mdu3aplixYqZgwYIx3//zzz+zKz4AAAAAqZJIcPdqAACQVbV6Tc7Rnbfy2U7Oxp/TsQNJTyS6d++erUEAAAAAyANrJGTFihXmzjvvNFdccYXZuHGj3fbhhx+aJUuWZGd8AAAAAFIlkZg5c6apV6+e+frrr83bb79tdu7cabd///33ZujQodkdIwAAAIBUSCRUsem+++4z06dPN4UKFYpsb9WqlZk7d252xgcAAAAgVRKJRYsWmQ4dOhy0vUKFCuaPP/7IjrgAAAAApFoiUaZMGfP7778ftP27774zVatWzY64AAAAAKRaItGlSxczaNAgs379epOWlmYOHDhg5syZYwYMGGC6deuW/VECAAAAcD+ReOCBB8wJJ5xgqlevbhda161b17Ro0cI0adLEVnICAAAAkNqydB8JLbB+7rnnzF133WUWL15sk4kGDRqYY489NvsjBAAAAJAaiYSvRo0a9gEAAAAgb8lSInHttdce8vvjxo3LajwAAADIIbV6Tc7Rfbvy2U7Oxp/TsaeiLCUSW7Zsifl63759dorT1q1b7b0kAAAAAKS2LCUS77zzzkHbVLmpd+/epnbt2tkRFwAAAIBUq9qU8Bfly2f69+9vHn/88ez6lQAAAABSPZGQFStWmP3792fnrwQAAACQKlObNPIQzfM8e6fr999/33Tv3j27YgMAAACQSonEd999d9C0pvLly5tHH300w4pOAAAAAPJoIvHZZ59lyx+fNWuWefjhh838+fPtiIYWcV9yySUxIx1Dhw61N79TRaimTZuasWPHcuM7AAAAIJXWSAS1a9cuc8opp5gxY8Yk/P7IkSPN6NGjzTPPPGO+/vprU7x4cdOmTRvz999/53qsAAAAAA5zRKJBgwYmLS0tU89dsGBBut9r166dfSSi0YhRo0aZO++801x88cV220svvWQqVqxopkyZYrp06ZKV0AEAAAAkK5Fo27atefrpp03dunVN48aN7ba5c+eaJUuW2HtJFC1a9LADW7VqlVm/fr1p3bp1ZFvp0qXNGWecYb766qt0E4k9e/bYh2/79u2HHQsAAACAbEgkNm3aZPr27WuGDx8es13rGdauXWvGjRtnDpeSCNEIRDR97X8vkREjRph77733sP8+AAAAkF1q9Zqcoztz5bOdjBNrJN566y3TrVu3g7ZfffXVZvLknN1JGRkyZIjZtm1b5KHEBgAAAEAIEglNXZozZ85B27WtSJEi2RGXqVSpkv3/hg0bYrbra/97iRQuXNiUKlUq5gEAAAAgBFOb+vXrZ9dCaCF1o0aN7DZVVdKUprvuuitbAjv66KNtwjBjxgxTv379yHoH/R39bQAAAACOJRKDBw82tWrVMk888YR55ZVX7LY6deqY8ePHm8suuyzTv2fnzp1m+fLlMQusFy5caMqWLWtq1KhhE5b77rvP3jdCiYWSlCpVqsTcawIAAACAI4mEKGEIkjQk8u2335qzzz478nX//v3t/7t3724mTJhgBg4caO810bNnT3tDumbNmplp06Zl2/QpAAAAALmcSOjCftKkSWblypVmwIABdhRBU51UValq1aqZ+h0tW7a094tIj+5VMWzYMPsAAAAA4Hgi8cMPP9j7O+i+DqtXrzbXX3+9TSTefvtts2bNGnvjOAAAAACpK0tVmzQFqUePHmbZsmUx04zOP/98M2vWrOyMDwAAAECqJBLz5s0zvXr1Omi7pjQd6mZxAAAAAPJwIqF7NagUa7ylS5ea8uXLZ0dcAAAAAFItkWjfvr1dAL1v377IomitjRg0aJDp1Cn3b88NAAAAwIFE4tFHH7X3gKhQoYL566+/zFlnnWVq165tSpQoYe6///7sjxIAAACA+1WbVK1p+vTpZvbs2baCk5KKhg0bmnPOOSf7IwQAAADg9ojEV199Zd57773I17pBXPHixc3TTz9trrjiCnvjuD179uREnAAAAABcTSS0LmLJkiWRrxctWmRuuOEGc+6555rBgwebd99914wYMSIn4gQAAADgaiKxcOHCmOlLb7zxhmnUqJF57rnn7L0lRo8ebSZOnJgTcQIAAABwNZHYsmWLqVixYuTrmTNnmnbt2kW+Pv30083atWuzN0IAAAAAbicSSiJWrVpl/713716zYMECc+aZZ0a+v2PHDlOwYMHsjxIAAACAu4nE+eefb9dCfPHFF2bIkCGmWLFipnnz5pHvq4KTysACAAAASG2Byr8OHz7cdOzY0d43QveMePHFF02hQoUi3x83bpw577zzciJOAAAAAK4mEuXKlTOzZs0y27Zts4lE/vz5Y77/1ltv2e0AAAAAUluWb0iXSNmyZQ83HgAAAACptkYCAAAAAEgkAAAAAGQJIxIAAAAAAiORAAAAABAYiQQAAACAwEgkAAAAAARGIgEAAAAgMBIJAAAAAIGRSAAAAAAIjEQCAAAAQGAkEgAAAAACI5EAAAAAEBiJBAAAAIDASCQAAAAABEYiAQAAACAwEgkAAAAAgZFIAAAAAAiMRAIAAABAYCQSAAAAAAIjkQAAAAAQGIkEAAAAgMBIJAAAAAAERiIBAAAAIDASCQAAAACBkUgAAAAACIxEAgAAAEBgJBIAAAAAAiORAAAAAJBaicQ999xj0tLSYh4nnHBCssMCAAAA8rwCYd8DJ554ovnkk08iXxcoEPqQAQAAgJQX+qtyJQ6VKlVKdhgAAAAAXJnaJMuWLTNVqlQxtWrVMldddZVZs2bNIZ+/Z88es3379pgHAAAAgDyUSJxxxhlmwoQJZtq0aWbs2LFm1apVpnnz5mbHjh3p/syIESNM6dKlI4/q1avnaswAAABAXhDqRKJdu3amc+fO5uSTTzZt2rQxH3zwgdm6dauZOHFiuj8zZMgQs23btshj7dq1uRozAAAAkBeEfo1EtDJlypjjjjvOLF++PN3nFC5c2D4AAAAA5NERiXg7d+40K1asMJUrV052KAAAAECeFupEYsCAAWbmzJlm9erV5ssvvzQdOnQw+fPnN1dccUWyQwMAAADytFBPbVq3bp1NGjZv3mzKly9vmjVrZubOnWv/DQAAACB5Qp1IvPHGG8kOAQAAAIBrU5sAAAAAhBOJBAAAAIDASCQAAAAABEYiAQAAACAwEgkAAAAAgZFIAAAAAAiMRAIAAABAYCQSAAAAAAIjkQAAAAAQGIkEAAAAgMBIJAAAAAAERiIBAAAAgEQCAAAAQM5jRAIAAABAYCQSAAAAAAIjkQAAAAAQGIkEAAAAgMBIJAAAAAAERiIBAAAAIDASCQAAAACBkUgAAAAACIxEAgAAAEBgJBIAAAAAAiORAAAAABAYiQQAAACAwEgkAAAAAARGIgEAAAAgMBIJAAAAAIGRSAAAAAAIjEQCAAAAQGAkEgAAAAACI5EAAAAAEBiJBAAAAIDASCQAAAAABEYiAQAAACAwEgkAAAAAgZFIAAAAAAiMRAIAAABAYCQSAAAAAAIjkQAAAAAQGIkEAAAAgMBIJAAAAAAERiIBAAAAIDASCQAAAACpmUiMGTPG1KxZ0xQpUsScccYZ5ptvvkl2SAAAAECeFvpE4s033zT9+/c3Q4cONQsWLDCnnHKKadOmjdm4cWOyQwMAAADyrNAnEo899pi54YYbzDXXXGPq1q1rnnnmGVOsWDEzbty4ZIcGAAAA5FkFTIjt3bvXzJ8/3wwZMiSyLV++fKZ169bmq6++Svgze/bssQ/ftm3b7P+3b9+e6b97YO9uk1OCxJEVORm76/G7HLvr8bsce07H73Lsrsfvcuyux+9y7MLxkn0ftnYTtN37z/U8zxyONO9wf0MO+u2330zVqlXNl19+aRo3bhzZPnDgQDNz5kzz9ddfH/Qz99xzj7n33ntzOVIAAADALWvXrjXVqlVLzRGJrNDohdZU+A4cOGD+/PNPc+SRR5q0tLRs/3vK6KpXr27fiFKlShmXuBy76/ETO/s+r7Ub1+N3OXbX43c5dtfjJ/bU3fee55kdO3aYKlWqHNbvCXUiUa5cOZM/f36zYcOGmO36ulKlSgl/pnDhwvYRrUyZMian6U127QCRCrG7Hj+xs+/zWrtxPX6XY3c9fpdjdz1+Yk/NfV+6dOnUXmxdqFAh07BhQzNjxoyYEQZ9HT3VCQAAAEDuCvWIhGiaUvfu3c1pp51mGjVqZEaNGmV27dplqzgBAAAASI7QJxKXX3652bRpk7n77rvN+vXrTf369c20adNMxYoVTRhoGpXucRE/ncoFLsfuevzEzr7Pa+3G9fhdjt31+F2O3fX4iZ1973TVJgAAAADhFOo1EgAAAADCiUQCAAAAQGAkEgAAAAACI5EAAAAAEBiJBIBQ2Lp1q9m7d2+yw8iT/vjjD7N7927jIpdjl/379yc7hDzL5X3vertH6pxrSSRCRrcrX7dundmyZYv5559/kh1OnqKbHbpcxEzxu+p///ufufTSS83ixYvt1669D3/99Zf9zLq679u2bWs++OAD5/a9y7H78d9zzz1m5cqVzsXuOpf3vevt3j9euniN4/J5NqfOtSQSIbJkyRLTvn1707p1a9OqVSvz3HPPOfVBW7Nmjfn000+d7FVeunSp+de//mXatWtnb4C4YsUK45Iff/zRxn3BBReYq666yp5gXOmt+v77783pp59u2864cePstrS0NOMKHZDVblq0aGGaNm1qhg0bZjZv3mxc2fe62eeCBQvMxIkTndr3LscuixYtsu1lw4YNJn/+/E7FvnbtWrNs2TLjKpf3vevtXueqzp07m7PPPtu0bNnSfPTRR8YVLp9nc/JcSyIRoiRCFyInn3yyGT16tKlbt64ZO3aszdxd8PPPP5vjjz/edOnSxcycOdOpIWNdCDZv3tweEE466SQze/Zse7BwxU8//WRPigULFjTnnnuu+fXXX02vXr3MnXfeabZt22bCfmBr3Lix6du3r/nPf/5j284PP/xgXLFq1Spz1llnmTp16pg77rjDfm7ff/99c9FFF9kbaLqw72+77TbzySef2Hb/8ccfGxe4HLvoJqtdu3Y111xzje0wOuqoo+xnVY99+/aZsB9vatasac4//3zbAeMal/e96+1e51pd51SvXt0MHjzY9u7rRn0ujKi4fJ7N8XOtbkiH5Pr999+9U045xfv3v/8d2bZq1SrvvPPO85YsWWK/v3v3bi+s/vzzT+/888/3evTo4Z177rlexYoVvWnTpnn79u3zwk77tmHDht6tt94a83rKli3rvfHGG17Y7dmzx+vSpYt30003xWyvX7++V7RoUfuebN261QujefPmeSVLlvRuv/12+/W3337rlS5d2hszZoz9+sCBA17YjR8/3mvVqpW3f//+yLYPPvjAa968uVevXj1v/fr1XhgtWLDAK1asWGTfL1u2zKtVq5Z31113hX7fuxy7b/ny5V6zZs28P/74w36Gu3Xr5jVu3Ni2me7du4e23WzatMlr3bq116lTJ69Ro0be8ccf7/3000+eS1zd9663+3Xr1nknnniid9ttt0W26Zh/0UUXeb/++qv3119/eXv37vXCyOXzbG6ca0kkQuCbb77x7rnnHm/FihWRbTo4lChRwqtRo4Z36qmnehdffLG3efNmL4yWLl3q9e/f3/vkk0/s10oqXEkm3n77bXtS0clFdCDT47TTTvPGjRvnueCcc87xHnroIfvvHTt22P/fcsstNhE944wzvGeeecYLm3/++cdr0KCBd/PNN8dsHzBggHf00Ud7a9as8Vyg/V6hQgXv77//jtn+2WefeS1atPA6dOgQeU/CQicNnbwHDhwYs/3hhx/2SpUq5f3vf//zwsrl2KN9+OGHXvny5b1du3bZCxR9VtVxMWzYMO+ss86ynRu60A3juapnz57ep59+ajtczjzzTO+EE05wKplwcd+nQrvXflcSoYTCN2jQIO/II4+0x3y1JXXobd++3QsjF8+zuXWuJZEIAfVmrly5MvL1o48+6qWlpXkvv/yyHZF48cUXbTLx1FNPeWFtqD///LP9vy9RMqHXqcw+THRA8A8O4vcs6+Dw2GOPxTw3bD0+ikfx6gB82WWXRbarR00HiEmTJnmXXnqp/X4YqQcqft9+9NFHXs2aNb2pU6far6N7+sPEb+tffPGFHU3UhUh00qy4n3/+edvLqZ7EsIlOfPzXsnjxYvtadPwJ8753OXa/nW/cuNFr0qSJd99999lR3EWLFsUkofrM/uc///HCRp0sSiZ8uuBWrBqZiL6Y1fsS1k4kjaq4uO9dbvd+29H1jO+RRx6x1znPPvusN3fuXO/uu+/2Tj75ZDuiGybapy6fZ3PjXEsikWTRF9++r7/+2psxY0bMNg0J9u3b1wt7/NGJQrt27WwyoQa7c+dOe6DQyEuyL8g1gqKh1HjRcWn43h8yFo1OTJ8+3Quj2bNn29Er9YpoiFX/vu666+z3lOCp9009hsne76KRn4wOWDq560QfRv7J3O+RUrtWvJoaoeHj+M+GetvUa+iKrl27escdd5wXNv5FafTxJb49hzV20dRUxev3turE3rlzZ69y5cpe9erVY6bT6Hn6LIfleJ/eccM/9muk3E8mdJzR5/uBBx4I7YiuS/s+EZfafXrUdjQbIPo6R69Lo7u6RggDtevofT1nzhxnzrO5fa5lsXUS/P7775FFLvnyxb4FSu4aNWpkqzaJFiPt2rXLHHvssXYhsP+cZNev/uWXXxLGX6hQochCa1U0aNiwobn22mttubH777/fdOrUKakVJrTgSIvCp0yZctD3FJdfJUuvo1SpUvbfd911l7nuuutMjRo1TLKtXr3avPjiizamefPm2XLBWgCmRXeKTwvBHnzwQfP8889HSr0dccQRpkKFCkmv7KF9r3Y8YcKEhN/39/2AAQPsZyTRe5RM2pdqy1pcrf9/8cUXpnjx4uaNN96w1V/69OljZs2aFfN66tWrZypVqmSSTVXIRo0aZSuTab9qoWCikoZ33323LTrw5JNPmjAVcrjlllvs4l7Fp8IU4rfnMMfuV3q58sor7ef06quvtp/VIkWK2AWPlStXtuW+VUHFb/96Xcccc4ypVq1askO3lZl0rLn88sttvKrW5POP/WXLlrXHeh1ndHxXNRsVHjjzzDNN2Nq9Kgu6su+1kH3kyJHmhhtuMG+++WakXKd/DRD2dp9e/IpdbadDhw6R6xzt/z///NPUr18/cp2T7HbfoEEDM3DgwMh+btKkSeQ8q+uDsJ5nk3KuPexUBIFo+LdKlSp2vmN8D6YvPqNVz7iG0KKnPyXLjz/+6B111FF20ZEWhKcnOhNWr6weCxcu9JJJf1+L1TQvMz3+Yq8LL7zQDrk++OCD9me0OCnZfvjhB9uDprn3GpIsV66cN2HChEP+jOZBajHwtm3bvGT67rvvMtz3PvUQaoj7mmuu8cJC0x+OOOII78Ybb7TzfPX51fxqf2RCUzxOOukk7/TTT7dFE9555x07f1Y/46+/SWbslSpVsm1aiwOPOeYYb+jQobaXP/5Yo3aiaYkXXHCBFwZq8yp8cO2113pXX32117RpU++OO+6wPZphj92feqI2oB7uO++802vfvr3tzfSLZ2idgdZjadHs9ddfb6fIaS6zfka9nMne9+plVTvXCK3WQmi6niTqdVXhCi081bE+DNP50mv3/r7fsmWL/byGcd/7xxu1eU2z1To+7X9/HaL2v/8ehLHdZyb++N5yXedoVOuXX37xwlBEQ2tP2rZt6w0ZMiThzJEwnmeTda4lkchFv/32m/1A6eCl4bGrrroqZr5pPM3X1OIjfSDVOJJN04FUqUPTrHQhqwuqQyUTGkLu3bu3V6hQIXtCTSYNOebPn98uqBMdGDQX8/HHH7dTluIv9lSVpEiRIvbEmF7Cl5u0n3XC00WUP99RFydKMBNVulB70fdVqSHZCZz2fYECBex0B9EJRGsLXnrpJdsuNmzYEHmuf8DWeiBdBOjAnOyhYu17XYRo3/u0fubKK6+0F+Oac+1fFKrogIaKdULU/5P9udVJWbHoZOifuJUcV61a1cYbzd/PkydPthfvyS7uoI4THWei97uOh1rsq1j9aU76t99uwhK7P+1NF3j9+vWLmReuqShqN3671wWtTvpqLzq2nn322Un/zOp4qE4L7Xu/XWh+uNp3NP+CVsegf/3rX/aYmexjfZB2r+OLnhOmfa+YdFE6ePDgyDado3RxmC9fvsic9rC2+8zG79PCfS0i14V7so+Xvv/+97+2Peizq+u16NcS1vNsMs+1JBK5SHPstG5AvTX64KuBppdM6MCsxWCXXHJJzGKwZFED04W3ej20xkC99RpZUTKR3kiJFhSqtN6hkqXcoA+Mv4BdHyrRCUOL1LSGQxfoSvC01sD/8Ol16vnRi8OSRRcdI0aMsBeu6vn2T4zqNdOF1urVq2Oer++/9tpr3uWXX+59//33XjKpHauNaF/6vXzqodJBWgdfzVNWr5V6P+Mv3qOreyT7pKIeG3UE+PSa1MOmNqQeZRVEiC4ooLavC8lkUixjx46188G1L/12owtXtfn0EmSNsij+ZH9mNdqmnmK1ef8Ep1EeXfSpQ0MLHN9//3273f9+GGL3aT/XqVMnZuGuLpi0Tb2A6iVXQQ3Re6PPuX4m2aW+FYeOl7169bIlLf19qwRO1QNbtmxpR93846Xos62e/zB0umS23fvrbvze8TDse7+XuG7duva4E02joBqN1jqCL7/8MuZ7YWr3QeLXsVKdA6qWFX8OSCbFovOn1jQpidAaIF2c6xz8yiuvRNpMWM6zyT7Xkkjk8kLN6CkyWu3vJxNaYO3zD3DqeQ5TbWKNSMycOTPytcqdJUom/BNPdK9hsuliRHGqR+TYY4+1Iw5+z5kWg+sgpxOh36Oj/a6EKSy0r5VMRNOFrWpBR7ed6INKWMqO6sCl6RzqMdNBrWPHjvZzoBg1hKwLwz59+tivkz36kIh6alSz3afF0xqpUg1uJRDq0dRol5+khsnrr7/ujRo1KmabLpg0ZcW/CA8rlSWMTuTvvfde26up/a/eZS10rFatWih6wNNrN7rw1slcF1VqJ2o3L7zwgr0YUU139eCHsd3oMxvdw6ppJ4pVU4N04afjpTpf/NKRSvzCcrxxvd3rWKNpkppG5idB2qbRZx0vdTHujwxlNOUmGXQtkNn4/euisJXb1XWLOit0ka0OIS0AV6ejrh+ik2Vdq+0IUbtP1rmWRCKHxb9Z/gc/ehqBejSjRyY0lzb6gj2ZdJIeOXJkut+PHpnwpzlp2sf8+fO9sNGQtk7mmlYWn5XroFemTBl7L48wXsxG8+NT0qODc/SIlapghPEeDLog1M2edPERX3N++PDhtpcqDMPyGe1zfX51IRVdwUu9ObrfS1hKRuozm2go3n8NOomox1BTCqI7NZI9LzwR/3ip/2uufvRFoE7oajfxUyWSve+jq3S99dZbNplQJ4VGD/2RK78jQNNvRo8e7YWR3150v4U2bdrE7GclRrpY+eqrr7ywcLndx7cbJcmaWqYROE2/LV68eOQ+AErmVFY6zDSCm5n4w5gIKYlQ8qBEQrNIRKMOqtKkkUQlFWGMO5nn2gKHt1QbGa38Hz16tK1GoEoQ995770FVjjp27Gj/rwoAeq6qL0ydOtVWyUi2RYsW2dvZKybdEl4VFXyqZKDX0rNnT/u1XpuoCsa7775r5s+fb5Jp1apVtprOzp07Tc2aNW3lCFVV6N+/v93nqtzkVy/Inz+/Oeqoo2wVD1X0CEPVhfXr19tKEHv27DG1atUyxx13XMx+F8Wt6hGq1CSDBw82r732mpk7d25SY1dlFFW3UJyqftKsWTNTt25dM2jQILNx40b7eqL3vT4b5cuXNwUKhONwpLbz3nvv2X1drlw5W4XGbxN6TY899pj9tzpitF0PVWaqXr16KD6z2t+q5qXPbrt27SLf81+D9nnhwoUj+3vIkCG2+oiqgCW73cyZM8f+W9VPzjnnHLu//Sovr7/+esxnoEyZMqZq1aq27YRB9PFS1WhOPfVUW63uggsusMchVX2pWLFi5PklS5Y0pUuXNiVKlDBhOt7Url3bVnzxq9gVK1bMfh7UXvzPrNq69r1eQxi43O6j283ZZ59tKx0qLlXU0Xn0m2++sVWwFK+ozYRlv8cf76tUqWJfiyph/fvf/84w/vjroWTGrjatymo6p+qh17Ft2zZzzTXX2Ap977//vpk+fbo9x6ot3XnnnUmNPVTn2mxNSxChXmINo2poSfPqNKQdPTUleqGUTJw40S5K1lSVMCw40rC2hrLVC6h5pU8++aTdHh1z9L81J1Vz8xR/sit2aL6iFg+p4oKGUTViopETX6Layv48zTDcVVOjJVrcq5EqtRv1Bia6SY/mxKqN6fWql0fPTfZ6FL/Si0Z91POhXiktwvQl6slRT5UWparnMwyfW1XDUtvRXHCNUmm9jD6Tfuzx7UcLUjWMnOjeJMn4zGpoW0PY/l1w4/e5huL1vqiCinqowlBQQO1GUwd091hNi9B8XlVqil4YG/86NLqoG3VGLyAM2/HS7w1XL6fWx2mxteJVG9LIsyrgxa9xCuPxJn6UVm1L904Jwyiiy+0+vXbjU7uJrwSk3n5dU4RhKmii4/0NN9wQE3/89OywxJ8odlXl82NSJSZd02i7PyVd0+N0rg1DBc0fQnSuJZHIATr5aSGdf0DTHEA10Pg57v6brQ+UX3YuDPN9lQhoKFInatHrULWL9G7ipvj96lLJXpysYWp9oPwhbi38UsnF+AO0T1OZtHBQsYdhsZfmkmpfq4qLEgUNrWqOtfZvPJ3EdQGri97ChQsnvUStkrCGDRvatqwLJe3bV1991V6M6z2IPyGqPaltaYpEGNq9PrcNGjSI+dzOmjXLJvi6cY8/zO1TW1dVjzBUVdNUQg29a969aDqBLlASrfPR2itdNGroXs9JdrtR1SstQPb3u9q9pgDpJK4LjvipejqJ6ySv/R6GSimZPV5quxboa42W2pM6O5Ld6RLkeCN6TTq2at+HYYGpy+0+yHnWLx2v440668JwvDzU8V5T+fzjvX9BG6b4MzpX6div84GqTvrJZvQ0y2TbHrJzLYlEDl3Mag5g9Bum6kWqFKSDtCqRRGe0uiDRxUqye5P9+d7KcrXmwff555/b3hItYEv0QVIFhjDEr4VPWkik9SbRi7x1MaIKL+op0d21/fh1gNB7oR7QZF8Iig5eOjD4vTXRoz266FBvSHzddpXMC8M9OkTxaQ7ptGnTYrbPnTvX9phonqlPFV/0nmiOeBj2vegzqUTCT4bVnjSPXdt08mvevHmkJ0f3U9ECVPWAJvuCSheAugBRQhxdvlPHIL8MYPRnVvN/NVdc7SbZsfsXVDopRl9A+fer0eJqjepGP1efcT0/DG0+M8fL6M+yTvZaNK51ZDr5u3S8UVEHVX0Jy/HS5XYf9DyrZFtrsHQsCsO+D3q8V4demOLPKHaVOg6zLSE715JI5AD1oKmHWBetGk7V4hxdaKvXR2X1tDBTQ/jRwjClxk+C3n333YO2KwFSCbT0hGGIW3Ryju451iiQeja1eE29ISqDFn2QUBKU7CkpPrUVXZyq5rP4Q6x6PWoz8XX//Wk1ye7diY5f04Lip/D590TRvn/ooYci39P0iTDcfCi6d1a9m9ELYpVcaIqTpkOoN0efZZ96Pf17SCSTpsokqvyjBXfq6fenYkVPI1Dt8PiFeMmiDgglxB9++GFkm3ovVYteNwnTe6KLW596kpVEu3S89CvxhUlWjjcff/yxt3btWi8MXG73WTnP6lgTlvNsVo73YYo/o9h1zImOPWx2hOxcSyKRA3QAUyUOXcBq2omSCK2BiO41URmx6HmoyZ7rmB7/YKzeEp1col9H2KlnWeseovezMnj1ciYqmRoG0Sc5v0dKF7M6MUaf2MNSFSu61K8oeda8dV1wRD9HF1Iq+ach7zCVy4um0QbVytfcb11gqa0refDnnaojQHOZE62xCRM/PiXVKo+qeflhPs6o40XrUFT3XwmDn7Sp4oton0f3OodZZo+XYXkfMnu8SfYUrFRs96lwntXxXqMMhzreh2HtW1Zj131Fwtp27s5E/Ll1rk3ukvkUsWHDBvPdd9+ZL7/8MrI6/uabbza//vqrGTlypK12pGoMsn//fvP333+bE044IabaSDIrBfnxq2KK4o+m1yJ16tSx8U6bNs2EiSq4+Ps1nqoXTJw4MaaCh18FqUaNGiYM4ve9X00qujqTKkeoAoNil6FDh5qrr77aVgNTZ0CyqMrL/fffbyul+O33wgsvNGXLljVjx461lS5E31OVCFV5UUWkZFfqSK/tqDqNqns1b97cVh154IEHzI033mhfi6iqij7TYajqdah2739m9Xk97bTTzGeffRb5Xhhi37Jli1mxYoVZvHix/VrVUnr37m3bhdp2r1697PswatQo+31t//nnn01YZMfxMlnvQ1aPN6pcluzjjR+nq+3e5fNsouscueSSSzI83rscu1+Zz9X48+XWuTZX0pUUprmWtWvXtvX8VXVEPTmqU+1PVVJFDs31jb7Lo+bIapFsGIbn4+P362xH35XX76nSNAMtUou/q2ayaEqPpohp/qVE9xTH95T7tOBIVUnCcKO/zOx7f/RB83q1fdiwYXaEK5kLBbVP1cukRaMaddPCzOhep/fee8+uHVAvs+6TIuol0fM0QheGHqr4thM9R1ztSO0j+k7W+gxofrimxyW7h+pQ7T6eejj1HiWaRpGsqlhqG2o7mg/eoUOHyPd0UypNV9GIrU9rnbRGQsfMMHD5eOnq8SYV2r3L7eZQ1zmi/2s6VliP9y7H7kr8JBKHQQc0vcGqGKHFgVrIojt+ahHV008/becD6o3U4l8tUNNQkxbBaG5bGBYcZRR//GI7rSVQY9Y88WRP79Dwuy5EdLLQ4rqMTi5a3KbXqUWzYajOFGTf67Wq0ogWhoehOpNPCzVV1ULz2xVbdHKmedZai6IqEhp+9UuphqHdp9d2/GQiPlHQgVylOhV/squSBWn3/p3ldaLRGqHoZCkZtO5BF6hK5nWhp7Uoahv+DS/jizgoodAUM01zCsO8dpePl64fb1xu9y63m0PFr8+y4hetVdG1TdiO9y7H7lL8JBKHQRcYqr8dv9hVF1eqJjFhwoTI/FKdLJUhau6aGoQr8eugHH1hpfrbuiBIJs1bVM9wp06d7J1jda8IlVVM7+SiSgaa+64TURgODpnd9/6FlarT6ASqUoFhmKvsx6V64KqOogOZetBU4lieeeYZO79aI266YFRlkieeeCIUd5AN2nb80s06sSe77QSN3ad7qCQqh5mbNEKr0Qe1b58u8LT+IVGFFI1O3HTTTfazEIY27/Lx0vXjjcvt3vV2k1H8ut+Lf52jRfhhO967HLtL8ZNIHAZdQCkz9HvLooeRdDMl9aTFVwQKQw3izMavkRM//rBVHRk/frwtpShaOH2ok4u2acF1sm/8lNV9v2rVKnuxFYZeWfFPeDqp+4tgtUBWi9iVrOnAF6Z9fThtRzTSohEtF2MPC1VsUfnlV155JeY4qIpBzZo1i9xPJ74XOkxVvVw+Xrp8vHG53bvebrJ6nRMWLsfuUvwkEodJN57TnPvoHszo7/l3eQxTAhEkfvXkh0l0r43/b+1bzSeNP7moJ0slPcMqyL4Pw1zN+P2vCliqZe1fALZu3drLnz+/7XkO4wkxK20n+j1xLfYwXVzpZmDRIzr+a1CPmm4SFr0tzFw7XqbS8cbFdp8K7SZI/GH8DLscuyvxh6N8iiO2b99uH6o64lNVpqVLl9pKI1K4cGGzd+9e++969eqZnTt32n+HoVJNVuJXRZ4w8GPfunVrZJsqFOzbt8/u28aNG5sHH3zQVKhQwbRs2dKsXbvWDBw40Fx55ZVm165dxvV9r4pCYYg9uoKFqu1UrFjRFCxY0Fx//fW2itPo0aPNp59+arp3726rk7ncdlSpJlF1GFdiT/b+92NXtZ8iRYrY6nWifeq3I1Xh8fextt16660xVdaSKRWOly4eb1Kl3bvYbg43/mRXOHI5dqfjT1oK4xgt0NXdVNUDq4UsWgTo3x3zySef9I477jiva9euMT+jRdaaR64ekmRnuy7HHx+76idHLwCM7oH66quv7GIjzfHVTWXCcL+IVN735557rl0MpmoS/r0tVClF8zejqx4li8ttJ9Vi153kfX6bVqUR3XdEhgwZYufl67UkWyp9Zl2KPdXavev73qX4XY7d9fhJJDJB83RVYUE3SFJpy8cee8wOqbZr187ODZcXXnjB3khGZd1U7UKLCDVnPAx3HXY5/vRiV7mz6JK6/tQxDfupqoHmDia7wk4q7/t33nnHPkdVUVThJf4GefElJZPB5baTF2KXN998014IqiqWyoyG4UaLqfiZdSH2VG33ru97F+J3OfZUiJ9EIhNefvllO483uidE9Xt1cNNJcNasWZE7tPbq1cu74oor7EKYMLzBrsd/qNhbtWrlffTRR5Hteo6qY6lkYbIr7KT6vlfs6v1TNR7FHkYut528Evu4ceNsb7LKMoehzGiqf2bDHHsqt3vX933Y43c59lSIn0QiE3SDmGrVqh1UiWb27Nl2gemVV15pb6QULUyLq12OP6PYdZOw6NhVFSYsZfNSfd/rYKYynWHlcttJ9dj9BbIagWjevLm9UV1YpPJnNsyxp3q7d33fhzl+l2NPhfhJJDJBb6aGTl977TX7dXTWqPng6hH5+OOPY34m2fPtUiX+rMQeJux7t/Z9WOSV2FUdKAx3mc/Ln9mwxJ6X2n2q7PuwxO9y7KkQP4lEJumW4yVLlows5op+ozVHXHdrDTOX43c5dtfjdzl21+NP5dh146QwS+V9H+bYXY/f5dhdj9/l2F2Pv0Dy6kWF08qVK82rr75q/vjjD1tirkuXLnb7sGHDzLp160ybNm3MO++8Y0vOyT///GPLcVWqVMmEgcvxuxy76/G7HLvr8efF2CtXrmzCIC/u+zDE7nr8Lsfuevwux54K8SeU7EwmTBYuXOhVqVLFLupq0KCBXQSo2437Nm7c6HXr1s0rUqSIzQ4ffvhhr3///nahYBhuqe5y/C7H7nr8LsfuevzEzr7Pa+3G9fhdjt31+F2OPRXiTw+JxP+ner0qpaVa5ror78qVK70mTZp4lSpV8latWhWz00aNGuWdd955tt6v7jgYhooRLsfvcuyux+9y7K7HT+zs+7zWblyP3+XYXY/f5dhTIf5DIZHwPG/Tpk32zYy+DblceOGFtrbv2rVrD1oQuG3bNm/Pnj2hqJfvcvwux+56/C7H7nr8xM6+z2vtxvX4XY7d9fhdjj0V4s8IiYTn2cxQN9YqV65c5EZbI0aM8PLnz2/vunrxxRd7p5xyih1imjJlivfnn3+GasW8y/G7HLvr8bscu+vxEzv7Pq+1G9fjdzl21+N3OfZUiD8jJBL/34oVK7ybbrrJzkW7/PLLbfaoN3THjh22TvXUqVO9Zs2aeVWrVrVz23bv3u2Ficvxuxy76/G7HLvr8RM7+z6vtRvX43c5dtfjdzn2VIj/UPJsIqEbIn3++ef2hjY+3QxEtygvWLCgd8cdd0S2+5mh3nDNZVN2mWwux+9y7K7H73LsrsdP7Oz7vNZuXI/f5dhdj9/l2FMh/iDyZCKxZMkSezfVrl27er179475nlbG33zzzTZrfPvttyN3EIyu6ZtsLsfvcuyux+9y7K7HT+zs+7zWblyP3+XYXY/f5dhTIf6g8lwisWjRIq9MmTLenXfeGbNS/rPPPov8e/ny5d6NN97olSpVyg49SVjmq7kcv8uxux6/y7G7Hj+xs+/zWrtxPX6XY3c9fpdjT4X4syJPJRIaatKClj59+sRsHzlypK3n2759+5g3um/fvna7blEeBi7H73Lsrsfvcuyux0/s7Pu81m5cj9/l2F2P3+XYUyH+rMpTicS0adO8008/3fvhhx8i25577jlb2/ehhx6yNwqJfqOXLl3q3Xbbbd5PP/3khYHL8bscu+vxuxy76/ETO/s+r7Ub1+N3OXbX43c59lSIP6vyVCIxfPhwuyLeXw2/d+9e76WXXvLmzp1rv549e7ZXsWJFr23btpGf0Y1DwsLl+F2O3fX4XY7d9fiJnX2f19qN6/G7HLvr8bsceyrEn1X5TB5SsGBBs2/fPpOWlhb5+qqrrjJnnHGG/bpp06amZ8+eZuPGjWb79u12W4ECBUxYuBy/y7G7Hr/LsbseP7Gz7/Nau3E9fpdjdz1+l2NPhfizKqUTiaVLl5rJkydHvm7Xrp3ZtWuX6d+/f2Sb3vRoeoMbNWpkihYtapLN5fhdjt31+F2O3fX4iZ19n9fajevxuxy76/G7HHsqxJ9tvBSlUloPP/ywXcjy+uuv223bt2+3dxfULckHDRoU8/xdu3Z5t99+ux12CsN8NZfjdzl21+N3OXbX4yd29n1eazeux+9y7K7H73LsqRB/dkrJRGLx4sXesGHDvL/++ssbMmSIvfmHf1MQ3RCkY8eOXokSJbxWrVrZ+Wua19alSxd7+/L58+cnO3yn43c5dtfjdzl21+MndvZ9Xms3rsfvcuyux+9y7KkQf3ZLuURi4cKFNkN84IEHItuUGeqNfvnll+3Xv/76q/f44497J598sle5cmWvbt26NovUbcqTzeX4XY7d9fhdjt31+ImdfZ/X2o3r8bscu+vxuxx7KsSfE1IqkdDdBIsWLeoNHTr0oO/Fv9H+0NSKFSvskNOePXu8ZHM5fpdjdz1+l2N3PX5iZ9/ntXbjevwux+56/C7Hngrx55SUSSR0N0ENG9WpUyeyTaW3Er3Rr776qhc2Lsfvcuyux+9y7K7HT+zs+7zWblyP3+XYXY/f5dhTIf6clBKJhIaadMOPli1b2ht+6G6B0Rlh/But544bN84LC5fjdzl21+N3OXbX4yd29n1eazeux+9y7K7H73LsqRB/TnM+kZg3b57NAO+55x77hj777LM2azzUG33TTTd5FSpU8LZt2+Ylm8vxuxy76/G7HLvr8RM7+z6vtRvX43c5dtfjdzn2VIg/NzifSMycOTPmDd26dWum3ugNGzZ4YeBy/C7H7nr8LsfuevzEzr7Pa+3G9fhdjt31+F2OPRXizw3OJxLRDhw4YP+vLDDRGx32W5G7HL/Lsbsev8uxux4/sbPv81q7cT1+l2N3PX6XY0+F+HNKSiUS0aLf6FtvvdVzjcvxuxy76/G7HLvr8RM7+z6vtRvX43c5dtfjdzn2VIg/O6VsIuG/0c8995yt+Tt48GDPNS7H73Lsrsfvcuyux0/s7Pu81m5cj9/l2F2P3+XYUyH+7FLApLBSpUqZzp07m4IFC5rGjRsb17gcv8uxux6/y7G7Hj+xs+/zWrtxPX6XY3c9fpdjT4X4s0uasgmT4vQS09LSjKtcjt/l2F2P3+XYXY+f2Nn3ea3duB6/y7G7Hr/LsadC/IcrTyQSAAAAALJXvmz+fQAAAADyABIJAAAAAIGRSAAAAAAIjEQCAAAAQGAkEgAAAAACI5EAAAAAEBiJBAAAAIDASCQAAAAABEYiAQAIhZo1a5pRo0ZFvtbdYqdMmZLUmAAA6SORAAAk1KNHD3sxH/9Yvnx5juyxefPmmZ49e/JuAIAjCiQ7AABAeLVt29aMHz8+Zlv58uVz5G/l1O8FAOQMRiQAAOkqXLiwqVSpUszjiSeeMPXq1TPFixc31atXNzfeeKPZuXNn5GcmTJhgypQpY9577z1z/PHHm2LFiplLL73U7N6927z44ot2CtMRRxxh+vbta/755590pzZFa9WqlenTp0/Mtk2bNplChQqZGTNm8A4CQBKQSAAAgp048uUzo0ePNkuWLLGJwaeffmoGDhwY8xwlDXrOG2+8YaZNm2Y+//xz06FDB/PBBx/Yx8svv2yeffZZM2nSpEz9zeuvv9689tprZs+ePZFtr7zyiqlatapNMgAAuY9EAgCQLo0qlChRIvLo3Lmz6devnzn77LPtCIIu4u+77z4zceLEmJ/bt2+fGTt2rGnQoIFp0aKFHZGYPXu2eeGFF0zdunXNhRdeaH/HZ599lqm937FjR/v/qVOnxox8+Os4AAC5jzUSAIB06WJfCYFP05k++eQTM2LECPPTTz+Z7du3m/3795u///7bjkJoGpPo/7Vr1478XMWKFW3ioWQketvGjRsztfeLFCliunbtasaNG2cuu+wys2DBArN48WLz3//+l3cPAJKEEQkAQLqUOBxzzDGRh6YWaTTh5JNPNpMnTzbz5883Y8aMsc/du3dv5OcKFiwY83s0apBo24EDBzK99zW9afr06WbdunV2AbhGQ4466ijePQBIEkYkAACZpsRBF/+PPvqoXSsh8dOacooWeJ922mnmueees+slnnrqqVz5uwCAxBiRAABkmkYltP7hySefNCtXrrSLpp955plc24MalXjwwQeN53l28TYAIHlIJAAAmXbKKaeYxx57zDz00EPmpJNOMq+++qpdL5FbrrjiClOgQAH7f62bAAAkT5qnbh0AABywevVqu4hbd8E+9dRTkx0OAORpJBIAgNDTdKrNmzebAQMGmFWrVpk5c+YkOyQAyPOY2gQACD0lDpUrV7YjEbm5JgMAkD5GJAAAAAAExogEAAAAgMBIJAAAAAAERiIBAAAAIDASCQAAAACBkUgAAAAACIxEAgAAAEBgJBIAAAAAAiORAAAAABAYiQQAAAAAE9T/A3WR4RzZxZ++AAAAAElFTkSuQmCC", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "top_families = high_conf[\"Prediction\"].value_counts().head(15)\n", "\n", "fig, ax = plt.subplots(figsize=(8, 4))\n", "top_families.plot(kind=\"bar\", ax=ax, color=\"#2166ac\")\n", "ax.set_title(\"High-confidence predictions \\u2014 top 15 families\")\n", "ax.set_xlabel(\"Family\")\n", "ax.set_ylabel(\"Sequences\")\n", "ax.tick_params(axis=\"x\", rotation=45)\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "efe77d99", "metadata": {}, "source": [ "## Post-hoc evaluation\n", "\n", "When ground-truth annotations are available (as they are for this demo dataset), we can assess\n", "prediction quality by comparing Snekmer's output against the known family labels.\n", "\n", "The demo test set contains 3,000 proteins split across three groups:\n", "- **In-family**: proteins from TIGRFAM families present in the training set\n", "- **Other annotated**: proteins from families *not* in the training set\n", "- **Unannotated**: proteins with no known family assignment\n", "\n", "Because this demo uses a small training set (200 families, 5,000 sequences), accuracy here\n", "represents a lower bound. Performance improves substantially with larger training sets.\n", "\n", "> **Note:** All sequences receive a prediction regardless of confidence. Adjust `conf_cutoff`,\n", "> `score_cutoff`, and `delta_cutoff` below to explore the precision/recall tradeoff." ] }, { "cell_type": "code", "execution_count": 8, "id": "58b5c65b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Kept: 995/3000 | TP:773 FP:87 FK:1140 | Precision:0.899 Recall:0.404\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAhwAAAEiCAYAAACyZgs8AAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjgsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvwVt1zgAAAAlwSFlzAAAPYQAAD2EBqD+naQAAO5NJREFUeJzt3Qn8THX///+Xfc0e0aKoLF2WoqQshSgipZIUSnS5kkplabGXK1FaucqFpEVKKSKiiOyyZCnChcq+ZcnW/G7P9/9/5jszn/1jTp/PZ+Zxv92Ojzlz5pwzZ2bOeZ3Xe8sWCAQCBgAA4KPsfq4cAACAgAMAAPwtyHAAAADfEXAAAADfEXAAAADfEXAAAADfEXAAAADfEXAAAADfEXAAAADfEXAAMe66665zUzzasmWLZcuWzcaOHRuc169fPzcvWr799lu3Pv2Nd/H8XUPKCDgQF9588013UahVq5ZlRs8//7x99tln6X792rVr3YVUF1j49x0KDVyQst9++819L1esWMHhAgEH4sN7771nF154oS1evNg2btxosRhw9O/fP9GAY8aMGW7C/+eZZ56xY8eORS3gqFevnluf/iJhwKHvJQEHhAwHYt7mzZvt+++/t5deesnOPvtsF3zEk9y5c7spK9GYkukJClIjZ86cljdv3qitL3v27G59+hvLxw04Uxn/CwF8pgCjaNGi1qxZM7v99tsTDTi8sv6hQ4faW2+9ZeXLl7c8efLYlVdeaUuWLAlbtkOHDlawYEH79ddfrWXLlu7/CmSeeOIJO336dNiyR44csccff9zOP/98t74KFSq4bYQO0qztarl33nnH/V+TtiH/+9//7F//+pd7Xb58+ax48eJ2xx13hGUydNeteXL99dcH1+HVKUisXH3Xrl3WsWNHK1WqlLtYVqtWzW0/vcckMdovvX7u3Ln24IMPun0vVKiQtWvXzvbv3x+2rLJPN998s3311VdWs2ZN917/85//uOcOHDhgjz76aPAYXnzxxfbCCy/YX3/9FbYOLafjVrhwYStSpIi1b9/ezYuUVB2O8ePH21VXXWX58+d33xdlLLzMkPZvzZo1NmfOnODx9Y5pUnU4Jk6caDVq1HDvpUSJEnbPPfe470x6v0uJicZx+/DDD91+nnXWWe7zqVKlir3yyispHi/v802qGE/HQ98Vue+++4LHzcsSbdiwwVq1amXnnHOO+w6ed955dtddd9nBgwdTfN/ImnJm9A4AflOAcdttt7m7/DZt2tiIESPcBdM7GYZ6//337Y8//nAXSJ0chwwZ4l67adMmy5UrV3A5XQyaNGni6oTogvz111/bsGHD3EW5S5cubhkFFS1atLBvvvnGXdyrV6/uLgxPPvmku8C8/PLLbrl3333XHnjgAXex69y5s5un9Yj2U9kZnYh1QtbJXfuvi52KUXRx1IWxW7du9uqrr9pTTz1llSpVcq/1/kbSHbBer6Klrl272kUXXeQujrr46SL1yCOPpOuYJEXbUACgC9dPP/3k9l+BlHeh9ug5fT7aTqdOnVyQdfToUatfv747Xpp/wQUXuOPRu3dv+/3332348OHBY33LLbfYvHnz7J///Kd7759++qkLOlJDaX/t3zXXXGMDBgxw35VFixbZ7NmzrXHjxm47Dz/8sAsInn76afcaBWtJ0UVVF1l9xwYPHmw7d+50F/H58+fbDz/84I5HWr5LyTmT4zZz5kz32oYNG7pgRNatW+f2M/J7kFb6DHQs+/Tp477XdevWdfN1jE+cOOHe8/Hjx91xVdChfZ0yZYr7DipoRAwKADFs6dKlSiUEZs6c6R7/9ddfgfPOOy/wyCOPhC23efNmt1zx4sUD+/btC86fPHmym//FF18E57Vv397NGzBgQNg6Lr/88kCNGjWCjz/77DO33KBBg8KWu/322wPZsmULbNy4MTivQIECbr2Rjh49mmDeggUL3HrHjRsXnDdx4kQ375tvvkmwfP369d3kGT58uFt2/PjxwXknTpwI1K5dO1CwYMHAoUOH0nxMEjNmzBi3nI6J1u8ZMmSIm6/1eMqWLevmTZ8+PWwdAwcOdMfm559/Dpvfq1evQI4cOQJbt24NO9Zat+fUqVOBunXruvnaF0/fvn3dPM+GDRsC2bNnD9x6662B06dPh21H3xfPZZddFnYcPTrmocde77VkyZKBf/zjH4Fjx44Fl5syZYpbrk+fPmn+LiXlTI+bfgeFChVyxyopkccr8vPV9ySp79qSJUsSHH/54Ycf3Hx9bxE/KFJBzGc3dCeqogbRHXXr1q1dGjmxlLWeUzrd492V6W4+ku6kQ2nZ0OW+/PJLy5Ejh8s+hFIRi+7Ip02bluL+K0XuOXnypO3du9elxnWHvHz5cksP7ZfuKHVn61GmQvt5+PBhV2yQ3mOSGN3dhmZCdNeuehTaj1DKtOiuN5QyL9qetr9nz57g1KhRI/f5qbjGe09aZ2hGQMded88pUWVdFTPoTjyyHkZ6ms8uXbrUFVmpKCy0roiK9CpWrGhTp05N83cpOWdy3PQ9UnGeMh1/Jy+DoYyfsjGIDwQciFk6sSqwULChiqMqQtCk1LVS3LNmzUrwGqWeQ3kX2sg6B7qQqKw9ctnQ5VRsUKZMGVc2Hsor6tDzKVHxhy6EXjm86gJou0o7p7esW9u95JJLElxck9qv1B6TpGhboVQsUbp06QRl/7pwRlI5//Tp0917Dp104RRd2L191jq17lAqXkjJL7/84o5F5cqVLRq845fYthVwRB7f1HyXknMmx01B0aWXXmo33XSTK7K7//773ev8pn3u3r27jRo1yn2nFTC98cYb1N+IcdThQMxS+bvKqxV0aEos+6Hy+VC6K05MaCXP5JaLNt2hjxkzxlX+q127trsz1F236nREVv7zS2qPyZkKzeZ49B5vuOEG69GjR6Kv0cUyqzvT79KZHLeSJUu6JqvKNCjjpknfN1Xs9SoRJ5XlSU2l1uSonorqDU2ePNlVzlWGTfVdFi5c6IIfxB4CDsQsBRQ6oerOKdKkSZNcpcKRI0cmesKOhrJly7oKgKpwGZrlWL9+ffB5T1In9Y8//thVfNTJ2fPnn38maH2RltS/trtq1Sp3UQrNciS2X9Ggu22vSEtUbKNAsGnTpim+VhUntbx3Z54U7bMyVlo2NMuhCpWp2YaOhSrhqmJvUlJ7jL3jp203aNAg7DnNi/bxPZPjJqog27x5czfpOCjroZYuzz77rCu+8zJa+s6FVnZNTYYupWOmFjGa1DeKKrVee+217jc5aNCgVL1PZC0UqSAmqShCQYWaDKopbOSklhMKBD7//HPf9kEXVN0Fvv7662Hz1TpFJ2KlsT0FChRItAmn7n4jMwmvvfZagrtLvV4SW0di+7Vjxw6bMGFCcN6pU6fcenWxVuuGaFKTWtU/8aiVirYX+v6Tcuedd9qCBQvcHXgkvVetx3tP+r/W7dEx0ntKiZqjKvBSi4rIrFHosU/qM4qk5qkKdHXhVCsMj7IHagGiuhx+S+1xU52gUDoOVatWdf/39t1rMeXV+xCvGXdKkvpeHjp0KLgPHgUe2n7oMUNsIcOBmKRAQgGFmqUm5uqrrw52AqZKkX7QHaPu7NWMUvUV1NeFUsdKIauIxDuRi/pBUDZEnZOp3ofKuFXXRAGTms2qKEV1DHQR0XLq0yKU7swVnKhpo+p2qL6H7q514UusEqfuYJXOXrZsmevLQZkUNYVUc8nIOidnSk0g1exSF0Hd4avHzjp16iT52YRSE2J9ljoO2l8dJ13sVq9e7fZZx1V1AHSsdXfcq1cvN0/HSgFnauq56C5en9HAgQNdRUs1+dXxU5NkfRZK84u2rYBGd996jY5tZAZDVEFWn4OaxSp4U+Vcr1msjvVjjz1mfkvtcVNz7H379rn3oWIMZS0UpOn75NXpUbGj6vGoabfWq+/Z6NGj3e9n69atye6HvuPKiij40vdKAYi+1ytXrnRBv/qPUfGOgg99z7Vu9c2BGJXRzWQAPzRv3jyQN2/ewJEjR5JcpkOHDoFcuXIF9uzZE2wC+uKLLyZYTvPVNDC0KaOaHKam+eAff/wReOyxxwJlypRx27rkkkvcNkKbW8r69esD9erVC+TLl8+tw2siu3///sB9990XKFGihGuy2qRJE7esmkNGNqN9++23A+XKlXPNHkObaUY2VZSdO3cG15s7d+5AlSpVEjRdTMsxSYzXbHLOnDmBzp07B4oWLereQ9u2bQN79+4NW1bvp1mzZomuR8ewd+/egYsvvtjtq/b5mmuuCQwdOjSsua3Wee+997pmnoULF3b/95pfJtcs1jN69GjXHDVPnjxuX3XMvObUsmPHDrePZ511lnu9d0wjm8V6JkyYEFxfsWLF3Pvevn172DJp+S4l5kyP28cffxxo3Lixa8arZS644ILAgw8+GPj999/D1rVs2bJArVq1gsu89NJLqWoWK2r+XLly5UDOnDmDn8WmTZsC999/f6B8+fLud6rjc/311we+/vrrFN8zsq5s+iejgx4Ascfr/EqZAhUzAIhv1OEAAAC+I+AAAAC+I+AAAAC+ow4HAADwHRkOAADgOwIOAADgOzr+SgX1Pvjbb7+5jmvSM3okAACxSD1rqJNFdZIXOSBkJAKOVFCwodE6AQBAQtu2bUtx0D0CjlTwunrWAS1UqFBqXgIAQMw7dOiQuyFPzZAIBByp4BWjKNgg4AAAIFxqqhtQaRQAAPiOgAMAAPiOIhUgg2t4a2ju06dP8zkAyLRy5cplOXLkyLoBx+DBg23SpEm2fv16y5cvn11zzTX2wgsvWIUKFYLL/Pnnn/b444/bhx9+aMePH7cmTZrYm2++aaVKlQous3XrVuvSpYt98803VrBgQWvfvr1bd86c//f2vv32W+vevbutWbPGVXB55plnrEOHDn/7ewY8J06csN9//92OHj3KQQGQ6etoqBWKrrFZMuCYM2eOPfTQQ3bllVe6u7ynnnrKGjdubGvXrrUCBQq4ZR577DGbOnWqTZw40QoXLmxdu3a12267zebPn++e151hs2bN7JxzzrHvv//encDbtWvnorHnn3/eLbN582a3zD//+U977733bNasWfbAAw9Y6dKlXQADZETfLvpe6o5B7ddz585NHy8AMm0mdvfu3bZ9+3a75JJL0p/pCGQiu3btCmiX5syZ4x4fOHAgkCtXrsDEiRODy6xbt84ts2DBAvf4yy+/DGTPnj2wY8eO4DIjRowIFCpUKHD8+HH3uEePHoHLLrssbFutW7cONGnSJFX7dfDgQbdN/QWi4dixY4G1a9cGjhw5wgEFkOkdPXrUnbN07krv9TFTVRo9ePCg+1usWDH3d9myZXby5Elr1KhRcJmKFSvaBRdcYAsWLHCP9bdKlSphRSzKWqhtsIpPvGVC1+Et460DyCgp9cwHAJlBNHrZzp6ZUsyPPvqoXXvttfaPf/zDzduxY4dLNRcpUiRsWQUXes5bJjTY8J73nktuGQUlx44dS7Avqiui50InINZVr17dTZUrV3YpU+9x69atfdme6lCde+65bhu6kXjwwQfdDUZ6NG3a1H766Sf3/7Fjx7p6YZ7PP//cFc0CyFiZppWK6nL8+OOPNm/evIzeFVfhtH///r5vp0ePHr5vA5mTeuVr2LChqywdWrn56pmv+7K9hTd0TXGZKVOmBHvUvfHGG4OPRWW3orpWofubnJS6OZYnn3zS3Wiocnj9+vVt5MiR9vDDD1taffnll8H/K+DQTYqCGGnRooWb4kHjHlN9We+MIc1StdyFF15oefLkcd9rVYrWeV3TmdB14eabb7YtW7a4YSYUAH/33XfJvmb48OF21113ubp9afXEE0+4ipH9+vVL9Pk77rjDNUCoXbu2C5oVMOs77NHrDhw44PYhOWPHjrXPPvvMTZmF3vsVV1xhd999ty/rzxQZDlUE1clNrUxCT1L6suhLqw8v1M6dO4NfJP3V48jnveeSW0a9huqHEal3796ueMebdAIG4pVOrKqArZO+MgUqigytbK1sgpYJbRGmit01atSwq666yv2uU5I3b14XcChLoYrgCkSU6dSkAETnARk1apTLwOgkr6LURYsWBS90K1ascM8vXbrU7aeWUSCiE3vLli3dcjfccIN9/PHHYft6+eWXu/9rAKpOnTq5fa5atap17tw5uF2k3oQJE9xnMW3aNNcQYNWqVQmy2ZrSQxWsUwo2RBd7L8MdTYsXL7Z9+/aFfd9jSY8ePVzA5Fcz/ewZXfNVwcann35qs2fPtosuuijseZ2w1NpErUo8OiGpGaz3gevv6tWrbdeuXcFlZs6c6YIJnZi8ZULX4S2T1JdGEbrXjTndmQNm+/fvty+++MJee+21ZA/H//73P3v55ZftnXfecXWw3n//fXe3pGLKlNY/ffp095t/6623bMmSJe71unD98ssvbp2iJvL6LWv+8uXL7bLLLgtbj1qf1axZ0y2vZVTUEuq+++5zAYhnzJgxdv/99wfXXbduXXdRWblypbsovvLKK3z86VS2bFnXxcHPP//sLmKtWrVygaqCSLUm/Oqrr6xOnTqJBqZaXq0h9Jy6RPAoyxFaxK7gV+uoVq2aCxInT55sAwYMCGZCFHTqe6Ciul69erntaN6dd97pvnOifdF+6Xqhun5eNi8x//nPf9J096/3of1o3ry5W3+DBg1cwBJJ+6vWmqNHjw4G0H369HHXKF0XBw0aFFx248aNbj/1fvVevAyJfjcKkkUtPVXnYsaMGe6xjommlNZdsmRJK1++fPB1MRVwKNU2fvx4d1JSilkRqSavXoWawXbs2NGlr/Rl1AlIJwwdqKuvvtoto2a0+iDvvfded5LQl1h9bGjdChxEzWE3bdrkojfdjakfj48++ohyXSCVlEZOTaUxZQx0Ubj99tvdyVB/VTFWNwmJefHFF91yKl7SskpRf/311+6vfr8qvlHWQTcIouX0W1cgoGbFae0T4NZbb7WFCxe6i8zhw4ddZtW7gOjE7e2Psh66k9bJHemjG0GdbxUMeMHBuHHj3MVQAaguxspARQamXjcImq9slb5PidGFW5krFYHr3K/AQgGjLqbKhHiZFn2e+lzV1YKCSc1TdkzXCenWrZsLRLRfCpQjb04jv9+1atVK03FQFk5B7tq1a90FXUFL5HFS5u25554LBr+izL6OmYJv7f+vv/7q5rdt29b9HpU50nHSNVKBvoIQ/XZCb6hDH4c2nEhq3UndoMdEHY4RI0a4v9ddd13YfN11eJ1y6U5FJyxFx6Edf3lUuU0nDXX8pQOlL5U6/vKiOVEUpy+x0qw6UanYRqlX+uAAUsfrF8f7zYWmXEOzF8pa6qT/+uuvp6kOR3JCA51PPvnEXYh04lf2QndnKqtPLRWh6mT97rvv2tlnn+3uOIsXLx7cd63/0ksvTfX6kJDu6HWc8+fP7+7YlakQfV5e5X1lsxTM1atXL/g6LzDVxU4ZCG+gTFUmTqxuny6YyqDo++a93mvhGEnBpIrH9fmKisp0py/a3tChQ93/VYk5ufo+yn6ENkBIKggPna/6UN53rPb/n5H3qCWltqf98wIzjxcIlyhRwsqVK+cCbB0TZfa8fqh0bJXhUXB8zz33uHm6uVagoUBMWTsF1gp2FFQlt269d68KgpaPuYBDP/DUlO2+8cYbbkoudRdaaSwxCmp++OGHdO0ngP+jZum6I9q7d687kaq3YI/qYaj8fN26dcGAQ3eVoSe7lOhOTHfCOinqIqKbA2UyVWFVd7sqMtG0Z88et+7IgEMnZa+JfWKUJdVNie42lWb36G5ZPR3rDlSZFaXc9R4vvvhiPv40UGZBWYVIodkonft1V6/Mxt/RHFPbU3Ggvkdnsj0FUarg7FHQqu9IKH0vvYu3dw0LDdZPnToVfKxMjAJ2VSmIDDiSe11S+6vfjurObNiwwf0WvSBagU5oZe/k1q33l1jdxpipNAog69AdkIopVS6tu7PQMnVlE3Vi14VcJ9BKlSqlWFs/ksqhVVNeky5cuhNVFkRZFaWcVQdA85XpUHFrYq9XJVev0mgkBT86yeoOO/QCpGyqTrR6ncrHVXyTVDofZ0bZZd2Fh1YoVfDoXTRVVKBKvLpgqm5CYjQUhi6sXiVS1bnx6kdEBp0KJvX5esMI6K/XT5O259WdUFGbmlEnRd8Lr/m19z60r9529XrVI1EwlRpFixZ1xR3KcIRm5ZOiqgf6XagUQPQdVvbHyxTpvaiIxAvwlcHr27dvgn6okqObhcjgJ+aaxQJIXfNVv2msIe9kLIl1kPfII4+4yRPaz4VS3JpSKlIJrbwZSsGAUtxemjvU3LlzE31NaGCg1jSaQkWOm6SmlondgasoKCtLbfPVjKaskbIbKi7RxV9FHKo3o3kqelHwoQurAoebbropyYu1Ghyo2EDBibJhAwcOdIGw6mWo7o8yEvqe9ezZ02USVP/Cywhoniodq5hd3w/VBVRmQhfppKiekeoJehdwBaXa1vXXX+/Wq0l1MZSBS62zzjrLFTGpfpGKGBUwJEfDcyjg13dV21MGUFlHb39ULOXtnwIf/Y40PzUU4KmIKTTzF03Z1N2oL2uOIer4SxVYFTF75YrRQD8c8cvrh0Mp1dT2a5HVpKYOB5CVqD6EMisKwkPrNcWK6dOnu4YcmiKpqEV1PZTFDC2SScv1kSIVAABSQVkwFc3owhuLDh48aEOGDPFt/bF5awUAgA9SWzyRFbX2aRgDDxkOAADgOwIOAADgOwIOAADgOwIOAADgOyqNApnI0WfK+rLe/IP+l6rl1CNh7ty5g83e1NFRcv0CqJtwjeWg7pvPlPpLUN8eananwbbUt4B6/fT6GEgLjaehbq817oS6QVeTPm8fUzvEeVakHiv9kFzfFEkNTy/qj0Kdw2ksLLXu0GehTtw0nonG8xg5cqRvfT6EYsj5K3wbcj4tCDgAhNFYRZGjsP5d1IGSN/qlgg91KOaNf5EWob026iKni5sXcKR2iHNEr2vzxMYn0Wfy73//O10Bh7rijlb/NfEw5HydOnVckK1O9TISRSoAkqXeHNV7oy7Y6grcG7k10gcffODuhNXds3o61CiZou6nmzVr5obfVsYktb15aj1eN9IabE2v1aR1eaNbauRXDWGuC5y6PPcGhFTPkepSXXfSuotW74xaRoFI6BDn6hWya9euYR07aQCw3bt3u8fqpVHdRKvXS71/jcqJtGeu1LV4JPWWqR5C9bl4PXNqtHAN3KZjHjqiq5c9Ue+gek5j4TDk/IBMMeR8WhBwAAjzr3/9y13sNWkgKA18qPEl1Avhf//732A30ZHUrbSCDnX9rNcpS6LxT9q0aWPDhg1zQ2ErQNDYGPp/cvQ6jVGhYELdkKvLZ61TY2+op8cHHnjALacRMZUuV2Ch5SIHctPFTBc2Fa1oGZ2UQ7Vr184++uij4PvRNpVl0aBc6mZbAY96ldQInVqHjg2SpztpHXdNClaTokBQPe7qc9Ew9KJA4qGHHnJZBw22qfn6TDwaKE2BrAJIhpyfmSmGnE8LilQAJFuksnLlSnv44YfdwFRKY+vEtm3btgSjqF577bWufF4nQV20daelYa41LktoIKC7Ws1XxiOSyvq9dLyyCgpUFMQou+CNwKmLvu7sFJRoOwp0lEVRdkWp47SOG6MxPBRQqT6K7sYV3IiKdnTyVtAj2h7SXqSS1Jg5kY4cOeIuijt37gzLOIUOlqbMlTcWCkPOr80UQ86nBQEHgGTpjrN3796uKENUdJFYhkOZC2UgdJelzIHK55UWVxGF7mLTWocjKaHDcSvAueWWW9zIo0899ZTbNwVMaaERaDX6pgILjb7p1fXQMFN63xp9Fv7zhvVSFix0rI7khrhnyPmcGT7kfFpQpAIgxfEVlAmQSZMmhQ377dHJTXUjNKy1ijAUnCgtrpYiGtDJG05bdFH3hvNObRCi4hy1LvFS8epeWidV3f0qk6KRQRVw6GIVKXKo8kiqX6BMhopn7rnnnmBlRM3Xtrx9VZ0BpfkRHfpcjh075kaK9YIJfdYKVD36zLdv357k58aQ8xk/5HxakOEAkKz+/fu7IEIXCNWf8NK0oVTcoLoUKm7RBbt48eKuDoT+P2XKFJeJ0MVByynlq+dSS1kLlUl7mQcFP2+//bb7vyqgqimomvIqAFERTCQN+61Kp0rz33bbbS77EkrNOFVRUZkRnZg9qrOhOgO6CHpBlbIhKoLBmVPmS5+FKgIr2FB9DdXN6N69u/vMlcnSiKxqGp3YyMMMOd8wUww5nxYMT58KDE+PaGN4eiBziOch59OC4ekBADgDDDn/96FIBQAQ1xhy/u9BpVEAAOA7Ag4AAOA7Ag4AAOA7Ag4AAOA7Ag4AAOA7WqkAmcirr77qy3q7deuWquU0yJM60fK6SVanTDfccIN9//331q9fP9dXgf5qgDb13qm2/er63G/qVEzNF7XtxGgcFHUYpf3XMuqATKPFijr90qijU6dOdR1KZQSND6IRd3UcozWsOpDV8M0HkOzgbaJh6RPrEE/LpifgUK+d0brwqgt1dT+uYCOxAE69kX777beuC/SMUqpUKde51Lhx41xvpUA8okgFQLI0fHvHjh0TzNfAZuqlUcPYN23a1M3btWuXdenSxW6++WY3cJsyC54LL7zQdUetES41DLnGJlF3y3qsbsfVvfj+/fvdshqZVuutXLmyG302qfE0RF1feyNlRnbJPnr0aJs7d24w2NDIpVpfmzZt3P7VrFnTNm3aFHyNulBXsKXn1LW5NwaLunP3xnLRfip4EA1ip27c9TeldWu+9hWIVwQcAMJo+Hdd7DVNmzYtyaOjwc5UzKHilS+//NLNe+yxx9z4GBo/RQOdaXyMiRMnBl+jsUkWLVrkxszQxV1dSStDodFkQwMUFQEpENGQ2u+8844bCyIpyl7UqlUrbJ6Kej788EP3nIbmDqWB2p5//nlbvXq1CxBeeOEFN1/vVQHK/Pnz3XPaN2/8CXUMpRFp//rrL1u5cqULRJThmTdvnhtlVuOxJLdu0XIaTVevA+IRRSoAki1SUYYjNY4ePeou1nv27HGPc+XK5TIgGtHV06FDh+Dw8hqGXhfuTz75xD3WqKHKgogCjKFDhwazCy1atEhyu8p+qMgilLILuugr8NEIsKFU9OJlPPR/DXEuCihat25tRYoUcY+VqVHdEFHwoOd1XDTqpranYEZ1WkJ7qUxq3aIipKJFi7pMiQbCA+INAQeAqNColDJ58mRX6TSxET6VEQldXhfkxOqHRPKClMTkz5/fDSwVqmLFim50WgUKykqEjhDrVYgVjTCr+iQpbVPrURGSV8SjgEMBiAKOESNGpHrd2s98+fKl+H6BWESRCoB0UfCgC6gyE6IiCNVtUIbEo7v5pOpftGzZ0gUFyoyI/q5Zs8b9Xxd1FW949Tk+//zzJPdDLWlCsyieSpUquUyJAoUxY8ak+H60TWVzvCIP1bfwgqEyZcpY4cKFbeTIkW45DVmv7MmWLVvsiiuusNS2VFEQc/7556dqeSDWkOEAMpHUNl/NDFQ80KpVK3dRVpZB9TjUKmTAgAGumEHNaxWE6MKdWLZDFUhV2VL1L7xsguap2OKVV15xxS/KKKhIpUGDBknux+233+7qkSgQiKRMx+zZs93+nD59OtmWMTfddJP9+OOPrigke/bsLpAJDZ60fgUZ5cqVc49VN+Tyyy93y6Z2mPBbb7011csDsSZbwMuDIkm649Hdjcqbo1n22qNHD456nDrrrLPcRVB3zrHaL0NiQYYfVE9EmRUVbyjAyazq1q1rb731lsu8AFmNspmbN292dZRCiw7Tcn0k1AaQ5Yt2VDSjk2FmpeIUVUIl2EA8i81bKwBxJbSlSGakSqaJ9RUCxBMyHAAAwHcEHEAGUNUpbwKAzC4a5yqKVIAMoCagx44dc4OMqaKV+myINZF9YwDIusHG7t27XWsydeiXXgQcQAZQZ1Tfffed685bzStjsamkAioAsSFbtmyu5dmZ3BwRcAAZeEHWOCIah0N3Dcn1ppkVPfnkkxm9CwCiROeoM83EEnAAGUydX2mKNaFt9QEg9vK4AAAg08nQgGPu3LnWvHlz19ui0skaPTKUN7Jk6HTjjTeGLbNv3z5r27atq3inUR47duzoeh4MpSGh1cuf7rg0jsGQIUP+lvcHAAAyQcBx5MgRN9TzG2+8keQyCjA0eJM3ffDBB2HPK9jQgE8zZ8504xwoiOncuXNYt6sa66Fs2bK2bNkye/HFF61fv36ui2EAAPD3yNA6HBosSVNyVKFOtfgTs27dOjcg0pIlS6xmzZpunoa7btq0qQ0dOtRlTt577z03mqVGntRgUhoYasWKFfbSSy+FBSYAACCO63B8++23VrJkSatQoYIbi2Dv3r3B5zRYk4pRvGDDG9FRTQwXLVoUXKZevXou2PA0adLEDWe9f//+RLepCnzKjIROAAAgRgMOFaeMGzfOZs2aZS+88ILNmTPHZUQ0zLTs2LHDBSOhNPJmsWLF3HPeMhrHIJT32Fsm0uDBg93od96keh8AACBGm8Xeddddwf+rg6SqVata+fLlXdbDz8Gaevfubd27dw8+VoaDoAMAgBjNcEQqV66clShRwjZu3Ogeq27Hrl27wpY5deqUa7ni1fvQXw0NHcp7nFTdENUbUauX0AkAAMRJwLF9+3ZXh6N06dLuce3atd1YFGp94pk9e7brNrpWrVrBZdRy5eTJk8Fl1KJFdUKKFi2aAe8CAID4k6EBh/rLUIsRTbJ582b3/61bt7rn1DXywoULbcuWLa4exy233GIXX3yxq/QplSpVcvU8OnXq5LqInj9/vnXt2tUVxaiFitx9992uwqj651Dz2QkTJtgrr7wSVmQCAABiOOBYunSpXX755W4SBQH6f58+fVyf7eqwq0WLFnbppZe6gKFGjRpuwCsVeXjU7LVixYquToeaw9apUyesjw1V+pwxY4YLZvT6xx9/3K2fJrEAAMRJpdHrrrvODXublK+++irFdahFyvvvv5/sMqpsqkAFAABkjCxVhwMAAGRNBBwAAMB3BBwAAMB3BBwAAMB3BBwAAMB3BBwAACBzBhzLly+31atXBx9PnjzZWrZsaU899ZQbCh4AAOCMA44HH3zQfv75Z/f/TZs2uZ498+fPbxMnTrQePXqkZ5UAACCGpSvgULBRvXp1938FGfXq1XOdb40dO9Y++eSTaO8jAACIx4BDvYNqgDT5+uuvXZfioiHc9+zZE909BAAA8Rlw1KxZ0wYNGmTvvvuuzZkzx5o1a+bma7ySUqVKRXsfAQBAPAYcw4cPdxVHNTLr008/7UZwlY8//tiuueaaaO8jAACIx8HbNBhaaCsVz4svvuhGeQUAAIhKPxwHDhywUaNGWe/evW3fvn1u3tq1a23Xrl3pXSUAAIhR6cpwrFq1yho2bGhFihSxLVu2WKdOndww8ZMmTbKtW7fauHHjor+nAAAgvjIc3bt3t/vuu882bNhgefPmDc5Xa5W5c+dGc/8AAEC8BhxLlixxnX9FOvfcc23Hjh3R2C8AABDvAUeePHns0KFDiXYIdvbZZ0djvwAAQLwHHC1atLABAwbYyZMn3eNs2bK5uhs9e/a0Vq1aRXsfAQBAPAYcw4YNs8OHD1vJkiXt2LFjVr9+fdcXx1lnnWXPPfdc9PcSAADEXyuVwoUL28yZM23+/Pm2cuVKF3xcccUV1qhRo+jvIQAAiM+Aw3Pttde6CQAAIOpFKt26dbNXX301wfzXX3/dHn300fSsEgAAxLB0BRwagj6xzIbGUdF4KgAAAGcccOzdu9fV44hUqFAhhqcHAADRCTjUImX69OkJ5k+bNs3KlSuXnlUCAIAYljO9XZtraPrdu3dbgwYN3LxZs2a55rIauh4AAOCMA47777/fjh8/7vrcGDhwoJt34YUX2ogRI6xdu3bpWSUAAIhh6W4W26VLFzcpy5EvXz4rWLBgdPcMAADEjDPqh0MYOwUAAPhSaXTnzp127733WpkyZSxnzpyWI0eOsAkAAOCMMxwdOnRwg7U9++yzVrp0aTd4GwAAQFQDjnnz5tl3331n1atXT8/LAQBAnElXkcr5559vgUAg+nsDAABiUroCDvW10atXL9uyZUv09wgAAMScdBWptG7d2o4ePWrly5e3/PnzW65cucKe37dvX7T2DwAAxGvAQW+iAADA94Cjffv26XkZAACIU+mqwyG//PKLPfPMM9amTRvbtWtXcPC2NWvWRHP/AABAvAYcc+bMsSpVqtiiRYts0qRJdvjwYTd/5cqV1rdv32jvIwAAiMeAQy1UBg0aZDNnzrTcuXMH52vk2IULF0Zz/wAAQLwGHKtXr7Zbb701wfySJUvanj17orFfAAAg3gOOIkWK2O+//55g/g8//GDnnntuqtczd+5ca968uRuTRd2jf/bZZ2HPq3OxPn36uO7TNSJto0aNbMOGDQma4LZt29YKFSrk9qtjx47BIh7PqlWrrG7dupY3b17XadmQIUPS/J4BAMDfHHDcdddd1rNnT9uxY4cLFP766y+bP3++PfHEE9auXbtUr+fIkSNWrVo1e+ONNxJ9XoHBq6++aiNHjnT1RQoUKGBNmjSxP//8M7iMgg1VVFXxzpQpU1wQ07lz5+Dzhw4dssaNG1vZsmVt2bJl9uKLL1q/fv3srbfeSs9bBwAA6ZAtkI4+yk+cOGEPPfSQjR071k6fPu1GjNXfu+++281Lz4ixClw+/fRTa9mypXus3VLm4/HHH3eBjBw8eNBKlSrltqGgZ926dVa5cmVbsmSJ1axZ0y0zffp0a9q0qW3fvt29fsSIEfb000+74Mirb6I6KMqmrF+/PlX7pqClcOHCbvvKpERLjx49orYuILMhkwjEvkNpuD6mK8OhC/fbb7/tmsYqqzB+/Hh38X733XejNjz95s2bXZCgYhSP3lStWrVswYIF7rH+qhjFCzZEy2fPnt1lRLxl6tWrF1a5VVmSn376yfbv3x+VfQUAAD50/OW54IIL3OQHBRuijEYoPfae019VVA2lbEuxYsXClrnooosSrMN7rmjRogm2ffz4cTeFRnAAAOBvDjjuv//+ZJ8fPXq0ZWWDBw+2/v37Z/RuAMgAs2fP5rgjZjVo0CDDtp2uIhUVRYRO6mlUP1J1AnbgwIGo7Ng555zj/u7cuTNsvh57z+mv18up59SpU67lSugyia0jdBuRevfu7cqjvGnbtm1ReU8AAMSrdGU4VLkzklqqdOnSxY0gGw0qBlFAMGvWLKtevXqwaEN1M7QdqV27tgtw1PqkRo0abp4CH+2L6np4y6jS6MmTJ4Oj2qpFS4UKFRItTpE8efK4CQAAZPBYKglWlD27de/e3V5++eVUv0b9ZaxYscJNXkVR/X/r1q2u1cqjjz7qejT9/PPPXWdjanKrlideS5ZKlSrZjTfeaJ06dbLFixe7prldu3Z1LVi0nKjljCqMqn8ONZ+dMGGCvfLKK25fAQBAFqg0GkmtVlSkkVpLly6166+/PvjYCwI0Gq2avqrZqPrqUL8aymTUqVPHNXtVB16e9957zwUZDRs2dEFPq1atXN8doS1bZsyY4ZrxKgtSokQJ15lYaF8dAAAgEwYckdkB9ZmhnkenTp2apqHrr7vuOvfapCjLMWDAADclRS1S3n///WS3U7VqVfvuu+9SvV8AACATBBzqwjyUMgtnn322DRs2LMUWLAAAIP6kK+D45ptvor8nAAAgZkWt0igAAEBUMxyXX365q1+RGsuXL0/PJgAAQLwHHGqK+uabb7qB09TPhSxcuNA1O1UfGRpKHgAA4IwCjt27d1u3bt1s4MCBYfP79u3reuXM6l2bAwCATFCHY+LEia4Trkj33HOPffLJJ9HYLwAAEO8Bh4pM1KtnJM0L7ZQLAAAg3UUq6nJcdTVUIfSqq65y8zTGiYpSnn32WY4sAAA484CjV69eVq5cOTcmyfjx44PjmowZM8buvPPO9KwyLj2wa1hG7wLgoyEcXQBnPpaKAguCCwAA4GvHXxpMbdSoUfbUU0/Zvn373DwVsfz666/pXSUAAIhR6cpwrFq1yho1auRGYt2yZYs98MADbhC1SZMmuaHlx40bF/09BQAA8ZXh0GixHTp0sA0bNoS1SmnatKnNnTs3mvsHAADiNeBYsmSJPfjggwnmn3vuubZjx45o7BcAAIj3gCNPnjx26NChBPN//vlnN0w9AADAGQccLVq0sAEDBtjJkyfdYw3kprobPXv2tFatWqVnlQAAIIalK+AYNmyYHT582EqWLGnHjh2z+vXrW/ny5a1gwYL23HPPRX8vAQBA/LVSUeuUmTNn2rx581yLFQUfNWrUsIYNG0Z/D2NYg/pPZvQuAL7ZzrEFkN4Mx4IFC2zKlCnBx3Xq1LECBQq4oerbtGljnTt3tuPHj6dllQAAIA6kKeBQvY01a9YEH69evdo6depkN9xwg+vu/IsvvrDBgwf7sZ8AACBeAo4VK1aEFZt8+OGHbvC2t99+2/XN8eqrr9pHH33kx34CAIB4CTj2799vpUqVCj6eM2eO3XTTTcHHV155pW3bti26ewgAAOIr4FCwsXnzZvf/EydOuLFTrr766uDzf/zxh+XKlSv6ewkAAOIn4FDX5aqr8d1331nv3r0tf/78Vrdu3eDzarGi5rEAAADpbhY7cOBAu+2221y/G+pz45133rHcuXMHnx89erQ1btw4LasEAABxIE0BR4kSJdzgbAcPHnQBR44cOcKenzhxopsPAAAQlY6/EqMh6gEAAKLStTkAAEBaEHAAAADfEXAAAADfEXAAAADfEXAAAADfEXAAAADfEXAAAADfEXAAAADfEXAAAADfEXAAAADfEXAAAADfEXAAAADfEXAAAADfEXAAAID4Djj69etn2bJlC5sqVqwYfP7PP/+0hx56yIoXL24FCxa0Vq1a2c6dO8PWsXXrVmvWrJnlz5/fSpYsaU8++aSdOnUqA94NAADxK6dlcpdddpl9/fXXwcc5c/7fLj/22GM2depUmzhxohUuXNi6du1qt912m82fP989f/r0aRdsnHPOOfb999/b77//bu3atbNcuXLZ888/nyHvBwCAeJTpAw4FGAoYIh08eND++9//2vvvv28NGjRw88aMGWOVKlWyhQsX2tVXX20zZsywtWvXuoClVKlSVr16dRs4cKD17NnTZU9y586dAe8IAID4k6mLVGTDhg1WpkwZK1eunLVt29YVkciyZcvs5MmT1qhRo+CyKm654IILbMGCBe6x/lapUsUFG54mTZrYoUOHbM2aNRnwbgAAiE+ZOsNRq1YtGzt2rFWoUMEVh/Tv39/q1q1rP/74o+3YscNlKIoUKRL2GgUXek70NzTY8J73nkvK8ePH3eRRgAIAAGI04LjpppuC/69ataoLQMqWLWsfffSR5cuXz7ftDh482AU3AAAgTopUQimbcemll9rGjRtdvY4TJ07YgQMHwpZRKxWvzof+RrZa8R4nVi/E07t3b1dHxJu2bdvmy/sBACBeZKmA4/Dhw/bLL79Y6dKlrUaNGq61yaxZs4LP//TTT66OR+3atd1j/V29erXt2rUruMzMmTOtUKFCVrly5SS3kydPHrdM6AQAAGK0SOWJJ56w5s2bu2KU3377zfr27Ws5cuSwNm3auGawHTt2tO7du1uxYsVcUPDwww+7IEMtVKRx48YusLj33nttyJAhrt7GM8884/ruUFABAAD+Hpk64Ni+fbsLLvbu3Wtnn3221alTxzV51f/l5ZdftuzZs7sOv1TJUy1Q3nzzzeDrFZxMmTLFunTp4gKRAgUKWPv27W3AgAEZ+K4AAIg/2QKBQCCjdyKzUysVZVRUnyOaxSvnjekVtXUBmc32+/5tWdHs2bMzehcA33j9VmXE9TFL1eEAAABZEwEHAADwHQEHAADwHQEHAADwHQEHAADwHQEHAADwHQEHAADwHQEHAADwHQEHAADwHQEHAADwHQEHAADwHQEHAADwHQEHAADwHQEHAADwHQEHAADwHQEHAADwHQEHAADwHQEHAADwHQEHAADwHQEHAADwHQEHAADwHQEHAADwHQEHAADwHQEHAADwHQEHAADwHQEHAADwHQEHAADwHQEHAADwHQEHAADwHQEHAADwHQEHAADwHQEHAADwHQEHAADwXU7/NwEAWce/px/L6F0AfNOggWUYMhwAAMB3BBwAAMB3BBwAAMB3BBwAAMB3BBwAAMB3BBwAAMB3BBwAAMB3BBwAAMB3BBwAAMB3cRVwvPHGG3bhhRda3rx5rVatWrZ48eKM3iUAAOJC3AQcEyZMsO7du1vfvn1t+fLlVq1aNWvSpInt2rUro3cNAICYFzcBx0svvWSdOnWy++67zypXrmwjR460/Pnz2+jRozN61wAAiHlxMXjbiRMnbNmyZda7d+/gvOzZs1ujRo1swYIFCZY/fvy4mzwHDx50fw8dOhTV/frr2P9tA4g10f69/F1OHT+a0bsAZJnfpbe+QCCQ4rJxEXDs2bPHTp8+baVKlQqbr8fr169PsPzgwYOtf//+Ceaff/75vu4nEEsKPzQ8o3cBQITCr5ov/vjjDytcuHCyy8RFwJFWyoSovofnr7/+sn379lnx4sUtW7ZsGbpvSH8UroBx27ZtVqhQIQ4jkAnwu8z6lNlQsFGmTJkUl42LgKNEiRKWI0cO27lzZ9h8PT7nnHMSLJ8nTx43hSpSpIjv+wn/Kdgg4AAyF36XWVtKmY24qjSaO3duq1Gjhs2aNSssa6HHtWvXztB9AwAgHsRFhkNURNK+fXurWbOmXXXVVTZ8+HA7cuSIa7UCAAD8FTcBR+vWrW337t3Wp08f27Fjh1WvXt2mT5+eoCIpYpOKyNQHS2RRGYCMw+8yvmQLpKYtCwAAwBmIizocAAAgYxFwAAAA3xFwAAAA3xFwAACiQqNxqwUgkBgCDmQ6HTp0sJYtW4bN+/jjjy1v3rw2bNiwDNsvIFZdd9119uijjyaYP3bsWDo9RNTETbNYZF2jRo2yhx56yI3wS78pAJA1keFApjZkyBB7+OGH7cMPPwwGG7ob69atm/Xo0cOKFSvmuqfv169f2Ou2bt1qt9xyixUsWNB1m3znnXcGu7bX6L/q6n7p0qXBXme1nquvvjr4+vHjxwcH69uyZYsbQ2fSpEl2/fXXW/78+a1atWqJjjQMxHrmcejQoVa6dGk3tpRuBE6ePJnszYKGhfB6eea3G98IOJBp9ezZ0wYOHGhTpkyxW2+9Ney5d955xwoUKGCLFi1yQcmAAQNs5syZwQBCwYYG3JszZ46bv2nTJtf5m9fvvzp++/bbb93j1atXu4Dihx9+sMOHD7t5el39+vXDtvn000/bE088YStWrLBLL73U2rRpY6dOnfqbjgaQ8b755hv75Zdf3F/9BlXkoikx+l326tXLZsyYYQ0bNgzO57cbx9TxF5CZtG/fPpA7d251SBeYNWtWgufr168fqFOnTti8K6+8MtCzZ0/3/xkzZgRy5MgR2Lp1a/D5NWvWuPUtXrzYPe7evXugWbNm7v/Dhw8PtG7dOlCtWrXAtGnT3LyLL7448NZbb7n/b9682b121KhRCda3bt06X44B8HfSb+qRRx5JMH/MmDGBwoULB3+XZcuWDZw6dSr4/B133OF+Ox49//LLLwd69OgRKF26dODHH39MsB1+u/GLDAcypapVq7oa7+qO3Ms6RD4fSineXbt2uf+vW7fOFYd4RSJSuXJll9rVc6Lsxbx58+z06dMum6FUryZlPX777TfbuHGje5zUNrU98bYJxIPLLrvMFUcm9rvzqGL322+/7X5fWj4Sv934RcCBTOncc891F/9ff/3VbrzxRvvjjz/Cns+VK1fYYxWJqCglterVq+fWuXz5cps7d25YwKEApEyZMnbJJZckuU1tT9KyTSCzUj0n1W2KdODAgbChx1Pzu6tbt64L5D/66KNEt8VvN34RcCDTKlu2rLv4a7C9xIKOpFSqVMm2bdvmJs/atWvdyVOZDlG2Q3dar7/+ujsBVqxY0QUhqsehOiOR9TeAWFahQgUXfEfSPNVXSguNxj1t2jR7/vnnXQXTtOC3G9sIOJCpqVhEWQelbZs0aWKHDh1K8TWNGjWyKlWqWNu2bd0Jc/HixdauXTsXRNSsWTO4nDIa7733XjC4UEsVnfAmTJhAwIG40qVLF/v5559d669Vq1bZTz/9ZC+99JJ98MEH9vjjj6d5fddcc419+eWX1r9//zR1BMZvN7YRcCDTO++881zQsWfPnlQFHUrzTp482YoWLeqyFjqJlStXzgUSoRRoKPUbWldD/4+cB8Q6/T5UtLh+/Xr3e6lVq5YrEpk4caLLLqZHnTp1bOrUqfbMM8/Ya6+9lqrX8NuNbQxPDwAAfEeGAwAA+I6AAwAA+I6AAwAA+I6AAwAA+I6AAwAA+I6AAwAA+I6AAwAA+I6AAwAA+I6AAwAA+I6AAwAA+I6AAwAA+I6AAwAAmN/+Hz+Gj2NEsFB/AAAAAElFTkSuQmCC", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "conf_cutoff, score_cutoff, delta_cutoff = CONF_THRESHOLD, None, None\n", "\n", "edf = pd.read_csv(results_path)\n", "edf.columns = edf.columns.str.strip().str.capitalize()\n", "edf[\"Accession\"] = edf[\"Sequence\"].str.split(\"|\").str[1].fillna(edf[\"Sequence\"])\n", "ann_gt = pd.read_csv(\n", " DEMO_ROOT / \"annotations\" / \"TIGRFAMs_annotation.ann\",\n", " sep=\"\\t\",\n", ").rename(columns={\"id\": \"Accession\", \"family\": \"Truefamily\"})\n", "edf = edf.merge(ann_gt, on=\"Accession\", how=\"left\")\n", "\n", "kept = pd.Series(True, index=edf.index)\n", "for col, cut in [(\"Confidence\", conf_cutoff), (\"Score\", score_cutoff), (\"Delta\", delta_cutoff)]:\n", " if cut is not None:\n", " kept &= edf[col] >= cut\n", "\n", "known = edf[\"Truefamily\"].notna()\n", "corr = known & (edf[\"Prediction\"] == edf[\"Truefamily\"])\n", "counts = {\n", " \"True Positive\": int((kept & corr).sum()),\n", " \"False Positive\": int((kept & known & ~corr).sum()),\n", " \"Filtered (Known)\": int((~kept & known).sum()),\n", " \"Predicted (Unknown)\": int((~known & kept).sum()),\n", " \"Filtered (Unknown)\": int((~known & ~kept).sum()),\n", "}\n", "TP, FP, FK = counts[\"True Positive\"], counts[\"False Positive\"], counts[\"Filtered (Known)\"]\n", "prec = TP / (TP + FP) if TP + FP else float(\"nan\")\n", "rec = TP / (TP + FK) if TP + FK else float(\"nan\")\n", "print(f\"Kept: {kept.sum()}/{len(edf)} | TP:{TP} FP:{FP} FK:{FK} | Precision:{prec:.3f} Recall:{rec:.3f}\")\n", "\n", "colors = {\n", " \"True Positive\": \"#1b9e77\",\n", " \"False Positive\": \"#d95f02\",\n", " \"Filtered (Known)\": \"#757575\",\n", " \"Predicted (Unknown)\": \"#4575b4\",\n", " \"Filtered (Unknown)\": \"#bdbdbd\",\n", "}\n", "fig, ax = plt.subplots(figsize=(5.5, 3))\n", "for group, keys in [(\"Known\", [\"True Positive\", \"False Positive\", \"Filtered (Known)\"]),\n", " (\"Unknown\", [\"Predicted (Unknown)\", \"Filtered (Unknown)\"])]:\n", " bot = 0\n", " for k in keys:\n", " ax.bar(group, counts[k], bottom=bot, color=colors[k], label=k)\n", " bot += counts[k]\n", "ax.set(title=\"Annotation prediction results\", ylabel=\"Sequences\")\n", "h, l = ax.get_legend_handles_labels()\n", "ax.legend(dict(zip(l, h)).values(), dict(zip(l, h)).keys(), ncol=2, fontsize=8)\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "c5d6e7f8", "metadata": {}, "source": "## Interactive mode (wizard)\n\nIf you omit `--train`, `--query`, or the annotation flag, `easy` enters an interactive wizard that prompts for each missing input:\n\n```\n$ snekmer easy\n\n=== Snekmer easy ===\n\nStep 1 Training sequences (file or directory path): /path/to/train/\n\nStep 2 Query sequences (file or directory path): /path/to/query.fasta\n Found 10 training file(s), 1 query file(s).\n\nStep 3 How are your training sequences annotated?\n\n [1] Family labels are embedded in FASTA headers (between | | characters)\n Example: >db|TIGR04183|seqid Description text\n ^^^^^^^^\n this field becomes the family label\n (equivalent to passing --create-ann)\n\n [2] I have a separate annotation file (.ann)\n Format: tab-separated with columns: id family\n (equivalent to passing --ann )\n\n Choice [1]: 2\n\n Path to annotation file (.ann): /path/to/annotations.ann\n\nStep 4 Output directory [snekmer_easy_output]: my_results\n```\n\nYou can mix flags and wizard \u2014 for example, provide `--train` and `--ann` but omit `--query`, and only the query prompt will appear." }, { "cell_type": "markdown", "id": "d6e7f8a9", "metadata": {}, "source": [ "## Key parameters\n", "\n", "Run `snekmer easy --help` to see all options. The most commonly adjusted:\n", "\n", "| Flag | Default | Description |\n", "|---|---|---|\n", "| `--k` | `8` | K-mer length |\n", "| `--alphabet` | `2` (solvacc) | Amino acid reduction alphabet (0\u20135 or name, see `--help`) |\n", "| `--selection` | `top_hit` | Annotation selection method: `top_hit`, `greatest_distance`, `combined_distance` |\n", "| `--threshold` | `Median` | Score threshold column from family stats: `Median`, `Mean`, `90th Percentile`, `None` |\n", "| `--apply-output` | `snekmer_results.csv` | Output filename for the results CSV |\n", "| `--cores` | all CPUs | Number of CPU cores to use |\n", "| `--dry-run` | \u2014 | Show the pipeline steps without running them |\n", "\n", "For advanced use (adding to an existing model, fragmentation, etc.), use `snekmer learn` and `snekmer apply` directly with a `config.yaml`." ] }, { "cell_type": "code", "execution_count": 9, "id": "e7f8a9b0", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "usage: snekmer easy-learn-apply [options]\n", "\n", "Guided front-end that runs learn then apply end-to-end.\n", "\n", "Prompts for training sequences, query sequences, and annotation style,\n", "then builds a self-contained workspace and runs both pipeline steps.\n", "All prompts can be skipped by supplying the corresponding flags.\n", "\n", "options:\n", " -h, --help show this help message and exit\n", "\n", "Input / output:\n", " --train PATH Path to training sequences (FASTA file or directory of\n", " FASTA files). If omitted, the wizard will prompt for\n", " it. (default: None)\n", " --query PATH Path to query sequences to annotate (FASTA file or\n", " directory). If omitted, the wizard will prompt for it.\n", " (default: None)\n", " --output DIR Output directory for the workspace. If omitted, the\n", " wizard will prompt. (default: None)\n", "\n", "Annotation (choose one):\n", " --ann PATH Path to an existing annotation file (.ann). Format:\n", " tab-separated with columns 'id' and 'family'.\n", " (default: None)\n", " --create-ann Generate annotations from training FASTA headers.\n", " Requires headers in the format: >db|FAMILY_LABEL|seqid\n", " description (the field between the first pair of | |\n", " becomes the family label). (default: False)\n", "\n", "K-mer parameters:\n", " --k N K-mer length. (default: 8)\n", " --alphabet Reduced alphabet encoding (0\u20135, alphabet name, or\n", " 'None'). 2 = solvacc (3-letter). See alphabets list\n", " below. (default: 2)\n", "\n", "Learn / apply options:\n", " --selection Annotation selection method {top_hit,\n", " greatest_distance, combined_distance}. (default:\n", " top_hit)\n", " --threshold Family-specific score threshold for prediction\n", " filtering. Options: 'Median', 'Mean', '90th\n", " Percentile', 'None'. (default: Median)\n", " --apply-output FILENAME\n", " Output filename for apply results. (default:\n", " snekmer_results.csv)\n", "\n", "Snakemake options:\n", " --cores N, -c N CPU cores to use. (default: 10)\n", " --dry-run, -n Show what would be done without executing. (default:\n", " False)\n", " --verbose Show additional Snakemake debug output. (default:\n", " False)\n", " --quiet [{progress,rules,all} ...], -q [{progress,rules,all} ...]\n", " Reduce Snakemake output. (default: None)\n", "\n", "Miscellaneous:\n", " --copy-files Copy input files into the workspace instead of\n", " symlinking them (useful when the workspace will be\n", " moved or shared). (default: False)\n", "\n", "Alphabets (k-mer recoding):\n", " 0: hydro (size 2) \u2014 2-value hydrophobicity alphabet\n", " 1: standard (size 7) \u2014 \u201cStandard\u201d reduction alphabet\n", " 2: solvacc (size 3) \u2014 Solvent accessibility alphabet\n", " 3: hydrocharge (size 3) \u2014 2-value hydrophobicity with charged residues as a third category\n", " 4: hydrostruct (size 3) \u2014 2-value hydrophobicity with structural-breakers as a third category\n", " 5: miqs (size 10) \u2014 MIQS alphabet3\n", " None: None (size 20) \u2014 No reduced alphabet\n", "\n", "You may pass either an integer (0\u20135) or the alphabet name (e.g. 'hydro'), or 'None'.\n" ] } ], "source": [ "!snekmer easy --help" ] } ], "metadata": { "kernelspec": { "display_name": "snekmer", "language": "python", "name": "snekmer" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.13" } }, "nbformat": 4, "nbformat_minor": 5 }