Developing a feature rich Natural Language Processor
- Using just jq, Nix & Bash

I've always been drawn to voice commands. Maybe it's because of the extended use cases that come with being blind - you can imagine fumbling around for the TV remote when not being able to see? That sucks! Now I can simply tell the TV to start. Or if I prefer, simply ask where the remote is and the Nvidia Shield Remote plays a sound. Maybe it's just the appeal of being able to multitask with a third hand that doesn't exist. Either way, the idea of speaking to my computer and having it do things has always fascinated me. Voice as an interface feels natural, powerful... and deeply underutilized.

# πŸ¦† says β–Ά I use 20x magnification when I code and debug. I use emoji to simplify logs for myself. If you can't handle my code style you can disable most of it on this website by toggling the button in the navbar. Shall duck continue?

Let's rewind a bit. I've always had a soft spot for doing things the stupid way. Not "stupid" as in broken, but "stupid" as in... unorthodox, unnecessary, and definitely not optimized for scale. The kind of stupid that makes you ask: "Wait, you did what in Bash?!"

The Genesis: Bash Meets Nix

Early in this project, no one - not even I - could give a good reason for doing it in Bash. There were no benchmarks to beat, no ecosystem gaps to fill, and definitely no best practices to follow. Maybe it's still a stupid idea. But I kept coming back to one thing: Bash has flexibility.
You can bend it in ways most people can't imagine.

# πŸ¦† says β–Ά There is no spoon...

Meanwhile, tools like Home Assistant's Assist were starting to bore me. They worked, sure - but I constantly felt locked in, boxed by high-level abstractions and limited customization. I didn't want a voice assistant that just reacts - I wanted one that obeys.
While being an active contributor within the community, I multiple times got or noticed people getting responses from the dev's "That's not possible." and "No, you can't do that". Which was my breaking point to leave and go my own way.

That's where Nix came in. Going the Nix way - what could be more fun?
Using Nix's declarative configuration model, I realized I could define a CLI interface that held all my custom scripts and logic - the glue that tied intent to action. I could describe sentences declaratively and map them directly to Bash scripts.

No overhead. No dependencies. No noise. Just me, a microphone, and a shell.

I Did Not Mean for It to Get This Complex

What started as a small utility to let me run shell commands by typing natural language sentences quickly spiraled into a full-blown NLP system. With Bash. And Nix. And thousands of dynamically generated regular expressions.

This blog post walks you through the actual code behind a voice-command-esque CLI script framework I built using Nix to organize, parse, cache, and test structured sentence intents to build and execute Shell commands.

πŸ¦†πŸ  HOME via 🐍 v3.12.10
01:41:26 ❯ yo do "please stop test auto 5 yes wild card testing here"
β”Œβ”€(yo-demo)
β”‚πŸ¦†
└─▢ --action stop
└─▢ --target test
└─▢ --mode auto
└─▢ --duration 5
└─▢ --confirm yes
└─▢ --wild wild card testing here
[πŸ¦†πŸ“œ] [01:41:31] ⁉️DEBUG⁉️ β–Ά +0.007 s Script Started!
[πŸ¦†πŸ“œ] [01:41:31] ⁉️DEBUG⁉️ β–Ά +0.013 s SCript Executed!
[πŸ¦†πŸ“œ] [01:41:31] ⁉️DEBUG⁉️ β–Ά +0.020 s Script finished!

Nix Configuration: Declaring Intents

Nix Configuration
  yo = {
    scripts = {
      demo = {
        parameters = [
          { name = "action"; description = "Action to perform"; default = "run"; }
          { name = "target"; description = "Target of the action"; }
          { name = "intensity"; description = "Level of intensity"; type = "int"; optional = true; }
          { name = "duration"; description = "Duration in seconds"; type = "int"; optional = true; }
          { name = "mode"; description = "Mode of operation"; optional = true; }
          { name = "confirm"; description = "Whether to confirm the action"; optional = true; }
          { name = "wild"; description = "Wildcard testing"; optional = true; }     
        ];
        code = ''
          echo "Script executed!"
        '';
      };
    };
    do = {
      intents = {
        demo = {
          data = [{
            sentences = [
              "please {action} {target} {mode} {duration} {confirm} {wild}"
            ];
            lists = {
              action.values = [
                { "in" = "[run|execute|start]"; out = "RUN"; }
                { "in" = "[stop|terminate|halt]"; out = "STOP"; }
              ];
              target.values = [
                { "in" = "[process|program|sequence]"; out = "PROCESS"; }
                { "in" = "[test|check|validation]"; out = "TEST"; }
              ];
              mode.values = [
                { "in" = "[auto|automatic]"; out = "AUTO"; }
                { "in" = "[manual|manually]"; out = "MANUAL"; }
              ];
              confirm.values = [
                { "in" = "[yes|confirm|sure]"; out = "YES"; }
                { "in" = "[no|cancel|decline]"; out = "NO"; }
              ];
              intensity.values = builtins.genList (i: {
                "in" = toString (i + 1);
                out = toString (i + 1);
                }) 10;
              duration.values = builtins.genList (i: {
                "in" = toString ((i + 1) * 10);
                out = toString ((i + 1) * 10);
              }) 6;
              wild.wildcard = true;
            };
          }];
        };  
      };
    };      
  };
  
};

Early on, I realized Nix could give me a ton of structure and organization. Since I have been running NixOS system wide for some years now, I have grown comfortable writing Nix expressions. Nix will let user define software configurations declaratively. I'm abusing it for voice logic. I wanted to declaratively define scripts and bind natural language intents to each of them - and Nix is great at producing structured derivations.

Each script gets a name, description, category, and as many parameter options as prefered., Add a set of intent sentences to that.
These are eventually transformed into regular expressions, cached, and tested. Lists can optionally be defined mapping in words to out words. Or set the list value as a wildcard if input for the parameter should be a wildcard and accept any input.

From Sentences to Shell Scripts

The core idea was: take a sentence from the user, figure out what script it maps to, and run it with the correct arguments. Sounds simple? I wouldn't really call it easy, but okay.

# πŸ¦† says β–Ά Logical first steps

Integrating defined intents (Nix)

let
  scripts = config.yo.scripts; # πŸ¦† says β–Ά import all dem scripts
  scriptNames = builtins.attrNames scripts; # πŸ¦† says β–Ά names can be useful too
  scriptNamesWithIntents = builtins.filter (scriptName:
    builtins.hasAttr scriptName config.yo.do.intents
  ) scriptNames; # πŸ¦† says β–Ά scripts with no sentences - skippin' dem yo
  

Intent Parsing + Entity Resolution

To match user input, I needed to preprocess the input sentence, resolve any custom entities, and match it against a giant list of potential phrases. Oh yeah, and some of the words in those phrases were dynamic parameters.

Bash + Nix (Pattern Matching)

  # πŸ¦† says β–Ά where da magic dynamic regex iz at 
  makePatternMatcher = scriptName: let
    dataList = config.yo.do.intents.${scriptName}.data;    
  in '' # πŸ¦† says β–Ά diz iz how i pick da script u want 
    match_${scriptName}() { # πŸ¦† says β–Ά shushin' da caps – lowercase life 4 cleaner dyn regex zen ✨
      local input="$(echo "$1" | tr '[:upper:]' '[:lower:]')" 
      # πŸ¦† says β–Ά always show input in debug mode
      # πŸ¦† says β–Ά watch the fancy stuff live in action  
      dt_debug "Trying to match for script: ${scriptName}" >&2
      dt_debug "Input: $input" >&2
      # πŸ¦† says β–Ά duck presentin' - da madnezz 
      ${lib.concatMapStrings (data:
        lib.concatMapStrings (sentence:
          lib.concatMapStrings (sentenceText: let
            # πŸ¦† says β–Ά now sentenceText is one of the expanded variants!
            parts = lib.splitString "{" sentenceText; # πŸ¦† says β–Ά diggin' out da goodies from curly nests! Gimme dem {param} nuggets! 
            firstPart = lib.escapeRegex (lib.elemAt parts 0); # πŸ¦† says β–Ά gotta escape them weird chars 
            restParts = lib.drop 1 parts;  # πŸ¦† says β–Ά now we in the variable zone quack?  
            # πŸ¦† says β–Ά process each part to build regex and params
            regexParts = lib.imap (i: part:
              let
                split = lib.splitString "}" part; # πŸ¦† says β–Ά yeah yeah curly close that syntax shell
                param = lib.elemAt split 0; # πŸ¦† says β–Ά name of the param in da curly – ex: {user}
                after = lib.concatStrings (lib.tail split); # πŸ¦† says β–Ά anything after the param in this chunk
                # πŸ¦† says β–Ά Wildcard mode! anything goes - duck catches ALL the worms! (.*)
                isWildcard = data.lists.${param}.wildcard or false;
                regexGroup = if isWildcard then "(.*)" else "\\b([^ ]+)\\b";       
                # πŸ¦† says β–Ά ^ da regex that gon match actual input text
              in {
                regex = regexGroup + lib.escapeRegex after;
                param = param;
              }
            ) restParts;

            fullRegex = let
              clean = lib.strings.trim (firstPart + lib.concatStrings (map (v: v.regex) regexParts));
            in "^${clean}$"; # πŸ¦† says β–Ά mash all regex bits 2gether
            paramList = map (v: v.param) regexParts; # πŸ¦† says β–Ά the squad of parameters 
          in ''
            local regex='^${fullRegex}$'
            dt_debug "REGEX: $regex"
            if [[ "$input" =~ $regex ]]; then  # πŸ¦† says β–Ά DANG DANG – regex match engaged 
              ${lib.concatImapStrings (i: paramName: ''
                # πŸ¦† says β–Ά extract match group #i+1 – param value, come here plz 
                param_value="''${BASH_REMATCH[${toString (i+1)}]}"
                # πŸ¦† says β–Ά if param got synonym, apply the duckfilter 
                if [[ -n "''${param_value:-}" && -v substitutions["$param_value"] ]]; then
                  subbed="''${substitutions["$param_value"]}"
                  if [[ -n "$subbed" ]]; then
                    param_value="$subbed"
                  fi
                fi           
                ${lib.optionalString (
                  data.lists ? ${paramName} && !(data.lists.${paramName}.wildcard or false)
                ) ''
                  # πŸ¦† says β–Ά apply substitutions before case matchin'
                  if [[ -v substitutions["$param_value"] ]]; then
                    param_value="''${substitutions["$param_value"]}"
                  fi
                  case "$param_value" in
                    ${makeEntityResolver data paramName}
                    *) ;;
                  esac
                ''} # πŸ¦† says β–Ά declare global param – duck want it everywhere! (for bash access)
                declare -g "_param_${paramName}"="$param_value"            
                declare -A params=()
                params["${paramName}"]="$param_value"
                matched_params+=("$paramName")
              '') paramList} # πŸ¦† says β–Ά set dat param as a GLOBAL VAR yo! every duck gotta know 
              # πŸ¦† says β–Ά build cmd args: --param valu
              cmd_args=()
              ${lib.concatImapStrings (i: paramName: ''
                value="''${BASH_REMATCH[${toString i}]}"
                cmd_args+=(--${paramName} "$value")
              '') paramList}
              dt_debug "REMATCH 1: ''${BASH_REMATCH[1]}"
              dt_debug "REMATCH 2: ''${BASH_REMATCH[2]}"
              dt_debug "MATCHED SCRIPT: ${scriptName}"
              dt_debug "ARGS: ''${cmd_args[@]}"
              return 0
            fi
          '') (expandOptionalWords sentence)
        ) data.sentences
      ) dataList}
      return 1
    }
  ''; # πŸ¦† says β–Ά dat was fun! let'z do it again some time

# πŸ¦† says β–Ά Funny side note, before this project, regular expressions was diz duck's absolute worst nightmare. These days diz duck quacktually don't mind it dat much.

The Performance Bomb

As I implemented more features, performance became explosive. I wanted my sentence definitions to have [optional|words] and (required|one|of|these|words) patterns. This led to combinatorial explosion.

Nix (Cartesian Product)

  cartesianProductOfLists = lists:
    # πŸ¦† says β–Ά if da listz iz empty .. 
    if lists == [] then
      [ [] ] # πŸ¦† says β–Ά .. i gib u empty listz of listz yo got it?
    else # πŸ¦† says β–Ά ELSE WAT?!
      let # πŸ¦† says β–Ά sorry.. i gib u first list here u go yo
        head = builtins.head lists;
        # πŸ¦† says β–Ά remaining listz for u here u go bro!
        tail = builtins.tail lists;
        # πŸ¦† says β–Ά calculate combinations for my tail - yo calc wher u at?!
        tailProduct = cartesianProductOfLists tail;
      in # πŸ¦† says β–Ά for everyy x in da listz ..
        lib.concatMap (x:
          # πŸ¦† says β–Ά .. letz combinez wit every tail combinationz ..  
          map (y: [x] ++ y) tailProduct
        ) head; # πŸ¦† says β–Ά dang! datz a DUCK COMBO alright!  
         
  # πŸ¦† says β–Ά here i duckie help yo out! makin' yo life eazy sleazy' wen declarative sentence yo typin'    
  expandOptionalWords = sentence: # πŸ¦† says β–Ά qucik & simple sentences we quacky & hacky expandin'
    let # πŸ¦† says β–Ά CHOP CHOP! Rest in lil' Pieceez bigg sentence!!1     
      tokens = lib.splitString " " sentence;      
      # πŸ¦† says β–Ά definin' dem wordz in da (braces) taggin' dem' wordz az (ALTERNATIVES) lettin' u choose one of dem wen triggerin' 
      isRequiredGroup = t: lib.hasPrefix "(" t && lib.hasSuffix ")" t;
      # πŸ¦† says β–Ά puttin' sentence wordz in da [bracket] makin' em' [OPTIONAL] when doin' u don't have to be pickin' woooho 
      isOptionalGroup = t: lib.hasPrefix "[" t && lib.hasSuffix "]" t;   
      expandToken = token: # πŸ¦† says β–Ά dis gets all da real wordz out of one token (yo!)
        if isRequiredGroup token then
          let # πŸ¦† says β–Ά thnx 4 lettin' ducklin' be cleanin' - i'll be removin' dem "()" 
            clean = lib.removePrefix "(" (lib.removeSuffix ")" token);
            alternatives = lib.splitString "|" clean; # πŸ¦† says β–Ά use "|" to split (alternative|wordz) yo 
          in  # πŸ¦† says β–Ά dat's dat 4 dem alternativez
            alternatives
        else if isOptionalGroup token then
          let # πŸ¦† says β–Ά here we be goin' again - u dirty and i'll be cleanin' dem "[]"
            clean = lib.removePrefix "[" (lib.removeSuffix "]" token);
            alternatives = lib.splitString "|" clean; # πŸ¦† says β–Ά i'll be stealin' dat "|" from u 
          in # πŸ¦† says β–Ά u know wat? optional means we include blank too!
            alternatives ++ [ "" ]
        else # πŸ¦† says β–Ά else i be returnin' raw token for yo
          [ token ];      
      # πŸ¦† says β–Ά now i gib u generatin' all dem combinationz yo
      expanded = cartesianProductOfLists (map expandToken tokens);      
      # πŸ¦† says β–Ά clean up if too much space, smush back into stringz for ya
      trimmedVariants = map (tokenList:
        let # πŸ¦† says β–Ά join with spaces then trim them suckers
          raw = lib.concatStringsSep " " tokenList;
          # πŸ¦† says β–Ά remove ALL extra spaces
          cleaned = lib.replaceStrings ["  "] [" "] (lib.strings.trim raw);
        in # πŸ¦† says β–Ά wow now they be shinin'
          cleaned 
      ) expanded; # πŸ¦† says β–Ά and they be multiplyyin'!      
      # πŸ¦† says β–Ά throwin' out da empty and cursed ones yo
      nonEmpty = lib.filter (s: s != "") trimmedVariants;
      hasFixedText = v: builtins.match ".*[^\\{].*" v != null; # πŸ¦† says β–Ά no no no, no nullin'
      validVariants = lib.filter hasFixedText nonEmpty;
    in # πŸ¦† says β–Ά returnin' all unique variantz of da sentences – holy duck dat'z fresh 
      lib.unique validVariants;

This is where the bomb went off. The module went from being instant to extremely slow.
Even with a very detailed debugging system in place, when you are generating code with this extensiveness, finding the problematic line can take time, later i found that it was caused by a single intent definition, my timer intent....

I was defining this intent in the most genius way, something similar to:

Timer intent definition gone wrong

sentences = [
  "(create|set|start|launch) [a] timer [for] {hours} (hour|hours) {minutes} (minute|minutes) {seconds} (second|seconds)"
  "(create|set|start|launch) [a] timer [for] {minutes} (minute|minutes) [and] {seconds} (second|seconds)"
  "(create|set|start|launch) [a] timer [for] {minutes} (minute|minutes)"
  "(create|set|start|launch) [a] timer [for] {seconds} seconds"
];
lists = {
  hours.values = lib.genList (n: {
    "in" = lib.concatStringsSep "|" [
      (toString n)
      ("kl " + toString n)
      (toString n + "h")
      (builtins.elemAt numberWords n)
    ];
    out = toString n;
  }) 24;
  
  minutes.values = lib.genList (n: {
    "in" = lib.concatStringsSep "|" [
      (toString n)
      (toString n + "m")
      ("minut " + toString n)
      (builtins.elemAt numberWords n)
    ];
    out = toString n;
  }) 60;

  seconds.values = lib.genList (n: {
    "in" = lib.concatStringsSep "|" [
      (toString n)
      (toString n + "s")
      ("sekund " + toString n)
      (builtins.elemAt numberWords n)
    ];
    out = toString n;
  }) 60;

To better paint the picture of why this was a bad idea, I'll do the math

Combinational explosion:

Yes β€” over 715 million possible combinations.
Insane? Definitely.
Useful? Ehh.. I don't think so?

But I still think this example shows how powerful this can be if you'd decide to go that crazy route.
This is where I decided to define with caution, and also where I implemented the next performance feature, the priority system.
Basic functionallity that let user define order of which the intents are processed for the regex patterns.

Priority System for pattern matching

  # πŸ¦† says β–Ά priority system 4 runtime optimization
  scriptRecordsWithIntents = 
    let # πŸ¦† says β–Ά calculate priority
      calculatePriority = scriptName:
        config.yo.do.intents.${scriptName}.priority or 3; # πŸ¦† says β–Ά default medium
      # πŸ¦† says β–Ά create script records metadata
      makeRecord = scriptName: rec {
        name = scriptName;
        priority = calculatePriority scriptName;
        hasComplexPatterns = 
          let 
            intent = config.yo.do.intents.${scriptName};
            patterns = lib.concatMap (d: d.sentences) intent.data;
          in builtins.any (p: lib.hasInfix "{" p || lib.hasInfix "[" p) patterns;
      };    
    in lib.sort (a: b:
        # πŸ¦† says β–Ά primary sort: lower number = higher priority
        a.priority < b.priority 
        # πŸ¦† says β–Ά secondary sort: simple patterns before complex ones
        || (a.priority == b.priority && !a.hasComplexPatterns && b.hasComplexPatterns)
        # πŸ¦† says β–Ά third sort: alphabetical for determinism
        || (a.priority == b.priority && a.hasComplexPatterns == b.hasComplexPatterns && a.name < b.name)
      ) (map makeRecord scriptNamesWithIntents);
  # πŸ¦† says β–Ά generate optimized processing order
  processingOrder = map (r: r.name) scriptRecordsWithIntents;

Fuzzy Matching

Exact regex matches were cool... until they weren't.
Since I had been getting longer and longer runtime execution, I really wanted something more than just fuzz here.
I decided to go with a trigrams cache solution, with a falling back to Levenshtein distance algoritm.
I think this quickly turned out in my favor both performance and speed wise.
To be honest, what was tricky about this part was, in what way I would use the functions.
I started with the obvious exact matching falling back to fuzzy matching. Which doubled the runtime if a command had no exact match. (At this point about 50 seconds total), which is way to high for my use case.
I wanted to run the fuzzy matching logic async with the exact matching, holding off with the execution of the potential command until exact matching had failed.
Running multiple jobs in the background in Bash is fine, but it pretty much makes you lose control of the actual process and makes you wonder who is actually steering this boat?

Bash (Fuzzy Matching)


trigram_similarity() {
  local str1="$1"
  local str2="$2"
  declare -a tri1 tri2
  # πŸ¦† says β–Ά creates 3 char substring from str1
  for ((i=0; i<''${#str1}-2; i++)); do
    tri1+=( "''${str1:i:3}" )
  done
  # πŸ¦† says β–Ά creates 3 char substring from str2 
  for ((i=0; i<''${#str2}-2; i++)); do
    tri2+=( "''${str2:i:3}" )
  done
  local matches=0  
  # πŸ¦† says β–Ά count how many trigrams from str1 appear in str2
  for t in "''${tri1[@]}"; do
    [[ " ''${tri2[*]} " == *" $t "* ]] && ((matches++))
  done
  # πŸ¦† says β–Ά calculate total number of trigrams 
  local total=$(( ''${#tri1[@]} + ''${#tri2[@]} ))
  # πŸ¦† says β–Ά no trigrams?
  (( total == 0 )) && echo 0 && return
  # πŸ¦† says β–Ά return dice’s coefficient similarity Γ— 100 
  echo $(( 100 * 2 * matches / total ))  # πŸ¦† says β–Ά 0-100 scale
}
        
levenshtein_similarity() {
  local a="$1" b="$2"
  local len_a=''${#a} len_b=''${#b}
  local max_len=$(( len_a > len_b ? len_a : len_b ))   
  (( max_len == 0 )) && echo 100 && return     
  local dist=$(levenshtein "$a" "$b")
  local score=$(( 100 - (dist * 100 / max_len) ))         
  # πŸ¦† says β–Ά boostz da score for same startin' charizard yo
  [[ "''${a:0:1}" == "''${b:0:1}" ]] && score=$(( score + 10 ))
  echo $(( score > 100 ? 100 : score ))
}

Bash script workflow

Takes --input "some natural language command" from the user.

Load intent data from a JSON file (includes regex patterns, substitutions, etc.).
Load fuzzy index for fallback fuzzy matching.
For each defined script: Load regex substitutions (entities) and store them in an associative array.

Exact Match Phase (Runs in Background):
Iterate through all defined scripts (in order of priority).
Apply regex substitutions to input text.
Try to match input via corresponding match_script functions.
If matched:
Apply substitutions.
Prepare arguments.
Execute yo-script with those arguments.
Signal fuzzy handler to stop

Fuzzy Match Phase (Runs Concurrently):
Normalize input text.
Compute trigram + Levenshtein similarity scores against defined sentences.
Select best match above the threshold.
Apply the same substitution logic.
Match using match_fuzzy_script function.
If matched:
Wait for exact matcher to finish.
If exact didn’t match, execute yo-script with resolved args.

β–Ά View the Bash script for the NLP module

in { # πŸ¦† says β–Ά YOOOOOOOOOOOOOOOOOO    
  yo.scripts = { # πŸ¦† says β–Ά quack quack quack quack quack.... qwack 
    do = {
      description = "Natural language to Shell script translator with dynamic regex matching and automatic parameter resolutiion";
      aliases = ["b"];
      category = "βš™οΈ Configuration"; # πŸ¦† says β–Ά duckgorize iz zmart wen u hab many scriptz i'd say!
      logLevel = "WARNING";
      autoStart = false;
      parameters = [
        { name = "input"; description = "Text to parse into a yo command"; optional = false; }
        { name = "fuzzyThreshold"; description = "Minimum procentage for considering fuzzy matching sucessful. (1-100)"; default = "15"; }
      ]; 
      # πŸ¦† says β–Ά run yo do --help to display all defined voice commands
      helpFooter = ''
        WIDTH=$(tput cols) # πŸ¦† duck say β–Ά Auto detect width
        cat < 0' "$intent_data_file" 2>/dev/null || echo false)
          if [[ "$has_lists" != "true" ]]; then
            echo -n "$text"
            echo "|declare -A substitutions=()"  # πŸ¦† says β–Ά empty substitutions
            return
          fi                    
          # πŸ¦† says β–Ά dis is our quacktionary yo 
          replacements=$(jq -r '.["'"$script"'"].substitutions[] | "\(.pattern)|\(.value)"' "$intent_data_file")
          while IFS="|" read -r pattern out; do
            if [[ -n "$pattern" && "$text" =~ $pattern ]]; then
              original="''${BASH_REMATCH[0]}"
              [[ -z "''$original" ]] && continue # πŸ¦† says β–Ά duck no like empty string
              substitutions["''$original"]="$out"
              substitution_applied=true # πŸ¦† says β–Ά rack if any substitution was applied
              text=$(echo "$text" | sed -E "s/\\b$pattern\\b/$out/g") # πŸ¦† says β–Ά swap the word, flip the script 
            fi
          done <<< "$replacements"      
          echo -n "$text"
          echo "|$(declare -p substitutions)" # πŸ¦† says β–Ά returning da remixed sentence + da whole 
        }        
        for f in "$MATCHER_DIR"/*.sh; do [[ -f "$f" ]] && source "$f"; done
        scripts_ordered_by_priority=( ${lib.concatMapStringsSep "\n" (name: "  \"${name}\"") processingOrder} )
        dt_info "$scripts_ordered_by_priority"
        find_best_fuzzy_match() {
          local input="$1"
          local best_score=0
          local best_match=""
          local normalized=$(echo "$input" | tr '[:upper:]' '[:lower:]' | tr -d '[:punct:]')
          local candidates
          mapfile -t candidates < <(jq -r '.[] | .[] | "\(.script):\(.sentence)"' "$YO_FUZZY_INDEX")
          dt_debug "Found ''${#candidates[@]} candidates for fuzzy matching"
          for candidate in "''${candidates[@]}"; do
            IFS=':' read -r script sentence <<< "$candidate"
            local norm_sentence=$(echo "$sentence" | tr '[:upper:]' '[:lower:]' | tr -d '[:punct:]')
            local tri_score=$(trigram_similarity "$normalized" "$norm_sentence")
            (( tri_score < 30 )) && continue
            local score=$(levenshtein_similarity "$normalized" "$norm_sentence")  
            if (( score > best_score )); then
              best_score=$score
              best_match="$script:$sentence"
              dt_info "New best match: $best_match ($score%)"
            fi
          done
          if [[ -n "$best_match" ]]; then
            echo "$best_match|$best_score"
          else
            echo ""
          fi
        }
           
        # πŸ¦† says β–Ά insert matchers, build da regex empire. yo
        ${lib.concatMapStrings (name: makePatternMatcher name) scriptNamesWithIntents}  
        # πŸ¦† says β–Ά for dem scripts u defined intents for ..
        exact_match_handler() {        
          for script in "''${scripts_ordered_by_priority[@]}"; do
            # πŸ¦† says β–Ά .. we insert wat YOU sayz & resolve entities wit dat yo
            resolved_output=$(resolve_entities "$script" "$text")
            resolved_text=$(echo "$resolved_output" | cut -d'|' -f1)
            dt_debug "Tried: match_''${script} '$resolved_text'"
            # πŸ¦† says β–Ά we declare som substitutionz from listz we have - duckz knowz why 
            subs_decl=$(echo "$resolved_output" | cut -d'|' -f2-)
            declare -gA substitutions || true
            eval "$subs_decl" >/dev/null 2>&1 || true
            # πŸ¦† says β–Ά we hab a match quacky quacky diz sure iz hacky!
            if match_$script "$resolved_text"; then      
              if [[ "$(declare -p substitutions 2>/dev/null)" =~ "declare -A" ]]; then
                for original in "''${!substitutions[@]}"; do
                  dt_debug "Substitution: $original >''${substitutions[$original]}";
                  [[ -n "$original" ]] && dt_info "$original > ''${substitutions[$original]}" # πŸ¦† says β–Ά see wat duck did there?
                done # πŸ¦† says β–Ά i hop duck pick dem right - right?
              fi
              args=() # πŸ¦† says β–Ά duck gettin' ready 2 build argumentz 4 u script 
              for arg in "''${cmd_args[@]}"; do
                dt_debug "ADDING PARAMETER: $arg"
                args+=("$arg")  # πŸ¦† says β–Ά collecting them shell spell ingredients
              done
         
              # πŸ¦† says β–Ά final product - hope u like say duck!
              paramz="''${args[@]}" && echo
              echo "exact" > "$match_result_flag" # πŸ¦† says β–Ά tellz fuzzy handler we done
              dt_debug "Executing: yo $script $paramz" 
              # πŸ¦† says β–Ά EXECUTEEEEEEEAAA  – HERE WE QUAAAAACKAAAOAA
              exec "yo-$script" "''${args[@]}"   
              return 0
            fi         
          done 
        }        

        ${lib.concatMapStrings (name: makeFuzzyPatternMatcher name) scriptNamesWithIntents}  
        # πŸ¦† SCREAMS β–Ά FUZZY WOOOO TO THE MOON                
        fuzzy_match_handler() {
          resolved_output=$(resolve_entities "dummy" "$text") # We'll resolve properly after matching
          resolved_text=$(echo "$resolved_output" | cut -d'|' -f1)
          fuzzy_result=$(find_best_fuzzy_match "$resolved_text")
          [[ -z "$fuzzy_result" ]] && return 1

          IFS='|' read -r combined match_score <<< "$fuzzy_result"
          IFS=':' read -r matched_script matched_sentence <<< "$combined"
          dt_debug "Best fuzzy script: $matched_script" >&2

          # πŸ¦† says β–Ά resolve entities agein, diz time for matched script yo
          resolved_output=$(resolve_entities "$matched_script" "$text")
          resolved_text=$(echo "$resolved_output" | cut -d'|' -f1)
          subs_decl=$(echo "$resolved_output" | cut -d'|' -f2-)
          declare -gA substitutions || true
          eval "$subs_decl" >/dev/null 2>&1 || true

          # if (( best_score >= $FUZZY_THHRESHOLD )); then
          # πŸ¦† says β–Ά we hab a match quacky quacky diz sure iz hacky!
          if match_fuzzy_$matched_script "$resolved_text" "$matched_sentence"; then
            if [[ "$(declare -p substitutions 2>/dev/null)" =~ "declare -A" ]]; then
              for original in "''${!substitutions[@]}"; do
                dt_debug "Substitution: $original >''${substitutions[$original]}";
                [[ -n "$original" ]] && dt_info "$original > ''${substitutions[$original]}" # πŸ¦† says β–Ά see wat duck did there?
              done # πŸ¦† says β–Ά i hop duck pick dem right - right?
            fi
            args=() # πŸ¦† says β–Ά duck gettin' ready 2 build argumentz 4 u script 
            for arg in "''${cmd_args[@]}"; do
              dt_debug "ADDING PARAMETER: $arg"
              args+=("$arg")  # πŸ¦† says β–Ά collecting them shell spell ingredients
            done
            # πŸ¦† says β–Ά wait for exact match to finish
            # while kill -0 "$pid1" 2>/dev/null; do
            while [[ ! -f "$match_result_flag" || $(cat "$match_result_flag") != "exact_finished" ]]; do
              sleep 0.05
            done
            # πŸ¦† says β–Ά checkz if exact match succeeded yo  
            if [[ $(cat "$match_result_flag") == "exact" ]]; then 
              dt_debug "Exact match already handled execution. Fuzzy exiting."             
              exit 0
            fi
                   
            # πŸ¦† says β–Ά final product - hope u like say duck!
            paramz="''${args[@]}" && echo
            dt_info "Executing: yo $matched_script $paramz" 
            # πŸ¦† says β–Ά EXECUTEEEEEEEAAA  – HERE WE QUAAAAACKAAAOAA
            exec "yo-$matched_script" "''${args[@]}"
            return 0
          fi
        }        

        # πŸ¦† says β–Ά if exact match winz, no need for fuzz! but fuzz ready to quack when regex chokes
        exact_match_handler &
        pid1=$!
        fuzzy_match_handler
        exit
      '';
    };    
   

Automated Sentence Testing

As things grew more and more, and my bin directory became thicker, the system became unmaintainable without proper testing, I for sure was not going to do this manually. (Remember the conditional explosion?)
So i decided to go with the automated sentence testing route.
I wrote a comprehensive automated test harness as one of the CLI commands itself.

Testing Approach:

β–Ά View the Bash script for Automated Testing

    # πŸ¦† says β–Ά automatic doin' sentencin' testin'
    tests = { # πŸ¦† says β–Ά just run yo tests to do an extensive automated test based on your defined sentence data 
      description = "Extensive automated sentence testing for the NLP"; 
      category = "βš™οΈ Configuration";
      autoStart = false;
      logLevel = "INFO";
      parameters = [{ name = "input"; description = "Text to test as a single  sentence test"; optional = true; }];       
      code = ''    
        set +u  
        ${cmdHelpers}
        intent_data_file="${intentDataFile}" # πŸ¦† says β–Ά cache dat JSON wisdom, duck hates slowridez
        intent_base_path="${intentBasePath}" # πŸ¦† says β–Ά use da prebuilt path yo
        config_json=$(nix eval "$intent_base_path.$script" --json)
        passed_positive=0
        total_positive=0
        passed_negative=0
        total_negative=0
        passed_boundary=0
        failures=()     
        resolve_sentence() {
          local script="$1"
          config_json=$(nix eval "$intent_base_path.$script" --json 2>/dev/null)
          [ -z "$config_json" ] && config_json="{}"          
          local sentence="$2"    
          local parameters # πŸ¦† says β–Ά first replace parameters to avoid conflictz wit regex processin' yo
          parameters=($(grep -oP '{\K[^}]+' <<< "$sentence"))          
          for param in "''${parameters[@]}"; do
            is_wildcard=$(jq -r --arg param "$param" '.data[0].lists[$param].wildcard // "false"' <<< "$config_json" 2>/dev/null)
            local replacement=""
            if [[ "$is_wildcard" == "true" ]]; then
              # πŸ¦† says β–Ά use da context valuez
              if [[ "$param" =~ hour|minute|second ]]; then
                replacement="1"  # πŸ¦† says β–Ά use numbers for time parameters
              elif [[ "$param" =~ room|device ]]; then
                replacement="livingroom" # πŸ¦† says β–Ά use realistic room names
              else
                replacement="test" # πŸ¦† says β–Ά generic test value
              fi
            else
              mapfile -t outs < <(jq -r --arg param "$param" '.data[0].lists[$param].values[].out' <<< "$config_json" 2>/dev/null)
              if [[ ''${#outs[@]} -gt 0 ]]; then
                replacement="''${outs[0]}"
              else
                replacement="unknown"
              fi
            fi
            sentence="''${sentence//\{$param\}/$replacement}"
          done # πŸ¦† says β–Ά process regex patterns after parameter replacement
          # πŸ¦† says β–Ά handle alternatives - (word1|word2) == pick first alternative
          sentence=$(echo "$sentence" | sed -E 's/\(([^|)]+)(\|[^)]+)?\)/\1/g')          
          # πŸ¦† says β–Ά handle optional wordz - [word] == include da word
          sentence=$(echo "$sentence" | sed -E 's/\[([^]]+)\]/ \1 /g')          
          # πŸ¦† says β–Ά handle vertical bars in alternatives - word1|word2 == word1
          sentence=$(echo "$sentence" | sed -E 's/(^|\s)\|(\s|$)/ /g')  # πŸ¦† says β–Ά remove standalone vertical bars
          sentence=$(echo "$sentence" | sed -E 's/([^ ]+)\|([^ ]+)/\1/g')  # πŸ¦† says β–Ά pick first alternative in groups          
          # πŸ¦† says β–Ά clean up spaces
          sentence=$(echo "$sentence" | tr -s ' ' | sed -e 's/^ //' -e 's/ $//')
          echo "$sentence"
        }
        if [[ -n "$input" ]]; then
            echo "[πŸ¦†πŸ“œ] Testing single input: '$input'"
            FUZZY_THRESHOLD=15
            YO_FUZZY_INDEX="${fuzzyIndexFile}"
            priorityList="${toString (lib.concatStringsSep " " processingOrder)}"
            scripts_ordered_by_priority=($priorityList)
            ${lib.concatMapStrings (name: makePatternMatcher name) scriptNamesWithIntents}
            ${lib.concatMapStrings (name: makeFuzzyPatternMatcher name) scriptNamesWithIntents}
            for f in "$MATCHER_DIR"/*.sh; do [[ -f "$f" ]] && source "$f"; done
            find_best_fuzzy_match() {
              local input="$1"
              local best_score=0
              local best_match=""
              local normalized=$(echo "$input" | tr '[:upper:]' '[:lower:]' | tr -d '[:punct:]')
              local candidates
              mapfile -t candidates < <(jq -r '.[] | .[] | "\(.script):\(.sentence)"' "$YO_FUZZY_INDEX")
              dt_debug "Found ''${#candidates[@]} candidates for fuzzy matching"
              for candidate in "''${candidates[@]}"; do
                IFS=':' read -r script sentence <<< "$candidate"
                local norm_sentence=$(echo "$sentence" | tr '[:upper:]' '[:lower:]' | tr -d '[:punct:]')
                local tri_score=$(trigram_similarity "$normalized" "$norm_sentence")
                (( tri_score < 30 )) && continue
                local score=$(levenshtein_similarity "$normalized" "$norm_sentence")  
                if (( score > best_score )); then
                  best_score=$score
                  best_match="$script:$sentence"
                  dt_info "New best match: $best_match ($score%)"
                fi
              done
              if [[ -n "$best_match" ]]; then
                echo "$best_match|$best_score"
              else
                echo ""
              fi
            }
            test_single_input() {
                local input="$1"
                dt_info "Testing input: '$input'"
                for script in "''${scripts_ordered_by_priority[@]}"; do
                    resolved_output=$(resolve_entities "$script" "$input")
                    resolved_text=$(echo "$resolved_output" | cut -d'|' -f1)
                    dt_debug "Trying exact match: $script '$resolved_text'" 
                    if match_$script "$resolved_text"; then
                        dt_info "βœ… EXACT MATCH: $script"
                        dt_info "Parameters:"
                        for arg in "''${cmd_args[@]}"; do
                            dt_info "  - $arg"
                        done
                        return 0
                    fi
                done
                dt_info "No exact match found. Attempting fuzzy match..."
                fuzzy_result=$(find_best_fuzzy_match "$input")
                if [[ -z "$fuzzy_result" ]]; then
                    dt_info "❌ No fuzzy candidates found"
                    return 1
                fi  
                IFS='|' read -r combined match_score <<< "$fuzzy_result"
                IFS=':' read -r matched_script matched_sentence <<< "$combined"
                dt_info "Best fuzzy candidate: $matched_script (score: $match_score%)"
                dt_info "Matched sentence: '$matched_sentence'"
                resolved_output=$(resolve_entities "$matched_script" "$input")
                resolved_text=$(echo "$resolved_output" | cut -d'|' -f1)
                if match_fuzzy_$matched_script "$resolved_text" "$matched_sentence"; then
                    dt_info "βœ… FUZZY MATCH ACCEPTED: $matched_script"
                    dt_info "Parameters:"
                    for arg in "''${cmd_args[@]}"; do
                        dt_info "  - $arg"
                    done
                    return 0
                else
                    dt_info "❌ Fuzzy match rejected (parameter resolution failed)"
                    return 1
                fi
            }
            test_single_input "$input"
            exit $?
        fi
    
        # πŸ¦† says β–Ά insert matchers
        ${lib.concatMapStrings (name: makePatternMatcher name) scriptNamesWithIntents}  
        test_positive_cases() {
          for script in ${toString scriptNamesWithIntents}; do
            echo "[πŸ¦†πŸ“œ] Testing script: $script"    
            config_json=$(nix eval "$intent_base_path.$script" --json 2>/dev/null || echo "{}")
            mapfile -t raw_sentences < <(jq -r '.data[].sentences[]' <<< "$config_json" 2>/dev/null)    
            for template in "''${raw_sentences[@]}"; do
              test_sentence=$(resolve_sentence "$script" "$template")
              echo " Testing: $test_sentence"
              resolved_output=$(resolve_entities "$script" "$test_sentence")
              resolved_text=$(echo "$resolved_output" | cut -d'|' -f1)
              subs_decl=$(echo "$resolved_output" | cut -d'|' -f2-)
              declare -gA substitutions || true
              eval "$subs_decl" >/dev/null 2>&1 || true
              if match_$script "$resolved_text"; then
                say_duck "yay βœ… PASS: $resolved_text"
                ((passed_positive++))
              else
                say_duck "fuck ❌ FAIL: $resolved_text"
                failures+=("POSITIVE: $script | $resolved_text")
              fi
              ((total_positive++))
            done
          done
        }
        test_negative_cases() {
          echo "[πŸ¦†πŸš«] Testing Negative Cases"
          negative_cases=(
            "make me a sandwich"
            "launch the nuclear torpedos!"
            "gΓΆr mig en macka"
            "avfyra kΓ€rnvapnen!"
            "ducks sure are the best dont you agree"
          )        
          for neg_case in "''${negative_cases[@]}"; do
            echo " Testing: $neg_case"
            matched=false
            for script in ${toString scriptNamesWithIntents}; do
              resolved_output=$(resolve_entities "$script" "$neg_case")
              resolved_neg=$(echo "$resolved_output" | cut -d'|' -f1)     
              if match_$script "$resolved_neg"; then
                say_duck "fuck ❌ FALSE POSITIVE: $resolved_neg (matched by $script)"
                failures+=("NEGATIVE: $script | $resolved_neg")
                matched=true
                break
              fi
            done       
            if ! $matched; then
              say_duck "yay βœ… [NEG] PASS: $resolved_neg"
              ((passed_negative++))
            fi
            ((total_negative++))
          done
        }
        test_boundary_cases() {
          echo "[πŸ¦†πŸ”²] Testing Boundary Cases"
          boundary_cases=("" "   " "." "!@#$%^&*()")  
          for bcase in "''${boundary_cases[@]}"; do
            printf " Testing: '%s'\n" "$bcase"
            matched=false   
            for script in ${toString scriptNamesWithIntents}; do
              if match_$script "$bcase"; then
                say_duck "fuck ❌ BOUNDARY FAIL: '$bcase' (matched by $script)"
                failures+=("BOUNDARY: $script | '$bcase'")
                matched=true
                break
              fi
            done       
            if ! $matched; then
              say_duck "yay βœ… [BND] PASS: '$bcase'"
              ((passed_boundary++))
            fi
          done
          total_boundary=''${#boundary_cases[@]}
        }  
        test_positive_cases
        test_negative_cases
        test_boundary_cases
        
        # πŸ¦† says β–Ά calculate
        total_tests=$((total_positive + total_negative + total_boundary))
        passed_tests=$((passed_positive + passed_negative + passed_boundary))
        percent=$(( 100 * passed_tests / total_tests ))
        
        # πŸ¦† says β–Ά colorize based on percentage
        if [ "$percent" -ge 80 ]; then 
            color="$GREEN" && duck_report="⭐"
        elif [ "$percent" -ge 60 ]; then 
            color="$YELLOW" && duck_report="🟒"
        else 
            color="$RED" && duck_report="😭"
        fi
        
        # πŸ¦† says β–Ά display failed tests report
        if [ "$passed_tests" -ne "$total_tests" ]; then 
            if [ ''${#failures[@]} -gt 0 ]; then
                echo "" && echo -e "''${RED}## ────── FAILURES ──────##''${RESET}"
                for failure in "''${failures[@]}"; do
                    echo -e "''${RED}## ❌ $failure"
                done
                echo -e "''${RED}## ────── FAILURES ──────##''${RESET}"
            fi
        fi
        
        # πŸ¦† says β–Ά display final report
        echo "" && echo -e "''${color}"## β”€β”€β”€β”€β”€β”€β‹†β‹…β˜†β‹…β‹†β”€β”€β”€β”€β”€β”€ ##''${RESET}"
        bold "Testing completed!" 
        say_duck "Positive: $passed_positive/$total_positive"
        say_duck "Negative: $passed_negative/$total_negative"
        say_duck "Boundary: $passed_boundary/$total_boundary"
        say_duck "TOTAL: $passed_tests/$total_tests (''${color}''${percent}%''${GRAY})"
        echo "''${RESET}" && echo -e "''${color}## β”€β”€β”€β”€β”€β”€β‹†β‹…β˜†β‹…β‹†β”€β”€β”€β”€β”€β”€ ##''${RESET}"
        say_duck "$duck_report"
        exit 1
      ''; # πŸ¦† says β–Ά thnx for quackin' along til da end!
    };# πŸ¦† says β–Ά the duck be stateless, the regex be law, and da shell... is my pond.    
  };}# πŸ¦† say β–Ά nobody beat diz nlp nao says sir quack a lot NOBODY I SAY!
# πŸ¦† says β–Ά QuackHack-McBLindy out!  

Where I Landed

Dynamic Pattern Matching

Unlimited automatic parameter resolution & entity substitutions through dynamically generated regex patterns matching against declarative sentence definition.

Shell command construction & dispatcher.

Hybrid Matching System

Exact pattern matching with async fuzzy matching fallback

Comprehensive Testing

Automated test harness covering all edge cases with detailed reporting for every defined sentences.

The result is a voice-command ready system that feels like magic when it works and teaches you humility when it doesn't. It's unorthodox, occasionally infuriating, but ultimately empowering - letting me command my computer in a way that feels truly natural and powerful.
Even with a lot of bad Whisper transcriptions we are looking at a low 2% failure rate for called intents.
I can be really drunk and speak Japanese and it would still properly match and extract all parameters.

# πŸ¦† says β–Ά "What's the stupidest way YOU could solve your next problem?"

Please go stupid!
Look how well this turned out.
After over five years of exmperementing with building that voice command empire,
I can safely say that this module is the by far most accurate intent handler I have tried so far.
I wish it would handle speeds a little better when scaling, but that might be a project for another day.
Well, I hope you enjoyed the funky ride and hopefully learned something.
Maybe I'll throw up some more texts if this was liked, because I do have a ton of strange modules like this laying around.

Peace and code QuackHack-McBLindy out yo!

View NLP source code on GitHub
View testing framework source code on GitHub

Comments on this blog post