Developing a feature rich Natural Language Processor
- Using just jq, Nix & Bash

I've always been drawn to voice commands. Maybe it's because of the extended use cases that come with being blind - you can imagine fumbling around for the TV remote when not being able to see? That sucks! Now I can simply tell the TV to start. Or if I prefer, simply ask where the remote is and the Nvidia Shield Remote plays a sound. Maybe it's just the appeal of being able to multitask with a third hand that doesn't exist. Either way, the idea of speaking to my computer and having it do things has always fascinated me. Voice as an interface feels natural, powerful... and deeply underutilized.

# ๐Ÿฆ† says โฎž I use 20x magnification when I code and debug. I use emoji to simplify logs for myself. If you can't handle my code style you can disable most of it on this website by toggling the button in the navbar. Shall duck continue?

Let's rewind a bit. I've always had a soft spot for doing things the stupid way. Not "stupid" as in broken, but "stupid" as in... unorthodox, unnecessary, and definitely not optimized for scale. The kind of stupid that makes you ask: "Wait, you did what in Bash?!"

The Genesis: Bash Meets Nix

Early in this project, no one - not even I - could give a good reason for doing it in Bash. There were no benchmarks to beat, no ecosystem gaps to fill, and definitely no best practices to follow. Maybe it's still a stupid idea. But I kept coming back to one thing: Bash has flexibility.
You can bend it in ways most people can't imagine.

# ๐Ÿฆ† says โฎž There is no spoon...

Meanwhile, tools like Home Assistant's Assist were starting to bore me. They worked, sure - but I constantly felt locked in, boxed by high-level abstractions and limited customization. I didn't want a voice assistant that just reacts - I wanted one that obeys.
While being an active contributor within the community, I multiple times got or noticed people getting responses from the dev's "That's not possible." and "No, you can't do that". Which was my breaking point to leave and go my own way.

That's where Nix came in. Going the Nix way - what could be more fun?
Using Nix's declarative configuration model, I realized I could define a CLI interface that held all my custom scripts and logic - the glue that tied intent to action. I could describe sentences declaratively and map them directly to Bash scripts.

No overhead. No dependencies. No noise. Just me, a microphone, and a shell.

I Did Not Mean for It to Get This Complex

What started as a small utility to let me run shell commands by typing natural language sentences quickly spiraled into a full-blown NLP system. With Bash. And Nix. And thousands of dynamically generated regular expressions.

This blog post walks you through the actual code behind a voice-command-esque CLI script framework I built using Nix to organize, parse, cache, and test structured sentence intents to build and execute Shell commands.

๐Ÿฆ†๐Ÿ  HOME via ๐Ÿ v3.12.10
01:41:26 โฏ yo do "please stop test auto 5 yes wild card testing here"
โ”Œโ”€(yo-demo)
โ”‚๐Ÿฆ†
โ””โ”€โฎž --action stop
โ””โ”€โฎž --target test
โ””โ”€โฎž --mode auto
โ””โ”€โฎž --duration 5
โ””โ”€โฎž --confirm yes
โ””โ”€โฎž --wild wild card testing here
[๐Ÿฆ†๐Ÿ“œ] [01:41:31] โ‰๏ธDEBUGโ‰๏ธ โฎž +0.007 s Script Started!
[๐Ÿฆ†๐Ÿ“œ] [01:41:31] โ‰๏ธDEBUGโ‰๏ธ โฎž +0.013 s SCript Executed!
[๐Ÿฆ†๐Ÿ“œ] [01:41:31] โ‰๏ธDEBUGโ‰๏ธ โฎž +0.020 s Script finished!

Nix Configuration: Declaring Intents

Nix Configuration
  yo = {
    scripts = {
      demo = {
        parameters = [
          { name = "action"; description = "Action to perform"; default = "run"; }
          { name = "target"; description = "Target of the action"; }
          { name = "intensity"; description = "Level of intensity"; type = "int"; optional = true; }
          { name = "duration"; description = "Duration in seconds"; type = "int"; optional = true; }
          { name = "mode"; description = "Mode of operation"; optional = true; }
          { name = "confirm"; description = "Whether to confirm the action"; optional = true; }
          { name = "wild"; description = "Wildcard testing"; optional = true; }     
        ];
        code = ''
          echo "Script executed!"
        '';
      };
    };
    do = {
      intents = {
        demo = {
          data = [{
            sentences = [
              "please {action} {target} {mode} {duration} {confirm} {wild}"
            ];
            lists = {
              action.values = [
                { "in" = "[run|execute|start]"; out = "RUN"; }
                { "in" = "[stop|terminate|halt]"; out = "STOP"; }
              ];
              target.values = [
                { "in" = "[process|program|sequence]"; out = "PROCESS"; }
                { "in" = "[test|check|validation]"; out = "TEST"; }
              ];
              mode.values = [
                { "in" = "[auto|automatic]"; out = "AUTO"; }
                { "in" = "[manual|manually]"; out = "MANUAL"; }
              ];
              confirm.values = [
                { "in" = "[yes|confirm|sure]"; out = "YES"; }
                { "in" = "[no|cancel|decline]"; out = "NO"; }
              ];
              intensity.values = builtins.genList (i: {
                "in" = toString (i + 1);
                out = toString (i + 1);
                }) 10;
              duration.values = builtins.genList (i: {
                "in" = toString ((i + 1) * 10);
                out = toString ((i + 1) * 10);
              }) 6;
              wild.wildcard = true;
            };
          }];
        };  
      };
    };      
  };
  
};

Early on, I realized Nix could give me a ton of structure and organization. Since I have been running NixOS system wide for some years now, I have grown comfortable writing Nix expressions. Nix will let user define software configurations declaratively. I'm abusing it for voice logic. I wanted to declaratively define scripts and bind natural language intents to each of them - and Nix is great at producing structured derivations.

Each script gets a name, description, category, and as many parameter options as prefered., Add a set of intent sentences to that.
These are eventually transformed into regular expressions, cached, and tested. Lists can optionally be defined mapping in words to out words. Or set the list value as a wildcard if input for the parameter should be a wildcard and accept any input.

From Sentences to Shell Scripts

The core idea was: take a sentence from the user, figure out what script it maps to, and run it with the correct arguments. Sounds simple? I wouldn't really call it easy, but okay.

# ๐Ÿฆ† says โฎž Logical first steps

Integrating defined intents (Nix)

let
  scripts = config.yo.scripts; # ๐Ÿฆ† says โฎž import all dem scripts
  scriptNames = builtins.attrNames scripts; # ๐Ÿฆ† says โฎž names can be useful too
  scriptNamesWithIntents = builtins.filter (scriptName:
    builtins.hasAttr scriptName config.yo.do.intents
  ) scriptNames; # ๐Ÿฆ† says โฎž scripts with no sentences - skippin' dem yo
  

Intent Parsing + Entity Resolution

To match user input, I needed to preprocess the input sentence, resolve any custom entities, and match it against a giant list of potential phrases. Oh yeah, and some of the words in those phrases were dynamic parameters.

Bash + Nix (Pattern Matching)

  # ๐Ÿฆ† says โฎž where da magic dynamic regex iz at 
  makePatternMatcher = scriptName: let
    dataList = config.yo.do.intents.${scriptName}.data;    
  in '' # ๐Ÿฆ† says โฎž diz iz how i pick da script u want 
    match_${scriptName}() { # ๐Ÿฆ† says โฎž shushin' da caps โ€“ lowercase life 4 cleaner dyn regex zen โœจ
      local input="$(echo "$1" | tr '[:upper:]' '[:lower:]')" 
      # ๐Ÿฆ† says โฎž always show input in debug mode
      # ๐Ÿฆ† says โฎž watch the fancy stuff live in action  
      dt_debug "Trying to match for script: ${scriptName}" >&2
      dt_debug "Input: $input" >&2
      # ๐Ÿฆ† says โฎž duck presentin' - da madnezz 
      ${lib.concatMapStrings (data:
        lib.concatMapStrings (sentence:
          lib.concatMapStrings (sentenceText: let
            # ๐Ÿฆ† says โฎž now sentenceText is one of the expanded variants!
            parts = lib.splitString "{" sentenceText; # ๐Ÿฆ† says โฎž diggin' out da goodies from curly nests! Gimme dem {param} nuggets! 
            firstPart = lib.escapeRegex (lib.elemAt parts 0); # ๐Ÿฆ† says โฎž gotta escape them weird chars 
            restParts = lib.drop 1 parts;  # ๐Ÿฆ† says โฎž now we in the variable zone quack?  
            # ๐Ÿฆ† says โฎž process each part to build regex and params
            regexParts = lib.imap (i: part:
              let
                split = lib.splitString "}" part; # ๐Ÿฆ† says โฎž yeah yeah curly close that syntax shell
                param = lib.elemAt split 0; # ๐Ÿฆ† says โฎž name of the param in da curly โ€“ ex: {user}
                after = lib.concatStrings (lib.tail split); # ๐Ÿฆ† says โฎž anything after the param in this chunk
                # ๐Ÿฆ† says โฎž Wildcard mode! anything goes - duck catches ALL the worms! (.*)
                isWildcard = data.lists.${param}.wildcard or false;
                regexGroup = if isWildcard then "(.*)" else "\\b([^ ]+)\\b";       
                # ๐Ÿฆ† says โฎž ^ da regex that gon match actual input text
              in {
                regex = regexGroup + lib.escapeRegex after;
                param = param;
              }
            ) restParts;

            fullRegex = let
              clean = lib.strings.trim (firstPart + lib.concatStrings (map (v: v.regex) regexParts));
            in "^${clean}$"; # ๐Ÿฆ† says โฎž mash all regex bits 2gether
            paramList = map (v: v.param) regexParts; # ๐Ÿฆ† says โฎž the squad of parameters 
          in ''
            local regex='^${fullRegex}$'
            dt_debug "REGEX: $regex"
            if [[ "$input" =~ $regex ]]; then  # ๐Ÿฆ† says โฎž DANG DANG โ€“ regex match engaged 
              ${lib.concatImapStrings (i: paramName: ''
                # ๐Ÿฆ† says โฎž extract match group #i+1 โ€“ param value, come here plz 
                param_value="''${BASH_REMATCH[${toString (i+1)}]}"
                # ๐Ÿฆ† says โฎž if param got synonym, apply the duckfilter 
                if [[ -n "''${param_value:-}" && -v substitutions["$param_value"] ]]; then
                  subbed="''${substitutions["$param_value"]}"
                  if [[ -n "$subbed" ]]; then
                    param_value="$subbed"
                  fi
                fi           
                ${lib.optionalString (
                  data.lists ? ${paramName} && !(data.lists.${paramName}.wildcard or false)
                ) ''
                  # ๐Ÿฆ† says โฎž apply substitutions before case matchin'
                  if [[ -v substitutions["$param_value"] ]]; then
                    param_value="''${substitutions["$param_value"]}"
                  fi
                  case "$param_value" in
                    ${makeEntityResolver data paramName}
                    *) ;;
                  esac
                ''} # ๐Ÿฆ† says โฎž declare global param โ€“ duck want it everywhere! (for bash access)
                declare -g "_param_${paramName}"="$param_value"            
                declare -A params=()
                params["${paramName}"]="$param_value"
                matched_params+=("$paramName")
              '') paramList} # ๐Ÿฆ† says โฎž set dat param as a GLOBAL VAR yo! every duck gotta know 
              # ๐Ÿฆ† says โฎž build cmd args: --param valu
              cmd_args=()
              ${lib.concatImapStrings (i: paramName: ''
                value="''${BASH_REMATCH[${toString i}]}"
                cmd_args+=(--${paramName} "$value")
              '') paramList}
              dt_debug "REMATCH 1: ''${BASH_REMATCH[1]}"
              dt_debug "REMATCH 2: ''${BASH_REMATCH[2]}"
              dt_debug "MATCHED SCRIPT: ${scriptName}"
              dt_debug "ARGS: ''${cmd_args[@]}"
              return 0
            fi
          '') (expandOptionalWords sentence)
        ) data.sentences
      ) dataList}
      return 1
    }
  ''; # ๐Ÿฆ† says โฎž dat was fun! let'z do it again some time

# ๐Ÿฆ† says โฎž Funny side note, before this project, regular expressions was diz duck's absolute worst nightmare. These days diz duck quacktually don't mind it dat much.

The Performance Bomb

As I implemented more features, performance became explosive. I wanted my sentence definitions to have [optional|words] and (required|one|of|these|words) patterns. This led to combinatorial explosion.

Nix (Cartesian Product)

  cartesianProductOfLists = lists:
    # ๐Ÿฆ† says โฎž if da listz iz empty .. 
    if lists == [] then
      [ [] ] # ๐Ÿฆ† says โฎž .. i gib u empty listz of listz yo got it?
    else # ๐Ÿฆ† says โฎž ELSE WAT?!
      let # ๐Ÿฆ† says โฎž sorry.. i gib u first list here u go yo
        head = builtins.head lists;
        # ๐Ÿฆ† says โฎž remaining listz for u here u go bro!
        tail = builtins.tail lists;
        # ๐Ÿฆ† says โฎž calculate combinations for my tail - yo calc wher u at?!
        tailProduct = cartesianProductOfLists tail;
      in # ๐Ÿฆ† says โฎž for everyy x in da listz ..
        lib.concatMap (x:
          # ๐Ÿฆ† says โฎž .. letz combinez wit every tail combinationz ..  
          map (y: [x] ++ y) tailProduct
        ) head; # ๐Ÿฆ† says โฎž dang! datz a DUCK COMBO alright!  
         
  # ๐Ÿฆ† says โฎž here i duckie help yo out! makin' yo life eazy sleazy' wen declarative sentence yo typin'    
  expandOptionalWords = sentence: # ๐Ÿฆ† says โฎž qucik & simple sentences we quacky & hacky expandin'
    let # ๐Ÿฆ† says โฎž CHOP CHOP! Rest in lil' Pieceez bigg sentence!!1     
      tokens = lib.splitString " " sentence;      
      # ๐Ÿฆ† says โฎž definin' dem wordz in da (braces) taggin' dem' wordz az (ALTERNATIVES) lettin' u choose one of dem wen triggerin' 
      isRequiredGroup = t: lib.hasPrefix "(" t && lib.hasSuffix ")" t;
      # ๐Ÿฆ† says โฎž puttin' sentence wordz in da [bracket] makin' em' [OPTIONAL] when doin' u don't have to be pickin' woooho 
      isOptionalGroup = t: lib.hasPrefix "[" t && lib.hasSuffix "]" t;   
      expandToken = token: # ๐Ÿฆ† says โฎž dis gets all da real wordz out of one token (yo!)
        if isRequiredGroup token then
          let # ๐Ÿฆ† says โฎž thnx 4 lettin' ducklin' be cleanin' - i'll be removin' dem "()" 
            clean = lib.removePrefix "(" (lib.removeSuffix ")" token);
            alternatives = lib.splitString "|" clean; # ๐Ÿฆ† says โฎž use "|" to split (alternative|wordz) yo 
          in  # ๐Ÿฆ† says โฎž dat's dat 4 dem alternativez
            alternatives
        else if isOptionalGroup token then
          let # ๐Ÿฆ† says โฎž here we be goin' again - u dirty and i'll be cleanin' dem "[]"
            clean = lib.removePrefix "[" (lib.removeSuffix "]" token);
            alternatives = lib.splitString "|" clean; # ๐Ÿฆ† says โฎž i'll be stealin' dat "|" from u 
          in # ๐Ÿฆ† says โฎž u know wat? optional means we include blank too!
            alternatives ++ [ "" ]
        else # ๐Ÿฆ† says โฎž else i be returnin' raw token for yo
          [ token ];      
      # ๐Ÿฆ† says โฎž now i gib u generatin' all dem combinationz yo
      expanded = cartesianProductOfLists (map expandToken tokens);      
      # ๐Ÿฆ† says โฎž clean up if too much space, smush back into stringz for ya
      trimmedVariants = map (tokenList:
        let # ๐Ÿฆ† says โฎž join with spaces then trim them suckers
          raw = lib.concatStringsSep " " tokenList;
          # ๐Ÿฆ† says โฎž remove ALL extra spaces
          cleaned = lib.replaceStrings ["  "] [" "] (lib.strings.trim raw);
        in # ๐Ÿฆ† says โฎž wow now they be shinin'
          cleaned 
      ) expanded; # ๐Ÿฆ† says โฎž and they be multiplyyin'!      
      # ๐Ÿฆ† says โฎž throwin' out da empty and cursed ones yo
      nonEmpty = lib.filter (s: s != "") trimmedVariants;
      hasFixedText = v: builtins.match ".*[^\\{].*" v != null; # ๐Ÿฆ† says โฎž no no no, no nullin'
      validVariants = lib.filter hasFixedText nonEmpty;
    in # ๐Ÿฆ† says โฎž returnin' all unique variantz of da sentences โ€“ holy duck dat'z fresh 
      lib.unique validVariants;

This is where the bomb went off. The module went from being instant to extremely slow.
Even with a very detailed debugging system in place, when you are generating code with this extensiveness, finding the problematic line can take time, later i found that it was caused by a single intent definition, my timer intent....

I was defining this intent in the most genius way, something similar to:

Timer intent definition gone wrong

sentences = [
  "(create|set|start|launch) [a] timer [for] {hours} (hour|hours) {minutes} (minute|minutes) {seconds} (second|seconds)"
  "(create|set|start|launch) [a] timer [for] {minutes} (minute|minutes) [and] {seconds} (second|seconds)"
  "(create|set|start|launch) [a] timer [for] {minutes} (minute|minutes)"
  "(create|set|start|launch) [a] timer [for] {seconds} seconds"
];
lists = {
  hours.values = lib.genList (n: {
    "in" = lib.concatStringsSep "|" [
      (toString n)
      ("kl " + toString n)
      (toString n + "h")
      (builtins.elemAt numberWords n)
    ];
    out = toString n;
  }) 24;
  
  minutes.values = lib.genList (n: {
    "in" = lib.concatStringsSep "|" [
      (toString n)
      (toString n + "m")
      ("minut " + toString n)
      (builtins.elemAt numberWords n)
    ];
    out = toString n;
  }) 60;

  seconds.values = lib.genList (n: {
    "in" = lib.concatStringsSep "|" [
      (toString n)
      (toString n + "s")
      ("sekund " + toString n)
      (builtins.elemAt numberWords n)
    ];
    out = toString n;
  }) 60;

To better paint the picture of why this was a bad idea, I'll do the math

Combinational explosion:

Yes โ€” over 715 million possible combinations.
Insane? Definitely.
Useful? Ehh.. I don't think so?

But I still think this example shows how powerful this can be if you'd decide to go that crazy route.
This is where I decided to define with caution, and also where I implemented the next performance feature, the priority system.
Basic functionallity that let user define order of which the intents are processed for the regex patterns.

Priority System for pattern matching

  # ๐Ÿฆ† says โฎž priority system 4 runtime optimization
  scriptRecordsWithIntents = 
    let # ๐Ÿฆ† says โฎž calculate priority
      calculatePriority = scriptName:
        config.yo.do.intents.${scriptName}.priority or 3; # ๐Ÿฆ† says โฎž default medium
      # ๐Ÿฆ† says โฎž create script records metadata
      makeRecord = scriptName: rec {
        name = scriptName;
        priority = calculatePriority scriptName;
        hasComplexPatterns = 
          let 
            intent = config.yo.do.intents.${scriptName};
            patterns = lib.concatMap (d: d.sentences) intent.data;
          in builtins.any (p: lib.hasInfix "{" p || lib.hasInfix "[" p) patterns;
      };    
    in lib.sort (a: b:
        # ๐Ÿฆ† says โฎž primary sort: lower number = higher priority
        a.priority < b.priority 
        # ๐Ÿฆ† says โฎž secondary sort: simple patterns before complex ones
        || (a.priority == b.priority && !a.hasComplexPatterns && b.hasComplexPatterns)
        # ๐Ÿฆ† says โฎž third sort: alphabetical for determinism
        || (a.priority == b.priority && a.hasComplexPatterns == b.hasComplexPatterns && a.name < b.name)
      ) (map makeRecord scriptNamesWithIntents);
  # ๐Ÿฆ† says โฎž generate optimized processing order
  processingOrder = map (r: r.name) scriptRecordsWithIntents;

Fuzzy Matching

Exact regex matches were cool... until they weren't.
Since I had been getting longer and longer runtime execution, I really wanted something more than just fuzz here.
I decided to go with a trigrams cache solution, with a falling back to Levenshtein distance algoritm.
I think this quickly turned out in my favor both performance and speed wise.
To be honest, what was tricky about this part was, in what way I would use the functions.
I started with the obvious exact matching falling back to fuzzy matching. Which doubled the runtime if a command had no exact match. (At this point about 50 seconds total), which is way to high for my use case.
I wanted to run the fuzzy matching logic async with the exact matching, holding off with the execution of the potential command until exact matching had failed.
Running multiple jobs in the background in Bash is fine, but it pretty much makes you lose control of the actual process and makes you wonder who is actually steering this boat?

Bash (Fuzzy Matching)


trigram_similarity() {
  local str1="$1"
  local str2="$2"
  declare -a tri1 tri2
  # ๐Ÿฆ† says โฎž creates 3 char substring from str1
  for ((i=0; i<''${#str1}-2; i++)); do
    tri1+=( "''${str1:i:3}" )
  done
  # ๐Ÿฆ† says โฎž creates 3 char substring from str2 
  for ((i=0; i<''${#str2}-2; i++)); do
    tri2+=( "''${str2:i:3}" )
  done
  local matches=0  
  # ๐Ÿฆ† says โฎž count how many trigrams from str1 appear in str2
  for t in "''${tri1[@]}"; do
    [[ " ''${tri2[*]} " == *" $t "* ]] && ((matches++))
  done
  # ๐Ÿฆ† says โฎž calculate total number of trigrams 
  local total=$(( ''${#tri1[@]} + ''${#tri2[@]} ))
  # ๐Ÿฆ† says โฎž no trigrams?
  (( total == 0 )) && echo 0 && return
  # ๐Ÿฆ† says โฎž return diceโ€™s coefficient similarity ร— 100 
  echo $(( 100 * 2 * matches / total ))  # ๐Ÿฆ† says โฎž 0-100 scale
}
        
levenshtein_similarity() {
  local a="$1" b="$2"
  local len_a=''${#a} len_b=''${#b}
  local max_len=$(( len_a > len_b ? len_a : len_b ))   
  (( max_len == 0 )) && echo 100 && return     
  local dist=$(levenshtein "$a" "$b")
  local score=$(( 100 - (dist * 100 / max_len) ))         
  # ๐Ÿฆ† says โฎž boostz da score for same startin' charizard yo
  [[ "''${a:0:1}" == "''${b:0:1}" ]] && score=$(( score + 10 ))
  echo $(( score > 100 ? 100 : score ))
}

Bash script workflow

Takes --input "some natural language command" from the user.

Load intent data from a JSON file (includes regex patterns, substitutions, etc.).
Load fuzzy index for fallback fuzzy matching.
For each defined script: Load regex substitutions (entities) and store them in an associative array.

Exact Match Phase (Runs in Background):
Iterate through all defined scripts (in order of priority).
Apply regex substitutions to input text.
Try to match input via corresponding match_script functions.
If matched:
Apply substitutions.
Prepare arguments.
Execute yo-script with those arguments.
Signal fuzzy handler to stop

Fuzzy Match Phase (Runs Concurrently):
Normalize input text.
Compute trigram + Levenshtein similarity scores against defined sentences.
Select best match above the threshold.
Apply the same substitution logic.
Match using match_fuzzy_script function.
If matched:
Wait for exact matcher to finish.
If exact didnโ€™t match, execute yo-script with resolved args.

โ–ถ View the Bash script for the NLP module

in { # ๐Ÿฆ† says โฎž YOOOOOOOOOOOOOOOOOO    
  yo.scripts = { # ๐Ÿฆ† says โฎž quack quack quack quack quack.... qwack 
    do = {
      description = "Natural language to Shell script translator with dynamic regex matching and automatic parameter resolutiion";
      aliases = ["b"];
      category = "โš™๏ธ Configuration"; # ๐Ÿฆ† says โฎž duckgorize iz zmart wen u hab many scriptz i'd say!
      logLevel = "WARNING";
      autoStart = false;
      parameters = [
        { name = "input"; description = "Text to parse into a yo command"; optional = false; }
        { name = "fuzzyThreshold"; description = "Minimum procentage for considering fuzzy matching sucessful. (1-100)"; default = "15"; }
      ]; 
      # ๐Ÿฆ† says โฎž run yo do --help to display all defined voice commands
      helpFooter = ''
        WIDTH=$(tput cols) # ๐Ÿฆ† duck say โฎž Auto detect width
        cat < 0' "$intent_data_file" 2>/dev/null || echo false)
          if [[ "$has_lists" != "true" ]]; then
            echo -n "$text"
            echo "|declare -A substitutions=()"  # ๐Ÿฆ† says โฎž empty substitutions
            return
          fi                    
          # ๐Ÿฆ† says โฎž dis is our quacktionary yo 
          replacements=$(jq -r '.["'"$script"'"].substitutions[] | "\(.pattern)|\(.value)"' "$intent_data_file")
          while IFS="|" read -r pattern out; do
            if [[ -n "$pattern" && "$text" =~ $pattern ]]; then
              original="''${BASH_REMATCH[0]}"
              [[ -z "''$original" ]] && continue # ๐Ÿฆ† says โฎž duck no like empty string
              substitutions["''$original"]="$out"
              substitution_applied=true # ๐Ÿฆ† says โฎž rack if any substitution was applied
              text=$(echo "$text" | sed -E "s/\\b$pattern\\b/$out/g") # ๐Ÿฆ† says โฎž swap the word, flip the script 
            fi
          done <<< "$replacements"      
          echo -n "$text"
          echo "|$(declare -p substitutions)" # ๐Ÿฆ† says โฎž returning da remixed sentence + da whole 
        }        
        for f in "$MATCHER_DIR"/*.sh; do [[ -f "$f" ]] && source "$f"; done
        scripts_ordered_by_priority=( ${lib.concatMapStringsSep "\n" (name: "  \"${name}\"") processingOrder} )
        dt_info "$scripts_ordered_by_priority"
        find_best_fuzzy_match() {
          local input="$1"
          local best_score=0
          local best_match=""
          local normalized=$(echo "$input" | tr '[:upper:]' '[:lower:]' | tr -d '[:punct:]')
          local candidates
          mapfile -t candidates < <(jq -r '.[] | .[] | "\(.script):\(.sentence)"' "$YO_FUZZY_INDEX")
          dt_debug "Found ''${#candidates[@]} candidates for fuzzy matching"
          for candidate in "''${candidates[@]}"; do
            IFS=':' read -r script sentence <<< "$candidate"
            local norm_sentence=$(echo "$sentence" | tr '[:upper:]' '[:lower:]' | tr -d '[:punct:]')
            local tri_score=$(trigram_similarity "$normalized" "$norm_sentence")
            (( tri_score < 30 )) && continue
            local score=$(levenshtein_similarity "$normalized" "$norm_sentence")  
            if (( score > best_score )); then
              best_score=$score
              best_match="$script:$sentence"
              dt_info "New best match: $best_match ($score%)"
            fi
          done
          if [[ -n "$best_match" ]]; then
            echo "$best_match|$best_score"
          else
            echo ""
          fi
        }
           
        # ๐Ÿฆ† says โฎž insert matchers, build da regex empire. yo
        ${lib.concatMapStrings (name: makePatternMatcher name) scriptNamesWithIntents}  
        # ๐Ÿฆ† says โฎž for dem scripts u defined intents for ..
        exact_match_handler() {        
          for script in "''${scripts_ordered_by_priority[@]}"; do
            # ๐Ÿฆ† says โฎž .. we insert wat YOU sayz & resolve entities wit dat yo
            resolved_output=$(resolve_entities "$script" "$text")
            resolved_text=$(echo "$resolved_output" | cut -d'|' -f1)
            dt_debug "Tried: match_''${script} '$resolved_text'"
            # ๐Ÿฆ† says โฎž we declare som substitutionz from listz we have - duckz knowz why 
            subs_decl=$(echo "$resolved_output" | cut -d'|' -f2-)
            declare -gA substitutions || true
            eval "$subs_decl" >/dev/null 2>&1 || true
            # ๐Ÿฆ† says โฎž we hab a match quacky quacky diz sure iz hacky!
            if match_$script "$resolved_text"; then      
              if [[ "$(declare -p substitutions 2>/dev/null)" =~ "declare -A" ]]; then
                for original in "''${!substitutions[@]}"; do
                  dt_debug "Substitution: $original >''${substitutions[$original]}";
                  [[ -n "$original" ]] && dt_info "$original > ''${substitutions[$original]}" # ๐Ÿฆ† says โฎž see wat duck did there?
                done # ๐Ÿฆ† says โฎž i hop duck pick dem right - right?
              fi
              args=() # ๐Ÿฆ† says โฎž duck gettin' ready 2 build argumentz 4 u script 
              for arg in "''${cmd_args[@]}"; do
                dt_debug "ADDING PARAMETER: $arg"
                args+=("$arg")  # ๐Ÿฆ† says โฎž collecting them shell spell ingredients
              done
         
              # ๐Ÿฆ† says โฎž final product - hope u like say duck!
              paramz="''${args[@]}" && echo
              echo "exact" > "$match_result_flag" # ๐Ÿฆ† says โฎž tellz fuzzy handler we done
              dt_debug "Executing: yo $script $paramz" 
              # ๐Ÿฆ† says โฎž EXECUTEEEEEEEAAA  โ€“ HERE WE QUAAAAACKAAAOAA
              exec "yo-$script" "''${args[@]}"   
              return 0
            fi         
          done 
        }        

        ${lib.concatMapStrings (name: makeFuzzyPatternMatcher name) scriptNamesWithIntents}  
        # ๐Ÿฆ† SCREAMS โฎž FUZZY WOOOO TO THE MOON                
        fuzzy_match_handler() {
          resolved_output=$(resolve_entities "dummy" "$text") # We'll resolve properly after matching
          resolved_text=$(echo "$resolved_output" | cut -d'|' -f1)
          fuzzy_result=$(find_best_fuzzy_match "$resolved_text")
          [[ -z "$fuzzy_result" ]] && return 1

          IFS='|' read -r combined match_score <<< "$fuzzy_result"
          IFS=':' read -r matched_script matched_sentence <<< "$combined"
          dt_debug "Best fuzzy script: $matched_script" >&2

          # ๐Ÿฆ† says โฎž resolve entities agein, diz time for matched script yo
          resolved_output=$(resolve_entities "$matched_script" "$text")
          resolved_text=$(echo "$resolved_output" | cut -d'|' -f1)
          subs_decl=$(echo "$resolved_output" | cut -d'|' -f2-)
          declare -gA substitutions || true
          eval "$subs_decl" >/dev/null 2>&1 || true

          # if (( best_score >= $FUZZY_THHRESHOLD )); then
          # ๐Ÿฆ† says โฎž we hab a match quacky quacky diz sure iz hacky!
          if match_fuzzy_$matched_script "$resolved_text" "$matched_sentence"; then
            if [[ "$(declare -p substitutions 2>/dev/null)" =~ "declare -A" ]]; then
              for original in "''${!substitutions[@]}"; do
                dt_debug "Substitution: $original >''${substitutions[$original]}";
                [[ -n "$original" ]] && dt_info "$original > ''${substitutions[$original]}" # ๐Ÿฆ† says โฎž see wat duck did there?
              done # ๐Ÿฆ† says โฎž i hop duck pick dem right - right?
            fi
            args=() # ๐Ÿฆ† says โฎž duck gettin' ready 2 build argumentz 4 u script 
            for arg in "''${cmd_args[@]}"; do
              dt_debug "ADDING PARAMETER: $arg"
              args+=("$arg")  # ๐Ÿฆ† says โฎž collecting them shell spell ingredients
            done
            # ๐Ÿฆ† says โฎž wait for exact match to finish
            # while kill -0 "$pid1" 2>/dev/null; do
            while [[ ! -f "$match_result_flag" || $(cat "$match_result_flag") != "exact_finished" ]]; do
              sleep 0.05
            done
            # ๐Ÿฆ† says โฎž checkz if exact match succeeded yo  
            if [[ $(cat "$match_result_flag") == "exact" ]]; then 
              dt_debug "Exact match already handled execution. Fuzzy exiting."             
              exit 0
            fi
                   
            # ๐Ÿฆ† says โฎž final product - hope u like say duck!
            paramz="''${args[@]}" && echo
            dt_info "Executing: yo $matched_script $paramz" 
            # ๐Ÿฆ† says โฎž EXECUTEEEEEEEAAA  โ€“ HERE WE QUAAAAACKAAAOAA
            exec "yo-$matched_script" "''${args[@]}"
            return 0
          fi
        }        

        # ๐Ÿฆ† says โฎž if exact match winz, no need for fuzz! but fuzz ready to quack when regex chokes
        exact_match_handler &
        pid1=$!
        fuzzy_match_handler
        exit
      '';
    };    
   

Automated Sentence Testing

As things grew more and more, and my bin directory became thicker, the system became unmaintainable without proper testing, I for sure was not going to do this manually. (Remember the conditional explosion?)
So i decided to go with the automated sentence testing route.
I wrote a comprehensive automated test harness as one of the CLI commands itself.

Testing Approach:

โ–ถ View the Bash script for Automated Testing

    # ๐Ÿฆ† says โฎž automatic doin' sentencin' testin'
    tests = { # ๐Ÿฆ† says โฎž just run yo tests to do an extensive automated test based on your defined sentence data 
      description = "Extensive automated sentence testing for the NLP"; 
      category = "โš™๏ธ Configuration";
      autoStart = false;
      logLevel = "INFO";
      parameters = [{ name = "input"; description = "Text to test as a single  sentence test"; optional = true; }];       
      code = ''    
        set +u  
        ${cmdHelpers}
        intent_data_file="${intentDataFile}" # ๐Ÿฆ† says โฎž cache dat JSON wisdom, duck hates slowridez
        intent_base_path="${intentBasePath}" # ๐Ÿฆ† says โฎž use da prebuilt path yo
        config_json=$(nix eval "$intent_base_path.$script" --json)
        passed_positive=0
        total_positive=0
        passed_negative=0
        total_negative=0
        passed_boundary=0
        failures=()     
        resolve_sentence() {
          local script="$1"
          config_json=$(nix eval "$intent_base_path.$script" --json 2>/dev/null)
          [ -z "$config_json" ] && config_json="{}"          
          local sentence="$2"    
          local parameters # ๐Ÿฆ† says โฎž first replace parameters to avoid conflictz wit regex processin' yo
          parameters=($(grep -oP '{\K[^}]+' <<< "$sentence"))          
          for param in "''${parameters[@]}"; do
            is_wildcard=$(jq -r --arg param "$param" '.data[0].lists[$param].wildcard // "false"' <<< "$config_json" 2>/dev/null)
            local replacement=""
            if [[ "$is_wildcard" == "true" ]]; then
              # ๐Ÿฆ† says โฎž use da context valuez
              if [[ "$param" =~ hour|minute|second ]]; then
                replacement="1"  # ๐Ÿฆ† says โฎž use numbers for time parameters
              elif [[ "$param" =~ room|device ]]; then
                replacement="livingroom" # ๐Ÿฆ† says โฎž use realistic room names
              else
                replacement="test" # ๐Ÿฆ† says โฎž generic test value
              fi
            else
              mapfile -t outs < <(jq -r --arg param "$param" '.data[0].lists[$param].values[].out' <<< "$config_json" 2>/dev/null)
              if [[ ''${#outs[@]} -gt 0 ]]; then
                replacement="''${outs[0]}"
              else
                replacement="unknown"
              fi
            fi
            sentence="''${sentence//\{$param\}/$replacement}"
          done # ๐Ÿฆ† says โฎž process regex patterns after parameter replacement
          # ๐Ÿฆ† says โฎž handle alternatives - (word1|word2) == pick first alternative
          sentence=$(echo "$sentence" | sed -E 's/\(([^|)]+)(\|[^)]+)?\)/\1/g')          
          # ๐Ÿฆ† says โฎž handle optional wordz - [word] == include da word
          sentence=$(echo "$sentence" | sed -E 's/\[([^]]+)\]/ \1 /g')          
          # ๐Ÿฆ† says โฎž handle vertical bars in alternatives - word1|word2 == word1
          sentence=$(echo "$sentence" | sed -E 's/(^|\s)\|(\s|$)/ /g')  # ๐Ÿฆ† says โฎž remove standalone vertical bars
          sentence=$(echo "$sentence" | sed -E 's/([^ ]+)\|([^ ]+)/\1/g')  # ๐Ÿฆ† says โฎž pick first alternative in groups          
          # ๐Ÿฆ† says โฎž clean up spaces
          sentence=$(echo "$sentence" | tr -s ' ' | sed -e 's/^ //' -e 's/ $//')
          echo "$sentence"
        }
        if [[ -n "$input" ]]; then
            echo "[๐Ÿฆ†๐Ÿ“œ] Testing single input: '$input'"
            FUZZY_THRESHOLD=15
            YO_FUZZY_INDEX="${fuzzyIndexFile}"
            priorityList="${toString (lib.concatStringsSep " " processingOrder)}"
            scripts_ordered_by_priority=($priorityList)
            ${lib.concatMapStrings (name: makePatternMatcher name) scriptNamesWithIntents}
            ${lib.concatMapStrings (name: makeFuzzyPatternMatcher name) scriptNamesWithIntents}
            for f in "$MATCHER_DIR"/*.sh; do [[ -f "$f" ]] && source "$f"; done
            find_best_fuzzy_match() {
              local input="$1"
              local best_score=0
              local best_match=""
              local normalized=$(echo "$input" | tr '[:upper:]' '[:lower:]' | tr -d '[:punct:]')
              local candidates
              mapfile -t candidates < <(jq -r '.[] | .[] | "\(.script):\(.sentence)"' "$YO_FUZZY_INDEX")
              dt_debug "Found ''${#candidates[@]} candidates for fuzzy matching"
              for candidate in "''${candidates[@]}"; do
                IFS=':' read -r script sentence <<< "$candidate"
                local norm_sentence=$(echo "$sentence" | tr '[:upper:]' '[:lower:]' | tr -d '[:punct:]')
                local tri_score=$(trigram_similarity "$normalized" "$norm_sentence")
                (( tri_score < 30 )) && continue
                local score=$(levenshtein_similarity "$normalized" "$norm_sentence")  
                if (( score > best_score )); then
                  best_score=$score
                  best_match="$script:$sentence"
                  dt_info "New best match: $best_match ($score%)"
                fi
              done
              if [[ -n "$best_match" ]]; then
                echo "$best_match|$best_score"
              else
                echo ""
              fi
            }
            test_single_input() {
                local input="$1"
                dt_info "Testing input: '$input'"
                for script in "''${scripts_ordered_by_priority[@]}"; do
                    resolved_output=$(resolve_entities "$script" "$input")
                    resolved_text=$(echo "$resolved_output" | cut -d'|' -f1)
                    dt_debug "Trying exact match: $script '$resolved_text'" 
                    if match_$script "$resolved_text"; then
                        dt_info "โœ… EXACT MATCH: $script"
                        dt_info "Parameters:"
                        for arg in "''${cmd_args[@]}"; do
                            dt_info "  - $arg"
                        done
                        return 0
                    fi
                done
                dt_info "No exact match found. Attempting fuzzy match..."
                fuzzy_result=$(find_best_fuzzy_match "$input")
                if [[ -z "$fuzzy_result" ]]; then
                    dt_info "โŒ No fuzzy candidates found"
                    return 1
                fi  
                IFS='|' read -r combined match_score <<< "$fuzzy_result"
                IFS=':' read -r matched_script matched_sentence <<< "$combined"
                dt_info "Best fuzzy candidate: $matched_script (score: $match_score%)"
                dt_info "Matched sentence: '$matched_sentence'"
                resolved_output=$(resolve_entities "$matched_script" "$input")
                resolved_text=$(echo "$resolved_output" | cut -d'|' -f1)
                if match_fuzzy_$matched_script "$resolved_text" "$matched_sentence"; then
                    dt_info "โœ… FUZZY MATCH ACCEPTED: $matched_script"
                    dt_info "Parameters:"
                    for arg in "''${cmd_args[@]}"; do
                        dt_info "  - $arg"
                    done
                    return 0
                else
                    dt_info "โŒ Fuzzy match rejected (parameter resolution failed)"
                    return 1
                fi
            }
            test_single_input "$input"
            exit $?
        fi
    
        # ๐Ÿฆ† says โฎž insert matchers
        ${lib.concatMapStrings (name: makePatternMatcher name) scriptNamesWithIntents}  
        test_positive_cases() {
          for script in ${toString scriptNamesWithIntents}; do
            echo "[๐Ÿฆ†๐Ÿ“œ] Testing script: $script"    
            config_json=$(nix eval "$intent_base_path.$script" --json 2>/dev/null || echo "{}")
            mapfile -t raw_sentences < <(jq -r '.data[].sentences[]' <<< "$config_json" 2>/dev/null)    
            for template in "''${raw_sentences[@]}"; do
              test_sentence=$(resolve_sentence "$script" "$template")
              echo " Testing: $test_sentence"
              resolved_output=$(resolve_entities "$script" "$test_sentence")
              resolved_text=$(echo "$resolved_output" | cut -d'|' -f1)
              subs_decl=$(echo "$resolved_output" | cut -d'|' -f2-)
              declare -gA substitutions || true
              eval "$subs_decl" >/dev/null 2>&1 || true
              if match_$script "$resolved_text"; then
                say_duck "yay โœ… PASS: $resolved_text"
                ((passed_positive++))
              else
                say_duck "fuck โŒ FAIL: $resolved_text"
                failures+=("POSITIVE: $script | $resolved_text")
              fi
              ((total_positive++))
            done
          done
        }
        test_negative_cases() {
          echo "[๐Ÿฆ†๐Ÿšซ] Testing Negative Cases"
          negative_cases=(
            "make me a sandwich"
            "launch the nuclear torpedos!"
            "gรถr mig en macka"
            "avfyra kรคrnvapnen!"
            "ducks sure are the best dont you agree"
          )        
          for neg_case in "''${negative_cases[@]}"; do
            echo " Testing: $neg_case"
            matched=false
            for script in ${toString scriptNamesWithIntents}; do
              resolved_output=$(resolve_entities "$script" "$neg_case")
              resolved_neg=$(echo "$resolved_output" | cut -d'|' -f1)     
              if match_$script "$resolved_neg"; then
                say_duck "fuck โŒ FALSE POSITIVE: $resolved_neg (matched by $script)"
                failures+=("NEGATIVE: $script | $resolved_neg")
                matched=true
                break
              fi
            done       
            if ! $matched; then
              say_duck "yay โœ… [NEG] PASS: $resolved_neg"
              ((passed_negative++))
            fi
            ((total_negative++))
          done
        }
        test_boundary_cases() {
          echo "[๐Ÿฆ†๐Ÿ”ฒ] Testing Boundary Cases"
          boundary_cases=("" "   " "." "!@#$%^&*()")  
          for bcase in "''${boundary_cases[@]}"; do
            printf " Testing: '%s'\n" "$bcase"
            matched=false   
            for script in ${toString scriptNamesWithIntents}; do
              if match_$script "$bcase"; then
                say_duck "fuck โŒ BOUNDARY FAIL: '$bcase' (matched by $script)"
                failures+=("BOUNDARY: $script | '$bcase'")
                matched=true
                break
              fi
            done       
            if ! $matched; then
              say_duck "yay โœ… [BND] PASS: '$bcase'"
              ((passed_boundary++))
            fi
          done
          total_boundary=''${#boundary_cases[@]}
        }  
        test_positive_cases
        test_negative_cases
        test_boundary_cases
        
        # ๐Ÿฆ† says โฎž calculate
        total_tests=$((total_positive + total_negative + total_boundary))
        passed_tests=$((passed_positive + passed_negative + passed_boundary))
        percent=$(( 100 * passed_tests / total_tests ))
        
        # ๐Ÿฆ† says โฎž colorize based on percentage
        if [ "$percent" -ge 80 ]; then 
            color="$GREEN" && duck_report="โญ"
        elif [ "$percent" -ge 60 ]; then 
            color="$YELLOW" && duck_report="๐ŸŸข"
        else 
            color="$RED" && duck_report="๐Ÿ˜ญ"
        fi
        
        # ๐Ÿฆ† says โฎž display failed tests report
        if [ "$passed_tests" -ne "$total_tests" ]; then 
            if [ ''${#failures[@]} -gt 0 ]; then
                echo "" && echo -e "''${RED}## โ”€โ”€โ”€โ”€โ”€โ”€ FAILURES โ”€โ”€โ”€โ”€โ”€โ”€##''${RESET}"
                for failure in "''${failures[@]}"; do
                    echo -e "''${RED}## โŒ $failure"
                done
                echo -e "''${RED}## โ”€โ”€โ”€โ”€โ”€โ”€ FAILURES โ”€โ”€โ”€โ”€โ”€โ”€##''${RESET}"
            fi
        fi
        
        # ๐Ÿฆ† says โฎž display final report
        echo "" && echo -e "''${color}"## โ”€โ”€โ”€โ”€โ”€โ”€โ‹†โ‹…โ˜†โ‹…โ‹†โ”€โ”€โ”€โ”€โ”€โ”€ ##''${RESET}"
        bold "Testing completed!" 
        say_duck "Positive: $passed_positive/$total_positive"
        say_duck "Negative: $passed_negative/$total_negative"
        say_duck "Boundary: $passed_boundary/$total_boundary"
        say_duck "TOTAL: $passed_tests/$total_tests (''${color}''${percent}%''${GRAY})"
        echo "''${RESET}" && echo -e "''${color}## โ”€โ”€โ”€โ”€โ”€โ”€โ‹†โ‹…โ˜†โ‹…โ‹†โ”€โ”€โ”€โ”€โ”€โ”€ ##''${RESET}"
        say_duck "$duck_report"
        exit 1
      ''; # ๐Ÿฆ† says โฎž thnx for quackin' along til da end!
    };# ๐Ÿฆ† says โฎž the duck be stateless, the regex be law, and da shell... is my pond.    
  };}# ๐Ÿฆ† say โฎž nobody beat diz nlp nao says sir quack a lot NOBODY I SAY!
# ๐Ÿฆ† says โฎž QuackHack-McBLindy out!  

Where I Landed

Dynamic Pattern Matching

Unlimited automatic parameter resolution & entity substitutions through dynamically generated regex patterns matching against declarative sentence definition.

Shell command construction & dispatcher.

Hybrid Matching System

Exact pattern matching with async fuzzy matching fallback

Comprehensive Testing

Automated test harness covering all edge cases with detailed reporting for every defined sentences.

The result is a voice-command ready system that feels like magic when it works and teaches you humility when it doesn't. It's unorthodox, occasionally infuriating, but ultimately empowering - letting me command my computer in a way that feels truly natural and powerful.
Even with a lot of bad Whisper transcriptions we are looking at a low 2% failure rate for called intents.
I can be really drunk and speak Japanese and it would still properly match and extract all parameters.

# ๐Ÿฆ† says โฎž "What's the stupidest way YOU could solve your next problem?"

Please go stupid!
Look how well this turned out.
After over five years of exmperementing with building that voice command empire,
I can safely say that this module is the by far most accurate intent handler I have tried so far.
I wish it would handle speeds a little better when scaling, but that might be a project for another day.
Well, I hope you enjoyed the funky ride and hopefully learned something.
Maybe I'll throw up some more texts if this was liked, because I do have a ton of strange modules like this laying around.

Peace and code QuackHack-McBLindy out yo!

View source code on GitHub

Comments on this blog post