LlamaFile: Creating a Terminal Script to Simplify Model Organization (*.sh Script Included)

So, for the past couple of days I've been downloading small models to test them on my small APU-based machine with Llamafile.

Llamafile is a one-click solution for Local Large Language Models (LLMs.) Officially, Llamafile comes in various distributions containing one model each, but you can download the application and models separately. In which case, the model must be passed to Llamafile via the -m MODELNAME terminal arguments. Llamafile accepts the *.gguf model format.

For my machine, any model bigger than 4 Billion parameters would be very slow, so 8 Billion was out of question, (even with quantization, though I'd like to test a model in Q2 format,) so I downloaded models of 3B sizes and lower. Of course, I never expected they'd be any good compared to models available online.

In this article, I'll talk about how I organize my workflow. I collected multiple models, (mostly abliterated ones as I'm less likely to run them online,) so I put them in a sub-folder to avoid cluttering, so here's my folder structure:

AI_STUFF_FOLDER
├── bin
│   ├── FluentlyQwen3-1.7B.i1-Q5_K_M.gguf
│   ├── gemma-3-1b-it-abliterated-GRPO.Q6_K.gguf
│   ├── ggml-model-i2_s.gguf
│   ├── huihui-ai.Qwen3-0.6B-abliterated.Q6_K.gguf
│   ├── llamafile-0.9.3
│   └── Phi-3.5-mini-instruct_Uncensored-Q5_K_L.gguf
└── llamafile.sh

I also didn't want to copy/paste the file names every time, so I used Venice Coding Model to create a script that runs Llamafile for me. I created both the .sh version for Linux, and .bat version for Windows.

The main way to run Llamafile is to go into Terminal/Powershell and type ./llamafile-0.9.3 -m MODELNAME (Replace 0.9.3 with your version number.) Since I want to be able to use any version of Llamafile and any model, so the terminal script finds the most recent version of llamafile to run it, and lets me choose between the models to run. Here's the script in its current state:

#!/bin/bash

##############################################
# CONFIGURABLE DEFAULTS - CHANGE THESE VALUES #
##############################################
LLAMAFILE_VERSION="llamafile-0.9.3"
DEFAULT_HOST="127.0.0.1"
DEFAULT_PORT="8080"
DEFAULT_MODEL_DIR="bin"
##############################################

# Check if -m or --model flag exists in arguments
has_model_flag=0
has_server_flag=0

for arg in "$@"; do
    if [ "$arg" = "-m" ] || [[ "$arg" == --model* ]]; then
        has_model_flag=1
    elif [ "$arg" = "--server" ] || [ "$arg" = "-s" ]; then
        has_server_flag=1
    fi
done

if [ $has_model_flag -eq 1 ]; then
    # Direct execution with all arguments
    exec "./bin/$LLAMAFILE_VERSION" "$@"
else
    # Find all GGUF models in model directory
    models=()
    while IFS= read -r -d $'\0' model; do
        models+=("$model")
    done < <(find "$DEFAULT_MODEL_DIR" -maxdepth 1 -type f -name '*.gguf' -print0 2>/dev/null)

    # Handle no models found
    if [ ${#models[@]} -eq 0 ]; then
        echo "ERROR: No .gguf models found in $DEFAULT_MODEL_DIR folder" >&2
        exit 1
    fi

    # Auto-select if only one model exists
    if [ ${#models[@]} -eq 1 ]; then
        selected_model="${models[0]}"
    else
        # Prepare model selection menu
        basenames=()
        for model in "${models[@]}"; do
            basenames+=("$(basename "$model")")
        done

        while true; do
            clear
            echo "=== AVAILABLE MODELS ==="
            for i in "${!basenames[@]}"; do
                printf "%3d) %s\n" $((i+1)) "${basenames[i]}"
            done

            echo -ne "\nSelect model by number or partial name: "
            read choice

            # Process selection
            if [[ "$choice" =~ ^[0-9]+$ ]]; then
                idx=$((choice-1))
                if (( idx >= 0 && idx < ${#models[@]} )); then
                    selected_model="${models[idx]}"
                    break
                else
                    echo -e "\nInvalid number. Please enter a number between 1 and ${#models[@]}."
                    echo "Press any key to continue..."
                    read -rsn1
                fi
            else
                # Fuzzy search by name (case-insensitive)
                matches=()
                choice_lower=$(echo "$choice" | tr '[:upper:]' '[:lower:]')
                for i in "${!basenames[@]}"; do
                    base_lower=$(echo "${basenames[i]}" | tr '[:upper:]' '[:lower:]')
                    if [[ "$base_lower" == *"$choice_lower"* ]]; then
                        matches+=("$i")
                    fi
                done

                if [ ${#matches[@]} -eq 1 ]; then
                    selected_model="${models[${matches[0]}]}"
                    break
                elif [ ${#matches[@]} -eq 0 ]; then
                    echo -e "\nNo matches found for '$choice'."
                else
                    echo -e "\nMultiple matches found (${#matches[@]}):"
                    for i in "${matches[@]}"; do
                        printf "  %3d) %s\n" $((i+1)) "${basenames[i]}"
                    done
                fi

                echo "Press any key to continue..."
                read -rsn1
            fi
        done
    fi

    # Build final command with server defaults if needed
    if [ $has_server_flag -eq 1 ]; then
        has_host=0
        has_port=0

        # Check if host/port already specified in arguments
        for ((i=1; i<=$#; i++)); do
            arg="${!i}"
            if [[ "$arg" == "--host"* || "$arg" == "-h"* ]]; then
                has_host=1
            elif [[ "$arg" == "--port"* || "$arg" == "-p"* ]]; then
                has_port=1
            fi
        done

        # Add default host if not specified
        if [ $has_host -eq 0 ]; then
            set -- "$@" "--host" "$DEFAULT_HOST"
        fi

        # Add default port if not specified
        if [ $has_port -eq 0 ]; then
            set -- "$@" "--port" "$DEFAULT_PORT"
        fi
    fi

    # Execute with all arguments
    exec "./bin/$LLAMAFILE_VERSION" -m "$selected_model" "$@"
fi

There are more features I want to add. For example, adding arguments that choose the smallest model or largest one, or even a random one. A feature that passed different values for top_k and temperature depending on the model, and a model-specific config file... But all of these ideas may be way too complex, so I'll refrain for now.

What do you think?

In the future, I'd like to test these models and find the best use-cases for them. I'd love to buy a stronger machine that could run models of sizes 25 Billion (or even 70 Billion) but that's currently out of my price range.

I also downloaded a couple MoE (Mixture of Experts) models, but they didn't work for some reason... They may be incompatible with Llamafile, or they may be broken... Both models are Qwen-based and created by the same person, so I'd like to test a MoE model created by a different account in the future.

Posted Using INLEO