Using LLaMA.cpp
You can find the full llama.cpp documentation here.
Step 1 - Clone the repo
Step 2 - Download the model
For example, we will use OpenChat 3.5 model, which is what is used on the demo instance. There are many models to choose from.
Navigate to TheBloke/openchat_3.5-GGUF and download one of the models, such as openchat_3.5.Q5_K_M.gguf
. Place this file inside the ./models
directory.
Step 3 - Build the server
Step 4 - Run the server
Read the llama.cpp documentation for more information on the server options. Or run ./server --help
.
Step 5 - Enable the server in the client
Last updated