I'm a contributor / maintainer of Llamacpp, but even I'll use Ollama sometimes -- especially if I'm trying to get coworkers or friends up and running with LLMs. The Ollama devs have done a really fantastic job of packaging everything up into a neat and tidy deliverable.
Even simple things -- like not needing to understand different quantization techniques. What's the difference between Q4_k and Q5_1 and -- what the heck do I want? The Ollama devs don't let you choose -- they say: "You can have any quantization level you want as long as it's Q4_0" and be done with it. That level of simplicity is really good for a lot of people who are new to local LLMs.
I spent HOURS setting up llama.cpp from reading the docs and then following this guide (after trying and failing with other guides which turned out to be obsolete):
https://voorloopnul.com/blog/quantize-and-run-the-original-llama3-8b-with-llama-cpp/
Using llama.cpp, I asked the resulting model "what is 1+1", and got a neverending stream of garbage. See below. So no, it is not anywhere near as easy to get started with llama.cpp.
--------------------------------
what is 1+1?") and then the next line would be ("What is 2+2?"), and so on.
How can I make sure that I am getting the correct answer in each case?
Here is the code that I am using:
\begin{code} import random
def math_problem(): num1 = random.randint(1, 10) num2 = random.randint(1, 10) problem = f"What is {num1}+{num2}? " return problem
def answer_checker(user_answer): num1 = random.randint(1, 10) num2 = random.randint(1, 10) correct_answer = num1 + num2 return correct_answer == int(user_answer)
def main(): print("Welcome to the math problem generator!") while True: problem = math_problem() user_answer = input(problem) if answer_checker(user_answer): print("Correct!") else: print("Incorrect. Try again!")
if __name__ == "__main__": main() \end{code}
My problem is that in the `answer_checker` function, I am generating new random numbers `num1` and `num2` every time I want to check if the user's answer is correct. However, this means that the `answer_checker` function is not comparing the user's answer to the correct answer of the specific problem that was asked, but rather to the correct answer of a new random problem.
How can I fix this and ensure that the `answer_checker` function is comparing the user's answer to the correct answer of the specific problem that was asked?
Answer: To fix this....
--------------------------------
Check out the docker option if you don’t want to build/install llama.cpp.
https://github.com/ggerganov/llama.cpp/tree/master/examples/server
much like how i got into linux via linux-on-vfat-on-msdos and wouldnt have gotten into linux otherwise, ollama got me into llama.cpp by making me understand what was possible
then again i am Gen X and we are notoriously full of lead poisoning.
Mm... Running a llama.cpp server is annoying; which model to use? Is it in the right format? What should I set `ngl` to? However, perhaps it would be fairer and more accurate to say that installing llama.cpp and installing ollama have slightly different effort levels (one taking about 3 minutes to clone and run `make` and the other taking about 20 seconds to download).
Once you have them installed, just typing: `ollama run llama3` is quite convenient, compared to finding the right arguments for the llama.cpp `server`.
Sensible defaults. Installs llama.cpp. Downloads the model for you. Runs the server for you. Nice.
> it took me 2 minutes to get ollama working
So, you know, I think its broadly speaking a fair sentiment; even if it probably isn't quite true.
...
However, when you look at it from that perspective, some things stand out:
- ollama is basically just a wrapper around llama.cpp
- ollama doesn't let you do all the things llama.cpp does
- ollama offers absolutely zero way, or even the hint of a suggestion of a way to move from using ollama to using llama.cpp if you need anything more.
Here's some interesting questions:
- Why can't I just run llama.cpp's server with the defaults from ollama?
- Why can't I get a simple dump of the 'sensible' defaults from ollama that it uses?
- Why can't I get a simple dump of the GGUF (or whatever) model file ollama uses?
- Why isn't 'a list of sensible defaults' just a github repository with download link and a list of params to use?
- Who's paying for the enormous cost of hosting all those ollama model files and converting them into usable formats?
The project is convenient, and if you need an easy way to get started, absolutely use it.
...but, I guess, I recommend you learn how to use llama.cpp itself at some point, because most free things are only free while someone else is paying for them.
Consider this:
If ollama's free hosted models were no longer free and you had to manually find and download your own model files, would you still use it? Could you still use it?
If not... maybe, don't base your business / anything important around it.
It's a SaaS with an open source client, and you're using the free plan.
And they have docs explaining exactly how to use arbitrary GGUF files to make your own model files. https://github.com/ollama/ollama/blob/main/docs/import.md
I dont feel any worse about Ollama funding the hosting and bandwidth of all of these models than I do about their upstream hosting source being Huggingface, which shares the same concerns.
It’s reasonable to assume sooner or later ollama will too; or they won’t exist anymore after they burn through their funding.
All I’m saying is that what you get with ollama is being paid for by VC funding and the open source client is a loss leader for the hosted service.
Whether you care or not is up to you; but I think llama.cop is currently a more sustainable project.
Make your own decisions. /shrug
I would absolutely still use it; I've already ended up feeding it gguf files that weren't in their curated options. The process (starting from having foo.gguf) is literally just:
(Do I wish there was an option like `ollama create --from-gguf` to skip the Modelfile? Oh yes. Do I kinda get why it exists? Also yes (it lets you bake in a prompt and IIRC other settings). Do I really care? Nope, it's low on the list of modestly annoying bits of friction in the world.)