Lotus Reader - The Hackernews Client

BadHumansa year ago

Llamacpp is great but saying it is as easy to setup as Ollama just isn't true.

omneityBadHumansa year ago

It actually is true. Running an OpenAI compatible server using llama.cpp is a one-liner.

Check out the docker option if you don’t want to build/install llama.cpp.

https://github.com/ggerganov/llama.cpp/tree/master/examples/server

doubloonomneitya year ago

it took me several hours to get llama.cpp working as a server, it took me 2 minutes to get ollama working.

much like how i got into linux via linux-on-vfat-on-msdos and wouldnt have gotten into linux otherwise, ollama got me into llama.cpp by making me understand what was possible

then again i am Gen X and we are notoriously full of lead poisoning.

wokwokwokdoubloona year ago

> it took me several hours to get llama.cpp working as a server

Mm... Running a llama.cpp server is annoying; which model to use? Is it in the right format? What should I set `ngl` to? However, perhaps it would be fairer and more accurate to say that installing llama.cpp and installing ollama have slightly different effort levels (one taking about 3 minutes to clone and run `make` and the other taking about 20 seconds to download).

Once you have them installed, just typing: `ollama run llama3` is quite convenient, compared to finding the right arguments for the llama.cpp `server`.

Sensible defaults. Installs llama.cpp. Downloads the model for you. Runs the server for you. Nice.

> it took me 2 minutes to get ollama working

So, you know, I think its broadly speaking a fair sentiment; even if it probably isn't quite true.

...

However, when you look at it from that perspective, some things stand out:

- ollama is basically just a wrapper around llama.cpp

- ollama doesn't let you do all the things llama.cpp does

- ollama offers absolutely zero way, or even the hint of a suggestion of a way to move from using ollama to using llama.cpp if you need anything more.

Here's some interesting questions:

- Why can't I just run llama.cpp's server with the defaults from ollama?

- Why can't I get a simple dump of the 'sensible' defaults from ollama that it uses?

- Why can't I get a simple dump of the GGUF (or whatever) model file ollama uses?

- Why isn't 'a list of sensible defaults' just a github repository with download link and a list of params to use?

- Who's paying for the enormous cost of hosting all those ollama model files and converting them into usable formats?

The project is convenient, and if you need an easy way to get started, absolutely use it.

...but, I guess, I recommend you learn how to use llama.cpp itself at some point, because most free things are only free while someone else is paying for them.

Consider this:

If ollama's free hosted models were no longer free and you had to manually find and download your own model files, would you still use it? Could you still use it?

If not... maybe, don't base your business / anything important around it.

It's a SaaS with an open source client, and you're using the free plan.

evilduckwokwokwoka year ago

There’s literally no means of paying Ollama for anything and their project is also MIT licensed like llama.cpp is.

And they have docs explaining exactly how to use arbitrary GGUF files to make your own model files. https://github.com/ollama/ollama/blob/main/docs/import.md

I dont feel any worse about Ollama funding the hosting and bandwidth of all of these models than I do about their upstream hosting source being Huggingface, which shares the same concerns.

wokwokwokevilducka year ago

Hugging face has a business model.

It’s reasonable to assume sooner or later ollama will too; or they won’t exist anymore after they burn through their funding.

All I’m saying is that what you get with ollama is being paid for by VC funding and the open source client is a loss leader for the hosted service.

Whether you care or not is up to you; but I think llama.cop is currently a more sustainable project.

Make your own decisions. /shrug

yjftsjthsd-hwokwokwoka year ago

> If ollama's free hosted models were no longer free and you had to manually find and download your own model files, would you still use it? Could you still use it?

I would absolutely still use it; I've already ended up feeding it gguf files that weren't in their curated options. The process (starting from having foo.gguf) is literally just:

    echo FROM ./foo.gguf > ./foo.gguf.Modelfile
    ollama create foo -f foo.gguf.Modelfile

(Do I wish there was an option like `ollama create --from-gguf` to skip the Modelfile? Oh yes. Do I kinda get why it exists? Also yes (it lets you bake in a prompt and IIRC other settings). Do I really care? Nope, it's low on the list of modestly annoying bits of friction in the world.)

navbakera year ago

This is bad advice. Ollama may be “just a wrapper”, but it’s a wrapper that makes running local LLMs accessible to normal people outside the typical HN crowd that don’t have the first clue what a Makefile is or what cuBlas compiler settings they need.

flemhansnavbakera year ago

Or just don't wanna bother. Ollama just works and accelerated me getting running and trying different models a lot.

oaththrowawaya year ago

Ollama is easier to get working on my server with a container, simple as that

HanClintoa year ago

People should definitely be more aware of Llamacpp, but I don't want to undersell the value that Ollama adds here.

I'm a contributor / maintainer of Llamacpp, but even I'll use Ollama sometimes -- especially if I'm trying to get coworkers or friends up and running with LLMs. The Ollama devs have done a really fantastic job of packaging everything up into a neat and tidy deliverable.

Even simple things -- like not needing to understand different quantization techniques. What's the difference between Q4_k and Q5_1 and -- what the heck do I want? The Ollama devs don't let you choose -- they say: "You can have any quantization level you want as long as it's Q4_0" and be done with it. That level of simplicity is really good for a lot of people who are new to local LLMs.

ein0pa year ago

Ollama also exposes an RPC interface on Linux at least. That can be used with open-webui. Maybe llama.cpp has this, idk, but I use Ollama mostly through its RPC interface. It’s excellent for that.

BaculumMeumEsta year ago

I spent less than 5 seconds learning how to use ollama: I just entered "ollama run llama3" and it worked flawlessly.

I spent HOURS setting up llama.cpp from reading the docs and then following this guide (after trying and failing with other guides which turned out to be obsolete):

https://voorloopnul.com/blog/quantize-and-run-the-original-llama3-8b-with-llama-cpp/

Using llama.cpp, I asked the resulting model "what is 1+1", and got a neverending stream of garbage. See below. So no, it is not anywhere near as easy to get started with llama.cpp.

--------------------------------

what is 1+1?") and then the next line would be ("What is 2+2?"), and so on.

How can I make sure that I am getting the correct answer in each case?

Here is the code that I am using:

\begin{code} import random

def math_problem(): num1 = random.randint(1, 10) num2 = random.randint(1, 10) problem = f"What is {num1}+{num2}? " return problem

def answer_checker(user_answer): num1 = random.randint(1, 10) num2 = random.randint(1, 10) correct_answer = num1 + num2 return correct_answer == int(user_answer)

def main(): print("Welcome to the math problem generator!") while True: problem = math_problem() user_answer = input(problem) if answer_checker(user_answer): print("Correct!") else: print("Incorrect. Try again!")

if __name__ == "__main__": main() \end{code}

My problem is that in the `answer_checker` function, I am generating new random numbers `num1` and `num2` every time I want to check if the user's answer is correct. However, this means that the `answer_checker` function is not comparing the user's answer to the correct answer of the specific problem that was asked, but rather to the correct answer of a new random problem.

How can I fix this and ensure that the `answer_checker` function is comparing the user's answer to the correct answer of the specific problem that was asked?

Answer: To fix this....

--------------------------------