To respond to some of the questions or those parts I personally find interesting:
The custom TUI library is so that I can write a plugin model around a C ABI. Existing TUI frameworks that I found and were popular usually didn't map well to plain C. Others were just too large. The arena allocator exists primarily because building trees in Rust is quite annoying otherwise. It doesn't use bumpalo, because I took quite the liking to "scratch arenas" (https://nullprogram.com/blog/2023/09/27/) and it's really not that difficult to write such an allocator.
Regarding the choice of Rust, I actually wrote the prototype in C, C++, Zig, and Rust! Out of these 4 I personally liked Zig the most, followed by C, Rust, and C++ in that order. Since Zig is not internally supported at Microsoft just yet (chain of trust, etc.), I continued writing it in C, but after a while I became quite annoyed by the lack of features that I came to like about Zig. So, I ported it to Rust over a few days, as it is internally supported and really not all that bad either. The reason I didn't like Rust so much is because of the rather weak allocator support and how difficult building trees was. I also found the lack of cursors for linked lists in stable Rust rather irritating if I'm honest. But I would say that I enjoyed it overall.
We decided against nano, kilo, micro, yori, and others for various reasons. What we wanted was a small binary so we can ship it with all variants of Windows without extra justifications for the added binary size. It also needed to have decent Unicode support. It should've also been one built around VT output as opposed to Console APIs to allow for seamless integration with SSH. Lastly, first class support for Windows was obviously also quite important. I think out of the listed editors, micro was probably the one we wanted to use the most, but... it's just too large. I proposed building our own editor and while it took me roughly twice as long as I had planned, it was still only about 4 months (and a bit for prototyping last year).
As GuinansEyebrows put it, it's definitely quite a bit of "NIH" in the project, but I also spent all of my weekends on it and I think all of Christmas, simply because I had fun working on it. So, why not have fun learning something new, writing most things myself? I definitely learned tons working on this, which I can now use in other projects as well.
If you have any questions, let me know!
2. How did you ensure your Zig/C memory was freed properly?
3. What do you not like about Rust?
It's been quite a while now, but:
- Great allocator support
- Comptime is better than macros
- Better interop with C
- In the context of the editor, raw byte slices work way better than validated strings (i.e. `str` in Rust) even for things I know are valid UTF8
- Constructing structs with .{} is neat
- Try/catch is kind of neat (try blocks in Rust will make this roughly equivalent I think, but that's unstable so it doesn't count)
- Despite being less complete, somehow the utility functions in Zig just "clicked" better with me - it somehow just felt nice reading the code
There's probably more. But overall, Zig feels like a good fit for writing low-level code, which is something I personally simply enjoy. Rust sometimes feels like the opposite, particularly due to the lack of allocators in most of its types. And because of the many barriers in place to write performant code safely. Example: The `Read` trait doesn't work on `MaybeUninit<u8>` yet and some people online suggest to just zero-init the read buffer because the cost is lower than the syscall. Well, they aren't entirely wrong, yet this isn't an attitude I often encounter in the Zig area.
> How did you ensure your Zig/C memory was freed properly?
Most allocations happened either in the text buffer (= one huge linear allocator) or in arenas (also linear allocators) so freeing was a matter of resetting the allocator in a few strategical places (i.e. once per render frame). This is actually very similar to the current Rust code which performs no heap allocations in a steady state either. Even though my Zig/C code had bugs, I don't remember having memory issues in particular.
> What do you not like about Rust?
I don't yet understand the value of forbidding multiple mutable aliases, particularly at a compiler level. My understanding was that the difference is only a few percent in benchmarks. Is that correct? There are huge risks you run into when writing unsafe Rust: If you accidentally create aliasing mutable pointers, you can break your code quite badly. I thought the language's goal is to be safe. Is the assumption that no one should need to write unsafe code outside of the stdlib and a few others? I understand if that's the case, but then the language isn't a perfect fit for me, because I like writing performant code and that often requires writing unsafe code, yet I don't want to write actual literal unsafe code. If what I said is correct, I think I'd personally rather have an unsafe attribute to mark certain references as `noalias` explicitly.
Another thing is the difficulty of using uninitialized data in Rust. I do understand that this involves an attribute in clang which can then perform quite drastic optimizations based on it, but this makes my life as a programmer kind of difficult at times. When it comes to `MaybeUninit`, or the previous `mem::uninit()`, I feel like the complexity of compiler engineering is leaking into the programming language itself and I'd like to be shielded from that if possible. At the end of the day, what I'd love to do is declare an array in Rust, assign it no value, `read()` into it, and magically reading from said array is safe. That's roughly how it works in C, and I know that it's also UB there if you do it wrong, but one thing is different: It doesn't really ever occupy my mind as a problem. In Rust it does.
Also, as I mentioned, `split_off` and `remove` from `LinkedList` use numeric indices and are O(n), right? `linked_list_cursors` is still marked as unstable. That's kind of irritating if I'm honest, even if it's kind of silly to complain about this in particular.
In all fairness, what bothers me the most when it comes to Zig is that the language itself often feels like it's being obtuse for no reason. Loops for instance read vastly different to most other modern languages and it's unclear to me why that's useful. Files-as-structs is also quite confusing. I'm not a big fan of this "quirkiness" and I'd rather use a language that's more similar to the average.
At the end of the day, both Zig and Rust do a fine job in their own right.
The most basic reason why you can't have unrestricted mutable aliasing is because then the following code, which contains a use-after-free bug, would be legal:
let mut val = Some("Hello".to_owned());
let outer_mut = &mut val;
let inner_mut = val.as_mut().unwrap();
*outer_mut = None;
println!("{}", inner_mut);
If, as is sometimes the case, you need some kind of mutable aliasing in your program, the intended solution is to use an interior-mutability API (which under the hood causes LLVM's noalias attribute to be omitted). Which one to use depends on the precise details of your use case; some (e.g., RefCell) carry performance costs, while others (e.g., Cell) are zero-cost but work only for certain types or access patterns. Having to figure this out is annoying, but such is the price of memory safety without runtime garbage collection. In the worst-case scenario you can use UnsafeCell, which as the name suggests is unsafe, but works with any type with no performance cost. UnsafeCell is also a little bit heavy on boilerplate/syntactic salt, which people used to C sometimes find annoying; there isn't that much drive to fix this because, as per above, it's supposed to be rarely used.The "few percent in benchmarks" thing sounds like it's referring to the rule that it's UB to use unsafe code to make aliased &mut references even if you don't actually use those references in a problematic way. Lifting that rule would preclude certain compiler optimizations, and as per above would not fix the real problem; you still couldn't have unrestricted mutable aliasing. It would only alleviate the verbosity cost, and that could be done in a different way without the performance cost (like by adding special concise syntax for UnsafeCell) if it were deemed important enough.
The uninitialized-memory situation is pretty widely agreed to be unsatisfactory. Unfortunately it is hard to fix. Ideally the compiler would do flow-control analysis so that you can read from memory that was uninitialized only if it has definitely been written to since then. Unfortunately this would be a big complicated difficult-to-implement type system feature, and the need to make it unwind-safe (analogous to the concept of exception safety in C++) adds much, much more complication and difficulty on top of that. You could imagine an intermediate solution, wherein reading from uninitialized memory gets you a valid-but-unspecified value of the applicable type instead of UB, but that also has some difficulties, such as unsoundness in conjunction with MADV_FREE; see https://internals.rust-lang.org/t/freeze-maybeuninit-t-maybeuninit-t-for-masked-reads/17188 if you're curious for more details.
Again, the point here is not "the current design is optimal", it's "improving on the current design is a difficult engineering problem that no one has solved yet".
I think people who need cursors over linked lists use a third-party library from crates.io for this, but it's quite reasonable to think that the standard library should have this. Most of the time when a smallish feature like that remains unstable it's because nobody has cared enough about it to shepherd it through the stabilization process (perhaps because it's not a hard blocker if you can use the third-party library instead). Possibly that process is too slow and heavyweight, but of course enacting a big process change in a massively multiplayer engineering project that's governed by consensus is an even harder problem.
i will say the struct files (files which are not a namespace struct.. that is, they have field values) are a bit weird, but at least the syntax is consistent with an implicit surrounding bracket.
It definitely helped me with my development speed, because I had a much larger breadth of APIs available to me all at once. Now that the project is released, I'll probably stay with the nightly version for another few months until after `let_chains` is out in stable, because I genuinely love that quality-of-life feature so much and just don't want to live without it anymore. Afterward, I'll make sure it builds in stable Rust. There's not really any genuine reason it needs nightly, except for... time.
Apropos custom helpers, I think it may be worth optimizing `Vec::splice`. I wrote myself a custom splice function to reduce the binary size: https://github.com/microsoft/edit/blob/e8d40f6e7a95a6e19765ff19621cf0d39708f7b0/src/helpers.rs#L199-L253
The differences can be quite significant: https://godbolt.org/z/GeoEnf5M7
> I know it's silly,
Nah, what's silly is LinkedList.
> I think it may be worth optimizing `Vec::splice`.
If this is upstream-able, you should try! Generally upstream is interested in optimizations. sort and float parsing are two things I remember changing significantly over the years. I didn't check to see what the differences are and how easy that actually would be...
Generally speaking, the requirement on my end is that whatever we use is as minimal as it gets: Minimal binary size overhead and minimal performance overhead. It also needs to be cross-platform of course. This for instance precludes the widely used WinRT ABI that's being used nowadays on Windows.
Webassembly is the binary spec for the web. But now everyone is using that because it's portable and lightweight.
The idea is you can create plugin using any language that compiles into webassembly. C, Rust, Pascal, Go, C++. Compile once and it should work in Windows, Linux and Mac. No need to compile to multiple architecture.
Performance should be great near native, but I guess there's going to be a problem with the added webassembly runtime size. Here is a runtime with estimated sizes: https://github.com/bytecodealliance/wasm-micro-runtime
And it's sandboxed too, so should be secure.
But plugins are still a long way off. Until then, I updated the issue to include your suggestion (thank you for it!): https://github.com/microsoft/edit/issues/17
I've maintained a project in zig since either 0.4 or 0.5 and i dont think this is the case at all. supporting 0.12 -> 0.13 was no lines of code, iirc, and 0.13->0.14 was just making sure my zig parser could handle the new case feature (that lets you write a duff's device).
zig may seem quirky but it's highly internally consistent, and not far off from C. every difference with c was made for good reasons (e.g. `var x:u8` vs `char x` gives you context free parsing)
i would say my gripes are:
1. losing async
2. not making functions const declarations like javascript (but i get why they want the sugar)
https://bootstrappable.org/ https://lwn.net/Articles/983340/ https://github.com/fosslinux/live-bootstrap/blob/master/parts.rst https://stagex.tools/
Because presumably you should have been doing it mostly for the benefit of Windows users, and wasting time because it’s a fun personal learning exercise means those users would suffer getting an underpowered app
Maybe just some residual instinct left over from times past when more people like Ballmer were still prominent, and they were not as user-enabling as today in some ways?
But maybe if you didn't misrepresent the situation so much you wouldn't lose your heart. This is not some tiny personal open source project where fun can be the only valid reason, but "will ship as part of Windows 11!", so millions of devices in a professional OS. Are your expectations so poorly calibrated that you have none in both cases? Why are they higher re. a forum comment?
Why did you make up a point about a person deserving respect and pass it as my thought? Could you not come up with a more coherent difference between those two situations yourself?
Why are you asking a question about the motivation if you don't even understand "whatever" it is I disliked?
Why did you make up the implication that rejects having fun?
Why are you making it personal in the first place?
How can criticism be neutral when it's... critical?
What kind of well argued thing do you expect in a... single sentence to even ask such a question?
Again, why is there such a huge mismatch in your expectations re. a comment and a professional app?
> > So, why not have fun learning something new, writing most things myself? I definitely learned tons working on this, which I can now use in other projects as well. > > Because presumably you should have been doing it mostly for the benefit of Windows users, and wasting time because it’s a fun personal learning exercise means those users would suffer getting an underpowered app
Would you care to elaborate on what intention you understood the author to have, which aspect of author's work you deemed as a waste of time, and why do you think the resulting app is underpowered for Windows 11 users?
I would also be interested in what mismatch you saw in my expectations as regards your original comment and a professional app.
I realise that we got off on a bad foot, but, if you care to, we can try and restart the conversation.
Of course the app doesn't check all boxes, plenty of features other editors have had years to add simply couldn't be added even if you work on Christmas (by the way, is also entirely driven by the NIH decision to write from scratch and not than that doing a 3 language mock)
And I'm not shaming the fun, I'm saying that that's not a good justification for shipping a worse app to the millions in a professional setting
What you expressed is a sentiment I've seen in quite a few places now. I think people would be shocked to learn how much time I spent on just the editing model (= cursor movement and similar behavior that's unique to this editor = a small part of the app) VS everything else. It's really not all that difficult to write a few FFI abstractions, or a UI framework, compared to that. "Pressure to ship" is definitely true, but it's not like Microsoft is holding a gun to my chest, telling me to complete the editor in 2 months flat. I also consider it important to not neglect one's own progress at becoming more experienced. How would one do that if not by challenging oneself with learning new things, right? Basically, I think I struck a balance that was... "alright".
Is there something about Zig in particular that makes this the case, or is it just an internal politics thing?
I already expressed my appreciation on the repo, but was promptly shushed by your colleague for Incitement Of A Language War, hehe.
I'm impressed by the architecture and implementation choices, especially the gap buffer and cursor movement. It seems we've independently arrived at the same kinds of conclusions on how to min-max a text editor: minimal concepts with maximal functionality.
Others have asked about Zig. I would love to hear more about the work you did in C. Did you start in C? What are some reasons why you didn't continue with C? If you had continued in C, with hindsight, what would have been most annoying? What was clearly better in C? Again, with hindsight, what would have been the best parts of following through with C? I see that you are C-cultured as well (Chris Wellons' blog) and some of the upsides of Zig that you mention I would have guessed you could have elegantly solved in C using Chris' insights. I'm very curious how with such expert advice available you still sought elsewhere and preferred it. Looking forward to hear about the C-side of the story.
Good luck with the project, and see you on the repo :)
In the Zig version did you use my zigwin32 project or did you go with something else? Also, how did you like the Zig build system vs rusts?
Overall, I liked the build system. What I found annoying is that I had to manually search for the Windows SDK path in build.zig just so I can addIncludePath it. I needed that so I can add ICU as a dependency.
The only thing that bothered me apart from that was that producing LTO'd, stripped release builds while retaining debug symbols in a separate file was seemingly impossible. This was extra bad for Windows, where conventionally debug information is always kept in a separate file (a PDB). That just didn't work and it'd be great if that was fixed since back then (or in the near term).
Can you confirm this? Is it some thing you intend to add? Curious to know why, if the answer is no