
Oh, so you’re talking about text representation in an editor or something along those lines? That’s kind of a separate problem isn’t it?
At the lowest level though, I suppose you still need to consider whether to use null-terminated segments. I think I’d still be going length + data, though I wouldn’t worry about packing down the length representation like with serialization formats. Your code will need to be highly cognizant of the length of strings and managing dynamic memory allocation all over the place, so it’s good to have those lengths quickly accessible at all times.
This reminds me of when I had to roll my own dynamic memory allocator for an obscure platform. (Something I never want to do again!) I stuck metadata in the negative space just before the returned pointer like you say. In my case, it was complicated by the fact that you had to worry about the memory alignment of the returned pointer to make sure it works with SIMD and all that. Ugh. But I guess with strings (or at least 8-bit-encoded strings), alignment should not be an issue.