• 0 Posts
  • 28 Comments
Joined 2 years ago
cake
Cake day: June 21st, 2023

help-circle


  • To compare forced labor camps where the alternative is being murdered to people making the active choice to volunteer to serve as moderators is a comparison so lacking in perspective that I’d expect to only find it on Reddit, but I guess Lemmy has managed to foster the same kind of behavior.

    Are you going to compare Reddit killing the API to the Holocaust next?



  • The key element here is that an LLM does not actually have access to its training data, and at least as of now, I’m skeptical that it’s technologically feasible to search through the entire training corpus, which is an absolutely enormous amount of data, for every query, in order to determine potential copyright violations, especially when you don’t know exactly which portions of the response you need to use in your search. Even then, that only catches verbatim (or near verbatim) violations, and plenty of copyright questions are a lot fuzzier.

    For instance, say you tell GPT to generate a fan fiction story involving a romance between Draco Malfoy and Harry Potter. This would unquestionably violate JK Rowling’s copyright on the characters if you published the output for commercial gain, but you might be okay if you just plop it on a fan fic site for free. You’re unquestionably okay if you never publish it at all and just keep it to yourself (well, a lawyer might still argue that this harms JK Rowling by damaging her profit if she were to publish a Malfoy-Harry romance, since people can just generate their own instead of buying hers, but that’s a messier question). But, it’s also possible that, in the process of generating this story, GPT might unwittingly directly copy chunks of renowned fan fiction masterpiece My Immortal. Should GPT allow this, or would the copyright-management AI strike it? Legally, it’s something of a murky question.

    For yet another angle, there is of course a whole host of public domain text out there. GPT probably knows the text of the Lord’s Prayer, for instance, and so even though that output would perfectly match some training material, it’s legally perfectly okay. So, a copyright police AI would need to know the copyright status of all its training material, which is not something you can super easily determine by just ingesting the broad internet.


  • AI haters are not applying the same standards to humans that they do to generative AI

    I don’t think it should go unquestioned that the same standards should apply. No human is able to look at billions of creative works and then create a million new works in an hour. There’s a meaningfully different level of scale here, and so it’s not necessarily obvious that the same standards should apply.

    If it’s spitting out sentences that are direct quotes from an article someone wrote before and doesn’t disclose the source then yeah that is an issue.

    A fundamental issue is that LLMs simply cannot do this. They can query a webpage, find a relevant chunk, and spit that back at you with a citation, but it is simply impossible for them to actually generate a response to a query, realize that they’ve generated a meaningful amount of copyrighted material, and disclose its source, because it literally does not know its source. This is not a fixable issue unless the fundamental approach to these models changes.


  • There is literally no resemblance between the training works and the model.

    This is way too strong a statement when some LLMs can spit out copyrighted works verbatim.

    https://www.404media.co/google-researchers-attack-convinces-chatgpt-to-reveal-its-training-data/

    A team of researchers primarily from Google’s DeepMind systematically convinced ChatGPT to reveal snippets of the data it was trained on using a new type of attack prompt which asked a production model of the chatbot to repeat specific words forever.

    Often, that “random content” is long passages of text scraped directly from the internet. I was able to find verbatim passages the researchers published from ChatGPT on the open internet: Notably, even the number of times it repeats the word “book” shows up in a Google Books search for a children’s book of math problems. Some of the specific content published by these researchers is scraped directly from CNN, Goodreads, WordPress blogs, on fandom wikis, and which contain verbatim passages from Terms of Service agreements, Stack Overflow source code, copyrighted legal disclaimers, Wikipedia pages, a casino wholesaling website, news blogs, and random internet comments.

    Beyond that, copyright law was designed under the circumstances where creative works are only ever produced by humans, with all the inherent limitations of time, scale, and ability that come with that. Those circumstances have now fundamentally changed, and while I won’t be so bold as to pretend to know what the ideal legal framework is going forward, I think it’s also a much bolder statement than people think to say that fair use as currently applied to humans should apply equally to AI and that this should be accepted without question.






  • Some context is that this is Spotify’s first profitable quarter in quite a while. Also, there are 11 million artists on Spotify. I won’t pretend to have any data on listening distribution, but even naively and stupidly going with a uniform split, that’s of course $5 per artist if you eliminated Spotify’s profit entirely. In reality, most of those will have next to no listeners, and the vast majority of streams are going to the top several thousand.

    The deeper question to ask is where all the streaming revenue is actually going, and the answer to that isn’t to line Spotify’s pockets; it’s to the labels.




  • Accepting violence as a valid political tool for anything other than an absolute last resort is the exact thing that leads to complete and utter chaos. You have to keep in mind that your side is probably not the only side with guns, and those on the other side are also telling themselves that there are plenty of examples of what happens when you let communists get comfy.

    Now, I would obviously say that one of these sides is much more in the wrong, but that doesn’t change the fact that, unless you want a politics of everyone shooting at each other, political violence should essentially always be condemned, even if it’s against your political foes.


    • gnome vs genome: ‘gnome’ isn’t actually that old of a word. Some Swiss guy just made up the Latin word ‘gnomus’ in the 1500s. At that point, /gn/ either was already an invalid onset in English or was very soon to become so. ‘Genome’ was coined in the 1920s by a German guy, loosely based of a Greek word, so I would guess the stressed <e> was interpreted as if it were also Greek and thus pronounced /i/.

    • desert vs dessert: ‘dessert’ comes from Middle French ‘dessert’, itself from ‘des-servir’, literally ‘dis-serve’, as in removing what has been served in order to give you a tasty treat at the end of the meal. Because that initial ‘des-’ is a prefix, it wouldn’t be stressed, thus we have stress on the end of the word. The first vowel may have been reduced to a schwa under continuing influence from French. As for ‘desert’, that comes from Old French, and while I would expect the ending to be stressed in French, it’s entirely possible it was loaned early enough into English that English’s native rule of stressing the first syllable took over. The later introduction of ‘dessert’ would have reinforced this.

    • moist vs maoist: Maoist is of course segmented as Mao-ist, and English doesn’t have any way to clearly show that, so if Mo-ist was a real word, it would unfortunately also be spelled as ‘moist’. Thankfully at least, ‘ao’ isn’t a native English sequence, so that’s a big clue, and ‘aoi’ is not a valid sequence for any single English sound. I think most languages would struggle with this sort of thing, since strong segmenting as happens with neologisms like this will happily defy standard phonological rules so long as speaker can recognize the segments and separate them accordingly.

    • flaming vs flamingo: It’s a somewhat similar story here, with ‘flaming’ obviously being the application of -ing to ‘flame’. ‘Flamingo’ is a loan from Portuguese, and English tends to not adapt loanwords to native phonology very much, but that’s hardly unusual. Hell, in German, you also have ‘Der Flamingo’. Funnily enough though, ‘flamingo’ is actually related to ‘flame’; both come from Latin ‘flamma’, which entered English via Old French.

    • uniformed vs uninformed: Again, another segmentation issue, though it is a kinda fun one. Here, it’s uni-formed and un-informed. Both do ultimately originate with Latin ‘formare’. Here, the core thing is that ‘inform’ is a very common and easily recognized word, and we all know that the stress in on the ending. Adding another common prefix to it doesn’t distract us from that, especially with the meanings being clearly related. ‘Uniform’, on the other hand, is much more liable to be analyzed as a single unit, since uni- is not a very common prefix and the connection in meaning to ‘formed’ isn’t super transparent. So, we just treat is as a single largely independent word and that’s that.

    • laughter vs slaughter: God damn it, you really had to bring <gh> hell back at the end. So as before, we get /f/ at the end of ‘laugh’ due to there not being any other consonant around to compensate for totally dropping the <gh>. Now, the obvious question is, why doesn’t this apply to ‘slaughter’ as well? That’s because ‘slaughter’ isn’t actually a native English word at all. Again, we can blame our friends the Vikings for bringing ‘slaughter’ and literal slaughter to England. The Old Norse form was ‘slahtr’. That first element, ‘slah’, is actually the same word as English ‘slay’. So basically, there never existed an English word ‘slaugh’ that had that pronounced <gh>, and so to whatever extent it was pronounced in the Old Norse word, it could easily fade away without causing problems. Whereas for ‘laughter’, this is easily analyzable as ‘laugh-ter’, and since ‘laugh’ developed an -f ending, ‘laughter’ kept it in order to maintain consistency.


    Truly, what a mess. Beyond sating some curiosity though, I hope this does go to show you that English really isn’t total random chaos like it’s often portrayed. Every apparent exception or weird spelling has a very real explanation behind it that tells a truly incredible story about an island that saw some Celts settle down, the arrival and then departure of the Romans, a violent conquest by the Anglo-Saxons, continued influence from Christianity, centuries of conflict with the Scandinavians, yet another conquest by the Normans, continuous cultural exchange with the rest of Europe, an explosion of Greek and Latin terms - many coined - during the Renaissance and Enlightenment, and then the modern age of globalization (and colonialism) that’s resulted in the importation of words from all across the world. Every word is a story; you only have to take the time to read it.

    (the English vowel system is actually insane though; I really cannot defend it lmao)


  • Okay let’s do this.

    • asses vs assess: ‘asses’ is the standard plural of ‘ass’, which has been around since Old English, and pluralization doesn’t change word stress. As the plural marker -es follows the stress, the vowel is reduced to schwa. ‘assess’ comes from Middle French ‘assesser’, which had stress on the end syllable. That got adopted into Middle English as ‘assessen’, with the -en being an infinitive ending (as it still is in German and Dutch). Removing that ending, as it would be when conjugated, you get the stem ‘assess-’, with stress on the end. Because that vowel is stressed, it isn’t reduced to schwa.

    • these vs theses: ‘these’ goes back to Old English, and while the details aren’t hugely important, suffice it to say that there were various processes that caused the /θ/ (<th>) and /s/ to become voiced as /δ/ and /z/. ‘Theses’ is from Ancient Greek (filtered through Latin), and is probably a relatively modern loan, vaguely from the Enlightenment if I had to guess. The Greek spelling is θέσεις (theseis). The Greek letter <θ> is strictly the voiceless ‘th’ sound (the sound in ‘thing’, NOT in ‘this’). The first vowel is the ‘bee’ vowel /i/ because it’s stressed, while the second one is also /i/ because that’s how <ει> has been pronounced in Greek since vaguely the Roman era. English, like all European languages, has its own tradition of how to pronounce words from Greek and Latin that have diverged a fair bit from how they were originally pronounced, giving weirdness like this.

    • trough vs through: -ough is notoriously terrible, but it wasn’t always this way. Back in Old English, these words ended in either -g or -h. -h was the ending sound of Scottish ‘Loch’, while -g was basically the same sound, but voiced. As you know, these sounds do not exist in English today, and so they generally either became silent or shifted to the next closest thing, often /f/. This depended on the exact phonetic context and was generally a mess, though I’ll do my best to untangle things. ‘trough’ was ‘trog’ or ‘troh’ in Old English, while ‘through’ was ‘thurh’. If I had to guess, I’d say that it went silent in ‘thurh’ because it was preceded by an /r/, and so it could be dropped while still being recognizable as the same word (note how you can easily still recognize the word didn’t even if you don’t pronounce the /t/). This wasn’t the case in ‘trog’, and so it became an /f/ as the next closest sounding consonant. The ‘loch’ sound /x/ and /f/ both produce some raspy rush of air, so it’s not completely weird.

    • though vs thought: this one is a bit messy. ‘though’ strictly speaking comes from Old English ‘þēah’, which we might expect to get an -f ending. However, it was conflated with Old Norse ‘þó’, which dropped the ending consonant and changed the vowel. A huge amount of Old Norse vocabulary entered English during the late Old English period and displaced quite a lot of native English vocabulary, including pronouns. ‘them’, for instance, isn’t actually a native English word, but rather is from Old Norse. ‘thought’ comes from Old English ‘þōht’, where as before with ‘thurh’, the sound could be dropped without impeding word recognition. The evolution of the vowels is a whole hot mess due to English having one of the most complex vowel systems in the world, so I’m gonna just leave that as ‘people talked and fucked the vowels up’.

    • though vs thorough: I don’t think there’s that much weird about this one? ‘thorough’ is from a corruption of Old English ‘thurh’ (through) into ‘thuruh’, which came to be used as an adjective and gained initial word stress that caused the vowels to evolve differently. It’s not that goofy, all things considered.

    thank god, we’re done with the <gh> disaster. The Scots really had it right when they decided to just keep it.

    • stranger vs strangler: this is predictable. The <g> in ‘stranger’ is reduced to ‘dʒ’ (the consonant in “Joe”) because it’s followed by <e>, reflecting a stage of palatalization in Middle French where the word originates (originally Latin ‘extraneus’). This isn’t the case in ‘strangler’, so it behaves as normal. Oh, I guess the vowels are different; like I said, English vowels are a disaster. So, ‘stranger’ was borrowed from Anglo-Norman, a weird dialect of Old French originating from the Normans that conquered Britain. It was divergent from more standard Old French in a few ways, and in this case, ‘stranger’ comes from Anglo-Norman ‘straungier’. This turned into a long /a/ in Middle English (the vowel of ‘father’), which the Great Vowel Shift turned into vowel we have today. ‘Strangle’, on the other hand, comes from Old French ‘estrangler’ and entered English with a short vowel. So, ‘strange’ originally had a long /a/ while ‘strangle’ had a short /a/, and those both evolved into different sounds that are spelled with the same letter because English is insane.


  • BraveSirZaphod@kbin.socialtoEurope@feddit.de*Permanently Deleted*
    link
    fedilink
    arrow-up
    13
    arrow-down
    5
    ·
    1 year ago

    Beyond that, this is the entire point of budget airlines: providing a cheaper alternative for people who either don’t need those extra services or for people who otherwise would not be able to afford to go at all.

    If they’re forced to offer all the same services as the major airlines, they’ll just go out of business. I’m sure the large players would love this to pass as a easy way to kill off some competition.