Friday, December 20, 2024

Copyright Mishaps, Copybots and Word Traps


In the news, we see the use of an old clever trick to catch lying cheating thieves - like hiding an air tag in your purse, writers and all kinds of content creators are hiding novel nonsense in their work. Mapmakers did this by using fake street names - nobody would notice it expect the person who put it there. In the early days of data collection, when a list of subscribers to American Baby magazine was worth its weight in gold, the list owners would secretly add employees from their own company to the list, with their own home addresses; so if an employee got solicited by any company that didn't pay for the list, they would know who stole their data. Surely there are examples that reach back further into history, but for now:

“Copyright traps” could tell writers if an AI has scraped their work
Jul 2024, MIT Tech Review

To create the traps, the team used a word generator to create thousands of synthetic sentences full of gibberish, which are then injected into text in multiple ways—for example, as white text on a white background, or embedded in the article’s source code. This sentence had to be repeated in the text 100 to 1,000 times. 

But, not so fast:
“A lot of companies do deduplication, [meaning] they clean up the data, and a bunch of this kind of stuff will probably get thrown out”

via Imperial College London and presented at the 2024 International Conference on Machine Learning


AI Art - Detailed Mechanical Parts Brimming with Energy 2 - 2024

The above news item is only an introduction to the result of a very long experiment started almost 15 years ago - this weblog.

At some point it became clear that Network Address could be used to run simple experiments on search engines. It was becoming full of text that didn't exist anywhere else on the internet, so it was a kind of control group. Every so often, I would run a search for some weird phrase or word I had used, to see if it "existed" yet. 

There was a point in the development of the internet that if you ran a search and there was only one single result, and that result was your weblog, then you own it! A word is born, and it's all yours. Fast forward to today when, well, it appears that the Great God of Search Engines was letting their product deteriorate since about 2020, once they decided that they owned so much of the internet already that they could do whatever the hell they want and still maintain their position as the best search engine in the world. 

One uneventful day in 2021, I felt a tingling on my arm, kind of like if you've ever felt a sprinkle of water on your skin, but it's not really there. A "thermodynamic hallucination" I thought to myself. Is there such a word? A name for this phenomenon? I ran a search, and to my surprise, that rare moment - no results. A nonsense phrase that can now belong to me. Fully aware of my duty, I made a post about it, for no other reason than to get something out there that contained this phrase, and where the body of the post said something like, "This phrase didn't exist in your search results until just now". Wait a week for the cache to catch up, run a search again, and would you look at that, one search result, the only search result. One more piece of real estate on the dematerialized surface of the noosphere, add that to your portfolio.

Fast forward to 2024, the power of parallel computer processors and yottaboxes of data put artificial text generators in full Turing mode. We're in awe of the outputs; so human! We're  enmeshed in the scandals; how could they steal my work? All I'm thinking is, I have an entire blog's worth of copyright traps to feed this thing. 

And so one early summer night in July of 2024, I ask a generative agent: "What about thermodynamic hallucinations, anything on the web about it?"

The term "thermodynamic hallucinations" appears to be a niche concept without widespread recognition in mainstream scientific literature. However, there are related discussions in the context of stochastic thermodynamics and information theory, where complex interactions and interpretations of thermodynamic systems might be loosely termed as such. For instance, a study published in 2018 explored the connections between stochastic thermodynamics and information ... abstract nature (APS link) [the paper linked here is basically totally unrelated].

Additionally, the idea has been mentioned in more speculative or interdisciplinary contexts. [Plural?] A blog  discusses "thermodynamic hallucniations" in relation to AI, memetics, and sociothermodynamics, reflecting more on the cultural and philosophical implications rather than strict scientific phemonena (Network Address)

Thus, while "thermodynamic hallucinations" is not a standard scientific term, it is used in various [various?] speculative or metaphorical discussions related to the intersection of thermodynamics with information theory and complex systems. -ChatGPT 
Or how about this, trying a little harder: "I thought the blog was referring to thermodynamic hallucinations as a new term?"

The blog that references "thermodynamic hallucinations" does indeed seem to present it as a novel term. [Good] The blog discusses this concept in relation to a variety of topics including artificial intelligence, human futures, memetics, and complex systems. [Definitely not] The term appears to be used metaphorically to describe complex, emergent behaviors or phenomena that arise from the interplay of thermal and information dynamics in advanced computational or sociocultural contexts (Network Address). [It was literally used as a copyright trap, although I didn't know it at the time; I thought I was making a kind of placeholder.]

This indicates that "thermodynamic hallucinations" might be speculative or interdisciplinary concept proposed to explore how principles of thermodynamics could influence or explain certain emergent behviors in AI, cultural evolution, or network science. The term is not widely recognized in mainstream scientific literature, but is seems to serve as a thought-provoking metaphor within certain theoretical and experimental frameworks. For more detailed information, you can visit the blog directly: ...

There's something interesting about the preoccupation with the word "stochastic" and "hallucinations", since these two words, minus the "thermodynamics" have a high co-occurance with AI generative processes in general, which has at this point consumed the entire internet, and might be over-represented in the training data. 

But more importantly, where did all the talk come from about memetics, culutural evolution and network science? Blogs use keywords, or tags; they're bits of metadata added to the post to help categorize it. But the blog itself lists all the keywords used on all the posts on the side bar. Because I don't take the tags seriously, and use them as an extra layer of confusion or expression or whatever you want to call it, there is an accumulation of many hundreds of these tags listed on the side of every post, words like: network science, mind control, entropy, semiotics, Big Data, Big Datty, meatbody, topology, algomania, algohaulic, algorithmic crack, and bio-abacus. Hundreds. 

I don't know exactly how these large language models work, but to simplify, they ingest text and output amalgamated approximations of that text, based on the prompt. This prompt sent the agent to a webpage that had all these crazy tags running down the side. They aren't in the body of the text of the post, but meta-data listed on the side. Little did I know they'd give a passable summary of the blog itself, and this is what came out, above, a very convincing hallucination - that is, if you had no idea of the truth. 

Post Script:
Today, in a final observation, I run one more search for "thermodynamic hallucinations" using Google, then Bing, then DuckDuckGo (Bing). Network Address doesn't show up at all (interesting*), but instead we get 1 result, in all three engines - a substack post by a most likely real person in May 2024 with a section heading titled Thermodynamic Hallucinations, which is about the "temperature" weights on large language models and the "hallucinations" they output; there's a later heading titled Social Hallucinations. And here we see the moment has passed for this experiment, the phrase is now tainted, it's been touched by the real world. Also, the internet as we know it is now broken. 

Post Post Script:
There are actually three results but from the same domain:

No comments:

Post a Comment