Friday, February 14, 2025

The Entelechy of Information

 

Things that aren't alive behave as if they are, making us question what life is.  Darwin's theory of evolution started with a Tree of Life for language, not for Life. This is a great piece, and she's the one who did the 'black traffic stop' study btw, very clever.


Study traces an infectious language epidemic
May 2024, phys.org

Note for context: More than 692,000 preventable hospitalizations were reported among unvaccinated patients from November to December 2021 alone, and at a minimum cost of $13.8 billion.

They used GPT-4 to find Covid prevention-opposed posts in banned subreddits. They were looking for something called "gists" - Reyna has shown that individuals learn and recall information better when it is expressed in a cause and effect relationship, and not just as rote information. This holds true even if the information is inaccurate or the implied connection is weak. Reyna calls this cause-and-effect construction a "gist."

The results show that, indeed, social media posts that linked a cause, such as "I got the COVID vaccine," with an effect, such as "I've felt like death ever since," quickly showed up in people's beliefs and affected their offline health decisions. In fact, the total and new daily COVID-19 cases in the U.S. could be significantly predicted by the volume of gists on banned subreddit groups.

Rho's work is grounded in a social science framework called Fuzzy Trace Theory that was pioneered by Valerie Reyna, a Cornell University professor of psychology and a collaborator on this Virginia Tech project.

via Eugenia Rho and Society + AI & Language Lab in the Department of Computing at the College of Engineering at Virginia Tech: Xiaohan Ding et al, Leveraging Prompt-Based Large Language Models: Predicting Pandemic Health Decisions and Outcomes Through Social Media Language, arXiv (2024). DOI: 10.48550/arxiv.2403.00994

And this, an important to note to stand by itself:
Many other social media platforms have barred outside researchers from using their data.

(And this is why they used reddit, which I think as of today no longer allows ^this. Regardless consider for a moment that digital social-mediated misinformation is a thing, and it's a real risk, and yet we can't even study it anymore because it's all totally proprietary. But don't consider the fact that data - all the data - is created by the users, yet it's not owned by the users to the extent that they, or anyone, can study it; that's for another post.)

Image credit: Plant root via Fiber Optic 2x - Dr Adolfo Ruiz De Segovia Nikon Small World - 2024


Back to language as an epidemic, behold Legalese:
Study explains why laws are written in an incomprehensible style
Aug 2024, phys.org

A 2022 study found legal documents frequently have long definitions inserted in the middle of sentences - a feature known as "center-embedding" which makes text more difficult to understand, even for lawyers. "Legalese somehow has developed this tendency to put structures inside other structures, in a way which is not typical of human languages." 

"Lawyers don't like it, laypeople don't like it, so the point of this current paper was to try and figure out why they write documents this way."

Hypotheses:
"Copy and edit hypothesis" - legal documents begin with a simple premise, and then additional information and definitions are inserted into already existing sentences, creating complex center-embedded clauses.

"Magic spell hypothesis" - the convoluted style of legal language signals a kind of authority, "if you want to write something that's a magic spell, people know that the way to do that is you put a lot of old-fashioned rhymes in there."

Test:
They did two experiments, one with people split in two groups for writing laws and then adding to them, or writing laws all at once. They found that people wrote center-embeddings regardless of whether they had to copy and edit.

The other experiment gave people laws from another country and asked them to rewrite them, both as law itself and as a description of the law. They found that people put embeddings in the law but not the description. 

They're now investigating the origins of center-embedding in legal documents, back to the Hammurabi Code of 1750 BC.

via MIT, University of Melbourne, University of Chicago Law School: Martínez, Eric, Even laypeople use legalese, Proceedings of the National Academy of Sciences (2024). DOI: 10.1073/pnas.2405564121


But in case you still weren't convinced that contagion is a function of information systems, and not of living systems ... or perhaps you'd rather believe that language is in fact a form of life:
How new words arise in social media
Sep 2024, phys.org

They analyzed 650 million tweets written in French between 2012 and 2014 to identify 400 words that were new to appear on the social media network then called Twitter, then they tracked the diffusion of these words over the following five years, and looked at the position and connectivity of users who adopted the words.

"lexical innovations" - new words

On average, the words that eventually persisted were used by people who were more central to their community, and remained in circulation at low levels for a longer period before entering a growth phase (18.5 months in circulation compared to 6.5 months for buzzes).

Words that became only temporary buzzes were used by people with less central positions within a social network and had a more rapid rise in use - followed by a rapid decline.

via École Normale Supérieure in France: Tarrade L, Chevrot J-P, Magué J-P (2024) How position in the network determines the fate of lexical innovations on Twitter, PLOS Complex Systems (2024). DOI: 10.1371/journal.pcsy.0000005

No comments:

Post a Comment