Friday, August 18, 2017

Unzipf


Zipf's law, top ten most favorite thing on Network Address. New theory ---

Unzipping Zipf's Law: Solution to a century-old linguistic problem
Aug 2017, phys.org

Sander Lestrade, a linguist at Radboud University in The Netherlands, proposes a new solution to this notorious problem in PLOS ONE.

...shows that Zipf's law can be explained by the interaction between the structure of sentences (syntax) and the meaning of words (semantics) in a text.

"In the English language, but also in Dutch, there are only three articles, and tens of thousands of nouns," Lestrade explains. "Since you use an article before almost every noun, articles occur way more often than nouns." But that is not enough to explain Zipf's law. "Within the nouns, you also find big differences. The word 'thing', for example, is much more common than 'submarine', and thus can be used more frequently. But in order to actually occur frequently, a word should not be too general either. If you multiply the differences in meaning within word classes, with the need for every word class, you find a magnificent Zipfian distribution. And this distribution only differs a little from the Zipfian ideal, just like natural language does.
-phys.org

WHAT'S ZIPF

The most frequent word in a language, or in a book, or whatever, will occur approximately twice as often as the second most frequent word, three times as often as the third most frequent word, etc.

(straight from wikipedia, I mean it's all numbers anyway, right?)

For example, in the The Brown University Standard Corpus of Present-Day American English, the word "the" is the most frequently occurring word, and by itself accounts for nearly 7% of all word occurrences (69,971 out of slightly over 1 million). True to Zipf's Law, the second-place word "of" accounts for slightly over 3.5% of words (36,411 occurrences), followed by "and" (28,852). Only 135 vocabulary items are needed to account for half the Brown Corpus.

The same relationship occurs in many other rankings unrelated to language, such as the population ranks of cities in various countries, corporation sizes, income rankings, and so on.
http://en.wikipedia.org/wiki/Zipf's_law

*Zipf's law is referenced in Science Fiction author Robert J. Sawyer's www.wake, when the main character is searching for intelligent life on the web.
http://en.wikipedia.org/wiki/Wake_(Robert_J._Sawyer_novel)

META

There's some other laws meta-physical, like Benford's Law:

In this distribution, the number 1 occurs as the first digit about 30% of the time, while larger numbers occur in that position less frequently, with larger numbers occurring less often: 9 as the first digit less than 5% of the time. This distribution of first digits is the same as the widths of gridlines on a logarithmic scale.


POST SCRIPT
other meta-phys laws etc.

Bursts
Network Address, 2012

Laws Meta-Physical
Network Address, 2013

Physicists eye neural fly data, find formula for Zipf's law
August 2014, phys.org

mathematical models, which demonstrate how Zipf's law naturally arises when a sufficient number of units react to a hidden variable in a system.

"If a system has some hidden variable, and many units, such as 40 or 50 neurons, are adapted and responding to the variable, then Zipf's law will kick in."

"We showed mathematically that the system becomes Zipfian when you're recording the activity of many units, such as neurons, and all of the units are responding to the same variable".

Ilya Nemenman, biophysicist at Emory University and co-author
-phys.org

No comments:

Post a Comment