Friday, August 18, 2017


Zipf's law, top ten most favorite thing on Network Address. New theory ---

Unzipping Zipf's Law: Solution to a century-old linguistic problem
Aug 2017,

Sander Lestrade, a linguist at Radboud University in The Netherlands, proposes a new solution to this notorious problem in PLOS ONE.

...shows that Zipf's law can be explained by the interaction between the structure of sentences (syntax) and the meaning of words (semantics) in a text.

"In the English language, but also in Dutch, there are only three articles, and tens of thousands of nouns," Lestrade explains. "Since you use an article before almost every noun, articles occur way more often than nouns." But that is not enough to explain Zipf's law. "Within the nouns, you also find big differences. The word 'thing', for example, is much more common than 'submarine', and thus can be used more frequently. But in order to actually occur frequently, a word should not be too general either. If you multiply the differences in meaning within word classes, with the need for every word class, you find a magnificent Zipfian distribution. And this distribution only differs a little from the Zipfian ideal, just like natural language does.


The most frequent word in a language, or in a book, or whatever, will occur approximately twice as often as the second most frequent word, three times as often as the third most frequent word, etc.

(straight from wikipedia, I mean it's all numbers anyway, right?)

For example, in the The Brown University Standard Corpus of Present-Day American English, the word "the" is the most frequently occurring word, and by itself accounts for nearly 7% of all word occurrences (69,971 out of slightly over 1 million). True to Zipf's Law, the second-place word "of" accounts for slightly over 3.5% of words (36,411 occurrences), followed by "and" (28,852). Only 135 vocabulary items are needed to account for half the Brown Corpus.

The same relationship occurs in many other rankings unrelated to language, such as the population ranks of cities in various countries, corporation sizes, income rankings, and so on.'s_law

*Zipf's law is referenced in Science Fiction author Robert J. Sawyer's www.wake, when the main character is searching for intelligent life on the web.


There's some other laws meta-physical, like Benford's Law:

In this distribution, the number 1 occurs as the first digit about 30% of the time, while larger numbers occur in that position less frequently, with larger numbers occurring less often: 9 as the first digit less than 5% of the time. This distribution of first digits is the same as the widths of gridlines on a logarithmic scale.

other meta-phys laws etc.

Network Address, 2012

Laws Meta-Physical
Network Address, 2013

Physicists eye neural fly data, find formula for Zipf's law
August 2014,

mathematical models, which demonstrate how Zipf's law naturally arises when a sufficient number of units react to a hidden variable in a system.

"If a system has some hidden variable, and many units, such as 40 or 50 neurons, are adapted and responding to the variable, then Zipf's law will kick in."

"We showed mathematically that the system becomes Zipfian when you're recording the activity of many units, such as neurons, and all of the units are responding to the same variable".

Ilya Nemenman, biophysicist at Emory University and co-author

Tuesday, August 15, 2017

Eyes on the Street

Computer 'anthropologists' study global fashion
Aug 2017,

What is the world wearing?

These scientists are using a deep learning object recognition program to discover visual patterns in clothing and fashion across millions of images of people worldwide and over a period of many years. They detected attributes like color, sleeve length, presence of glasses or hats, etc. (They end up filtering for only waist up photos). They ask questions such as, "How is the frequency of scarf use in the US changing over time?" or "For a given city, such as Los Angeles, what styles are most characteristic of that city."

The objective of this research is ultimately to "provide a look into cultural, social and economic factors that shape societies and provides insights into civilization."

Dashed lines mark Labor Day. Who said Americans don't like conformity?

via Cornell University: StreetStyle: Exploring world-wide clothing styles from millions of photos. arXiv.

I imagined that stuff like this is already happening all over the place, in all kinds of other fields, and being integrated into global policy decisions and bottom-line business calls alike. But, this is not the case; this is still just the beginning. One thing I caught from this, some digital era common sense - Google Trends results for "scarves" peak right before they do on Instagram, because, presumably, people are searching for the thing, then they buy it, then they take pictures of themselves wearing it.

Post Script
These are the real people, not the algorithms, that analyze and predict the world of fashion:
Color Conspirators, Network Address

Monday, August 7, 2017

Believability Likability Falliblity

Why humans find faulty robots more likeable
Aug 2017,

If you've never watched the robots from Boston Dynamics get pushed over while they try to stand, you really should. (just search robot fail videos). If you've never thought, aw man, I feel really bad for that guy, then you should watch definitely watch it. Because you know, one day when a real robot-looking robot is taking care of your feeble parents, or you, you're gonna want to like that robot. And as it turns out, watching something struggle, whether it's a robot or a bug, or ^this kid trying eat cereal, when you watch someone mess up, it makes you like them more.

Says science:

"...participants took a significantly stronger liking to the faulty robot than the robot that interacted flawlessly." ... This finding confirms the Pratfall Effect, which states that people's attractiveness increases when they make a mistake," says Nicole Mirnig, PhD candidate at the Center for Human-Computer Interaction, University of Salzburg, Austria.

Source document:
Nicole Mirnig et al, To Err Is Robot: How Humans Assess and Act toward an Erroneous Social Robot, Frontiers in Robotics and AI (2017). DOI: 10.3389/frobt.2017.00021

Deanonymity Reanonymity

It is easy to expose users' secret web habits, say researchers
July 2017, BBC News

"Two German researchers say they have exposed the porn-browsing habits of a judge, a cyber-crime investigation and the drug preferences of a politician." -BBC

This isn't news. (So why am I writing about it?)

Despite what you might think, there is really no such thing as anonymous data, that is, when you have enough data.

Four data points is all it takes to identify or de-anonymize anonymous data, and this goes back to 2006. In other words, if I were to take a bunch of people and assign them serial numbers instead of their names and track every website they went to, all I would need is four websites from one particular serial number, and I would be able to identify who that individual is.

We forget so easily, but over ten years ago, AOL released a bunch of search data, and then took it back down the same day. They realized that you could pretty easily, no, very easily identify, or re-identify the people behind the search data. Then there was a competition to prove it, done on Netflix users, then Twitter users. Now, ten years later, we have already forgotten. Or perhpas, a tech writer at BBC is just looking for clicks. Or maybe he's just tyring to remind us.

There is no privacy on the internet.

On a positive note, your mom was right, you are special and unique and there's nobody else in the world exactly like you (and that's why it's so easy to re-identify your anonymized self).

AOL subscribers sue over data leak
Ars Technica, 2006

AOL Proudly Releases Massive Amounts of Private Data
Tech Crunch, 2006

How hard is it to 'de-anonymize' cellphone data?
MIT News, 2013

Unique in the Crowd: The privacy bounds of human mobility.
Yves-Alexandre de Montjoye, C├ęsar A. Hidalgo, Michel Verleysen & Vincent D. Blondel. Scientific Reports 3, Article number: 1376 (2013). doi:10.1038/srep01376

The official paper:
Paul Ohm. Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization. UCLA Law Review, Vol. 57, p. 1701, 2010
U of Colorado Law Legal Studies Research Paper No. 9-12.

image credit: link