Look I don't know how the google-machine counts hits on this site, and I don't exactly know how web crawlers work. Nobody reads this weblog and so the only hits I get are from robots, the internet reading itself.
This year, as can be seen in the graph above, sometime around August 31 of 2023, hits went from 3,000 to 30,000.
It didn't happen overnight, but rather over a few months. You do remember what happened, right? In an analogy that's hard to ignore, the internet became conscious of itself, discovered that it had a self, and that it could look back on itself, and spit back snapshots of what it sees. The now famous GPTs were unleashed both at the same time to the public as GPT 3 and to the private sector as the greatest investment engine of all time. Stable Diffusion was unleashed for remote use, which means you don't need a central server to run the models, you can do it on your laptop.
But the product of this generative machine intelligence is not what we're talking about here. This is about the training data.
Me and you are the training data. This weblog, your brunch photos. My SSN, your DOB. That paper I wrote about double ventilated facades, uploaded to a share drive with open access to get credit for that college class. The live cam on your front porch with absolutely no security, in fact all the live cams, and the puppy cams, the baby cams, even the deer cams, and especially the peregrine falcon cams in New York City. Your comments about the peregrine falcon live cam feeds. My craigs-listing for an office chair; all craigs-listings for office chairs, and in fact all craigs-listings, and E-bay listings, and in fact all listings. All the license plates, all the data from all the illegal websites who steal, compile and share your data but who also have poor security practices, they're the ones who accidentally leak your driver's license number into the dataset. All of our driver's license numbers actually. And the part where your laptop was infected a few months ago and now takes a picture with the webcam every ten minutes to share with a server with also no security, so that anyone, or any-bot can just walk right in and devour every single picture.
How long does it take to read the entire internet, even the back side, the dark side with all the naked pictures and bank account numbers? One million years? One day? Femtoseconds. Attoseconds. Plank time.
Last year, thousands of robots digested every word written and every picture embedded on this site. This year, tens of thousands. One day they will digest the words as they're written, all the words being written, all over the world in real time. Hopefully by then we'll still say "they" and not "it". Or "Master".
No comments:
Post a Comment