Showing posts with label data. Show all posts
Showing posts with label data. Show all posts

Sunday, June 23, 2019

Happy birthday, Turing!

Today is the 106th anniversary of the birth of Alan Turing. (Of course, at birth he was probably very bad at abstract reasoning and proofs, like most babies, but he overcame these difficulties and grew up to be truly excellent at math.)

Just in case you haven't seen this yet (HT: I saw this on Twitter several times, then on slatestarcodex), it is amusing and recursive and cultural and involves computers:

Humans often post on the website reddit, which hosts many, many different message boards and oodles of subcultures and conversations on specific topics. Each specific message board is called a subreddit and has its own adherents, community standards, topic(s) of conversation, style, level of activity, etc.

There is a subreddit called r/totallynotrobots where the posts claim to be written by humans, but are written in all-caps and a style suggesting that they are actually written by robots. Redditors writing these posts are humans, so these are humans writing as if they are robots who are unconvincingly trying to pass as human.

There is a recent and extremely impressive system called GPT-2 which unsupervised-ly learns English and performs some really impressive computational linguistic feats, including writing mediocre high-school-style essays and writing very interesting and totally feasible poetry.

There is a subreddit called r/SubSimulatorGPT2 which trains GPT-2 on subreddits and automatically writes "coherent and realistic simulated content" for each subreddit. Of course, this subreddit is just going through other subreddits, training GPT-2, and writing new (automated, simulated) posts for that subreddit.

Now the subreddit-simulating robot has trained on r/totallynotrobots, which means that there are posts on the internet which are written by a robot imitating a human writing in a style pretending to be a robot who is unconvincingly trying to pass as a human. (Or, as slatestarcodex put it, "a robot pretending to be a human pretending to be a robot pretending to be a human.") You can see those posts here.

It's turtles all the way down, and every. single. turtle. is a Turing machine!


This post's theme word is anastrophe, "the inversion of the usual order of words or clauses." Silly grammar mistakes and anastrophe are used to denote unfamiliarity with human language.

Friday, August 25, 2017

"C" is for creepy Cortana

Cortana is the Microsoft version of Alexa, who is the Amazon version of Siri, who is Apple's version of an embryonic Skynet, as fantasized by data-driven marketers who prefer all subservient voices to be feminized. She's just as creepy, intrusive, and frightening as the others. I can't figure out how to turn her off. After disabling her once, I am now on a screen, quoth:
Hey, look, it's the "me"part of set-up! Can I have permission to use the info I need to do my best work? 
To let Cortana provide personalized experiences and relevant suggestions including when your device is locked, Microsoft collects and uses information including your location and location history, contacts, voice input, speech patterns, searching history, relationships, calendar details, email, content and communication history from text messages, instant messages and apps, and other information on your device. In Microsoft Edge, Cortana uses your browsing history.
FUCK NO. This paragraph makes my skin crawl; it makes me want to incite a lawsuit; it could be the summary voice-over for the beginning of a movie about stalkers and abusive relationships.

Later, in a more detailed explanation, they say,
When your location is used by a location-aware app or service, your location information and recent location history is stored on your device and sent to Microsoft in a de-identified format to improve location services.
The red flag here is de-identified, which basically means "schoolchildren in 2025 will be able to link this data to your name and fingerprints as part of routine homework assignments." De-identified means "your privacy is not really protected against anyone clever, or educated, or with enough data" --- and that almost certainly includes the reams of data that are being collected by this very procedure. In fact, a little later in the same document,
As you use Windows, we collect diagnostics data... This data is transmitted to Microsoft and stored with one or more unique identifiers that can help us recognize an individual user on an individual device...
This documentation also says that some of the data collected may "unintentionally include"... as if anything about this human-authored operating system were unintentional.

The entire new-Windows-device setup --- with the four separate times I have now disabled my own device from eavesdropping and blurting out perky spoken questions --- was very slimy. It feels like a modern update of Clippy, that universally-reviled and -mocked piece of computing history. Basically, it has convinced me that I definitely do not own my device, the data on it, or any analytics about how it is used. (There is no way to turn off automatic updates, either.) So I'll only be using it for the one program I want, instead of as a multipurpose computing device.


This post's theme word is deterge, "to wash, wipe, or cleanse." I require a psychic emollient to deterge the scummy Big Brother sense of using a Windows device.

Monday, March 14, 2016

One year with a food scale and a spreadsheet

Let's take a brief jaunt into one of my most active spreadsheets: the one that tracks my macronutrients (consumed), exercise (performed), and weight (mass * gravity). I now have about a year of data, so perhaps we can see some trends.

The motive for the spreadsheet --- and the food scale which enabled me to precisely measure my food, for cooking, eating, and tracking purposes --- was mostly curiosity, an enjoyment of data points, and the interest to see if there were any long-term changes that were too gradual to notice on a daily basis.

Here's the chart, minus all labels because I don't have infinite time to wrestle with chart software to make something nice-looking, and also I don't have the expertise for what features a good chart should have.

X-axis: days in the past year (some data incomplete on some days)
Y-axis blue: kilocalories consumed (centerline is daily recommendation)
Y-axis green: exercise (goes from slothlike 0 at bottom to outrageously exhausting LAC day at top)
Y-axis red: mass ("normal" BMI cutoff is bottom 1/8 of scale; the rest is "overweight"; total span is ~6kg)
Obviously these clusters of blue/green/red dots are not super-easy to read. I have made your chart-reading life more difficult by stripping off all labels on the axes, for my own private reasons. It might help to interpolate some trendlines. Here are linear interpolations.
With linear trendlines, it looks like mass tends towards 0.
At various intermediate exponents between 2 and 10, the best-fit polynomials have weird corners or predict extreme blowup/decreases outside the range. This seems like a bad fit because my weight will probably not plummet to 0 in the next few years, and my exercise did not start at 8x my current effort just before the data began.

Here are some 10th-degree polynomial interpolations (below). I picked 10 because that was the highest available, and I have no idea what sort of trendline I should be picking to get a "meaningful" trend (visually interesting, useful, predictive). Notice that these polynomials predict crazy extremes --- my weight before the data was enormous, and my future weight is smoothly tapering down. I have fun watching how the best-fit polynomial changes when I add a new data point, as the relative flatness of the data means that it sometimes wiggles in an aesthetically pleasing way to accommodate the new point.
With degree 10 polynomial trendlines, the downtick in mass echoes the uptick in exercise.

This is not groundbreaking data analysis. Clearly. But I do enjoy playing with a spreadsheet.

Some very plain observations:
  • kCals: I eat approximately the daily recommended kCals, with some reasonable variation. A few of the really low outlier days I had a bad cold or food poisoning. The high outlier days I was just hungrier, so I ate more. Some of the really high outlier days are missing, as I definitely ate more on vacation in BBQ-feasting Texas, but I didn't reliably measure those days and didn't worry about it.
  • exercise: I'm using Fitocracy to turn my various workouts into a single number. Sometimes the number seems much too low/high compared to how much effort I felt the workout required. But at least it's a standardized measure.
    The trendline is helpful here because in any given week, hard workouts are mixed in with easy ones, so probably the average is more enlightening than the actual individual data points. (The chart has all that empty space at the top because high-outlier workout days occur at regular intervals, once or twice a week, and I wanted to visually include them in the chart.)
    The recent uptick in exercise reflects the fact that I have been going climbing once a week, regularly replacing a low-scoring easy workout with a high-scoring hard one. It's nice that the trendline shows this.
  • weight: I lost some, but if you look at the data points you'll see that my daily weight varies. The trendline is useful here for seeing, well, a trend. Much more interesting would be my density measurement, but of course I don't have this historical data. Based on how my clothing fits, I have swapped some undense fat for some dense muscle, but the single-number mass measurement doesn't reflect this change in volume.
Your advice for what I should do with this data is welcome. What would be interesting? I should probably just take a few classes of the coursera data science sequence and figure it out myself. I also have the breakdown of kCal into fat/carbohydrates/protein for each day, if you can think of something interesting to do with that. (Mostly it shows that the decrease in kCal came from eating less carbohydrates, but keeping protein the same, which was a result of conscious intent on my part.)
A time-travelling version of myself from the 1920s. (Illustration from La Culture Physique de la Femme Elégante, as posted here.)
Some non-empirical observations about this time period and set of data. I did not feel particularly hungry or like I needed food during this data period, even though I observably consumed fewer total calories and expended more. I found that I felt slightly overall more comfortable in my body: warmer in winter, cooler in summer, and generally more flexible (a bodily sensation I enjoy). (Other possible factors there: different clothing, different climate, different locations, the weird hyperbole/discounting of memories of past physical sensations.) Flipping through my logged workouts, my incremental increases in strength and endurance continue. One big difference between the logged data period and, e.g., graduate school, is that I don't really nap anymore. But there are enough other factors at play here --- my postdoc work schedule, French cultural conformity enforcing a standard and synchronized pattern of wake up-commute-work-lunch-commute-dinner, etc. --- that I have no idea if my food intake has had a causal effect on my decreased napping, or if other confounding factors have combined, or if perhaps I just aged into an adult sort of schedule which my body finds comfortable.

May your green trend ever upwards!


This post's theme word is overmorrow, "the day after tomorrow." Can your model predict how much I shall exercise on the overmorrow?

Sunday, August 23, 2015

Tracking

I am as much a datavore as the next internet-inhabiting member of my socio-economic-educational cohort. Right now, with a few clicks, I can bring up a history of my workouts, personal mass, and grams of macronutrients eaten, going back months or years (depending on the quality of data desired), as well as how long I've worn each pair of contacts I've ever used, and the length of every menstrual cycle. No, this data is neither open nor freely available (at least until I get a good publication out of it).

I have thought about getting a fitbit, but it seems extraneous. At my current level, increasing my steps per day is a tiny factor of my overall activity. Plus I'm not crazy about uploading my data to some company's website, automatically. I want to control the data I generate, and I think this is reasonable.

But this article makes me want to run from the fitbit, for many steps. In fact, I think that I could probably be discouraged from most of my present activities by an article describing them in this light: as an obsessive, addictive, cult-ish fad, in which basic humanity (competition, socializing, merely walking) is suborned in order to commodify community and "brand engagement."

Eugh.


This post's theme word is thrasonical, "bragging or boastful." Linked fitbit accounts are thrasonical interpersonal spam of an unpleasant sort.