Now that it is possible to do, I have requested my personal data (in some cases, as with my Toyota driving data, going to great lengths to do so) from a variety of companies. I heard from many people that saw that NYT article that they had no idea how much driving data was being collected by their vehicle or insurance tracking apps, and I thought that similarly, most folks wouldn’t realize their televisions were collecting a lot of data too.
I am a big fan of LCD TVs, LG televisions in particular, and own several of them. I often opt out of data collection/use where I can for various products, though sometimes I will leave said collection on so I can see if it enhances my experience. I did so with my main TV at home for a few months, and then went to their website to go through the data request process. They ask you to submit a request from the same home wifi network that your TV is on, and provide data based on the IP address that you submit from. When I didn’t hear back from them after some time, I DM’d the company who got me in touch with someone who could help (and who was very helpful - I’m very grateful to him/them for being very open and helpful about all of this). But the data I got was definitely not “mine”.
I’ve written/spoken before about the limitations of IP addresses, and my team at Google built various IP privacy and VPN features (open sourcing some as well). In the case of residential IPs, you could have the same one for as long as 9 months or perhaps far less if a power outage disrupts your connection or you reboot your router. Your ISP will re-use your IP4 address (because these are of course quite limited) and so if a data request is based on your IP, it’s possible that the IP you submit for your request may apply to someone else’s past watching data.
I received zip files with CSVs, two for every month where one was “ad views” and the other “pg views”. I’ll put a link to the files down below, but in the case of the latter (see below), it indicates which TV show the household was watching. There are multiple TV identifiers, so one IP could map to multiple televisions. It also appears that the content identification that is being used will ID show content coming from multiple sources like other HDMI ports. Most of the shows are somewhat generic but in addition to the ZIP in the file, there are some other clues that this IP address might have been in Ohio (I live in Texas).
The other file I mentioned, “ad views” for every month seems to classify every advertisement that showed on the device while it was on. There’s a lot of line items here, so I filtered just for one instance of the show Shark Tank being watched in an hour-long slot along with about 13 minutes of ads. The ads are coded with a brand and content ID but were not identified in the file but I’m guessing perhaps “1022” which was often repeated might map to ABC promoting other shows as is typical of network television.
Please let me know your thoughts/feedback - I would love to share your views in a follow up post some time. I won’t offer much analysis/conclusions for now, apart from the quite obvious point that we are generating a LOT of data daily across a number of devices we interact with. And even if this was my watching data, I’m not sure HOW sensitive I would find it… but it does make me think more about what kinds of data I’m creating across all the internet devices I use/own, and what it might say about me and my family.
Here are (just two of many) files converted from CSV to XLSX for September 2023: ad views, and page views. Below is the data schema that was sent to me separately from the files. Firstly for “ad views”:
Then for “pg views”: