Gaël Varoquaux

Fri 11 November 2016


Data science instrumenting social media for advertising is responsible for todays politics

To my friends developing data science for the social media, marketing, and advertising industries,

It is time to accept that we have our share of responsibility in the outcome of the US elections and the vote on Brexit. We are not creating the society that we would like. Facebook, Twitter, targeted advertising, customer profiling, are harmful to truth and have helped Brexiting and electing Trump. Journalism has been replaced by social media and commercial content tailored to influence the reader: your own personal distorted reality.

There are many deep reasons why Trump won the election. Here, as a data scientist, I want to talk about the factors created by data science.

Rumor replaces truth: the way we, data-miners, aggregate and recommend content is based on its popularity, on readership statistics. In no way is it based in the truthfulness of the content. As a result, Facebook, Twitter, Medium, and the like amplify rumors and sensational news, with no reality check [1].

This is nothing new: clickbait and tabloids build upon it. However, social networking and active recommendation makes things significantly worst. Indeed, birds of a feather flock together, reinforcing their own biases. We receive filtered information: have you noticed that every single argument you heard was overwhelmingly against (or in favor of) Brexit? To make matters even worse, our brain loves it: to resolve cognitive dissonance we avoid information that contradicts our biases [2].


We all believe more information when it confirms our biases

Gossiping, rumors, and propaganda have always made sane decisions difficult. The filter bubble, algorithmically-tuned rose-colored glasses of Facebook, escalate this problem into a major dysfunction of our society. They amplify messy and false information better than anything before. Soviet-style propaganda builds on a carefully-crafted lies; post-truth politics build on a flood of information that does not even pretend to be credible in the long run.

Active distortion of reality: amplifying biases to the point that they drown truth is bad. Social networks actually do worse: they give tools for active manipulation of our perception of the world. Indeed, the revenue of today’s Internet information engines comes from advertising. For this purpose they are designed to learn as much as possible about the reader. Then they sell this information bundled with a slot where the buyer can insert the optimal message to influence the reader.

The Trump campaign used targeted Facebook ads presenting to unenthusiastic democrats information about Clinton tuned to discourage them from voting. For instance, portraying her as racist to black voters.

Information manipulation works. The Trump campaign has been a smearing campaign aimed at suppressing votes of his opponent. Release of negative information on Clinton did affect her supporter allegiance.

Tech created the perfect mind-control tool, with an eyes on sales revenue. Someone used it for politics.

The tech industry is mostly socially-liberal and highly educated, wishing the best for society. But it must accept its share of the blame. My friends improving machine-learning for costumer profiling and ad placement, you help shaping a world of lies and deception. I will not blame you for accepting this money: if it were not for you, others would do it. But we should all be thinking about how do we improve this system. How do we use data science to build a world based on objectivity, transparency, and truth, rather than Internet-based marketing?

Disgression: other social issues of data science

  • The tech industry is increasing inequalities, making the rich richer and leaving the poor behind. Data-science, with its ability to automate actions and wield large sources of information, is a major contributor to these sources of inequalities.
  • Internet-based marketing is building a huge spying machine that infers as much as possible about the user. The Trump campaign was able to target a specific population, black voters leaning towards democrats. What if this data was used for direct executive action? This could come quicker than we think, given how intelligence agencies tap into social media.

I preferred to focus this post on how data-science can help distort truth. Indeed, it is a problem too often ignored by data scientists who like to think that they are empowering users.

In memory of Aaron Schwartz who fought centralized power on Internet.

[1]Facebook was until recently using human curators, but fired them, leading to a loss of control on veracity
[2]It is a well-known and well-studied cognitive bias that individuals strive to reduce cognitive dissonace and actively avoid situations and information likely to increase it
Go Top