How do we make the Internet real again?

April 5, 2019

By:   Martin Zizi

In today’s climate, bots have effectively gone through a transitional phase and are now closely related to what we see on social media. In many ways, social media has created an entirely new playground for bots to run rampant.

Data collected in 2017 from GlobalDots’ global network reports nearly half of online traffic is non-human. To an extent, bot-generated traffic is unavoidable because they make up the backbone of the Internet and promote development. As an example, web crawlers are considered ‘good bots,’ and they are programmed to index web pages to build out the search engine results page (SERP) – a tool that vastly improves the user experience. However, there also exists bad bots that are programmed to infiltrate legitimate activity on the Internet, and even mimic human behavior online costing businesses billions. According to a study conducted by Imperva Incapsula, seven billion dollars is lost annually to bot-driven ad fraud. With escalating false metrics, fake social media profiles, and the spreading of misinformation, it’s time we ask ourselves: How do we make the Internet real again?

What Role do Bad Bots Play Online?
For more than a decade, malicious attacks typically involved a bad actor controlling a device/computer by infecting it with malware. For instance, an attacker could infect multiple devices and then instruct the devices to overwhelm a target’s web page (often an enterprise) with the goal of interrupting regular activity for legitimate users; this is referred to as a Distributed Denial of Service (DDoS). Exploited computers or devices could go unnoticed by the owner for long periods and carry out multiple attacks. As objects like thermostats, doorbells, and home assistants become connected to the Internet and the Cloud, the threat of DDoS attacks skyrockets. Recently, TechCrunch published an article reporting on security researchers who discovered that Internet-connected industrial refrigerators could be accessed by using unchanged default passwords that were documented on the company’s website. Cases like these prove that IoT devices are not furnished with the same security measures that are standard for computers and smartphones, which make them an easy target for abuse.

In today’s climate, bots have effectively gone through a transitional phase and are now closely related to what we see on social media. In many ways, social media has created an entirely new playground for bots to run rampant. Three forces at work here are click fraud, fake social media profiles, and amplification bots.

Click Fraud
Bot traffic has drastically affected online advertising. In 2017, the Association of National Advertisers reported only one-quarter of digital ad spend actually reaches legitimate users. This revelation has been a tough blow for companies using the pay-per-click (PPC) model that defines its pricing contingent on the performance of an advertisement. In theory, the PPC model is based on the concept that clicks will turn into conversions. However, in practice, clicks can represent false metrics due to malicious bot traffic.

In turn, advertising companies and websites offering ad placements can leverage misleading figures to inflate their prices for customers. You may ask: who commits click fraud? Advertising companies may profit big, but among the most common culprits are competitor companies, dissatisfied customers, and more shockingly, fraud rings. Botnets can be purchased and used to carry out click fraud attacks.

Fake Social Media Followers
Social media has become a new channel for companies to reach consumers through strategic partnerships with celebrities or influencers who have amassed large followings on popular sites like Twitter and Facebook. A survey conducted in 2018 by the Association of National Advertisers discovered that 75 percent of brands utilize influencers for promotional purposes. Quickly becoming a lucrative market, celebrities/influencers purchased fake followers to inflate their audience size with the hopes of profiting big off social media deals. Just last year, Devumi – a platform used to build an individual’s social media presence – was famously exposed by the New York Times for selling fake followers that recycled real people’s pictures and information. This affects the current cost of advertising audiences, much like the PPC model mentioned above. To solve this problem, Twitter took down tens of millions of suspicious accounts last July, accounting for 6 percent of the platform’s total follower count. There are some red flags to be aware of to avoid interacting with a bot profile on social media: 1) The account has a suspicious bio, 2) The account is posting every few minutes, 3) The account is endorsing polarizing or fake content and 4) The account has received a large follower account in a short amount of time.

Amplification Bots
Amplification bots are a specific type of fake social media profile. They are strategically used to boost the visibility of certain posts, especially ones containing misinformation, through likes and reposts. In 2016, we discovered the truth about Russian bots controlling social media accounts to amplify polarizing political material. A major consequence of this phenomenon was the dissemination of fake news articles that were perceived as credible because they received high engagement (although, let’s not forget that this is an illusion). In 2018, the Pew Researcher Center reported 68 percent of American adults get their news on social media, meaning amplification bots are still a danger to the integrity of the Internet.

Are CAPTCHA, Two-Factor Authentication, and Human Behavior Analysis Solutions of the Past?
Bots maneuver in a similar way to legitimate online users, making them hard to detect. For now, CAPTCHA – an acronym for Completely Automated Public Turing test to tell Computers and Humans Apart – is one of the tools used to filter traffic on web pages. CAPTCHA works by generating a text- or image-based puzzle meant to stump bots while being relatively easy for humans to complete. CAPTCHA systems have their limitations. They often create more friction for the user, and researchers using Google’s own speech-to-text service have tricked reCAPTCHA. The evolution of CAPTCHA has always been a game of cat and mouse – the industry recycles the human-generated responses into AI, which in turn makes AI faster and better than humans at answering – and such a self-defeating loop will likely continue for years to come.

Another tool used to deter bots is two-factor text authentication. This is a popular technique among banking applications and social media platforms to ensure a human is present on the other end of a profile. The idea is that humans will have a verifiable mobile number, though providing your phone number to services can leave room for misuse. For example, people can find your Facebook profile using the same phone number you provide for two-factor authentication. Using two-factor text authentication pains the user experience and requires sensitive information to be disclosed.

A new and up-and-coming bot mitigation service is tracking trackpad/mouse movements that indicate human behavior. Operating entirely in the background, the machine learning algorithms work to construct profiles based on legitimate site interactions, so when non-human behavior is detected, they are immediately ejected.

To stay ahead of the ever-growing capabilities of artificial intelligence (AI), developers will continue investigating new ways to overcome bots, each version more elaborate than the last. However, no matter how subtle and well-coded a bot is, there is one test it will never pass: a physiologic test.

The Final Frontier
Physiologic biometrics are data measurements that come from the live function of our body, like the voice, heartbeat or brain patterns. Physiologic signals are never the same twice, and they represent a complex and dynamic way to authenticate. One might argue that a well-understood bio-signal could be simulated to generate a fake human-like voice, heartbeat, or neural patterns. This assumption holds some truth because, at the moment, most AI learning is based on imitating patterns. As an example, because Face Recognition is trained by pattern recognition, a generic human face can be synthesized that can enter a certain percentage of individual profiles with some success. If we break free from this, there will be nothing left to imitate. In fact, if we were to train AI not by analogy but by learning from differences, it would make AI-based fakes very hard to program. While the “Imitation Game” dear to Alan Turing, is powerful in faking, the game of differences will be powerful in extinguishing fakes. For this reason, if we extract more than just patterns from physiological readings, we can use their complexity to conceive the final frontier between human and bot.