DataSpii: The catastrophic data leak via browser extensions
Sam Jadali
SecurityWithSam.com
Abstract
We present DataSpii (pronounced data-spy), the catastrophic data leak that occurs when any one of eight browser extensions collects browsing activity data — including personally identifiable information (PII) and corporate information (CI) — from unwitting Chrome and Firefox users. Our investigation uncovered an online service selling the collected browsing activity data to its subscription members in near real-time. In this report, we delineate the sensitive data source types relevant to the security of individuals and businesses across the globe. We observed two extensions employing dilatory tactics — an effective maneuver for eluding detection — to collect the data. We identified the collection of sensitive data from the internal network environments of Fortune 500 companies. Several Fortune 500 companies provided an additional measure of confirmation through a process of responsible disclosure. By deploying a honeypot to monitor web traffic, we discovered near-immediate visits to URLs collected by the extensions. To address the evolving threat to data security, we propose preemptive measures such as limiting access to shareable links, and removing PII and CI from metadata.
1. Introduction
Imagine if someone could publicly access, in near real-time — within an hour — your sensitive personal data on the websites you are browsing. Imagine, further, this person could access your sensitive business data in much the same way. Moreover, what if you and/or your colleagues were, yourselves, unwittingly leaking such data? In Table 1 below, we enumerate the types of data that can be accessed.
Table 1. Types of personal and corporate data accessible via the online service.
Personal data | Corporate data |
personal interests tax returns GPS location travel itineraries gender genealogy usernames passwords credit card information genetic profiles | company memos employee tasks API keys proprietary source code LAN environment data firewall access codes proprietary secrets operational material zero-day vulnerabilities |
This scenario may seem far-fetched, but in reality, it has global implications for millions of people across the globe who unwittingly send information about their browsing activity to a publicly accessible online service via at least eight browser extensions. The online service, referred to in this report as Company X, then provides the collected data to its paid or free-trial members who request such data about any domain of their choosing. See Table 2 for the Chrome and Firefox browser extensions in question.*
*The eight extensions state in either their terms of service, privacy policies, or descriptions that they may collect data that is personally identifiable or non-personally identifiable. While such statements appear to provide transparency about their data collection, privacy policies are largely ignored by web users [1].
Table 2. Chrome and Firefox extensions identified in the DataSpii leak.
Extension name | Number of users (as of Feb/Mar 2019) | Browser vendor | Chrome extension ID (if applicable) |
---|---|---|---|
Hover Zoom | 800,000+ users | Chrome | nonjdcjchghhkdoolnlbekcfllmednbl |
SpeakIt! | 1.4+ million users | Chrome | pgeolalilifpodheeocdmbhehgnkkbak |
SuperZoom | 329,000+ users | Chrome and Firefox | gnamdgilanlgeeljfnckhboobddoahbl |
SaveFrom.net Helper† | ≤140,000 users | Firefox | N/A |
FairShare Unlock‡ | 1+ million users | Chrome and Firefox | alecjlhgldihcjjcffgjalappiifdhae |
PanelMeasurement‡ | 500,000+ users | Chrome | kelbkhobcfhdcfhohdkjnaimmicmhcbo |
Branded Surveys‡ | 8 users (June 2019) | Chrome | dpglnfbihebejclmfmdcbgjembbfjneo |
Panel Community Surveys‡ | 1 user (June 2019) | Chrome | lpjhpdcflkecpciaehfbpafflkeomcnb |
†The invasive data collecting behavior occurs when the SaveFrom.net Helper extension is installed from the author’s official website using Firefox on macOS or Ubuntu. We have not observed such invasive behavior when it is installed from a browser vendor store.
‡FairShare Unlock, PanelMeasurement, Branded Surveys, and Panel Community Surveys make explicit efforts to let their users know they collect browser activity data.
These browser extensions and their respective browsers (i.e., Chrome or Firefox) run on desktops and laptops with macOS, Windows, Chrome OS, or Ubuntu operating systems. The extensions also run on Chromium-based browsers (e.g., Opera, Yandex Browser) that accept Google Chrome extensions. Google Chrome Sync’s feature, for instance, can spread the extensions to home and work computers. The extensions collect highly sensitive user information from URLs, page titles, and referrers. The collected data is then filtered by domain name and offered for sale. In one instance, we observed the widespread exposure of corporate project data and employee tasks from thousands of companies that use a popular project management provider, atlassian.net. It is important to note that this report examines data obtained from Company X; it does not address whether other companies or individuals may have also disseminated such data.
1.1 How it works
As stated above, the term Company X shall refer to the online service selling analytics data of any website. To protect the data of the millions of users and businesses impacted by DataSpii, this report has omitted the name of the online service. The term Party Y shall refer to the data collector vis-à-vis the eight browser extensions at the center of this report. For convenience, Party Y also refers to one or more corporate entities that may or may not have legal relationships with each other. Party Y shall also refer to other browser extensions, as yet unidentified, involved in the collection, processing, and/or dissemination of data.
During the course of our investigation, we observed Party Y extensions collecting browser activity data. Within one hour of collection, we observed that same data being disseminated to members of Company X. (Company X does not provide all Party Y collected metadata types [e.g., last-modified] available to its customers.) Once signed up for Company X’s free trial, a member can then view the near real-time traffic data of any organization by merely entering its domain name (see Figure 1). The data is then filtered by domain name and delivered to the Google Analytics accounts of Company X members. The collected metadata delivered to Company X members includes hostname, URL, page title, referrer, browser, browser version, city, state, country, internet service provider (ISP), network domain, operating system (OS), OS version, date, and time. The Google Analytics service collates the metadata, thus providing Company X members with insight into the sources of a domain’s web traffic statistics.
1.2 Background
A browser is software (e.g., Google Chrome, Mozilla Firefox, Safari, Microsoft Edge, Internet Explorer) that enables users to search, access, and display web pages. Browsers provide an interface to perform various web activities such as emailing, cloud-sharing, online shopping, watching movies, listening to music, and sharing photos. Browser extensions are modules that enhance and customize the browsing experience. For example, ad blockers are extensions that increase page speeds and reduce browser crashes, whereas password managers are extensions that generate and store random passwords, thereby increasing overall security.
According to the Google Chrome Developer site, “Extensions are event based programs used to modify or enhance the Chrome browsing experience. Events are browser triggers, such as navigating to a new page, removing a bookmark, or closing a tab. Extensions monitor these events in their background script, then react with specified instructions” [2]. Such triggers allow an extension to send user browsing data to third-party servers, which in turn can be used to compile user profiles for (the purpose of) targeting users with personalized advertisements.
In Data and Goliath, Bruce Schneier notes that some corporations use surveillance data to target consumers with discounts, trial offers, and other promotional features [3]. Corporations and governments that exchange surveillance data for legitimate purposes may also be sharing it — albeit inadvertently — with cybercriminals, often in huge data breaches. While most users understand that browsing activity is surveilled and tailored for advertisement purposes, few appreciate the full implications of being monitored.
Likewise, browser extensions surveil user data in return for convenience. Antimalware extensions warn users of danger by collecting the URLs that users visit and cross-referencing those URLs with a live database of known hacked sites. Grammar check extensions collect user-entered text, including social media posts, documents, and emails. In exchange for such conveniences, users are bombarded with personalized insights, performance stats, or advertisements. When such data is collected for tailored advertisements, market research, or enhanced browsing experiences, it may inadvertently end up in the wrong hands. Thus, the very same data that enhances personal experiences jeopardizes personal, corporate, and government information.
1.3 Company X and Party Y
Company X’s frequently asked questions (FAQ) page maintains that millions of people around the world opted in to anonymously share their browsing history. It further maintains that what it is doing is legal, ethical, and patented. While Company X does not disclose its method of data collection, it does seek to assure its users that the data collected remains anonymized. We suggest that this is not always the case. Moreover, if an employee is using a data-collecting browser extension on a company computer, the employee may not have the authority to consent on behalf of his or her employer.
While our investigation did not establish a legal relationship between Party Y and Company X, it did establish that much of the data collected by Party Y was made available to members of Company X.
The privacy policies of the Party Y extensions inform the end user that data may be collected. While privacy policies may allay some concerns about data collection, they are largely ignored by the typical web user [1]. Furthermore, in this report, we elucidate a delayed data collection measure employed by two Party Y extensions.
While Party Y extension policies state they may employ “PII scrubbers” to assist in providing anonymity, this report details their inability to scrub all types of PII. As a matter of legality, the European Union’s recently enacted data privacy law (EU-GDPR) states that the PII of all EU data subjects must be redacted or pseudonymized [4]. The failure to redact sensitive information, including PII and CI, endangers everyone who uses Party Y extensions, either directly or by association.
1.4 Invasiveness
After our reporting of a Party Y extension to a browser vendor, the vendor remotely disabled the extension for all users; however, that did not stop the data collection in our browser. While the extension ceased its functionality, we continued to observe our browsing activity being sent via POST request to Party Y servers. Ultimately, the data collection stopped when we removed the extension.
1.5 LAN impact
DataSpii circumvented the most effective security measures including authentication or encryption, which were designed to thwart leaks. Using a multi-pronged approach, we discovered the collection of sensitive data from the private LAN networks of many Fortune 500 companies. In addition, we devised a LAN experiment and observed the collection of hyperlinks stored within the page content of the website hosted on our LAN. Such data collected from a single visit to a page in a LAN environment can be used to map a corporation’s LAN environment. Furthermore, we observed the dissemination of our LAN data to three different hostnames. The collected data included our site’s LAN IP address, hostname, page title, timestamp of the visit, as well as the URLs of page resources (i.e., CSS files, JS files, and images) referenced in our HTML code. We then observed much of our LAN data being disseminated to members of Company X. Finally, through the responsible disclosure process, we corroborated our discovery with impacted individuals and major corporations.
1.6 History
Cybersecurity companies have long warned about the dangers of publicly accessible links [5]. In addition, the HTTP referer header has been notoriously known to cause leakage of publicly accessible URLs. Researchers have also warned about the data collection practices of browser extensions. Unfortunately, many corporations often fail to act on this direct knowledge. The problem is compounded when an online service such as Company X publishes the near real-time traffic data for its members. The dissemination of such data by Company X undermines the data protection features of data source types such as ephemeral links, URL-based query string authentication parameters, and API keys.
2. Affected data source types and impacted companies
Extensions have access to powerful browser application interface (API) functions that tap into user browsing activity. While some extensions may use this data to perform their inherent, legitimate functions, others use the data strictly for monetization or advertising. In order to be approved by a browser extension store, an extension must be reviewed by browser vendors; however, perfunctory reviews can result in the approval of bad actors. We identify seven data source types and provide examples of impacted companies by each.
Note: We have made disclosures to all impacted companies listed in this report. We have not identified any data being used maliciously. However, through our honeypot experiment, we observed third parties visiting unique URLs captured by Party Y browser extensions.
2.1 Data source type: Shared links
Web users frequently share publicly accessible unique links with family, friends, and colleagues. The links can direct recipients to Apple iCloud photos, Quickbooks invoices, 23andMe ancestry data, Nest security camera video clips, or confidential documents stored on OneDrive. The following examples detail how the leaking of shared links can impact major online services.
2.1.1 Zoom Video Communications
The DataSpii leak enables malicious parties to monitor corporate or client meetings. Enterprise businesses use services like Zoom for online meetings, video conferencing, and webinars. Typically, the meeting organizer will invite colleagues or clients by sharing a Zoom meeting link via email. The invite link URL often contains a unique meeting ID. When a participant of the meeting opens the link using a browser with an invasive extension, the link URL may then be displayed to a Company X customer in near real-time (see Figure 2).
Features of Zoom’s “Business” plan enable customers to choose a vanity hostname such as example.zoom.us. The vanity hostname acts as an easy way to brand the Zoom account and is included as a subdomain of the link URLs used for the organization’s online meetings. Search refinement tools provided by Google Analytics make it easy for a Company X member to filter the subdomains of Zoom customers, thereby allowing that member to filter meetings specific to an organization. Meetings that are not password protected become vulnerable to eavesdropping by outsiders with access to the URL. Although Zoom notifies meeting participants when a new user joins a meeting, it is possible for these notifications to be disregarded, especially during meetings with large numbers of participants. Zoom also offers a popular “waiting room” feature, enabling the host to manually admit each participant to ensure only their intended participants join their meeting.
2.1.2 Skype
As with Zoom, Skype users ask their friends or colleagues to join a group chat using a publicly accessible shared link. The recipient of the link can join the chat directly from a web browser. When a user opens the link with a Party Y browser extension, the Skype group URL may become available to Company X customers who request the data of Skype.com (see Figure 3). A malicious Company X member may use the link to eavesdrop on chat conversations. Note that when a new user joins the group chat, Skype will notify connected users that a new guest has joined the conversation.
2.1.3 23andMe
23andMe customers can share reports with family, friends, and/or their entire social network using a publicly accessible link. When the link is opened by a user with a Party Y browser extension, the URL of the report becomes available to any Company X customer that requests data on 23andme.com. Figure 4 displays URLs matching a known format for publicly accessible 23andMe reports. Such reports may include information on ancestry, family DNA, muscle composition, and other biomedical data.
2.1.4 Apple example I
When an iOS user stores photos on iCloud and shares four or more photos via the Photos app, the app will generate a publicly accessible unique link to the photos. When a recipient or sender opens the link in a browser with a Party Y extension, the link to the photos is collected. Such a link may end up in the hands of a Company X member requesting the data of icloud.com (see Figure 5). When the URL is visited, the photos become visible and may even display an iOS user’s first and last name.
2.1.5 Apple example II
Shoppers who place orders via apple.com may receive an email from Apple containing a link to view their order status. When such links are viewed using a browser with a Party Y extension, they may be delivered to customers of Company X that request data of Apple.com. Figure 6 contains Apple order status links obtained from the Company X service.
A bad actor may use the URL data to view an Apple.com product order status page in a browser. While Apple may well omit PII from visible areas of the order status page, during our observations, it did not conceal PII within the source code of our own Apple order. This source code can easily be viewed in web browsers by right-clicking anywhere on the page and selecting the “view page source” option. Figure 7 displays our PII data, including first name, last name, the last four digits of the purchaser’s credit card number, credit card type, and the Apple store location used for pickup.
2.1.6 Intuit Quickbooks
Vendors often email Quickbooks invoice links to customers for payment. Such invoices may be accessible via a publicly accessible unique link (see Figure 8). When the links are viewed using a browser with a Party Y extension, they may be delivered to customers of Company X that request data of intuit.com. Viewing the invoice may reveal the recipient’s full name, address, city, state, zip code, amount paid, invoice number, date, and details of the service, as well as the name and address of the vendor who sent the invoice.
2.1.7 Nest Labs
Nest provides businesses and homeowners with 24/7 video surveillance of their property. Nest members can share surveillance videos with others using a publicly accessible unique link. If the sender or recipient of a shared video clip accesses the URL via a browser with a Party Y extension, the link to the clip may then become available through the Company X service (see Figure 9).
2.1.8 JSFiddle
Web developers use sites such as JSFiddle.net to share and collaborate code with their colleagues. Developers can access their code via a publicly accessible unique URL. If the senders or recipients visit a JSFiddle.net URL using a browser with a Party Y extension, the code may appear to members of Company X requesting data on jsfiddle.net. In addition, Google Analytics tools allow Company X members to filter URLs by a visitor’s service provider. Google Analytics defines the service provider dimension as the internet service provider (ISP) registered to the IP address of the visitor. Many of the largest corporations operate as their own ISP. In Figure 10, we filter the service provider dimension for some of the largest organizations, and observe visits to JSFiddle.net URLs from those networks. The data obtained from Company X could then be used by bad actors to view or scrape the contents of JSFiddle URLs visited by targeted corporations.
2.2 Data source type: PII embedded in URL
Insecure practices such as embedding PII in the URL still occurs within many organizations. When browser users with a Party Y extension visit pages with PII in the URL, the data can be viewed by Company X members. A cybercriminal can use such data to find a person’s GPS coordinates in near real-time, commit burglary during a homeowner’s absence, or to stalk travelers along their destinations.
2.2.1 Southwest Airlines, American Airlines, and United Airlines
Southwest.com, United.com, and AA.com (American) all include passenger last names and record locator (confirmation) numbers within the URL. In addition, Southwest.com URLs may also include a passenger’s first name. On February 6, 2019, Wandera’s threat research team reported that many airlines, including Southwest Airlines, expose PII in their URLs. During our investigation in early June of 2019, Company X data showed thousands of Southwest URLs containing passenger first names, last names, and record locator numbers within the URL (see Figure 11).
Note: After disclosure, Southwest confirmed they are aware of this issue and have enhancements underway.
2.2.2 Uber
Uber customers can book rides in a web browser via the m.uber.com website. In doing so, customers with a Party Y browser extension may expose their pickup and drop-off locations to cybercriminals, who could then use such information to stalk riders using their GPS coordinates in near real-time (see Figure 12).
2.2.3 Apple example III
In Figure 13, we observe URLs containing email addresses from the iforgot.apple.com hostname. Such email addresses can be cross-referenced with hacked databases located on the dark web to exploit vulnerable Apple users.
2.3 Data source type: Page titles
To protect corporate assets, security-conscious businesses enforce user authentication to access company documents and portals; however, such businesses may ignore the content of data stored within the page title, which may contain descriptive confidential information. For example, “Firewall needs to be re-enabled. All ports are currently open” or “reset VPN password to [REDACTED-PASSWORD].” The near real-time exposure of such information creates opportunities for cybercriminals to attack individuals or organizations.
2.3.1 Atlassian
Atlassian is an enterprise software company that provides project management and document collaboration software to its clients. Atlassian’s project management software, JIRA, inserts project issues and employee-assigned tasks into the page title. Atlassian customers have the option to self-host Atlassian software or use the Atlassian Cloud platform. Companies that choose the Atlassian Cloud option can host their project management portal on a subdomain of atlassian.net (i.e., example.atlassian.net) [6]. When a Company X member requests data for atlassian.net, the data includes all subdomains under the domain. Within two hours of requesting Atlassian.net data via the Company X service, we observed the collection of 30,000 unique URLs on more than 1,400 atlassian.net subdomains. This included page title data from the following atlassian.net subdomains: BuzzFeed, NBCdigital, AlienVault, CardinalHealth, TMobile, Reddit, and UnderArmour. Users of these subdomains may have a Party Y extension leaking corporate page title and URL data, as shown in the following examples.
Example: nbcdigital.atlassian.net page titles include:
“Technical failure for <5 minute period of time affecting public save submissions or technical issues before Save period starts that cannot be resolved before the Save period opens – [REDACTED] – Confluence”
“Wrong contestant name(s) and/or image(s) displaying online, in the viz, or in the official app”
Example page title from an unnamed automotive company:
“Whitelist IP [RED.ACT.ED.IP] and [RED.ACT.ED.IP] for all AWS servers.”
From such information, a cybercriminal would know that the two IP addresses or server(s) have unfettered access to the entire company’s AWS servers. The cybercriminal can use such knowledge to focus all of his or her efforts on breaking into the server(s) associated with those IPs.
2.3.2 Microsoft
OneDrive is a file- and photo-sharing hosting service operated by Microsoft. Users of OneDrive often share their documents with friends or colleagues via a publicly accessible unique link. OneDrive inserts the titles of documents or filenames into the browser page title. The Party Y extensions capture the URL and page title data and deliver it to a Company X customer requesting data on 1drv.ms. Using Google Analytics advanced filter tools, metadata such as page titles are searchable. Figure 14 illustrates a page title search query for the word “tax.” The redacted results include the URLs to page titles such as “2018 Tax Return [REDACTED],” where the redacted text is often an individual’s name. A cybercriminal can access these documents and use the data in them to steal identities and/or view financial income, bank account numbers, social security numbers, and other confidential information.
2.3.3 Kareo
Kareo provides healthcare providers with electronic health record (EHR) management via Kareo’s practice management software. Healthcare providers that use Kareo can access patient information via a web browser. Customers of Company X requesting data on Kareo.com may use Google Analytics to find patient names. Figure 15 lists redacted patient names obtained from the Company X service. Furthermore, the figure illustrates many patient names being exposed from a single city. Over a four-day period, we observed 193 unique page titles from Hunters Creek, Florida. We observed the service provider (ISP/ASN/OrgName) value for all entries originating from Hunters Creek as being listed from a single service provider: [REDACTED-CLINIC-NAME] – [REDACTED]. Moreover, we observed all page title entries associated with the visits originating from Windows 7 running Chrome. The metadata led us to suspect the patient-name leak originated from one or more staff member(s) at the unnamed clinic.
Note: After our disclosure to Kareo, Kareo deployed a product update to remove patient names from page titles in the Kareo platform.
2.3.4 DrChrono
DrChrono provides an online platform for doctors and patients that makes EHRs available digitally [7]. Healthcare staff members can access such software from their offices.
Customers of Company X requesting data on DrChrono.com may use Google Analytics to find patient and medication names. Figure 16 displays page title data such as “RefillRequest for [REDACTED-NAME].” In the right-hand column, many of the redacted subdomains of the hostname contain the names of various health clinics and doctors’ offices. Using collected metadata such as timestamp, browser, OS, ISP, and hostname, the metadata may be used to follow the visitor’s browser activity path and infer from this which patient is associated with which medication.
2.3.5 Blue Origin
Blue Origin is a private aerospace company founded by Amazon’s CEO, Jeff Bezos. We observed the following page title data within a Google Analytics account populated by Company X and filtered for the domain name, blueorigin.com. We observed a blueorigin.com subdomain with URL and page title data that follow patterns similar to those of Atlassian’s JIRA software (e.g., /browse/[PROJECT-KEY]-[PROJECT-ID]). Due to the sensitive nature, we have decided against publishing a screenshot. Moreover, we have removed the project-name prefixes (commonly seen with JIRA issues) and suffixes which accompany the page titles.
Blue Origin page title examples:
“manifold [REDACTED] contaminated with [REDACTED] material”
“[REDACTED-COMPETITOR-NAME]”
“Rust found in [REDACTED] manifold”
“speed sensor broke during [REDACTED]”
“[REDACTED] Failed Calibration”
“Verify [REDACTED] can survive degrease and clean operations”
2.3.6 FireEye
FireEye is a publicly-traded cybersecurity company with offices around the world. We observed the following page title data within a Google Analytics account populated by Company X and filtered for the domain name, fireeye.com. The page titles contained the phrase “JIRA” or originated from fireye.com hostnames containing that phrase. We observed the service provider (ISP/ASN/OrgName) of the visitor(s) as “fireeye inc.” The city and region metadata components match a city of one of FireEye’s office locations [8]. This leads us to suspect that a user (or users) using the fireeye inc network is impacted by the DataSpii leak.
We have removed any category-name prefixes (commonly seen with JIRA issues) and suffixes which may accompany page titles. Due to the sensitive nature, we decided against publishing a screenshot.
The following are FireEye.com page titles collected from visitor(s) with an ISP listed as: fireeye inc.:
“URL with Malicious download missed detection [REDACTED-URL]”
“Large backup fails when [REDACTED]”
“As [REDACTED] engineer, I would like to run the [REDACTED]”
“Expose what Engines are running on the [REDACTED]”
2.4 Data source type: File attachments
Company X members can search, view, scrape, and save publicly accessible attachments leaked by users with Party Y extensions. This includes document, image, and video attachments from popular platforms such as Facebook Messenger and Zendesk. This vulnerability compromises the possible confidential and/or sensitive nature of the files being transmitted.
2.4.1 Facebook Messenger
The URLs to Facebook Messenger attachments may be captured by Party Y extensions (see Figure 17). In the course of our investigation, we sent a sample JPEG to a friend over Facebook Messenger. We observed the JPEG attachment being served over a publicly accessible fbcdn.net URL. While Facebook appears to append query strings that specify a time-limited window for access, Company X is able to provide these URLs in near real-time, thereby allowing access within Facebook’s time-limited window.
2.4.2 Zendesk
Many of the world’s largest corporations and organizations use Zendesk for their support ticketing system. The Zendesk website states that their customers include Uber, Airbnb, Mailchimp, GoFundMe, the American Civil Liberties Union (ACLU), and Dollar Shave Club [9]. Users who submit support tickets sometimes attach files containing confidential or personal information. Using an attachment we submitted to a Zendesk portal, we observed the storage of our attachment on the zdusercontent.com domain name. We observed that the URL to our attachment is publicly accessible. During the investigation of our own Zendesk ticket attachment, we observed that even when the token query parameter was excluded, the attachment still loaded in a remote browser. We discuss redacted query strings in Section 4.1. Figure 18 shows other attachment URLs captured on the zdusercontent.com domain name. Using Google Analytics filter tools, such ticket attachments can be filtered by referrer and narrowed to the companies from which they originate.
Note: Zendesk support instances have a setting which customers can enable to “Require authentication to download [attachments].” As such, the public accessibility of the zdusercontent.com URLs is dependent upon the setting the customers have in place. By default, it is not enabled.
2.5 Data source type: Ephemeral links and authorization strings
In order to defend against malicious activity, businesses may protect publicly accessible URLs by using query string parameters that (a) set a time-limited window for access, (b) add an authorization string, or (c) use (a) and (b) together. However, the ability of Company X to deliver complete URLs to its customers in near real-time renders these security methods less effective, if not altogether ineffective.
2.5.1 Amazon Web Services (AWS) Simple Storage Service (S3) buckets
Per AWS S3 documentation: “By default, all S3 buckets are private and can be accessed only by users that are explicitly granted.” Administrators, however, may authenticate S3 access requests by using AWS S3 Query String Authentication. This form of authentication includes the following URL query string parameters, that can be captured by Party Y extensions:
- The AWSAccessKeyId specifies, “the AWS secret access key used to sign the request and, indirectly, the identity of the developer making the request.”
- The Signature specifies, “[t]he URL encoding of the Base64 encoding of the HMAC-SHA1 of StringToSign.” The encoding is a unique representation that includes an AWS secret access key ID that Amazon assigns to users when they sign up for Amazon Web Services. In other words, it ensures that the request a user is about to make is not modified in transit.
- The Expires parameter specifies when the signature expires and is specified as a UNIX epoch timestamp. Any request received after the specified time will be rejected.
Shopify AWS S3 buckets
Customers of Company X requesting data on amazonaws.com can view URLs containing AWSAccessKeyId, Expires, and Signature query string parameters (explained in the previous subsection). The shopify.s3.amazonaws.com URL is one of many such examples (see Figure 19). A malicious Company X customer can view the captured data in near real-time and use it to access Shopify shipping labels and details about customer orders.
2.5.2 Facebook Photos
Facebook users share private photos on the platform while making use of privacy settings to restrict access. Such Facebook photo privacy settings include Public, Friends, Custom, and Only Me. Once posted, the URLs to such photos may reside on the Facebook domain, fbcdn.net.
To illustrate this, we uploaded a Facebook photo and chose the Only Me photo privacy setting: URL to photo on fbcdn.net. We viewed this photo with a Party Y extension and observed the URL collection by the extension. Then, within one hour, we witnessed the delivery of the URL by Company X to our Google Analytics account (see Figure 20).
As discussed in our results, our honeypot experiment determined that our URLs collected by Party Y extensions were visited by a third party almost immediately after collection. If a third party were to scrape and collect URLs to domains like fbcdn.net, it may enable them to use facial-recognition technology to process and identify individuals visible in the image. While we could only verify visits to our own honeypot domain, third-party viewing, processing, or scraping of such content has disastrous implications for privacy.
2.6 Data source type: API keys, secrets, and tokens
Party Y extensions can capture API authentication credentials, which poses especially vexing security issues. An API can offer software developers an easy way to integrate their software or app with another online platform. In order to serve their function, APIs often provide high-level access to an account. Malicious actors may use API keys and API secrets found on the Company X service to gain access to accounts on cryptocurrency exchanges or financial services, or to access sensitive information from any impacted corporation.
An unnamed, cloud-based website security service
An unnamed platform provides firewall protection services to businesses across the globe. The service’s web portal allows their customers to click on a link to perform firewall actions. Accessing such a link by a user with a Party Y browser extension can deliver the sensitive API key and API secret to a Company X customer requesting data on the unnamed service. A malicious Company X customer can then view customer API keys and secrets, bypass the website’s firewall, and pull the business’s logging data (see Figure 21). Unless use of the API is restricted by IP address, a bad actor can use the API data to wreak havoc on a corporation.
2.7 Data source type: Private LAN environments
For added security, corporations may host their most confidential data on a private intranet that is accessible only from within the organization. For convenience, corporations frequently host their intranet portals on a subdomain of their domain name (e.g., jira.example.com). When an employee with internet and intranet access has a Party Y extension and browses intranet portals on a company subdomain, the extension captures the private intranet browsing activity data before sending it to Party Y servers, which in turn deliver the data to any Company X customer requesting it on the corporation’s domain name.
Many of the page titles on the intranet subdomains include what appear to be employee tasks and issues. Data provided by Company X can provide competitors, state-sponsored actors, and cybercriminals with proprietary secrets, information about existing cybersecurity vulnerabilities, and other sensitive corporate data such as internal health memos.
We employed three prongs to identify corporations impacted by this data type. First, we used readily available DNS tools to ascertain whether the hostname for a data point was publicly resolvable. Second, we ascertained the visitor’s internet service provider (ISP). Many of the impacted Fortune 500 companies act as their own ISP with their own autonomous system number (ASN). Third, we relied on city and region metadata to ascertain if the visitor was near an impacted company’s place of operation.
We ascertained whether the three conditions listed below occurred. If at least two of the conditions occurred (condition 1 being required), we then presumed that the page visit was from a LAN environment, suggesting that a network user or staff member of the corporation may have a Party Y browser extension:
- the hostname is not publicly resolvable or the hostname points to an IP in the known private LAN IP range (i.e., 10.0.0.0/8, 172.16.0.0/12, or 192.168.0.0/16)
- the Service Provider (ISP/ASN/OrgName) value matches the name of the company;
- the City and Region metadata show the visitor is near or in the city of an office location of the impacted company.
We denote each page title with one of the following notations:
‖ All three conditions occurred.
¦ Conditions 1 and 3 are confirmed. We use this notation to acknowledge that not all companies operate their own ASN. In addition, some users may use a VPN to access a corporate LAN while maintaining a wide-area-network (WAN) IP via their local ISP, such as Comcast.
§ Conditions 1 and 2 are confirmed, and the City and Region metadata were missing.
Due to the sensitive page title data, we do not provide screenshots but instead provide the redacted examples in text. The prefixes and suffixes that identify the category or task, like those used by JIRA, have been removed as well. We raise this note to make readers aware that most page titles belong to a subcategory of the removed category prefix under which those page titles were raised.
The following page titles were obtained from Company X, and we identify the pages as originating from a private LAN environment:
2.7.1 Health technology companies, including AthenaHealth and Epic Systems, Inc.
‖ “Vaccines not being marked as completed once administered”
‖ “Case Delays due to Emergent Case after Patient is In Room”
‖ “PRN Medications administered multiple times”
‖ ”Giving OR Nurses Access to [REDACTED]”
‖ “Timed PT Services are Displayed and Calculated Incorrectly on The Billing Tab”
‖ ”Newborns not showing up in [REDACTED]”
2.7.2 Pharmaceutical companies, including Amgen, Merck, Pfizer, and Roche
‖ “[REDACTED] Samples were approved without approving the test results”
§ “URGENT Missing Samples [REDACTED]”
§ “Revise Sample Limit of [REDACTED] 5mg to [REDACTED] Packs per person/year”
‖ “[REDACTED] Aliquotting Issue prevents [REDACTED] from fully documenting [REDACTED]”
‖ “Personnel plates were not marked as sampled”
‖ “User Approved own entered results”
‖ “We would like to use Jira as a ticketing system for handling request based on privacy regulations globally”
§ “MR Review Single [REDACTED-DRUG-NAME] [REDACTED]”
‖ “Samples are showing Out of Spec even when the results are In spec”
‖ “[REDACTED] admin disable api key”
‖ “Need the Parent [REDACTED] ID visible when attempting to match customers, see attachment for more detail”
‖ “Update library dependencies per [REDACTED] vulnerabilities”
2.7.3 Kaiser Permanente
‖ “Unpatched Library/Software: [REDACTED] Risk-High – [REDACTED] Jira”
‖ “[REDACTED] Insufficient [REDACTED-SECURITY-ISSUE]
2.7.4 Unnamed airplane manufacturer
‖ “MFA access does not work through [REDACTED].”
Note: MFA is an acronym often used for multifactor authentication. Bad actors could act on such near real-time information to gain access to systems.
2.7.5 SpaceX
‖ “[REDACTED] Propulsion Failures”
‖ “[REDACTED] Vent Pressure [REDACTED] Displacement”
‖ “Dragon 2 Propulsion Initialization [REDACTED]”
2.7.6 Tesla
‖ “Investigate [REDACTED] Raven front drivetrain [REDACTED] at [REDACTED] MPH and also [REDACTED]”
‖ “Model [REDACTED] Significant white oxidation along the [REDACTED]”
‖ “[REDACTED VIN#] Driving at speeds of approx 5 to 25 mph there is a loud whining sound from the rear drive unit. Noticed that when checking oil level that [REDACTED]”
‖ “Periodic clunking noises while supercharging Model [REDACTED]”
‖ “Bring back [REDACTED] trims for Model [REDACTED] features pages in the Performance section”
2.7.7 Apple
‖ “Issue where [REDACTED] and [REDACTED] field are getting updated in response of story and collection update APIs by [REDACTED]”
‖ “New hit test API & [REDACTED] concept”
‖ “HomePod Settings (New Proposal) [REDACTED]”
‖ “scripts for testing VIO accuracy amoung [sic] various devices”
2.7.8 Cybersecurity firms, including Symantec, Trend Micro, and Palo Alto Networks
‖ “[REDACTED] Production Security vulnerabilities detected in [REDACTED]”
‖ “[REDACTED] does not have [REDACTED] agents deployed to enable security monitoring”
¦ “[REDACTED-TIER] – [REDACTED-MAJOR-BANK] – IDS Sensor [REDACTED] not displaying [REDACTED]”
¦ “[REDACTED-TIER] – [REDACTED-MAJOR-RETAIL-CORP] – Evaluate false positive for [REDACTED]”
¦ “[REDACTED-TIER] – [REDACTED-MAJOR-TELECOM-CORP] – SMS only providing limited Log History”
‖ “[REDACTED] Servers reaching maximum capacity and processing capabilities”
¦ “Reputation filter ignoring IP exceptions”
¦ “[REDACTED] fails to activate package when large number of malware references are present on a filter”
‖ “IPS Performance degrades after enabling OOB”
Note: IPS is a common acronym for Intrusion Protection System; OOB is a common acronym for Out-of-Band.
2.8 Example attack scenario
One Fortune 500 company’s private LAN data included the following page titles:
“Add client [REDACTED-LAN-IP-ADDRESS] to NFS export list on [REDACTED-SERVER-NAME]”
“Supermicro PSUs are bad”
“Please install latest CentOS on [REDACTED]”
“Snapshot strategy: Moving from 30 dailies, to 5 dailies, and 2 weeklies”
“Need [REDACTED-FOLDER] exported for NFS for [REDACTED-LARGE-LAN-IP-BLOCK]”
A bad actor’s access to such data will inform him or her of the following:
(1) the Fortune 500 company’s disaster recovery strategy;
(2) the brand of hardware in use;
(3) the IP address to a server that has access to an NFS storage unit;
(4) the company’s preferred Linux distribution; and
(5) a large LAN IP block with full access to a folder.
Such data can be used by bad actors to pinpoint attack vectors and plan a large-scale breach.
3. Methodology
To identify the extensions involved in data collection, we began our analysis by reviewing the first reported extension: Hover Zoom. In order to view the data captured by the extension, we analyzed the extension in an isolated environment. We used this controlled environment to verify whether the data was being collected by the extension, to decode and decompress the data, and to verify whether it was accessible via the Company X service.
3.1 Extension analysis environment
In order to set up an environment with minimal interference, we employed the following:
- Installation of a clean operating system (OS). Personal computers can be contaminated and exposed to malware in a variety of ways. OS software obtained from third-party providers may contain malware [10]. To decrease such risks, we provisioned a Windows Server 2016 virtual machine by using an OS ISO downloaded directly from microsoft.com.
- Clean IP. To rule out interference from ad networks or other third parties having obtained data from an IP address, we allocated to the server an IP address that had not been used for at least five years before this experiment.
- Browser. For each browser tested, we installed the browser by downloading the installation file from the browser vendor’s website.
- Burp Suite. Burp Suite, a web application security tool downloaded from portswigger.net, was used for recon and analysis. The app acts as an intermediary between the browser and destination web server, allowing inspection and analysis of web traffic. Burp Suite records all requests from the moment of extension installation and continues to do so as long as the application remains open.
- FoxyProxy Standard Chrome Extension. This extension proxies Chrome requests to Burp Suite. After vetting FoxyProxy, we used this extension in our Windows OS testing environment. We chose FoxyProxy instead of the native Windows proxy service because the native service tended to add unnecessary traffic bulk to our Burp Suite data. We also verified our results using other operating systems such as Chrome OS running on a Chromebook, and in some instances without using FoxyProxy.
- Suspected extension. We installed each browser extension in question using the resource from which it is most likely to be downloaded by everyday users (e.g., the Chrome Web Store). However, the invasive behavior of the SaveFrom.net Helper Firefox extension only occurred when downloaded from the extension author’s website, en.savefrom.net. In addition, we downloaded Firefox’s FairShare Unlock extension from https://unlock.fairsharelabs.com/install/.
We used Chrome, Chromium, and Firefox to verify that the leak can occur on macOS, Ubuntu, Windows, and Chrome OS (the OS used on Chromebooks).
3.2 Tools for analysis
For analysis, we used Burp Suite as well as Chrome’s Development Tools; specifically, we used “Inspect views background page” for each extension. We inspected the background page to monitor live events and network traffic requests initiated by the extension. We reviewed historical and live events via Burp Suite. We noted the timestamp, hostname, URL query strings, HTTP status, response length, the raw requests and responses for each request, and the sequence of the requests. We then searched for temporal patterns and anomalous requests such as an unusually long response length that occurred while the browser was idle. Such high response lengths may indicate either the download of an extension’s data collection instruction set or the more commonly observed automated extension update.
In some cases, the raw responses of requests contained minified source code, making it difficult to read. We used readily available unminification tools (e.g., Atom Beautify, unminify.com, beautifier.io) to translate the code into a more readable format. For additional extension source code analysis, we used Rob Wu’s CRX Viewer. We informed ourselves with actionable intelligence provided by Duo Security’s CRXcavator. We used PacketTotal’s PCAP historical analysis database to perform research on the hostnames that we uncovered via Burp Suite. For the historical WHOIS and hosting information of those hostnames, we used DomainTools.com. For greater hosting and DNS insight, we further assessed the hostnames and IPs using Farsight Security’s historical DNS database. We also performed a quantitative text and source code analysis of each extension author’s website, including the extension’s terms of service and privacy policy pages. In order to find similar sites and extensions, we used search engine tools such as PublicWWW.com to perform reverse source code lookups.
4. Results
4.1 Observations
The sync expansion
In order to provide a consistent experience across all devices, Chrome’s Sync function automatically syncs bookmarks, history, settings, and browser extensions across all computers logged into Chrome using a Gmail account. Consequently, an invasive extension combined with the browser’s syncing behavior enables data collection from all logged-in locations (e.g., home and work computers).
HTTPS and HTTP environments
The browser’s powerful API functions allow extensions to tap into browser activity. When an invasive extension is installed within the browser, it can access page content — regardless of whether the site being visited is listed as secure (i.e., it uses HTTPS) or insecure (i.e., it does not use HTTPS). The extension may then collect data and send it off to a server for processing.
Redaction methods
Company X-provided page URL data of AA.com (American Airlines) displayed redacted last names with “*****”. During our analysis of Hover Zoom’s POST requests in Burp Suite, we observed redaction of common query string parameters such as “lastName.” However, passenger names from Company X-provided URL data of Southwest.com (Southwest Airlines) were unredacted. Our observations suggest that this is due to the query string parameter (passengerLastName) employed by Southwest’s web developers. Hover Zoom exhibited redaction capabilities for obvious query string parameters; however, we did not observe such behavior for variations of such parameters.
During our analysis of SuperZoom’s POST requests, we did not observe any redaction capabilities. As a result, query string parameters such as lastname, firstname, username, password, ssn, and apikey, were not redacted, and were visible in data delivered to customers of Company X (see Figures 22 and 23).
We did, however, note that some redaction was performed post-collection. Using a browser with a Party Y extension, we submitted a URL with the parameter “[email protected].” In each instance, regardless of whether the extension performed redaction of the email address, the address was delivered to us by Company X with the redaction
4.1.1 POST request patterns
We observed all eight extensions perform POST requests to hostnames that resolve to two groups of IPs. The first group, identified as Group A, consists of five hostnames that were observed from four extensions. The five hostnames from Group A resolve to the same two IP addresses and use a subdomain of the following names: “cr-b” or “ff-b.” The second group, identified as Group B, consists of eight hostnames that all resolve to the same five IP addresses. IP Group B hostnames use subdomains of the following patterns: “cr-input” or “ff-input.” We observe that all invasive Chrome extensions send their requests to hostnames that begin with “cr.” All invasive Firefox extensions send their requests to hostnames that begin with “ff.”
4.1.2 Hostnames
Using the available toolsets, we uncovered Party Y hostnames being used in data collection for Company X (see Table 3).
Table 3. GET and POST request Hostnames used by Party Y extensions.
Group A | Group B |
cr-b.hvrzm.com cr-b.getspeakit.com cr-b.ebehaviors.com cr-b.panelmeasurement.com ff-b.ebehaviors.com | cr-input.hvrzm.com cr-input.getspeakit.com cr-input.superzoom.net cr-input.ebehaviors.com cr-input.panelmeasurement.com ff-input.ebehaviors.com ff-input.mxpnl.net ff-input.superzoom.net |
Each Group A hostname has two DNS A records that resolve to 34.195.36.59 and 52.200.192.173.
Each Group B hostname has five DNS A records that resolve to 52.44.145.169, 34.231.181.237, 34.237.105.7, 52.204.245.185, 54.160.162.145, and 52.72.99.240.
We observed eight Party Y extensions making GET or POST requests to one or more of the Table 3 hostnames.
We uncovered other hostnames, as well; however, we were unable to link them to any extension (see Table 4). These hostnames use DNS A records that point to the same DNS A records as those discussed in Table 3’s respective Group A and B hostnames. It is possible these extensions were already flagged or removed from browser vendor stores.
Table 4. Hostnames not yet identified with any particular extension.
Unidentified Group A hostnames | Unidentified Group B hostnames |
cr-b.prestadb.net networkanalytics.net b.networkanalytics.net | cr-input.prestadb.net ff-input.prestadb.net input.networkanalytics.net cr-input.nodehop.com ff-input.nodehop.com cr-input.mxpnl.net cr-input.freevideodownloader.net ff-input.freevideodownloader.net |
4.1.3 Text analysis
Notwithstanding the SaveFrom.net Helper extension, the privacy policies of the other five extensions are nearly identical. They share the same verbiage. FairShare Labs, PanelMeasurement, and Super Zoom listed the same business address on the extension author website, extension Chrome web store page, or extension privacy policy page, respectively. Links to the policies can be found in the appendices.
4.1.4 Data collection
To simplify the results for the body of our report, we analyzed a single extension, Hover Zoom. Figures and analyses for the other five extensions can be found in the appendices.
Immediately after the installation of Hover Zoom, we observed a GET request to cr-b.hvrzm.com containing query strings noting a unique browser ID, the extension’s installation time, install version, current version, and Hover Zoom’s Chrome extension ID. Using screenshots from Burp Suite, we present these requests in Figures 24 through 34. We identify the individual GET and POST requests using the request number located in the left-hand column of the screenshot (see Figure 24).
We observed the extension’s automated update to a new version, but, once again, we did not observe the collection of browsing activity data; however, 24 days after installation, we did observe a second automated update to the extension. Immediately following the second automated update, a GET request to cr-b.hvrzm.com was again performed with query strings noting the unique browser ID, installation time, install version (noted as two versions prior), current version, and Chrome extension ID (see Figure 25). One second later, another GET request was made, notably with a 156KB response payload. Following this request, all subsequent browser activity data was collected and sent via POST request to cr-input.hvrzm.com. Notably, the cr-input.hvrzm.com hostname does not appear in the original source code of the extension, nor does it appear in an installation on a new device. In Figure 26, we observe a unique reason query string parameter in the GET request. Figure 27 is the 156KB response payload to Figure 26’s GET request. We found 150KB of the original 156KB response payload stored in a file located in the Google Chrome profile’s File System folder, C:\Users\Administrator\AppData\Local\Google\Chrome\User Data\Default\File System\002\p\00\00000000. Upon examination of this payload, we observed minified JS data containing encoding and compression instructions, and we also observed the introduction of the cr-input.hvrzm.com hostname. We provide an unminified view of the 00000000 file in Figure 28.
Immediately following the download of the 156KB payload, a subsequent GET request was made noting a config statement in the URL (see Figure 29). In Figure 30, we observed the GET response to the config statement, which noted a batch_size of 10 and batch_max_wait of 600,000. Figure 31 illustrates how Hover Zoom waited until > 9 URLs had been collected before it made a POST request to cr-input.hvrzm.com. If fewer than 10 URLs were collected, it made a POST request with that data within 600,000 milliseconds (10 minutes). We browsed one page (request #12643) on the text.npr.org website. Precisely 10 minutes after request #12643, we observed POST request #12645 to cr-input.hvrzm.com containing a compressed and encoded payload.
To ascertain whether our browsing activity was accessible via Company X’s service, we used a browser with the actively data-collecting Hover Zoom extension. We then visited a domain under our own control, for which we had already requested analytics data via the Company X dashboard. We visited a unique URL on our domain. See Figure 32 (request is highlighted in red) for the POST request containing our browsing activity.
Note: This is a screenshot of data within a Google Analytics account populated by Company X.
4.2 Obstacles
The two users we identified via the data collected from Company X had one extension in common: Hover Zoom. From the outset of our investigation of Hover Zoom, we encountered several obstacles to identifying the extensions and observing the data collection process. The obstacles included dilatory tactics, encoding and compression, prevention of crawling by search engine crawlers, and evolving behavior, as detailed below.
4.2.1 Dilatory observation for data collection
During the course of our investigation, we observed dilatory tactics by Hover Zoom and SpeakIt!. We observed this tactic on multiple OS platforms, on virtual machines (VMs) located in several geographic regions, and at various hosting providers that use different ASNs. We observed these extensions undergo two stages of updating before downloading a data collection instruction set.
Timeline 1
February 5, 2019: We installed (via the Chrome Web Store) the latest version of SpeakIt!, 0.3.10, on one VM and the latest version of Hover Zoom, 6.0.40, on another VM. Since we did not observe any browsing activity data collection at the time of installation, we surmised that either there was no data collection or that that was a quiescent period after which data collection could begin.
February 15, 2019: We observed each extension perform an automated Chrome extension update. Hover Zoom was updated to version 6.0.41, and SpeakIt! to version 0.3.11. We did not observe any collection of browsing activity data after the update.
March 1, 2019: We observed each extension perform an automated Chrome extension update. Hover Zoom was updated to version 6.0.42, and SpeakIt! to version 0.3.12.
March 1, 2019: Seconds after the update, we observed a GET request to cr-b.hvrzm.com (Hover Zoom) or cr-b.getspeakit.com (SpeakIt!), with a response payload containing a data collection instruction set. Following the GET request, all subsequent user browser activity data was collected and sent via a POST request to cr-input.hvrzm.com (Hover Zoom) or cr-input.getspeakit.com (SpeakIt!).
Timeline 2
May 22, 2019: We installed SpeakIt! version 0.3.21 (the latest version at the time) on a VM located in a different geographic region and at a different hosting provider.
June 1, 2019: SpeakIt! was automatically updated to version 0.3.22. After the update, we did not observe any browsing activity data collection.
June 15, 2019: We observed an automatic update to SpeakIt! version 0.3.23.
June 15, 2019: Seconds after the update, we observed a GET request to cr-b.getspeakit.com. This GET request’s response payload contained the data collection instruction set. Following this request, all subsequent user browser activity data was collected and sent via a POST request to cr-input.getspeakit.com.
We repeated this experiment six times, under numerous scenarios; each time we obtained the same result. In the past, similar tactics have been used to avoid detection of data collection. As of May 9, 2019, more than 2.29 million people use Hover Zoom and SpeakIt!.
4.2.2 Encoding, compression, and the DataSpii decoder
Encoding and compression technologies can be used by extensions to evade the detection of data collection. Such technologies can obfuscate data and limit bandwidth consumption. We observed the use of an LZ-based compression algorithm by Hover Zoom, SpeakIt!, FairShare Labs, and PanelMeasurement (see Table 5). Using Burp Suite and a Windows grep tool, we identified the response payload involved in encoding and compression. We identified a file in the Chrome profile’s File System folder containing the response payload.
To assist our analyses, we developed a tool for decoding and decompressing the POST request payloads from all Party Y browser extensions. The DataSpii decoder tool can be found at https://decoder.dataspii.com.
Table 5. Encoding and compression used for SpeakIt!’s POST requests to cr-input.getspeakit.com.
i = new Array(JSON.stringify(e)), o = t.utils.utf16to8(JSON.stringify(i)), n = t.compressor.deflate(o); return a.encode(t.utils.bin2String(n)) |
4.2.3 Robots.txt
We observed a robots.txt file with instructions to specifically prevent search engines from crawling the terms of service and privacy policy pages on four out of the eight extension authors’ websites. We circumvented the obstacle by performing additional searches on other suspected matching data, such as the corporate street address listed for the extensions.
4.2.4 Evolution
The evolving behavior of Party Y extensions complicated our research. Over the last five months, we observed several newly downloaded data collection instruction sets containing new encoding and compression methods, further hindering our decoding process. These changes required us to modify our DataSpii decoder so we could analyze the new data collection instructions sets. On May 8, 2019, for instance, we observed Hover Zoom’s introduction of a new hostname, pnldsk.adclarity.com. We then observed Hover Zoom making encoded POST requests to this hostname using a new encoding method.
In addition, we initially observed a Party Y extension making POST requests for every 10 URL visits; weeks later, we observed the same extension making POST requests for every 50 URL visits. The extension modified its behavior when the response payload of a GET request increased a batch_size value from 10 to 50.
4.3 Honeypot experiment
We visited a web page on a URL of our own domain using a browser with a Party Y extension. To classify the URL as a unique, we appended query string parameters that specified the time, OS, browser, and Party Y extension. To minimize outside interference, we avoided the use of analytics snippets and third-party trackers in the page code (see Figure 35).
Within one hour of the collection, the unique URL appeared in the Google Analytics account populated by Company X. Within 3.5 hours of the collection, we observed a third party visiting the unique URL. Over the course of several weeks, we visited more than two dozen unique URLs on our control domain. We performed this test with each of the eight extensions. Within three to five hours of each visit, a third party visited the unique URLs collected on our domain. Except for large attachments, we observed this visitation pattern 100% of the time. The rapid third-party visitation of collected URLs compromises the security of publicly accessible URLs and ephemeral links. Such visitation enables third parties to process and scrape the page contents of the URL.
While visiting unique URLs using a browser with a Party Y extension, we observed third party visits from several IPs, which all originated from AWS netblocks. We used DomainTools.com and standard DNS lookup tools to verify that the IP’s forward and reverse DNS records match. Table 6 presents the IP addresses, ASN owner, and hostname. We then observed the IP addresses matching forward and reverse DNS records, all of which originated from the kontera.com domain.
Table 6: Third-party IPs visiting URLs collected by our browser using a Party Y extension.
IP Address | Netblock Owner/ASN | Matching forward and reverse DNS Entry |
54.209.60.63 | Amazon Web Services / AS14618 | nat.aws.kontera.com |
52.71.155.178 | Amazon Web Services / AS14618 | nat-service.aws.kontera.com |
184.72.115.35 | Amazon Web Services / AS14618 | nat-service1.aws.kontera.com |
54.175.74.27 | Amazon Web Services / AS14618 | nat-service3.aws.kontera.com |
54.86.66.252 | Amazon Web Services / AS14618 | nat-service4.aws.kontera.com |
Honeypot results
Test 1. We visited our unique URL using Chromium 74.0.3684.0 with the SuperZoom extension on macOS. Note: The server time was logged in GMT. However, in our URL, the “time” parameter value was written in Pacific Daylight Time. In addition, we acknowledge an error in the notation of the time zone. The time zone values were mistakenly written as PST, when it should have been written as PDT.
Our original visit:
OUR-REDACTED-IP – – [11/Mar/2019:20:50:06 +0000] “GET /samtesting.html?&os=mac&brow=crmium&v=74.0.3684.0&ext=SZ&date=mar112019&time=149pmpst&socsec=123004567&customerssn=123004567&lastname=doe&first=john&last=doe&password=mypass&p=anotherpass&apikey=XYZ HTTP/1.1” 200 198 “-” “Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3684.0 Safari/537.36”
Approximately 4 hours later, an unknown AWS IP performed a GET request of the collected URL:
184.72.115.35 – – [12/Mar/2019:01:03:45 +0000] “GET /samtesting.html?&os=mac&brow=crmium&v=74.0.3684.0&ext=SZ&date=mar112019&time=149pmpst&socsec=123004567&customerssn=123004567&lastname=doe&first=john&last=doe&password=mypass&p=anotherpass&apikey=XYZ HTTP/1.1” 200 198 “-” “Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.1.25 (KHTML, like Gecko) Version/8.0 Safari/600.1.25”
Test 2. We visited a unique URL using Firefox 65.0.1 with the SaveFrom.net Helper extension (installed via the extension author’s website) on macOS. Note: The server time was logged in GMT. However, in our URL, the “time” parameter value was written in Pacific Daylight Time. Further, we acknowledge an error in the notation of the time zone. The time zone values were mistakenly written as PST, when it should have been written as PDT.
Our original visit:
[OUR-REDACTED-IP] – – [11/Mar/2019:21:42:00 +0000] “GET /samtesting.html?&os=macosx10143&brow=ff&v=65.0.1&ext=SFfromsfhelpernet&date=mar112019&time=241pmpst&socsec=123004567&customerssn=123004567&lastname=doe&first=john&last=doe&password=mypass&p=anotherpass&apikey=XYZ HTTP/1.1” 200 198 “-” “Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:65.0) Gecko/20100101 Firefox/65.0”
Approximately 3.5 hours later, an unknown AWS IP performed a GET request of the collected URL:
184.72.115.35 – – [12/Mar/2019:01:17:47 +0000] “GET /samtesting.html?&os=macosx10143&brow=ff&v=65.0.1&ext=SFfromsfhelpernet&date=mar112019&time=241pmpst&socsec=123004567&customerssn=123004567&lastname=doe&first=john&last=doe&password=mypass&p=anotherpass&apikey=XYZ HTTP/1.1” 200 198 “-” “Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.1.25 (KHTML, like Gecko) Version/8.0 Safari/600.1.25”
Additional observations
Using a browser with a Party Y extension, we visited various sample file types, including zip and SQL database files. When visiting the zip file, the browser downloaded the file into the file system. It did not load them directly in the browser. As a result, we did not observe the transmission of the zip URL to a third-party hostname. However, the SQL files were loaded in the browser and the URL of our SQL files was transmitted to cr-input.hvrzm.com. Three hours after it was collected by the Party Y extension, we observed a third-party visit to our SQL file:
184.72.115.35 – – [18/May/2019:12:50:27 +0000] “GET /dataspii-sql-50000rows.sql HTTP/1.1” 200 4393501 “-” “Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.1.25 (KHTML, like Gecko) Version/8.0 Safari/600.1.25”
As noted in the log entry above, the entirety of the file, 4393501 bytes, was requested and transmitted to the visitor’s IP: 184.72.115.35. Using a browser with a Party Y extension, we visited an SQL file that was 8MB in size, but we did not observe the expected third-party file request. Perhaps this was due to the larger file size.
4.3.1 Network map vulnerability — collection of LAN data
We set up a non-publicly resolvable website on a web server which contained three HTML pages and one CSS file. The primary index page and two additional HTML pages each include images with hyperlinks to the other two pages. Each page also references a CSS file. We placed these HTML pages on a web server accessible only via a private LAN. We browsed the website using the following method:
- preconfigured the local computer’s hosts file with the non-publicly resolvable hostname alongside our local IP address,
- ensured the computer was connected to both the internet and private LAN, and
- used a browser that has a Party Y extension enabled.
In this experiment, we used the Hover Zoom Chrome extension. We visited two (non-publicly resolvable) URLs on our LAN, which we denote as URL 1 and URL 2. We monitored all connections by using Burp Suite or Chrome Development Tools’ “Inspect views background page” option for the Party Y extension. We then verified whether our data was retrievable using the Company X service. We denote URL 1 and URL 2 below:
URL 1
http://dataspii-lan-experiment-on-non-publicly-resolvable-hostname.[REDACTED].com/index.html?&os=win&brow=chrome&v=74.0.3729.131&ext=HZ&date=may162019&time=117ampst&lantest=yes&usinghostsfilefordns=yes&ispublic=no&ishostnamepubliclyresolveable=no&withoutfoxyproxy=yes
URL 2
http://dataspii-lan-experiment-on-non-publicly-resolvable-hostname.[REDACTED].com/corporate-passwords-stored-here.html
After visiting the URLs, we observed Hover Zoom making POST requests to three different hostnames:
- cr-input.hvrzm.com
- p.ymnx.co
- pnldsk.adclarity.com
These POST requests occurred on ports 443 and 40443.
4.3.1.1 URL 1 and URL 2 sent to cr-input.hvrzm.com
URL 1 sent to cr-input.hvrzm.com
Using Chrome’s Development Tools’ inspect views background page feature, we observed the encoded POST request and decoded it with our DataSpii decoder tool (see Figure 36).
The captured metadata included the site’s LAN IP address (10.10.33.5), page title, timestamp, and URL visited, as well as the URLs of the resources within the page, including CSS files, JS files, and images. The captured data included many elements of the HTTP header, including server type, content security policy, timestamp, access-control-allow-methods, access-control-allow-credentials, access-control-max-age, last-modified timestamp, and more (see Section 4.5.1 for more information).
URL 2 sent to cr-input.hvrzm.com
We visited URL 2 on our private LAN, observed the encoded POST request to cr-input.hvrzm.com using Burp Suite, and decoded it with the DataSpii decoder (see Figure 37).
4.3.1.2 URL 2 sent to pnldsk.adclarity.com
We observed Hover Zoom sending an encoded POST request to a second hostname, pnldsk.adclarity.com. We decoded that request with our DataSpii decoder. The decoded request included values for panelistId, ticketId, the URL we visited on our private LAN, and several encoded values for content (see Figure 38). We subsequently ran one content value through a standard base64 decoder, which revealed the capture of hyperlinks that were coded within our web page. The content value also included the image source URL used for the hyperlink (see Figure 39). Such data could be used by bad actors to map LAN infrastructure.
4.3.1.3 URL 2 sent to p.ymnx.co
We observed Hover Zoom make a third POST request to p.ymnx.co. The decoded request contained the URL that was visited on our LAN as well as the user agent (UA) of our browser (see Figure 40).
4.4 Quantification
Quantifying Party Y’s collection of PII and CI and their dissemination by both Party Y and Company X warrants further investigation. We do not know the number of Company X members or the domains for which they requested data. In addition, the total number of data points collected or disseminated is difficult to ascertain. Company X does not include historical data to their customers. Company X provides the real-time data as it is being captured (with a one-hour delay) to a member’s Google Analytics account. The data is then saved, collated, and made available to the Company X member. In addition, Company X members can log into a dashboard and share the data with anyone who has a Gmail account.
Second, data collection companies are in the business of selling data. Our investigation did not set out to examine links between Party Y’s business transactions or legal relationships with other companies. However, as mentioned in Section 4.3, we observed third parties visiting the unique URLs captured by Party Y extensions. Such capability is dangerous to global security.
4.4.1 Google Analytics hit data
Before March of 2019, customers of Company X that requested the data of a domain could see the number of hits made to it. Using the data we requested for fbcdn.net as an example, over a 30-day period we observed 60,267,694 hits to that domain. A Google Analytics hit is defined as “[a]n interaction that results in data being sent to Analytics. Common hit types include page tracking hits, event tracking hits, and ecommerce hits” [11].
In addition, over that same 30-day period, we recorded the number of Google Analytics hits from 28 domains via the Company X service. The combined hit count was over 246 million. For reference, 3 of the 28 domains are ranked in the Alexa Top 100, 17 of the 28 domains in the Top 5000, and the remaining 8 are outside of the top 5000. We estimate that within one year, the hit count of these 28 domains would be in the billions.
4.4.2 Sample PII collection stats
Using an unnamed airline as an example, a 28-day period in March of 2019 yielded nearly two million Google Analytics property hits for that airline. During that period, we observed 175,000+ unique URLs, of which 8,500 URLs contained passenger names, yielding an average of 304 unique URLs of passenger names per day. Using this number as a daily average, we estimate an exposure of over 110,000 unique URLs containing passenger names per year.
Note: These are not scientific measurements. We acknowledge that there are many variables, such as: a Party Y extension’s active installation count during any period, evolving extension behavior, passenger traffic patterns over the course of the year, etc. In addition, the addition or removal of one individual (e.g., a travel agent) with a Party Y extension, could greatly affect these results.
4.4.3 Party Y redaction efforts
We recognize that Hover Zoom, SpeakIt!, PanelMeasurement, and FairShare Unlock have made some effort to redact PII during data collection. We observed redaction measures for query string parameters such as firstName and lastName; however, the measures did not cover all variations, such as first and last (see Figure 34). For example, Southwest.com uses the query string parameter, passengerLastName. We observed unredacted last names in all URLs containing the passengerLastName parameter in Southwest.com traffic data obtained from Company X.
Furthermore, the redaction abilities did not include employee tasks and other CI from page titles, nor did any Party Y extension exclude data collected from LAN environments. Moreover, we did not observe redaction of any PII or CI by SuperZoom and SaveFrom.net Helper. We noted a single exception: email addresses. For example, we observed the collection of unredacted email addresses by SuperZoom; however, the email addresses appeared in redacted format on the Company X platform. We presume the redaction of email addresses occurred by a third-party processor or by Company X.
4.5 Effects
4.5.1 Private LAN Blueprint
When corporate staff use extensions such as Hover Zoom while browsing pages on their LAN, the extensions may be acting as real-time private LAN crawlers. As seen in Figures 36 through 39, we discovered Hover Zoom’s ability to capture LAN IPs, hostnames, web server technology data, and even the hyperlinks embedded within our page. We then observed Company X disseminating LAN data to its members in near real-time. (Note that whereas Hover Zoom appears to collect many types of metadata, Company X does not appear to provide all the metadata types [e.g., last-modified] to its customers.) Armed with such data, a bad actor could create a detailed network topology of an organization’s private infrastructure. In addition, the collation of such data facilitates the identification of internal disaster recovery systems, firewalls, routers, switches, and web server technologies. Such detailed information can, in turn, be used to monitor and exploit vulnerabilities.
4.5.2 Browser vendor’s remote deactivation did not stop collection
After reporting a Party Y extension to a browser vendor, the vendor remotely disabled the extension for all users; however, that did not stop the data collection. While the extension ceased its functionality, we continued to observe our browsing activity being sent via POST request to Party Y servers. The data collection stopped when we removed the extension. Moreover, a second report to a browser vendor resulted in the extension’s removal from the extension store; however, the extension was not remotely deactivated. It remained active and working in our test environment.
4.6 Courses of action
4.6.1 Preemptive removal of PII, API keys, and confidential information from metadata
The continuous evolution of data collecting extensions, weak browser vendor policies, and the general complacency about publicly accessible unique URLs have engendered the DataSpii leak. Data collection via privacy-violating browser extensions has been a well-documented issue for nearly two decades [12]. Party Y extensions are a few of many privacy-violating extensions. While Google and Mozilla may decide to improve their extension-vetting processes or increase their browser protections, browser extensions are merely one of many avenues where data collection occurs. In addition, data-mining firms will continue to collect data via other platforms (e.g., mobile).
We recommend that developers, corporations, and cybersecurity professionals act preemptively to prevent the leaking of PII and CI via metadata. This includes removing PII, CI, and confidential material within metadata such as URLs. We propose that companies further protect their APIs by restricting access to whitelisted IP addresses. In addition, file sharing services such as Dropbox should increase protections by expanding their premium features such as file expiration into their free tier. The restriction of such features to only the premium tier poses global risks.
The companies impacted by the capture of LAN data, as listed in this report, may want to consider changing their local environments by, for instance, modifying LAN network IP ranges or subnets.
4.6.2 Firefox Send: Archetype of file sharing services
Mozilla’s Firefox Send is a file sharing service with enhanced security features that protect against leaks. Users of this service upload files and share the links with others. Unlike other file sharing services, Firefox Send, by default, sets URLs to expire in 24 hours and one download. In addition, Firefox Send does not insert the file or document name into metadata such as the page title. Such default measures protect users against the DataSpii leak.
5. Discussion
5.1 Motivation
Our research on DataSpii arose out of sheer serendipity. While attempting to sign up with Company X’s well-known sister company, we were redirected to Company X’s site. We thought we were signing up with the sister company, yet we were, in reality, signing up with Company X. At first glance, Company X’s claims of viewing any domain’s traffic data seemed dubious. Instead of canceling our free trial with Company X, we instead opted to continue the trial to see where our observations would lead us. Within days of requesting data of a domain under our control, we recognized unique URLs that could only have been accessed by specific users. Since Company X states that it provides data only from desktop browsers, browser extensions and their access to privileged API features naturally emerged as the culprit.
Having associated unique URLs with users of domains under our control, we then asked three of the impacted users to provide us with a list of their browser extensions. Taken aback by the implications, the first user had immediately deleted all browser extensions and was, as a result, unable to identify the extension names. The other two users provided us with lists of their browser extensions, which revealed one common denominator: Hover Zoom. The first user was using Firefox on Ubuntu, the second was using Chrome on a Chromebook running Chrome OS, and the third was using Chrome on Windows. Despite Company X’s claim that its users have opted for their data to be collected, Hover Zoom users we spoke with were unaware of the collection or could not recall agreeing to it. The discrepancy between what the company claims and what the users experienced prompted our investigation.
5.2 Alternative theories
Initially, we theorized a sensor sitting at the edge of an ISP or corporate network. While a sensor could in theory capture unencrypted data, it would not be able to do so from websites enforcing the HTTPS protocol. However, similar to our methodology using Burp Suite, network monitoring systems can allow enterprises to monitor HTTPS traffic via Certificate Authority (CA) certificates as deployed by group policy. Considering that Company X was able to present encrypted data outside of corporate environments, this was beyond the capability afforded by the use of a sensor. This technology ignores Company X’s ability to provide encrypted data that originated outside of corporate environments. Barring a catastrophic vulnerability in encryption protocols, a more plausible explanation would be that local software was capturing browsing activity data. While we cannot rule out additional forms of collection, the most logical explanation was the privileged API features afforded to browser extensions.
Second, we reviewed the methods used by Party Y and Company X to ascertain whether our findings could be spurious. In theory, we acknowledge it is possible to submit invalid data to Group B hostnames such as cr-input.hvrzm.com, which data could then be delivered to members of Company X. In addition, an IP address’s netblock owner could be modified to any owner name by modifying the ASN or OrgName records at the regional internet registry (RIR) (e.g., ARIN).
We believe such orchestration is unlikely, for several reasons. First, any practical gains would be minimal compared to the effort required to fabricate a data collection scenario like DataSpii. The amount of resources required to achieve such a feat only to sell the data at a nominal cost would be illogical. Furthermore, the orchestrator would jeopardize profit. Secondly, even if it were practical, no corporation would want to invite the scrutiny such orchestration would entail. Third, it is unprecedented. If we assume the orchestration occurred, there would be many technical obstacles. An orchestrator would have to fabricate employee tasks, LAN hostnames, and page URLs. The orchestrator would then have had to identify and associate service providers and their IP addresses with fabricated visitors. The orchestrator would concoct extensions as a ruse only to sell the data at nominal fees. For these reasons and others, we believe that DataSpii was a bonafide leak on a catastrophic scale.
5.3 Related work
Other investigations into browser extensions that collect data have been recently published [13,14]. In one such investigation, Weissbacher et al. employed a novel prototype, Ex-Ray, to identify privacy-violating extensions on a large scale. In the process, they uncovered the extension SpeakIt!, just as we did. The authors dismissed the likelihood of dormancy given the economic incentive for the immediate collection of data via privacy-violating extensions. Had the authors monitored the extensions as long as we did, they may well have replicated our findings of dormancy.
In another investigation, Kapravelos et al. used Hulk, a novel system comprising of HoneyPages and a fuzzer, to simulate the behavior of extensions [15]. During the course of our investigation, several Party Y extensions required 10 or 50 page visits before eliciting a POST request with the browser activity data. We observed the batch_size parameter in response to GET requests by repeated manual input, a process that systems like Hulk could obviate.
In 2018, Quan and Kapravelos employed a novel prototype called Mystique to uncover browser extension data leaks [16]. Of the 181,683 extensions analyzed in the study, 3,868 (2.13%) emerged as candidates for leaking sensitive data. Although a step in the right direction, approaches like Mystique may not be suited to uncovering leaks concealed by dilatory tactics. We propose that Mystique and Ex-Ray extend the period of monitoring, just as we did.
Inspired by these prior investigations, we developed the DataSpii decoder tool to decode and decompress the data collected by Party Y extensions. The DataSpii decoder enables instant analysis of the collected data, allowing us to identify the capture of sensitive private LAN data from Party Y extensions. To the best of our knowledge, we are the first to successfully employ metadata analyses to accurately identify impacted individuals and corporations.
6. Conclusion
Our investigation determined that Party Y extensions had been providing sensitive data to members of the Company X service. We made several key observations in this regard. First, browser extensions like SpeakIt! and Hover Zoom employed dilatory tactics. The collection of data began 24 days post installation. Second, third-party visits to the unique URLs collected by Party Y extensions occurred. Third, a Party Y extension captured detailed, private LAN data before sending it to multiple hostnames.
Based on our investigation, we offer several recommendations. First, we recommend further research using novel methods to replicate, qualify, or extend our findings. Second, we recommend that browser vendors review their extension policies. Third, we recommend that corporations enact stronger browser security policies. Fourth, we recommend that web developers remove PII and CI from metadata such as URLs.
Though prudent, short-term fixes will not ultimately protect data from threats such as DataSpii. True data security will require the sustained collaboration of web developers, cybersecurity professionals, marketers, and browser vendors. The implications of our investigation transcend any one extension, website, Fortune 500 company, browser, or OS.
DataSpii arose from hazardous assumptions about data security. It circumvented some of the best technological safeguards (e.g., authentication and encryption) against data leaks. Even the most responsible individuals proved vulnerable to DataSpii; with vast budgets and myriad experts on hand, even the largest cybersecurity corporations proved vulnerable to DataSpii. Our data is only as secure as those with whom we entrust it. It takes just one party to unwittingly leak another party’s data.
Born of serendipity, our investigation has carefully documented a catastrophic leak on an unprecedented scale. We hope this report is evaluated by experts in the field. Once that has been done, the real work can begin.
Acknowledgements
I would like to thank Bridget Vandenbosch for her editing, logo and site design, and tireless support and assistance over the last seven months. I would like to express my great appreciation to Dr. Michael Weissbacher for his valuable suggestions. I would also like to thank the tech editor who told me six months ago that my initial discovery alone was not enough. Finally, I would like to thank the Electronic Frontier Foundation for their legal assistance. This report would not have been possible without the help of the above.
References
[1] Schaub, Florian, Aditya Marella, Pranshu Kalvani, Blase Ur, Chao Pan, Emily Forney, and Lorrie Faith Cranor. “Watching them watching me: Browser extensions’ impact on user privacy awareness and concern.” In NDSS workshop on usable security. 2016.https://pdfs.semanticscholar.org/d5ae/e2be1abe0d672a205a0792adbae352c733f9.pdf.
[2] “Manage Events with Background Scripts.” Chrome: Developer. Accessed May 19, 2019. https://developer.chrome.com/extensions/background_pages.
[3] Bruce Schneier, Data and Goliath: The hidden battles to collect your data and control your world (New York: Norton, 2013).
[4] Council of the European Union, and European Parliament. “Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the Protection of Natural Persons with Regard to the Processing of Personal Data and on the Free Movement of Such Data, and Repealing Directive 95/46/EC (General Data Protection Regulation).” Official Website of the European Union. Accessed May 24, 2019. https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32016R0679.
[5] Sobers, Rob. “The Dangers of Shared Links.” Inside Out Security (June 11, 2013). Accessed June 6, 2019. https://www.varonis.com/blog/the-dangers-of-shared-links/.
[6] Atlassian Cloud Support. “Changing the cloud instance URL address or migrating to a new cloud instance.” (April 2019). https://confluence.atlassian.com/cloudkb/changing-the-cloud-instance-url-address-or-migrating-to-a-new-cloud-instance-691011835.html.
[7] “Practice Medicine, Not Administration.” DrChrono. Accessed May 21, 2019. https://www.drchrono.com/.
[8] “Contact Us.” FireEye. Accessed June 1, 2019. https://www.fireeye.com/company/contact-us.html.
[9] “Customers – What Companies Use to Service You.” Zendesk. Accessed May 19, 2019. https://www.zendesk.com/why-zendesk/customers/.
[10] Gantz, John F., Pavel Soper, Thomas Vavra, Lars Smith, Victor Lim, and Stephen Minton. “Unlicensed software and cybersecurity threats.” White Paper (2015). http://download.softwareculicenta.ro/raport-idc-2015-03-malware.pdf.
[11] “Hit – Analytics Help.” Google. Accessed June 12, 2019. https://support.google.com/analytics/answer/6086082?hl=en.
[12] Martin, David M., Richard M. Smith, Michael Brittain, Ivan Fetch, and Hailin Wu. “The privacy practices of Web browser extensions.” Communications of the ACM 44, no. 2 (2001): 45-50.
[13] Weissbacher, Michael, Enrico Mariconti, Guillermo Suarez-Tangil, Gianluca Stringhini, William Robertson, and Engin Kirda. “Ex-Ray: Detection of History-leaking Browser Extensions.” In Proceedings of the 33rd Annual Computer Security Applications Conference, pp. 590-602. ACM, 2017.
[14] “Chrome Extensions – AKA Total Absence of Privacy.” Detectify Labs. Accessed June 12, 2019. https://labs.detectify.com/2015/11/19/chrome-extensions-aka-total-absence-of-privacy/.
[15] Kapravelos, Alexandros, Chris Grier, Neha Chachra, Christopher Kruegel, Giovanni Vigna, and Vern Paxson. “Hulk: Eliciting malicious behavior in browser extensions.” In 23rd {USENIX} Security Symposium ({USENIX} Security 14: 2014): 641-654.
[16] Chen, Quan, and Alexandros Kapravelos. “Mystique: Uncovering information leakage from browser extensions.” In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pp. 1687-1700. ACM, 2018.
[17] “SpeakIt! – Text to Speech for Chrome.” n.d. Chrome Web Store. Google. Accessed May 19, 2019. https://chrome.google.com/webstore/detail/speakit/pgeolalilifpodheeocdmbhehgnkkbak?hl=en-US.
[18] “SuperZoom.” n.d. Chrome Web Store. Google. Accessed February 1, 2019. https://chrome.google.com/webstore/detail/super-zoom/gnamdgilanlgeeljfnckhboobddoahbl?hl=en
[19] “SaveFrom.net Helper All-in-1 / Youtube Downloader – Get This Extension for Firefox (en-US).” January 11, 2011. Accessed May 9, 2019. https://addons.mozilla.org/en-US/firefox/addon/savefromnet-helper/.
[20] “FAQ – Frequently Asked Questions.” Savefrom.net. Accessed May 21, 2019. https://en.savefrom.net/faq.php.
[21] “Fairshare Unlock.” Accessed May 21, 2019. https://chrome.google.com/webstore/detail/fairshare-unlock/alecjlhgldihcjjcffgjalappiifdhae.
[22] “PanelMeasurement.” n.d. Chrome Web Store. Google. Accessed May 19, 2019. https://chrome.google.com/webstore/detail/panelmeasurement/kelbkhobcfhdcfhohdkjnaimmicmhcbo.
Appendix – Extensions
Extension #1: Hover Zoom
Extension #2: SpeakIt!
Extension #3: SuperZoom
Extension #4: SaveFrom.net (when downloaded from en.savefrom.net using Firefox on Mac or Ubuntu)
Extension #5: FairShare Unlock
Extension #6: Panel Measurement
Extension #7: Branded Surveys
Extension #8: Panel Community Surveys
Honeypot Proof of Concept
Privacy policy comparisons can be made here:
Hover Zoom: http://www.hoverzoom.net/disclosure/_pp.pdf
SpeakIt!: http://skechboy.com/speakit/disclosure/_pp.pdf
FairShare Unlock: http://privacy-policy.fairsharelabs.com/
PanelMeasurement: https://www.panelmeasurement.com/privacy
SuperZoom: http://funnerapps.com/privacy.php
SaveFrom.net Helper: https://en.savefrom.net/privacy-policy.html
Branded Surveys: https://surveys.gobranded.com/page/branded-surveys-privacy-policy
Panel Community Surveys: https://www.panelmeasurement.com/privacy
Extension #1: Hover Zoom
Available on: Chrome
Chrome Web Store extension ID: nonjdcjchghhkdoolnlbekcfllmednbl
Installation count: 811,750 users (Chrome Web Store, April 3, 2019)
Chrome Web Store link: https://chrome.google.com/webstore/detail/hover-zoom/nonjdcjchghhkdoolnlbekcfllmednbl?hl=en
Website/Developer: hoverzoom.net
Privacy policy: http://www.hoverzoom.net/disclosure
Hover Zoom’s background is provided in the Discussion section of this report.
Hover Zoom’s history is provided in the Discussion section of this report.
Hover Zoom observations
GET and POST requests to the following hostnames were observed:
cr-b.hvrzm.com
cr-input.hvrzm.com
p.ymnx.co
Hover Zoom’s data collection process is cited in the Results section of this report.
Extension #2: SpeakIt!
Available on: Chrome
Chrome Web Store extension ID: pgeolalilifpodheeocdmbhehgnkkbak
Installation count: 1,494,570 users (Chrome Web Store, May 18, 2019)
Chrome Web Store link: https://chrome.google.com/webstore/detail/speakit/pgeolalilifpodheeocdmbhehgnkkbak?hl=en-US
Website/developer: Skechboy.com
Privacy policy: https://skechboy.com/speakit/disclosure
Background
The Chrome Web Store describes SpeakIt!, which has more than 1.4 million users, as a “free text to speech extension that reads selected text using TTS technology with language auto-detection” [17]. Like Hover Zoom, SpeakIt!’s data collection process did not begin until 24 days after installation.
SpeakIt!’s history is provided in the Discussion section of this report.
SpeakIt! observations
GET or POST Requests to the following hostnames were observed:
skechboy.com
cr-b.getspeakit.com
cr-b.hvrzm.com
cr-input.getspeakit.com
Data collection process
The SpeakIt! data collection patterns exhibit stark similarities to those of Hover Zoom. Immediately after installation, a GET request was made to cr-b.getspeakit.com noting the installation time along with a unique browser ID. Using Burp Suite, we captured such requests. The requests can be seen in a series of figures that follow. We identify the individual GET and POST requests using the request number located in the left-hand column of the screenshot (see Figure A1).
Excluding the orange-highlighted text in Figure A2, nearly all contents of the response were saved within the following file: C:\Users\Administrator\AppData\Local\Google\Chrome\User Data\Default\File System\001\p\00\00000000.
Extension #3: SuperZoom
Available on: Chrome & Firefox
Chrome Web Store extension ID: gnamdgilanlgeeljfnckhboobddoahbl
Installation count:
Chrome: 329,313 users (Chrome Web Store, Feb 20, 2019)
Firefox: Unknown at this time
Website/developer: http://funnerapps.com/
Privacy policy: http://funnerapps.com/privacy.php
Background
According to the Google Chrome Web Store, SuperZoom’s function is to “Zoom images on popular websites (Reddit, Imgur, and more) by hovering over a thumbnail. Keep your browsing simple and on one page” [18]. Unlike Hover Zoom or SpeakIt!, SuperZoom did not perform redaction. The SuperZoom data collection process occurs right away and a waiting period is not required.
Super Zoom observations
GET or POST Requests to the following hostnames were observed:
cr-input.superzoom.net (Chrome)
ff-input.superzoom.net (Firefox)
rules.superzoom.net
Data Collection process
In this report, SuperZoom figures are presented only for the Google Chrome browser. In our observations, the data collection process for Firefox is identical to that for Chrome.
After the installation of SuperZoom, any browsing activity is immediately sent via POST request to cr-input.superzoom.net when using Chrome or ff-input.superzoom.net when using Firefox. The requests are base64-encoded and can be decoded with any base64 decoder.
SuperZoom encoded the URL along with the page title and sent the base64-encoded POST request to cr-input.superzoom.net.
Extension #4: SaveFrom.net Helper add-on (from author website)
Available on: Firefox
Install URL: http://download.sf-helper.com/mozilla/helper_last.html
Installation count: ≤140,000 users (Firefox Add-on Store, May 9, 2019) [19]
Developer: MagicBit, Inc. d/b/a SaveFrom.net
Privacy policy URL: https://en.savefrom.net/privacy-policy.html
Note: The invasive behavior of SaveFrom.net Helper only occurs when it is installed from the author’s website under certain conditions. We were able to replicate the data collection process only by using Firefox on macOS and Firefox on Ubuntu. We did not observe the invasive behavior when the add-on was installed from official browser vendor stores. The installation count is only provided here as a reference point, and may not represent the total number of users experiencing the invasive behavior.
Background
The SaveFrom.net Helper extension “makes downloading from the internet convenient and simple. With our help you can download audio, video, and other types of files from various websites and social networks: youtube.com, vk.com, vimeo.com, and others” [20]. Using the installation method below, data collection occurs once the user begins to browse websites.
SaveFrom.net Helper observations
GET or POST Requests to the following hostnames were observed:
ff-input.mxpnl.net
sac1abcc59.savefr.com
Installation
We installed this extension by visiting the official Savefrom.net website using the following procedure:
- Using Firefox on macOS or Firefox on Ubuntu, visit https://en.savefrom.net/.
- At the top of the page, click “Install.”
- Click the green “Install” button at the center of the page.
- Click “Allow” to install the extension.
Alternatively, it can be installed by visiting http://download.sf-helper.com/mozilla/helper_last.html.
Note: When using FireFox for Windows, the en.savefrom.net site may request the user to download an executable file. We have not tested any of the executables or any other versions of the extension.
Data collection process
After extension installation, a user’s browsing activity is immediately sent via POST request to ff-input.mxpnl.net. The requests are base64-encoded and can be decoded with any standard base64 decoder.
Extension #5: Fairshare Unlock Extension
Chrome Web Store extension ID: alecjlhgldihcjjcffgjalappiifdhae
Installation count:
Chrome: 1,055,216 users (Chrome Web Store, Mar 26, 2019)
Firefox: Unknown at this time
Available on: Chrome and Firefox
Chrome Web Store link: https://chrome.google.com/webstore/detail/fairshare-unlock/alecjlhgldihcjjcffgjalappiifdhae
Firefox link: https://unlock.fairsharelabs.com/install/
Developer: www.fairsharelabs.com
Privacy policy URL: http://privacy-policy.fairsharelabs.com/
Background
The description on the Chrome Web Store states, “Fairshare enables you to access premium content and functionality in popular websites and services for free when you join the Fairshare consumer marketing panel!” [21]. It also states it may “collect and share aggregated information about their web usage activities with third parties for business and research purposes.” Performing a Chrome Web Store search for “fairshare unlock” yields no results; however, the extension is accessible when using a direct link.
While we could not find the Firefox extension in the Mozilla add-on store, FairShare Labs can be installed on Firefox by visiting https://unlock.fairsharelabs.com/install/ and clicking “Support with Fairshare.”
History
We did not find any issues being reported about FairShare Unlock.
FairShare Unlock observations
GET or POST Requests to the following hostnames were observed:
cr-b.ebehaviors.com (Chrome)
cr-input.ebehaviors.com (Chrome)
ff-b.ebehaviors.com (Firefox)
ff-input.ebehaviors.com (Firefox)
p.ymnx.co (Chrome)
Data collection process
The FairShare Labs description states it may collect data about “‘web usage activities.” The extension begins its collection process immediately, and it performs redaction measures similar to those of Hover Zoom and SpeakIt!.
Extension #6: PanelMeasurement
Chrome Web Store extension ID: kelbkhobcfhdcfhohdkjnaimmicmhcbo
Installation count: Chrome: 517,241 users (Chrome Web Store, May 21, 2019)
Firefox: Unknown at this time
Available on: Chrome, possibly Firefox
Chrome Web Store link: https://chrome.google.com/webstore/detail/panelmeasurement/kelbkhobcfhdcfhohdkjnaimmicmhcbo
Developer: [email protected]
A Google search shows it may have previously been available on the Mozilla add-on store.
Background
PanelMeasurement is a Chrome extension developed by DDMR. It is described as a “free tool that will notify you of surveys that you can take based on your profile.” [22]. The description also states, “We collect data in order to build your profile, but rest assured, this data is always non-personally identifiable and anonymized.” Performing a Chrome Web Store search for “panelmeasurement” yields no results; however, the extension is accessible using the included Chrome Web Store link.
PanelMeasurement observations
GET or POST Requests to the following hostname were observed:
cr-b.panelmeasurement.com (Chrome)
cr-input.panelmeasurement.com (Chrome)
Extension #7: Branded Surveys
Chrome Web Store extension ID: dpglnfbihebejclmfmdcbgjembbfjneo
Installation count:
Chrome: 8 users (Chrome Web Store, July 2019)
Available on: Chrome
Chrome Web Store link: https://chrome.google.com/webstore/detail/branded-surveys/dpglnfbihebejclmfmdcbgjembbfjneo
Offered by: DDMR
Background
The Branded Surveys extension is described as one that “offers our users opportunities to earn cash and other prizes by completing online surveys.” It continues to state “[b]y uploading this extension, you are agreeing to the collection of your digital behavior data as a means to improve survey targeting and other market research activities. Bear in mind that all data is anonymized and will not be associated with your personally identifiable information.”
Branded Surveys observations
POST Requests to the following hostname were observed:
cr-input.panelmeasurement.com (Chrome)
Extension #8: Panel Community Surveys
Chrome Web Store extension ID: lpjhpdcflkecpciaehfbpafflkeomcnb
Installation count: Chrome: 1 user (Chrome Web Store, July 2019)
Available on: Chrome
Chrome Web Store link: https://chrome.google.com/webstore/detail/panel-community-surveys/lpjhpdcflkecpciaehfbpafflkeomcnb
Offered by: DDMR
Background
The Panel Community Surveys extension is described as “an effective and focused way to make money from your opinions and insights.” It continues to state, “[t]his extension collects your digital behavior data and shares it with 3rd parties to enable better survey targeting and other market research activities.”
Branded Surveys observations
POST Requests to the following hostname were observed:
cr-input.panelmeasurement.com (Chrome)
Honeypot proof of concept
Using our honeypot, we identified a third -party visiting DataSpii-leaked URLs shortly after we visited a self-generated unique link using a Party Y browser extension. The immediate third-party processing, scraping, or visiting of URLs creates a security issue and challenges the data protection features of publicly accessible URLs as well as ephemeral links.
PoC for third-party visiting captured unique URLs
Environment: Using a new Windows VM, install the following:
- Google Chrome
- Burp Suite Community Edition
- FoxyProxy Standard Chrome Extension
- Any Party Y extension
Step 1: Visit a domain in which you, the reader, control and have access to the raw logs for that domain.
Step 2: Create a basic HTML page without any snippets or ads of any kind on that domain.
Step 3: Visit a unique and never-before-seen URL using a browser with an invasive browser
extension.
Step 4: Visit an additional 10–50 links in your browser. Note: Some of the extensions send URLs back immediately, some in batches of 10, and some in batches of 50. This will send your data back to the collecting server.
Step 5: Monitor the domain’s raw access logs. Approximately three to five hours after the visit, a third
party may visit the URL.
Look into “data leaks” by Avg/Avast (likely others too), it’s even more scandalous than extensions because those “anti-viruses” are supposed to keep user safe – they appear to be connected to JumpShot
Thank-you for a great piece of work. This is a major issue for all organizations.
My big question is, how does Adblock Plus rate in terms of security?
There are more extensions that perform similar activity. Check out connections to mini.s-shot[.]com (web history and screenshots of the webpages you visited) up to this domain. Install these extensions and see for yourself:
https://chrome.google.com/webstore/detail/classic-nintendo-contra-g/ondilobpgmpeapkggjimpepobepkjeoe
https://chrome.google.com/webstore/detail/classic-battle-tanks-game/ambedopgmfdnldlfgfbpbffjjpbidknj
https://chrome.google.com/webstore/detail/classic-digger-game-new-t/kgkbjkboohbelapfpfablmdihlaiihaf
https://chrome.google.com/webstore/detail/doodle-jump-game-new-tab/kcfcfndpebgecojfiiiaignbpdbbdodk
https://chrome.google.com/webstore/detail/classic-super-mario-game/hngfmjanfdlnbdbieplabjjgdgiahbcf
Then there are LNKR chrome extensions too that perform similar activity:
https://securitytrails.com/blog/lnkr-malicious-browser-extension
Message if you want more details
Great work. You opened my eyes to things threats I did’t know before. Thank you.
WOW! How do we find and delete these extensions?
Hi Jan,
I’ve included the instructions below:
For Google Chrome:
Manually enter the following URL in your browser: chrome://extensions
On the following page, click Remove next to the extension in question.
Mozilla Firefox:
Manually enter the following URL in your browser: about:addons
On the following page, click Remove next to the extension in question.
If you have any other questions, please let me know.
-Sam