Somewhere in a server rack in Virginia, a database contains a record with a name, home address, estimated income, marital status, vehicle registration, and a list of websites visited in the last month. The person described by this record didn't create it. Nobody asked permission to build it. The company that maintains it has no direct relationship with the person it describes. The record was assembled by purchasing fragments of data from dozens of sources, merging them using an email address as a join key, and packaging the result as a consumer profile ready for sale. This happens millions of times a day, across hundreds of companies, and almost nobody whose data is being traded knows the extent of it.
The data-broker industry operates in a space between legal and invisible. Most of what brokers do is technically lawful under current regulations in most jurisdictions. That doesn't make it ethical, and it doesn't mean individuals have no options. But understanding the machinery is the first step toward disrupting it.
Where Data Brokers Get Their Material
Public records are the foundation. Property deeds, voter registration rolls, court filings, business registrations, and marriage certificates are all public under most Western legal systems. A data broker doesn't need to hack anything to get a home address, the rough value of a property, or the date a person registered to vote. This information is public by design, originally intended for transparency and civic participation, now harvested at scale for commercial profiling.
Commercial data sources add depth. Credit card transaction records (aggregated and supposedly anonymised), loyalty programme participation, warranty registrations, magazine subscriptions, and online purchase histories all flow into broker databases. A person who fills out a warranty card for a kitchen appliance is providing their name, address, email, and the fact that they just spent a specific amount on a specific product. That single data point, combined with hundreds of others, builds a detailed consumption profile.
Online tracking is the third pillar. Browser cookies, device fingerprints, IP-based geolocation, and app-permission data (GPS coordinates, contact lists, installed apps) feed into broker profiles. The tracking pixel on a news website doesn't just serve an advert. It logs the visit against a browser fingerprint that can be matched to an identity through probabilistic linkage. The person reading an article about mortgage refinancing at 2am is now flagged as a "likely refinancer" in a profile that gets sold to mortgage lenders within 48 hours.
Social media is a goldmine for brokers, though most people don't realise it. Public posts, profile information (employment history, education, relationship status), friend connections, and engagement patterns all contribute to profile building. The privacy settings on a social media platform control what other users see. They don't necessarily control what data the platform shares through its advertising API or its partnership agreements with third-party data enrichment services.
How Profiles Are Assembled
The raw material arrives as disconnected fragments. A voter registration record has a name and address but no email. A website visit log has an IP address and browser fingerprint but no name. A loyalty programme record has an email and purchase history but no browsing data. The broker's core competency is linking these fragments into a unified profile.
The simplest linkage key is email address. An email address that appears in a warranty registration, a loyalty programme, a voter roll, and a website's login cookies creates a deterministic match across all four data sources. More sophisticated linkage uses probabilistic matching: a combination of name, date of birth, and postal code that's unique enough to match records even when the email doesn't appear in both.
The assembled profile typically contains demographic data (age, gender, income bracket, education level, marital status), geographic data (home address, workplace proximity, commute patterns), behavioural data (purchase history, website visits, app usage), and inferred attributes (political affiliation, health conditions, financial stress level). The inferred attributes are particularly concerning because they're derived from patterns rather than stated facts, and they can be wrong in ways that have real consequences for the people they describe.
A person who searches for diabetes symptoms, visits a health-food website, and purchases a glucose monitor isn't necessarily diabetic. They might be researching for a family member, writing an article, or buying a gift. But the profile now contains an inferred health condition that affects the advertisements they see, the insurance quotes they receive, and the credit offers that reach their inbox. The inference engine doesn't know the context. It only sees the pattern.
Who Buys Broker Data and What They Do With It
Advertising is the most visible use case. Brands purchase audience segments from brokers to target ads. A car manufacturer buying the "likely new-car purchaser in the next 90 days" segment is using broker data to reach people whose browsing, search, and financial profiles suggest they're shopping for a vehicle. This is standard practice in digital advertising, and it's the revenue engine that funds most of the data-broker industry.
Financial services use broker data for risk assessment. A lender evaluating a loan application might purchase supplementary data about the applicant's estimated income, property ownership, and spending patterns. This data doesn't appear on the credit report, but it influences the lending decision. The applicant has no visibility into what data was consulted, and no mechanism to correct errors in a broker's profile.
Insurance companies use broker data for pricing and underwriting. Health insurers in jurisdictions that permit it (notably the United States) can factor lifestyle data into premium calculations. A profile that shows frequent purchases of alcohol, tobacco products, or fast food might result in higher premiums, even if the individual's actual health metrics are excellent. The profile is a statistical proxy for risk, and proxies are often wrong at the individual level.
Employers and landlords represent a more controversial buyer category. Background-check services pull from broker databases to compile reports on prospective tenants and job applicants. A person with an eviction record from a decade ago, a bankruptcy filing from their twenties, or a court appearance that was ultimately dismissed might find these entries appearing in background checks long after the events themselves are resolved.
Opting Out: What Works and What Doesn't
Most major data brokers offer some form of opt-out mechanism, typically a web form that requests removal of a specific record. The process varies widely in friction and effectiveness. Some brokers process opt-out requests within days. Others require postal mail, copies of government ID, or notarised affidavits. A few acknowledge the request and then don't follow through, relying on the fact that most people won't check whether their data was actually removed.
The fundamental problem with opt-out is that it's a whack-a-mole exercise. There are an estimated 4,000 data-broker companies operating globally. Opting out of ten of them still leaves 3,990. And even the ones that honour the opt-out will re-acquire the data from other sources within months, because the person's data continues to flow from public records, commercial transactions, and online tracking. Removal is temporary unless the data generation itself is disrupted.
Legislation is slowly addressing the imbalance. The California Consumer Privacy Act (CCPA) and its successor the CPRA give California residents the right to know what data brokers hold about them and to request deletion. Vermont requires data brokers to register with the state, creating a public list of companies that trade in personal data. The EU's GDPR provides stronger protections but enforcement against data brokers has been inconsistent, partly because many brokers operate from jurisdictions outside the EU's direct enforcement reach.
The Economics of a Consumer Profile
A single consumer profile sells for surprisingly little on the open market. Basic demographic data might fetch a fraction of a penny per record. A profile enriched with purchase history, browsing behaviour, and inferred attributes might sell for a few pence. The economics work because of volume: a broker with 200 million profiles selling to hundreds of buyers generates significant revenue even at per-record prices that seem trivial.
The price varies by specificity. A generic "adults aged 25-34" segment is cheap. A segment like "adults aged 25-34 who searched for luxury SUVs in the past 30 days and live within 10 miles of a BMW dealership" commands a premium because it's immediately actionable for a specific advertiser. The more attributes a profile contains, and the more recently those attributes were updated, the more it's worth.
This pricing structure creates a perverse incentive. Brokers are rewarded for collecting more data, more frequently, on more people. There's no economic penalty for collecting data that's wrong, outdated, or inferred from insufficient evidence. The buyer wants a segment, the broker delivers a segment, and whether the individuals in that segment actually match the criteria is someone else's problem. Nobody audits the accuracy because nobody in the transaction chain has an incentive to.
Children and Vulnerable Populations
Data brokers have profiles on children, despite regulations like COPPA (in the US) and the Age Appropriate Design Code (in the UK) that restrict the collection of children's data. The profiles are often assembled indirectly: a parent's purchase of children's products, a school's use of educational technology that shares data with advertising partners, or a child's activity on platforms that claim to be age-gated but don't effectively enforce age verification.
Elderly individuals are disproportionately targeted by broker data. "Senior consumer" segments are sold to financial-services companies, health-product marketers, and, less scrupulously, to operators of scams that specifically target older adults. The FTC has documented cases where data brokers sold lists of elderly consumers with specific vulnerabilities (recent bereavement, cognitive decline indicators, isolation markers) to companies that used those lists for predatory marketing.
Domestic violence survivors present a particularly dangerous case. A person who has fled an abusive partner and changed their address has their new location exposed if a data broker assembles it from a utility registration, a change-of-address filing, or a vehicle registration at the new location. Opt-out mechanisms are useless if the data is assembled and sold before the survivor even knows the broker has it.
Disrupting the Pipeline at the Source
If opt-out is a reactive measure (removing data after it's already been collected), disruption is a proactive one: making the incoming data less useful. The principle is that broker profiles depend on accurate, linkable data fragments. If the fragments are inaccurate, the profiles degrade.
Using different email addresses for different purposes breaks the join key that links fragments across sources. A dedicated email for shopping, a separate one for social media, and a third for government correspondence means that a broker seeing the shopping email can't automatically link it to the social-media profile or the voter registration. Email aliasing services make this practical without requiring multiple accounts.
Synthetic identity information serves a similar purpose. Services like Surfshark Alternative ID, Firefox Relay, and Another.IO allow individuals to interact with websites and services without providing their real name, address, or phone number. The data that flows into the broker pipeline is fictional. It looks real enough to pass validation, but it doesn't correspond to an actual person. The broker assembles a profile of someone who doesn't exist, which dilutes the accuracy of their database.
VPNs and browser anti-fingerprinting tools disrupt the tracking-data pipeline. A VPN hides the real IP address, making geolocation-based profiling unreliable. Browser extensions that randomise fingerprint attributes (canvas hash, WebGL renderer, installed fonts) prevent cross-site tracking that relies on fingerprint consistency. Privacy-focused browsers like Brave and Firefox with Enhanced Tracking Protection block third-party trackers by default.
The data-broker industry's business model rests on a simple economic principle: more data equals more accurate profiles equals higher prices. Anything that reduces the quality of incoming data reduces the value of the output. Synthetic identities introduce noise into a system that depends on signal, and noise at scale degrades the product that brokers sell. The individual benefit is privacy. The collective benefit, if the practice becomes widespread, is a less accurate and consequently less profitable surveillance-advertising infrastructure.