Reconnaissance and OSINT in Penetration Testing

The Myth

Hacking doesn't start with a port scan.

The popular image of a cyber attack begins with someone launching Nmap against a target network — green text scrolling down a black screen, open ports appearing one by one. It makes for good cinema. It's also almost entirely wrong.

By the time a skilled attacker sends a single packet to your network, they already know what they're looking for. They know your technology stack, your email address format, the names of your IT administrators, which VPN appliance you run, which cloud provider you use, and possibly the passwords of several of your employees. They gathered all of this without touching your systems — without generating a single log entry, without tripping a single alert.

This is reconnaissance. It's the phase that determines whether an attack succeeds or fails, and it's the phase that most organisations never see, never test, and never think about when evaluating their security posture.

In a well-conducted penetration test, reconnaissance isn't a warm-up exercise before the "real" testing begins. It is the testing — or at least, it's the intelligence-gathering phase that shapes every decision the tester makes from that point forward. Understanding what an attacker can learn about you without ever touching your systems is one of the most valuable things a pen test can deliver.

The Uncomfortable Truth

Everything described in this article can be done by anyone with an internet connection, a browser, and freely available tools. There is no barrier to entry. The information is public, legal to access, and sitting in plain sight. The only question is whether you know what's out there before an attacker finds it.

Defined

What reconnaissance and OSINT actually mean.

Reconnaissance is the systematic gathering of information about a target before launching an attack. It's the military term applied to cyber operations: understanding the terrain before committing forces.

OSINT — Open Source Intelligence — is a subset of reconnaissance that uses only publicly available sources. No hacking, no exploitation, no illegal access. Just careful, methodical collection and analysis of information that the target (or third parties) have made publicly available, intentionally or otherwise.

Reconnaissance divides into two categories, and the distinction matters because they carry very different detection profiles:

	Passive Reconnaissance	Active Reconnaissance
Definition	Gathering information without directly interacting with the target's systems. No packets sent, no connections made.	Gathering information by directly probing the target's systems — port scanning, service enumeration, vulnerability scanning.
Detectability	Virtually undetectable. The target generates no logs, no alerts, no evidence that reconnaissance is occurring.	Detectable in principle — port scans, connection attempts, and enumeration generate log entries and may trigger IDS/IPS alerts.
Examples	DNS lookups via public resolvers, certificate transparency logs, Shodan, LinkedIn, Google dorking, breached credential databases, GitHub searches, job adverts	Nmap port scanning, service fingerprinting, directory brute-forcing, DNS zone transfer attempts, SMTP user enumeration
Legal status	Entirely legal — accessing publicly available information	Legally complex — active scanning without authorisation may violate the Computer Misuse Act 1990, depending on the nature and intent of the interaction
When attackers use it	First — always. Passive recon is risk-free for the attacker and information-rich. Sophisticated adversaries spend days or weeks here before ever touching the target.	Second — once passive recon is exhausted and the attacker needs to confirm specific details (open ports, service versions, application behaviour).

A penetration tester simulates both phases. But the passive phase is where the strategic advantage is built — and it's the phase that most directly mirrors how real adversaries operate.

The Sources

Where attackers find your information.

The volume of information available about any organisation through public sources is staggering — and it grows every time someone posts a job advert, pushes code to GitHub, registers a certificate, or creates a LinkedIn profile. Here's what a skilled OSINT practitioner can extract, source by source.

Source	What It Reveals	Why Attackers Care
DNS records	Subdomains, mail servers, SPF/DKIM/DMARC configuration, hosting providers, cloud services, internal naming conventions. A full DNS enumeration can reveal dozens of systems the organisation doesn't realise are discoverable.	Subdomains often point to forgotten services — staging environments, legacy portals, test instances. MX records reveal the email platform. TXT records may leak internal project names or cloud account identifiers.
Certificate transparency logs	Every SSL/TLS certificate issued by a public CA is logged in publicly searchable transparency logs. This reveals every hostname the organisation has ever requested a certificate for — including internal names, development systems, and pre-launch projects.	CT logs are one of the most powerful subdomain enumeration tools available. They reveal systems before they're live, systems after they've been "decommissioned" (but the certificate was never revoked), and naming patterns that predict future hostnames.
Shodan / Censys	Internet-wide scanning databases that index every device, service, and banner visible on the public internet. Search by organisation, IP range, technology, or vulnerability.	An attacker can find every internet-facing service you operate — including ones you've forgotten — without sending a single packet to your network. Shodan queries for your organisation name or IP range reveal VPN portals, web servers, IoT devices, and management interfaces.
LinkedIn	Employee names, job titles, reporting structures, team sizes, technology skills listed on profiles, and — crucially — job adverts that reveal the exact technology stack in use.	LinkedIn is the single richest source of targeting intelligence for social engineering. An attacker can identify the IT administrator by name, learn they use Azure AD and CrowdStrike, find three recent hires who haven't completed security training, and craft a phishing email referencing a real internal project.
GitHub / GitLab / Bitbucket	Public repositories, committed credentials, API keys, internal URLs, configuration files, infrastructure-as-code templates, CI/CD pipeline definitions, developer comments referencing internal systems.	Developers accidentally commit secrets to public repositories with alarming frequency. API keys, database connection strings, internal hostnames, and even passwords appear in code, configuration files, and commit history — sometimes in repositories that were briefly public before being made private (but the commit is cached).
Breached credential databases	Email addresses and passwords from historical data breaches involving the organisation's domain. Sources include HaveIBeenPwned (legitimate), and various underground marketplaces (used by actual attackers).	If employees reuse passwords — and they do — breached credentials from a third-party service may work against the organisation's own systems. VPN portals, webmail, and cloud services are the primary targets for credential stuffing.
Google dorking	Advanced search operators that find indexed documents, exposed directories, error messages revealing technology versions, login portals, configuration files, and sensitive documents inadvertently made public.	Queries like `site:acme.co.uk filetype:pdf` or `intitle:"index of" site:acme.co.uk` reveal documents and directory listings that were never intended to be public. We regularly find internal policies, network diagrams, old presentations, and password lists through Google alone.
Social media	Office photos (showing screens, whiteboards, badge designs), check-ins revealing office locations, conference attendance, technology preferences, personal details useful for password guessing and security question answering.	A photo of a team celebration that includes a whiteboard with network diagrams. An Instagram post geotagged at the data centre. A tweet complaining about the new VPN client by name. Social media is a gold mine of incidental intelligence.
Company filings and public documents	Companies House filings, annual reports, supplier lists, partnership announcements, regulatory submissions, procurement notices, FOI responses.	Financial information reveals the organisation's value as a target. Supplier relationships reveal potential supply chain attack vectors. Procurement notices reveal technology purchases. FOI responses sometimes contain more technical detail than the organisation intended to disclose.

Worked Example

What we found in 90 minutes without touching a system.

To illustrate the power of passive reconnaissance, here's a composite of what we typically uncover during the OSINT phase of an engagement — all gathered before a single active probe is launched. The target is a fictional but representative mid-size UK professional services firm.

Passive Reconnaissance — 90 Minutes
# DNS Enumeration
amass--domain acmeconsulting.co.uk# 47 subdomains discovered
  notable:staging.acmeconsulting.co.uk# Staging environment — live, no WAF
  notable:vpn.acmeconsulting.co.uk# Fortinet SSL VPN portal
  notable:owa.acmeconsulting.co.uk# Outlook Web Access — on-prem Exchange
  notable:jira.acmeconsulting.co.uk# Jira instance — public login page
  notable:legacy-portal.acmeconsulting.co.uk# Old client portal, still resolving

# Certificate Transparency
crt.sh--domain acmeconsulting.co.uk# 12 additional hostnames from CT logs
  notable:dev-api.acmeconsulting.co.uk# Certificate issued 3 weeks ago — new project?

# Shodan
shodan--org 'Acme Consulting'# 23 internet-facing services indexed
  notable:FortiOS 6.4.8 on vpn.acmeconsulting.co.uk# Vulnerable to CVE-2024-21762
  notable:Exchange 2016 CU23 on owa.acmeconsulting.co.uk# End of support Oct 2025

# LinkedIn
linkedin--company 'Acme Consulting'# 210 employees profiled
  IT team:4 staff. IT Director: James Morton.
  job_ad:'Azure AD, Intune, CrowdStrike, Fortinet'# Full stack revealed
  new_hire:3 starters in last 30 days (finance, legal, HR)# Phishing targets

# Breached Credentials
hibp--domain acmeconsulting.co.uk# 14 accounts in breach databases
  notable:j.morton@acmeconsulting.co.uk# IT Director — LinkedIn breach 2021

# GitHub
gitrob--org acme-consulting# 3 public repos found
  notable:config.yml contains SMTP relay credentials# Committed 8 months ago
  notable:.env file with staging DB connection string# Includes password

# Google Dorking
googlesite:acmeconsulting.co.uk filetype:pdf# 34 indexed PDFs
  notable:IT-Security-Policy-2023.pdf# Detailed AV and firewall product names
  notable:Network-Diagram-Draft-v2.pdf# Internal IP ranges and VLAN structure

Ninety minutes. No packets sent. No alerts generated. And the attacker now knows: the VPN appliance model and version (vulnerable), the email platform (Exchange 2016, approaching end of support), the full technology stack (from a job advert), the IT Director's name and a breached password, database credentials committed to GitHub, internal IP ranges from an indexed PDF, three new starters who are prime phishing targets, and a forgotten staging environment with no WAF.

This isn't exceptional. This is normal. We find intelligence of this quality or better in the majority of engagements.

The 90-Minute Rule

If a pen tester can build a comprehensive attack plan in 90 minutes of passive OSINT, so can an attacker — and the attacker isn't constrained by a testing window, a scope document, or a code of ethics. Everything we find, they find. The difference is what happens next.

From Intelligence to Attack

How recon shapes every subsequent phase.

Reconnaissance isn't a standalone exercise — it directly determines the effectiveness of every phase that follows. The quality of the intelligence gathered in recon is the single largest predictor of whether an attack (or a pen test) succeeds.

Subsequent Phase	Without Good Recon	With Good Recon
Social engineering	Generic phishing email — "Dear user, please verify your account." Low click rate. Easily spotted by security-aware staff.	Targeted spear-phish to three new starters in finance, referencing their real manager's name, a genuine internal project, and the firm's actual email template. Click rate: significantly higher.
Exploitation	Blind scanning of the entire IP range looking for anything exploitable. Noisy, slow, and likely to trigger IDS alerts before finding anything useful.	Targeted exploitation of the specific Fortinet CVE identified through Shodan. One precise attack against a known-vulnerable service. Minimal noise, maximum impact.
Credential attacks	Dictionary-based brute-force against the login portal. Thousands of attempts. Account lockouts triggered. Detectable.	Credential stuffing using known-breached passwords for identified employees. Small number of highly targeted attempts. If the IT Director reused his LinkedIn password on the VPN, access is achieved in a single request.
Internal movement	Aimless exploration of the internal network looking for interesting systems. Slow. Generates anomalous traffic. Likely detected.	Targeted movement toward specific systems identified during recon — the finance file share, the client document repository, the domain controller at the IP address found in the indexed network diagram.
Reporting	"We found an open port on this IP." Decontextualised. Technical. No business narrative.	"Using publicly discoverable information, we identified a vulnerable VPN appliance, a breached credential for the IT Director, and database credentials in a public GitHub repository. Combined, these allowed us to access the internal network, escalate to Domain Admin, and reach the client document store — all beginning from information available to any internet user."

What Defenders Miss

The blind spots in your external posture.

Most organisations have some awareness of their external attack surface — they know their main website, their VPN portal, their email server. What they consistently underestimate is the shadow attack surface: the systems, data, and information that exist publicly but aren't on anyone's radar.

Forgotten Infrastructure

Staging servers spun up for a project two years ago. A marketing microsite on a subdomain nobody remembers creating. A legacy portal that was "decommissioned" in a meeting but never actually switched off. These systems are unpatched, unmonitored, and often contain credentials that work on production systems.

Developer Leakage

Credentials in GitHub commits. API keys in public repositories. Internal URLs in JavaScript source maps. .env files in exposed directory listings. Developers operate at pace, and security hygiene in code repositories is consistently one of the weakest areas we encounter.

Recruitment Intelligence

Job adverts are threat intelligence in reverse. "Experience with FortiGate, Azure AD, CrowdStrike Falcon, and Dynamics 365 required" tells an attacker exactly which products to research exploits for, which management consoles to look for, and which default configurations to target.

People as Intelligence

LinkedIn profiles reveal org charts, reporting lines, team sizes, and individual technology skills. Combined with breached credential databases, an attacker can identify specific individuals with privileged access, determine whether their credentials have been exposed, and craft social engineering attacks tailored to their role and recent activity.

Accidental Publication

Documents that were never intended to be public but were indexed by Google: internal policies naming specific security products, network diagrams with IP ranges, meeting minutes discussing infrastructure projects, presentations containing screenshots of internal systems. Once indexed, deleting the file doesn't remove it from Google's cache.

Email Configuration

Missing or misconfigured SPF, DKIM, and DMARC records don't just enable email spoofing — they tell an attacker that the organisation may not have mature email security controls, making phishing more likely to succeed. A permissive DMARC policy (p=none) is a signal that spoofed emails won't be rejected.

Why Scanners Aren't Enough

The limits of automated discovery.

Vulnerability scanners and automated attack surface monitoring tools are valuable, but they operate in the active reconnaissance domain — they probe systems that are already known. They don't replicate the creative, cross-referencing intelligence work that characterises skilled passive OSINT.

A Scanner Can...	A Scanner Cannot...
Enumerate open ports and services on known IP ranges	Discover IP ranges you didn't know you owned — or that were assigned to you by a third party
Identify known CVEs in detected service banners	Find credentials committed to a public GitHub repository eight months ago
Check SSL/TLS configuration against best-practice standards	Read a job advert and determine that you use FortiGate, Azure AD, and CrowdStrike
Detect missing HTTP security headers	Identify that your IT Director's password was exposed in the LinkedIn breach and may still be in use
Crawl a web application for common vulnerability patterns	Find an internal network diagram indexed by Google from a PDF that was accidentally placed in a public directory
Monitor for new subdomains and certificate issuance	Cross-reference a new hire announcement on LinkedIn with a phishing scenario targeting someone who hasn't yet completed security awareness training

Automated tools find what's technically exposed. Human OSINT finds what's strategically exposed — the intelligence that transforms a scattered collection of technical observations into a coherent attack plan. Both are necessary. Neither is sufficient alone.

For Defenders

Managing your OSINT exposure.

You can't eliminate your OSINT footprint — your organisation exists publicly and some exposure is inherent to doing business. But you can manage it: reducing unnecessary exposure, monitoring for leaked intelligence, and ensuring that the information available to attackers doesn't give them an easy path in.

Action	What It Addresses	How to Do It
External attack surface monitoring	Forgotten subdomains, expired certificates, exposed services, new assets appearing without authorisation	Deploy continuous attack surface monitoring (Shodan Monitor, Censys, or commercial ASM tools). Review weekly. Decommission anything that shouldn't be exposed.
Credential monitoring	Employee credentials appearing in breach databases, paste sites, or underground marketplaces	Subscribe to HaveIBeenPwned domain monitoring. Implement breached password detection in Azure AD / on-premises AD. Force password resets for any exposed accounts.
Code repository auditing	Credentials, API keys, internal URLs, and configuration files in public repositories	Run automated secret scanning on all repositories (GitHub Secret Scanning, GitLeaks, TruffleHog). Implement pre-commit hooks that block secrets from being committed. Rotate any credentials found retroactively.
Job advert review	Technology stack disclosure in recruitment materials	Review job adverts before publication. Remove specific product names where possible — "experience with enterprise firewall management" rather than "FortiGate 600E experience required." The difference seems trivial; to an attacker, it's enormous.
Document hygiene	Internal documents indexed by search engines, metadata in published PDFs revealing author names, internal paths, and software versions	Audit publicly accessible directories. Strip metadata from published documents (ExifTool). Configure robots.txt and noindex directives for non-public areas. Check Google for indexed sensitive documents regularly.
Email authentication	Spoofability of your domain, DMARC policy enforcement, SPF alignment	Implement SPF, DKIM, and DMARC with a policy of `p=reject`. Monitor DMARC reports for abuse. This doesn't just prevent spoofing — it signals to attackers that your email security is mature, making phishing less attractive as an entry vector.
Commission an OSINT assessment	Understanding your full external exposure from an attacker's perspective	As a standalone service or as the first phase of a pen test engagement. The output is a comprehensive picture of what an attacker can learn about your organisation — and specific recommendations for reducing that exposure.

In the Engagement

How recon fits into our testing process.

Reconnaissance isn't a separate deliverable we bolt onto an engagement — it's integrated into the first phase of every external and social engineering assessment we conduct. Here's how it fits into the workflow.

Phase	Duration	What Happens
Passive OSINT	Day 1 (typically 4–6 hours)	Full passive reconnaissance: DNS, CT logs, Shodan, LinkedIn, GitHub, Google dorking, breached credentials, social media, public documents. All findings documented. Attack plan drafted.
Active enumeration	Day 1–2	Port scanning, service fingerprinting, web application discovery, directory enumeration. Active probing of targets identified during passive recon. Findings cross-referenced with OSINT intelligence.
Analysis and planning	Day 2 (1–2 hours)	All intelligence consolidated. Attack paths mapped. Priorities established: which entry vectors are most likely to succeed? Which paths lead to the crown jewels? Scenario refined based on what we've found.
Exploitation	Day 2 onwards	Targeted exploitation of the most promising paths identified during recon. Every attack is informed by intelligence — not guesswork. Findings reported in the context of the intelligence that led to them.

The recon phase typically consumes 15–20% of the total engagement time. It's the highest-leverage time in the entire test — the intelligence gathered here determines the efficiency and effectiveness of everything that follows. Skipping it, or reducing it to a quick Nmap scan, is like navigating without a map.

The Deliverable

What you receive from the OSINT phase.

The OSINT findings don't just inform our testing — they're reported to you as a standalone section of the engagement report. This is some of the most immediately actionable intelligence in the entire deliverable, because many of the exposures can be remediated without waiting for the full pen test to complete.

Attack Surface Map

A complete inventory of your externally discoverable assets — subdomains, services, cloud instances, exposed management interfaces. Compared against your known asset inventory to identify shadow IT and forgotten infrastructure.

Credential Exposure Report

Every employee email address found in breached credential databases, with the breach source and date. Specific accounts where password reuse is likely. Recommendations for immediate forced resets and MFA enforcement.

Code Repository Findings

Any secrets, credentials, or sensitive information found in public code repositories — with exact file paths, commit hashes, and remediation steps (rotate credentials, make repo private, purge Git history).

Social Engineering Intelligence

A profile of what an attacker would know about your people: key individuals, new starters, technology skills, organisational structure. An assessment of how this intelligence could be used to craft targeted phishing or vishing campaigns.

Information Leakage Inventory

Every document, directory listing, and indexed page that reveals sensitive information — network diagrams, policies, internal presentations, metadata from published files. Specific URLs and remediation actions for each.

Summary

The bottom line.

Reconnaissance is the phase that determines everything. Real attackers spend more time gathering intelligence than they do exploiting vulnerabilities — because good intelligence makes exploitation faster, quieter, and more effective.

A penetration test that skips or rushes the reconnaissance phase is testing blind. It will find vulnerabilities — scanners always do — but it won't find the attack paths: the specific chains of intelligence and weakness that a real adversary would follow from public information to your crown jewels.

The most important thing you can do today is see yourself the way an attacker sees you. The information is already out there. The only question is whether you find it first.

See What They See

Discover your OSINT exposure before an attacker does.

Our OSINT assessments reveal exactly what an adversary can learn about your organisation from public sources — and provide specific, prioritised actions to reduce that exposure.

Request an OSINT Assessment Read: Threat Scenario Modelling

All Posts Get in Touch