Open-Source Intelligence

Data is money and it is everywhere

Open source intelligence (OSINT) is actionable information that has been gathered from freely and publicly available sources. The type of information that can be considered OSINT is not something that an organization or other entity can reasonably expect to keep private. Anyone, regardless of affiliation or authorization, can obtain this information without running afoul of any laws or regulations. This makes OSINT valuable to the preliminary phases of a pen test, where discretion is desired. After all, the pen test process is meant to mirror that of the real-world attack process; skilled attackers will attempt to gather as much information as they can while taking as few risks as necessary.

OSINT is using any tool to collect and analyze publicly accessible data. Sources can be divided into six categories:

Media
newspapers, magazines, television, radio
Internet
blogs, user created content, social media websites, and message boards
Public Government Data
court documents, land deeds, census, press conferences, websites, and speeches
Professional and Academic Publications
journals, academic papers, symposia, and dissertations
Commercial Data
satellite imagery, financial and industrial assessments
Grey Literature
business documents, newsletters, technical reports, and patents

There are many potential sources of OSINT, and most are connected to the Internet. Some examples include:

Registration information from Whois databases.
The target organization’s public website.
Any additional websites that may be related to the target organization.
The social media profiles of a target organization.
The social media profiles of individuals associated with the target organization.
Job postings on job boards.
Google search results.
Online blogs, news articles, etc.
Information gathered from querying public DNS servers.
Mail server records gathered from public DNS servers.
Information gathered from website SSL/TLS certificates.

Record as much information about your target.

Use Google Earth and Maps to for physical recon.

Start a TECHNICAL map of systems/technologies/methodologies of target
ex Facebook, Twitter, LinkedIn, Google+, Instagram

LinkedIn gives real names to Twitter accounts!
Search social networks, public sites, and visit the company websites. See if they leak information on what systems they are using. Search job boards for tech stacks.

Look for current projects, trips to conferences, phone numbers, and email addresses. Craft phishing and impersonation attacks.

Search for Company contracts with the US Government or other public entities.

Run whois and get

Owner name
Street addresses
Email addresses
Technical contacts

Double check target’s sites

Products
Services
Technologies
Company culture
Board members, profiles
Current/future projects/products
Suppliers
Job Vacancies, etc

Look at target’s email pattern and record points of contact

name.surename@company.com
surname.name@company.com
[first letter of name]surenam@company.com

Send fake advertisement email for testing email formats

Google dork for leaked documents

Additional Research

Although the sources listed previously and discussed in further detail in this topic are of great value to OSINT gathering, you may also find it useful to research public information using various industry standards. For example, the following industry-recognized threat and vulnerability intelligence sources are maintained by the MITRE Corporation, which receives funding from the U.S. Department of Homeland Security:

Common Vulnerabilities and Exposures (CVE), a dictionary of vulnerabilities.
Common Weakness Enumeration (CWE), a database of software-related vulnerabilities.
Common Attack Pattern Enumeration and Classification (CAPEC), a database that classifies specific attack patterns.

Another potential source of research is one or more prominent computer emergency response teams (CERTs), such as the CERT Coordination Center (CERT/CC), United States Computer Emergency Readiness Team (US-CERT), and the Japan Computer Emergency Response Team Coordination Center (JPCERT/CC). These CERTs often issue public security advisories that contain useful information on a wide range of vulnerabilities.

Standards organizations like the International Organization for Standardization (ISO) and the National Institute of Standards and Technology (NIST) can also be valuable sources of public information. For example, NIST, a U.S. government agency, publishes many documents that detail known security issues and guidance to organizations for how to mitigate them.

Finally, you should be on the lookout for instances of full disclosure. Full disclosure is the process of publishing an analysis of vulnerabilities without restrictions as to who can access this analysis. The intent is to ensure that as many users and organizations as possible are aware of the vulnerabilities so that they can take action to protect themselves. However, the side effect of full disclosure is that attackers are also privy to this information and can act on it. As a pen tester, an instance of full disclosure might provide you with valuable insight into a vulnerable piece of technology used by the target organization.

Whois

Whois is a protocol that supports querying of data related to entities that register public domains and other Internet resources. Information about such entities is available to anyone who queries databases using Whois. A Whois query can be executed using a command-line utility, but there are also web apps available that enable users to run queries. A typical query will be conducted on a public domain like comptia.org in order to reveal information about that domain, and in turn, the organization that owns it.

Whois queries can retrieve information such as:

The name of the domain’s registrant.
The name of the registrant organization.
The mailing address of the registrant.
The phone number of the registrant.
The email address of the registrant.
The previous information regarding administrative and technical contacts.
Identifying information about the domain’s registrar.
The status of the domain, including client and server codes that concern renewal, deletion, transfer, and related information.
The name servers the domain uses.

Whois queries are a great tool for OSINT because they can tell you a lot about the target organization and how its domain is configured. You can use this information to take more targeted actions against the domain’s contacts, as well the underlying architecture of the domain.

Whois and Privacy Issues

As you might expect, attackers, especially spammers, use Whois data to target their operations. Likewise, Whois data raises issues of privacy, as queried data can reveal personally identifiable information (PII), not to mention information about the organization that an attacker can leverage. The rise of data privacy regulations like the General Data Protection Regulation (GDPR) has led to increased scrutiny of the Whois protocol. The Internet Corporation for Assigned Names and Numbers (ICANN) has stated that they aim to “reinvent” Whois to be more in line with recent privacy concerns. This may mean that data that was once publicly available through Whois no longer will be; however, the exact details of the proposed changes are not known at this time.

Target Organization’s Website

The organization’s public website, usually used for marketing purposes, is a potential resource for OSINT. Most sites have an “About” page that can reveal more about the purpose, goals, and nature of an organization. Even if no overt “About” page exists, most public marketing sites still use other methods to inform the reader about the organization’s products and services. In doing so, the organization may also reveal key information that could support your pen test.

Marketing websites commonly provide the following information that may be of use to a pen tester:

A list of C-suite, upper-management, or other high-profile personnel in the organization.
Upcoming events hosted by or attended by the organization.
Forms to fill out to receive more information on products and services the organization offers.
User forums and other community-driven content.
Additional contact information beyond what you’d find in a Whois query.
Links to the organization’s social media profiles.

An organization’s main public website will not necessary be a standard marketing site. For example, Amazon’s most public-facing domain is an online storefront. Government organizations and educational institutions may host purely informational sites. What you’ll glean from an organization’s public site depends on what organization you’re targeting, and you should never expect to learn everything possible from this one site alone.

An organization’s primary website for public consumption is not the only website that might help you gather background information about the organization. The following are other potential sites that might reveal actionable information:

Secondary sites, like those meant for use by employees or specific customers in a business-to-business sales scenario.
Subdomains of primary sites that aren’t directly linked or easily visible from the primary site, like administrative portals.
Websites owned and/or operated by partner organizations, like a supplier that a retail vendor often contracts with.
Websites of the target organization’s subsidiaries; or, conversely, the target’s parent organization.
Social media profiles that are used as another (or perhaps, primary) marketing outlet for the organization.

While a related website might not provide you with the same level of OSINT as the primary site, it may still provide you with extra details that you wouldn’t otherwise have obtained. A partner site might reveal more about the partner’s relationship with the target organization, possibly enough for you to attempt to use the partner as a vector (assuming this is within scope). For example, the Target breach of 2014 was made possible because the attacker(s) stole network credentials from the retailer’s third-party HVAC provider.

Social Media

Most organizations that provide products and services to the public—and even those that don’t—have at least some presence on social media. These social media profiles are primarily used as a marketing channel to reach certain audiences that may not be exposed to traditional marketing. In fact, many potential customers may never even see the organization’s primary website, so an organization may put a lot of effort into their social media presence. Therefore, you may be able to gather a great deal of information concerning how the target does business.

Beyond the organization’s profile, social media is also a rich resource for extracting data about individuals. Everyone from the C-suite to rank-and-file employees may have a presence on a number of social media sites. These individual profiles are often linked from the company’s main profile, making it easier to perform reconnaissance on an organization’s personnel structure. Likewise, an individual may have more than one profile in order to separate their professional life from their personal life. In either case, individual profiles may reveal much about an employee’s interests, habits, behavior, relationships, and other PII.

Examples of common social media sites that may provide actionable intelligence include:

Twitter, which is used by many organizations to promote their products and services in short statements, as well as to provide casual customer service and to bolster brand loyalty and recognition.
Facebook, which is used for more in-depth marketing and may be more likely to include images, videos, and event scheduling.
LinkedIn, which is used primarily for networking opportunities and job searching.
YouTube, which is used to publish videos that market an organization’s products, services, and/or brand.
Instagram, which is used to publish images that market an organization’s products, services, and/or brand.
Reddit, which is often used to target marketing efforts toward specific communities.

Job Boards

Organizations looking to hire will often post on public job boards. These job postings may reveal information about the organization’s personnel structure, technical environments, networking architecture, and other computing infrastructure. This is because the employer needs to both entice prospective employees and give them enough information to determine whether or not they should apply. The amount and type of information on these job postings is highly dependent on the organization’s industry and the actual job they are hiring for. A network administrator position for a tech company might include more information about the technical side of the organization’s operations than a sales associate position at a retail business.

Some information you might be able to glean from job boards includes:

The personnel makeup of specific departments and teams.
The lack of qualified personnel in crucial positions.
The level of technical sophistication that the organization has.
The software architecture of the organization’s technical services, like web server technology and cloud technologies.
The language that in-house software is programmed in.
The types and quantities of hardware that the organization employs.
The network and security systems that the organization employs.

Some common public job boards include:

CareerBuilder
Monster
ZipRecruiter
Indeed
Glassdoor
LinkedIn

Google Hacking aka Dorking

Google hacking is the process of using the Google search engine to identify potential security weaknesses in publicly available sources, like an organization’s website. Although not necessarily “hacking” in a direct sense, Google’s search engine enables you to extract more information than you would be able to from a typical, everyday search.

Google hacking queries almost always include a special search operator in order to cut down on irrelevant results and focus on very specific types of desired information. The following table lists some common special search operators that are often used in Google hacking.

Operator	Searches	Example
site	A specific site.	site:technoherder.com report to search Techno Herder’s website only for results including the text “report”.
link	For pages that link to the specified page.	link:technoherder.com report to search for any pages that link to Techno Herder’s website and have the text “report” anywhere on the page.
filetype	For specific file types.	filetype:pdf report to search for PDFs including the text “report”.
intitle	For page titles.	intitle:Certification report to search for any pages whose titles include the text “Certification” and have the text “report” anywhere on the page.
inurl	For URLs.	inurl:Certification report to search for any pages whose URLs include the text “Certification” and have the text “report” anywhere on the page.
inanchor	For anchor text.	inanchor:Certification report to search for any pages whose anchor text includes the text “Certification” and have the text “report” anywhere on the page.

Online Articles and News

Online articles and other news items can provide insight into the business operations of a target organization. Larger businesses will often be featured by mainstream media outlets. For example, an online news service may report on major new services offered by a business, whereas publications that provide financial news may focus on a company’s fiscal performance. News outlets may also report on an organization’s impropriety or other negative traits that surround its business.

Articles are not just reserved for larger businesses; even smaller businesses issue press releases for public consumption. These press releases may be published on the company’s own website, but often they are published by one or more sites that specialize in publishing press releases. These articles are usually written in marketing speak, but they can still reveal how a business may be changing and how this change affects day-to-day operations. For example, an organization might issue a press release detailing their acquisition of another company and what this means for the parent company’s people, products, and technology.

You should essentially treat online news and articles as another resource that won’t necessary reveal something significant on its own, but used in conjunction with other OSINT, may help you construct an accurate account of the target organization.

DNS Querying

Querying DNS servers for name resolution information can enable you to view more about the structure of an organization’s network. Standard queries will simply use DNS servers to identify the IP address behind a particular domain or resource name. This IP address might be useful as an entry point into the network, or possibly as a vector for performing more reconnaissance.

Advanced queries can retrieve more information than just an IP address. You can identify the individual DNS records for a particular domain, like MX records, NS records, TXT records, and more. These records can reveal additional targets that you may not have enumerated using other OSINT methods. For example, you may be able to identify that the organization is using specific services, like VoIP, by enumerating an SRV record.

There are several tools that can help you perform DNS querying, including several web apps. One common command-line tool is nslookup, which you can use to query a domain and specify the record types that you’re looking for. The tool dig has similar functionality and is more widely used on Linux systems, and can perform reverse lookups to match an IP address to a domain name.

Aside from identifying DNS records, you may also be able to use DNS querying to initiate a zone transfer. In a properly configured environment, a DNS server’s information will be transferred to other DNS servers in the same domain for backup purposes. However, improperly configured servers may leak this information to hosts outside the domain, including yours. This information can not just reveal DNS records, but it can also enumerate which hosts are directly accessible from the Internet.

One of the most useful elements of contact information you can gather is an email address. Email is the main point of internal and external contact for many individuals in many organizations. It is also a common vector for soliciting information about people and organizations, as well as more intrusive social engineering attacks. Email addresses are also commonly used in place of user names in systems that manage user accounts. This can make it easier for you to focus your online password cracking attacks or other techniques for gaining unauthorized access.

In addition to email addresses themselves, you should also consider enumerating email-based DNS records. An MX record will tell you which server handles mail sent to that domain. If you can successfully compromise mail servers, you can effectively compromise the lines of communication within the domain. Another DNS record of note is Sender Policy Framework (SPF). SPF validates that incoming mail from a domain is coming from a trusted IP address. This is an effort to mitigate email spoofing used in spam, phishing, and other email-based attacks. For the pen test, identifying the presence of an SPF record may encourage you not to waste your time on spoofing messages; alternatively, you might focus your efforts on targeting the host with the trusted IP address identified in the record.

SSL/TLS Certificates

Digital certificates used in SSL/TLS communications are another public resource that can inform your pen test actions. One of the most useful fields in a digital certificate from a reconnaissance perspective is the subject alternative name (SAN). SANs usually identify specific subdomains that the certificate applies to, but can also identify other domains, IP addresses, and email addresses. Organizations use SANs in their certificates so that they don’t need to purchase and use different certificates for each individual resource. The resources identified in a SAN may reveal new targets for you to focus on. Note that some certificates simply use a wildcard (*) character to denote that all subdomains of the parent domain are covered by the certificate. In this case, you might not be able to identify any specific resources.

In addition to SANs, under the Certificate Transparency (CT) framework, logs of public certificate authorities (CAs) are published for anyone to access. These logs contain information about the domains and subdomains that a CA’s issued certificates apply to. This can enable you to discover subdomains that may be no longer covered by the certificate but still exist. For example, an organization might have used a specific SAN in the past, but later moved to a wildcard. That past domain might be listed in the CT logs for the issuing CA.

Shodan

https://www.shodan.io/

Shodan is an online search engine that enables anyone to connect to public or improperly secured devices that allow remote access through the Internet. For example, a zoo might set up an IP camera in one of its enclosures for anyone in the world to watch the animals through just their browser. Shodan would index this connection and enable anyone to search for it. It does this by grabbing service banners sent by a device to a client over certain ports.

More commonly, however, manufacturers and users of devices exercise poor security practices and unwittingly expose their device to the wider world. For example, someone might purchase an IP camera to use as surveillance at their home or office, and they may fail to change the default user name and password from “admin” and “admin123”. Using Shodan, anyone can find and watch the live feed of this camera if it is Internet-connected.

Devices indexed by Shodan include more than just cameras, however. Everything from traffic lights to industrial control systems (ICSs) may have Internet connectivity as part of the Internet of Things (IoT)—and IoT devices are notoriously lax when it comes to security. Some systems may even allow a user full remote control of a device.

Shodan can be useful to the pen test reconnaissance phase in a number of ways. If you manage to view the feed of a security camera outside the target organization’s office, you can get a better picture of the premises and its defenses if you plan on conducting a physical test there. If the organization employs control systems for HVAC or industrial equipment, you may be able to control these remotely as part of your attack phase.

theHarvester

https://github.com/laramies/theHarvester

theHarvester is an open source OSINT tool that gathers the following information about a public resource:

Subdomain names
Employee names
Email addresses
PGP key entries
Open ports and service banners

For some of its data, like employee names and PGP keys, theHarvester uses general search engines like Google and Bing to gather information. It also searches certificate information directly from Comodo’s certificate search engine. For additional information, theHarvester searches social media sites like Twitter and LinkedIn. Its banner grabbing functionality relies on Shodan.

The tool is relatively simple to use, yet can help you automate many information gathering tasks.

Maltego

https://www.maltego.com/

Maltego is another OSINT tool that can gather a wide variety of information on public resources. Unlike theHarvester and Recon-ng, Maltego has a full GUI to help users visualize the gathered information and compare it to other sets of information. It features an extensive library of “transforms,” which automate the querying of public sources of data.

Some types of OSINT that these transforms enumerate include:

People’s names.
People’s and company’s phone numbers.
People’s and company’s physical addresses.
Network address blocks.
Email addresses.
External links.
DNS records.
Subdomains.
Downloadable files.
Social media profiles.
And many more.

The results of querying are placed in node graphs where links are established between each node. This enables the user to analyze how two or more data points may be connected. For example, if you run a transform on a domain, Maltego can place that domain at the top of a tree hierarchy with several branching links to other resources under that domain, like subdomains enumerated through DNS. Under these subdomains might be IP addresses and address ranges. In a more people-oriented search, the resources that branch off the domain might include personnel phone numbers, email addresses, etc. Maltego provides more than just hierarchical layouts; you can also show objects in a circular layout, block layout, organic layout (minimal distance between entities), and more.

Note that Maltego is proprietary software and comes in several editions. Maltego CE is the free edition, but requires you to register with a Maltego Community account to take advantage of a limited set of available transforms.

FOCA

https://github.com/ElevenPaths/FOCA

Fingerprinting Organizations with Collected Archives (FOCA) is a GUI OSINT tool that is designed primarily to discover useful metadata that may be hidden with documents, typically those downloaded from the web. FOCA can work with a variety of document types, include Microsoft Office (.docx, .xlsx, etc.) and the OpenDocument format (.odt, .ods, etc.). It can also analyze PDFs and graphical design file types like the XML-based Scalable Vector Graphics (SVG) format.

Like with many of the tools mentioned previously, FOCA scans general search engines like Google, Bing, and DuckDuckGo to find downloadable files. You can also provide local files for FOCA to analyze. Some of the useful metadata FOCA can extract includes user and people names, software version information, operating system version information, printer information, plaintext passwords, and more.

FOCA’s functionality has expanded over the years to the point where it doesn’t just do metadata extraction, but can also function as a general OSINT tool. It can gather DNS and IP address information, and its plugin architecture enables developers to extend its functionality even further. Note that, unlike theHarvester, Recon-ng, and Maltego, FOCA is a Windows-only tool. It also requires a running SQL server to store its data in a database.

Guidelines for Gathering Background Information

When gathering background information:

Understand that not all gathered information will be useful to your pen test.
Understand the difference between OSINT and closed-source intelligence.
Recognize which sources provide OSINT and which provide closed-source intelligence.
Perform Whois queries to obtain domain registration information.
Examine the organization’s main website to identify more about its personnel and its business operations.
Look for any related sites, like a partner site, to corroborate or add to the information that you’ve gathered.
Examine the organization’s or its employees’ social media profiles to find any revealing information.
Search job boards to identify personnel issues or the technology that the organization uses.
Conduct Google hacking to run advanced search queries on website data.
Examine news articles and press releases for information on current or upcoming business operations.
Query DNS to obtain domain, subdomain, and additional information about the organization’s network structure.
Attempt to leverage poorly configured servers for zone transfers to discover more DNS information.
Identify email addresses and how they may be used as account names.
Identify the domain’s use of MX and SPF records to influence your email-based tests.
Identify SANs in SSL/TLS certificates to discover subdomains.
Search through CT logs to find past certificate issuances to resources.
Use Shodan to discover Internet-connected IoT and other non-traditional computing devices.
Use theHarvester and Recon-ng to perform many of the previous techniques from the command line.
Use Maltego to visualize connections between OSINT objects.
Use FOCA to extract metadata from files available for download.

Guidelines for Preparing Background Findings for Next Steps

Clearly determine what “next steps” actually means for your pen test.
Analyze findings to determine how to weaponize them in future phases.
Consider findings within a bigger picture, not in a vacuum.
Discard irrelevant findings and focus on findings that are actionable and relevant.
Determine how public IP addresses map to resources like web servers that you can later target.
Consider how you may use public IP addresses as entry points into the private network.
Determine which subdomains may be worth targeting due to how they’re named.
Leverage information from third-party sites to learn more about an organization and its people.
Consider how the people information you gather can help shape your later testing.
Leverage people information in conducting social engineering tests.
Use gathered technology information to identify potential vulnerabilities.
Consider that the presence of certain technology might imply the target organization relies on a specific vendor in other areas.
Record your findings and next steps in a document for easy reference.