Table of Contents
OSINT Tools: Applications and Risks. 4
Demonstrating theHarvester Tool 9
Comparison of the 5 selected tools. 13
Introduction to OSINT
In today’s digital world, people and businesses create a huge amount of data every day. With the internet and smartphones, billions of users share personal and work-related information online. However, not all this information is meant to be public. Due to mistakes, cyberattacks, or lack of awareness, sensitive data can become exposed, leading to security risks [1].
Open-source intelligence (OSINT) is about gathering and analyzing publicly available information in a legal and ethical way. The goal is to collect data from sources like websites, social media, public documents, and news articles while respecting privacy and following the rules. OSINT can be used for many things, such as helping governments understand political situations, assisting law enforcement, and giving businesses insights into other companies [1].
Some of the data used in OSINT comes from social media, online scanners, and public records, which can be about people, businesses, or countries. It’s important that OSINT doesn’t involve illegal actions, like cracking passwords or impersonating someone. It helps organizations or governments learn about potential threats, such as criminal groups or terrorist organizations [1].
OSINT is also used in cybersecurity, especially for tasks like penetration testing, where the goal is to find weaknesses in systems. Hackers also use OSINT to gather personal details from social media and use them in attacks like spear phishing, where they trick people into revealing sensitive information [1].
Another important part of OSINT is checking the accuracy of the information. Analysts need to verify what they find to avoid spreading fake news. By analyzing sources carefully and cross-checking information, OSINT helps ensure that the data is trustworthy.
In the end, OSINT allows people and organizations to make informed decisions based on information that’s out in the open, while always staying ethical and legal.
One way to find and analyze publicly available information is through Open-Source Intelligence (OSINT). OSINT involves gathering data from sources like social media, public databases, government records, and company websites. It is commonly used by intelligence agencies, businesses, law enforcement, ethical hackers, and even cybercriminals. OSINT can be useful for security, but it can also be dangerous if used for harmful purposes [2].
Figure 1: Principles for OSINT Professionals Graphic (high res)
In this research, I will explore five OSINT tools, demonstrating how they work and comparing their effectiveness. These tools will be evaluated based on their features, accuracy, and usability, allowing for a comprehensive comparison. The goal is to provide insights into the strengths and weaknesses of each tool and highlight how they can be used to gather publicly available information for various purposes, including security, research, and risk assessment.
The five tools I will explore are:
- Maltego – A powerful tool for gathering and analyzing data from a wide range of online sources, including social media, DNS records, and websites.
- Shodan – A search engine for finding Internet-connected devices, offering valuable insights into the security of systems and networks.
- The Harvester – A tool for gathering email addresses, subdomains, and other publicly available information related to a target.
- Spider Foot – An automated OSINT tool that helps users gather and analyze data from multiple sources, including IP addresses, domains, and even historical data.
- OSINT Framework – A collection of open-source tools organized in categories, which help gather specific types of data such as social media accounts, domains, and more.
By analyzing and comparing these tools, this paper will help organizations better understand the range of OSINT tools available and how they can leverage them to improve security and make informed decisions.
OSINT Tools: Applications and Risks
1. OSINT Framework
What the Tool Does
The OSINT (Open-Source Intelligence) tool is designed to gather publicly available information from various online sources, such as social media platforms, websites, forums, and public databases. It automates the collection, analysis, and organization of data to generate valuable insights. The tool helps cybersecurity professionals, law enforcement agencies, and businesses monitor threats, conduct research, and assess risks efficiently [4].
Requirements for the Tool to Work
For the OSINT tool to function effectively, it requires:
- Internet Access: To retrieve real-time data from public sources.
- Data Sources: Integration with APIs, web scraping capabilities, and access to open databases.
- Processing Power: Adequate computing resources for data analysis and filtering.
- Legal Compliance: Adherence to ethical guidelines and local laws governing data collection.
- User Knowledge: Operators should have expertise in querying, data validation, and cybersecurity awareness to interpret results accurately.
Potential Malicious Use
While OSINT tools are valuable for legitimate purposes, they can also be exploited by malicious actors in various ways:
- Social Engineering Attacks: Attackers can gather personal details to craft convincing phishing emails or impersonate individuals.
- Reconnaissance for Cyber Attacks: Hackers may use OSINT to map an organization’s digital footprint, identifying vulnerabilities before launching attacks.
- Doxxing: Personal information can be exposed and shared publicly to intimidate or harm individuals.
- Corporate Espionage: Competitors may use OSINT to gather intelligence on business strategies, financial data, and employee information.
To mitigate risks, organizations should implement cybersecurity awareness programs, restrict the exposure of sensitive information, and monitor their digital presence actively.
2. The Harvester
What is theHarvester?
theHarvester (purposely spelled with a lowercase ‘t’ at the beginning) is a command-line-based tool developed by Edge-Security. It is a Python-based OSINT (Open-Source Intelligence) tool designed for gathering publicly available information about a target, typically in the early stages of a cybersecurity assessment. theHarvester is commonly used by penetration testers, security analysts, and ethical hackers to assess a company’s external threat landscape on the internet [5].
Functionality of theHarvester
theHarvester automates the process of collecting open-source intelligence from various online sources. It retrieves data such as:
- Email addresses associated with a domain
- Subdomains and hostnames
- Employee names (from sources like LinkedIn)
- IP addresses
- Open ports (if using active scans)
- Publicly indexed documents containing metadata
This tool supports both passive and active reconnaissance techniques. Passive reconnaissance involves collecting data without directly interacting with the target’s infrastructure, while active reconnaissance may involve probing a target to obtain additional information.
Potential Misuse by Malicious Actors
While theHarvester is a powerful tool for cybersecurity professionals, it can also be exploited by cybercriminals for malicious purposes, such as:
- Targeted Phishing Attacks – Attackers can use harvested email addresses to send phishing emails, impersonating legitimate sources.
- Reconnaissance for Cyber Attacks – Malicious actors can gather subdomains, IP addresses, and employee details to identify vulnerabilities in an organization’s infrastructure.
- Social Engineering Attacks – By gathering employee names and publicly available information, attackers can craft convincing social engineering schemes to manipulate individuals.
- Credential Stuffing & Password Attacks – If usernames are obtained, attackers may attempt to brute-force or guess login credentials.
3. Shodan
Overview of Shodan
Shodan is a specialized search engine designed to identify and index Internet-facing devices, including industrial control systems (ICS), web servers, security cameras, and other networked hardware. Unlike traditional search engines that index websites, Shodan scans IP addresses and records service banners, metadata, and open ports of connected devices. It provides a powerful interface for querying devices based on parameters such as geographic location, software version, and operating system [6].
Requirements for Shodan to Function
For Shodan to operate effectively, several key conditions must be met:
- Publicly Accessible Devices: Shodan can only index devices that have routable IP addresses and are directly accessible from the Internet.
- Open Ports and Services: Devices must have active services running on common ports such as FTP (21), SSH (22), Telnet (23), HTTP (80), or other industry-specific ports.
- Service Banner Information: Devices must respond to Shodan’s automated queries, revealing details such as device type, firmware version, and software configuration.
- Continuous Internet Scanning: Shodan uses automated bots to continuously scan the Internet, probing for accessible services and devices.
Potential Malicious Uses of Shodan
While Shodan is a valuable tool for security researchers and network administrators, it also presents significant risks when misused by malicious actors [6]:
- Reconnaissance for Cyber Attacks: Hackers can use Shodan to identify vulnerable industrial control systems, routers, and IoT devices, making them potential targets for exploitation.
- Exploitation of Weak Credentials: Many indexed devices use default or weak authentication settings, allowing attackers to gain unauthorized access.
- Targeting Critical Infrastructure: Shodan has revealed sensitive assets such as power grids, water treatment plants, and gas pipelines, raising concerns about potential cyber-physical attacks.
- Launching Botnet Attacks: Compromised devices can be leveraged for large-scale Distributed Denial-of-Service (DDoS) attacks or malware distribution.
In conclusion, while Shodan is a powerful tool for security research and network monitoring, organizations must take proactive measures such as network segmentation, service banner obfuscation, and strict access control policies to mitigate the risks associated with exposed Internet-facing devices.
4. Maltego
Maltego is a tool designed for gathering and analyzing information from different online sources. It is used for Open-Source Intelligence (OSINT) and helps in cybersecurity investigations, penetration testing, and data analysis. Maltego can map relationships between domains, IP addresses, email addresses, social media accounts, and more. The tool presents information in a visual graph format, making it easier to understand connections between different entities [7].
How Does Maltego Work? Maltego uses “transforms” to pull data from various databases, search engines, and online services. Users input an entity, such as a domain name or an email address, and Maltego retrieves associated data. It organizes the results in a network-style graph, allowing users to see relationships clearly [7].
Maltego is available in two versions:
- Community Edition (CE): Free but with limitations, such as fewer daily transforms and no bulk data retrieval.
- Commercial Edition: Paid version with full functionality, allowing extensive data extraction and analysis.
Requirements for Maltego to Work To use Maltego effectively, the following requirements must be met:
- Internet Connection: Maltego relies on online sources to gather data.
- User Registration: Even the free version requires users to create an account.
- Access to External APIs: Some transforms require API keys to extract information from third-party sources.
- Compatible System: Maltego runs on Windows, macOS, and Linux and requires Java to function.
- Correct Configuration: Users may need to configure settings, such as the discovery server, to ensure proper operation.
How Malicious Actors Can Use Maltego While Maltego is a powerful tool for ethical hackers and security researchers, cybercriminals can also misuse it for illegal purposes, such as:
- Reconnaissance for Attacks: Hackers can gather details about a company’s digital footprint, identifying weak points before launching cyberattacks.
- Phishing Campaigns: By collecting email addresses and social media profiles, attackers can create targeted phishing emails to deceive users.
- Doxxing: Malicious users can extract personal information to expose individuals online, leading to privacy violations.
5. Spider Foot
SpiderFoot is an open-source tool designed for gathering intelligence from the internet. It was created by Steve Micallef and is written in Python. This tool automates the process of Open-Source Intelligence (OSINT) collection by querying over 100 public data sources. It collects information about domain names, email addresses, IP addresses, DNS servers, and other online identifiers. SpiderFoot is useful for cybersecurity professionals, ethical hackers, and researchers who want to analyze security risks and data exposure [8].
Requirements for SpiderFoot to Work
- Operating System: SpiderFoot can run on both Windows and Linux.
- Python Environment: Since it is written in Python, a Python interpreter must be installed.
- Internet Connection: The tool queries online databases and public services, so an active internet connection is necessary.
- Installation: It can be installed via GitHub or package managers like pip.
- API Keys (Optional): Some external data sources require API keys to access more detailed information.
- Web Interface (Optional): While it can be used in the command-line interface (CLI), a web-based interface is also available for easier use.
How Malicious Actors Could Misuse SpiderFoot
Although SpiderFoot is designed for cybersecurity and research purposes, it can also be used by hackers and cybercriminals. Here are some ways it can be misused:
- Reconnaissance for Cyber Attacks: Hackers can use it to gather details about a target, such as employee email addresses or exposed servers, before launching an attack.
- Identifying Vulnerabilities: By scanning a website or network, malicious users can find weaknesses that could be exploited.
- Tracking Individuals: Cybercriminals could use the tool to collect personal information, leading to identity theft or social engineering attacks.
- Data Harvesting: Attackers may scrape and compile publicly available data for phishing campaigns or other fraudulent activities.
It is widely used by security professionals to identify vulnerabilities and secure networks. However, like many cybersecurity tools, it can also be misused for unethical purposes. Proper awareness and responsible usage are crucial to prevent potential threats. Organizations should monitor their digital footprint and take proactive security measures to protect sensitive information [9].
Demonstrating theHarvester Tool
I will demonstrate how to use TheHarvester to gather open-source intelligence (OSINT) on a domain. The tool is useful for collecting data like email addresses, subdomains, virtual hosts, open ports, and other publicly available information. This demo will be executed in a controlled, virtualized environment to ensure that I am not engaging with active targets.
Hypothetical Scenario:
For the purposes of this demonstration, I am conducting a security assessment of a hypothetical domain: kali.org. This domain is a public website, and I am trying to find potential vulnerabilities as a real-world example. Please do not use this to take down the website, as per applicable laws it is strictly prohibited. My goal is to gather useful information about the domain for a mock penetration testing exercise [10]. Specifically, I will search for:
- Email addresses related to kali.org.
- Subdomains associated with the domain.
- Any DNS records or virtual hosts.
Setup:
- I am working in a virtualized Kali Linux environment [10].
- TheHarvester is installed on the system, which allows me to search through various public data sources such as search engines and PGP key servers [10].
Step-by-Step Demo:
- Install TheHarvester:
First, I ensure that TheHarvester is installed on my system using the following command [10]:

- Search for Email Addresses:
Now, I will use TheHarvester to search for any email addresses linked to examplecorp.com by querying DuckDuckGo as the search engine. This helps me discover any publicly available email addresses related to the domain.
I will run the following command, and this command performs the following:
- -d: Specifies the target domain examplecorp.com.
- -l 500: Limits the search results to 500.
- -b duckduckgo: Uses DuckDuckGo as the search engine

At this point, I didn’t find any email addresses, but it listed 14 hosts
- Searching for Subdomains using Google
This demo will search for subdomains of kali.org using Google.

- Subdomain Takeovers
The command of theHarvester -d kali.org -b bing -t is used to gather information about the subdomains of the domain kali.org by performing a search using Bing. The -t flag attempts to check if any subdomains are vulnerable to takeover.

Comparison of the 5 selected tools
Tool | What It Does | Requirements | Potential Malicious Use |
OSINT Framework | Gathers publicly available information from online sources like social media, websites, and forums. It helps with threat monitoring, research, and risk assessment. | Internet access, data sources (APIs, web scraping), processing power, legal compliance, user knowledge in data validation. | Social engineering attacks, reconnaissance for cyberattacks, doxxing, corporate espionage. |
The Harvester | A command-line tool for collecting information like email addresses, subdomains, IP addresses, and employee names. It helps with early cybersecurity assessments. | Internet access, Python environment, command-line knowledge. | Phishing attacks, reconnaissance for cyberattacks, social engineering, credential stuffing attacks. |
Shodan | A search engine that indexes Internet-connected devices like servers, cameras, and industrial control systems. It helps with security research by identifying exposed devices. | Publicly accessible devices, open ports/services, and continuous internet scanning. | Reconnaissance for cyberattacks, exploitation of weak credentials, targeting critical infrastructure, launching botnet attacks. |
Maltego | Gathers and analyzes data from various online sources and maps relationships between entities like domains, IPs, and social media accounts. It presents data in a visual graph. | Internet access, user registration, compatible systems (Windows, macOS, Linux), and external API keys (for some features). | Reconnaissance for attacks, phishing campaigns, doxxing. |
SpiderFoot | An open-source tool that automates the collection of OSINT from over 100 public data sources. It helps in cybersecurity and research by identifying risks and data exposure. | Python environment, internet access, operating system (Windows or Linux), and optional API keys. | Reconnaissance for cyberattacks, identifying vulnerabilities, tracking individuals, data harvesting. |
References
cybervieadmin. “What Is the Harvester Tool | Kali Linux | Linux | CYBERVIE.” CYBERVIE, 18 Feb. 2019, cybervie.com/blog/what-is-the-harvester/.
Szymoniak, Sabina, and Kacper Foks. “Open Source Intelligence Opportunities and Challenges: A Review.” Advances in Sciences and Technology/Postępy Nauki I Techniki, vol. 18, no. 3, 1 June 2024, pp. 123–139, https://doi.org/10.12913/22998624/186036.
AlKilani, Hamzeh, and Abdallah Qusef. “OSINT Techniques Integration with Risk Assessment ISO/IEC 27001.” International Conference on Data Science, E-Learning and Information Systems 2021, 5 Apr. 2021, https://doi.org/10.1145/3460620.3460736.
Borges, Esteban. “Top 15 Free OSINT Tools to Collect Data from Open Sources.” Www.recordedfuture.com, 29 Apr. 2024, www.recordedfuture.com/threat-intelligence-101/tools-and-technologies/osint-tools.
Gill, Ritu. “What Is OSINT (Open-Source Intelligence?) | sans Institute.” Www.sans.org, 23 Feb. 2023, www.sans.org/blog/what-is-open-source-intelligence/.
Borges, Esteban. “TheHarvester: A Classic Open Source Intelligence Tool.” Securitytrails.com, SecurityTrails, 28 Apr. 2021, securitytrails.com/blog/theharvester-tool.
Bodenheim, Roland, et al. “Evaluation of the Ability of the Shodan Search Engine to Identify Internet-Facing Industrial Control Devices.” International Journal of Critical Infrastructure Protection, vol. 7, no. 2, June 2014, pp. 114–123, https://doi.org/10.1016/j.ijcip.2014.03.001. Accessed 5 Dec. 2019.
“Maltego – an Overview | ScienceDirect Topics.” Www.sciencedirect.com, www.sciencedirect.com/topics/computer-science/maltego.
SecurityTrails Team. “Spiderfoot, an Open Source Intelligence Automation Tool.” Securitytrails.com, SecurityTrails, 29 May 2018, securitytrails.com/blog/spiderfoot-osint-automation-tool.
Jelen, Sara. “OSINT Is Maturing: Our Interview with Steve Micallef from SpiderFoot.” Securitytrails.com, SecurityTrails, 19 Mar. 2019, securitytrails.com/blog/steve-micallef-spiderfoot. Accessed 11 Feb. 2025.
“Theharvester | Kali Linux Tools.” Kali Linux, www.kali.org/tools/theharvester/.