E-mails, subdomains and names Harvester - OSINT
Find a file
Peter McAlpine 7e131a0b1d Fix EXTRAword.com matching when word is 'word.com'
This change makes myparser.emails(...) stricter so that, given a word
(eg: 'word.com') it does not match domains which end with that word
(i.e. some-other-word.com). Subdomains continue to be matched (eg:
'subdomain.word.com').

Also add unit test for emails(...)
2015-03-05 14:04:50 -05:00
discovery Removed debug line 2014-12-16 23:54:19 +00:00
lib 2.5 2014-12-16 23:25:12 +00:00
.gitignore Fix EXTRAword.com matching when word is 'word.com' 2015-03-05 14:04:50 -05:00
changelog.txt Cleaning Readme 2014-12-16 23:33:13 +00:00
COPYING Initial commit for version 2.0 2011-05-04 16:07:06 +01:00
htmlExport.py 2.5 2014-12-16 23:25:12 +00:00
LICENSES 2.5 2014-12-16 23:37:44 +00:00
myparser.py Fix EXTRAword.com matching when word is 'word.com' 2015-03-05 14:04:50 -05:00
myparser_test.py Fix EXTRAword.com matching when word is 'word.com' 2015-03-05 14:04:50 -05:00
processor.py 2.5 2014-12-16 23:25:12 +00:00
README Added dependency 2014-12-17 12:24:32 +00:00
theHarvester.py UI help corrections 2015-01-03 04:17:47 +00:00
TODO Test 2014-12-14 16:43:36 +00:00

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

*******************************************************************
*                                                                 *
* | |_| |__   ___    /\  /\__ _ _ ____   _____  ___| |_ ___ _ __  *
* | __| '_ \ / _ \  / /_/ / _` | '__\ \ / / _ \/ __| __/ _ \ '__| *
* | |_| | | |  __/ / __  / (_| | |   \ V /  __/\__ \ ||  __/ |    *
*  \__|_| |_|\___| \/ /_/ \__,_|_|    \_/ \___||___/\__\___|_|    *
*                                                                 *
* TheHarvester Ver. 2.5                                           *
* Coded by Christian Martorella                                   *
* Edge-Security Research                                          *
* cmartorella@edge-security.com                                   *
*******************************************************************

What is this?
-------------

theHarvester is a tool for gathering e-mail accounts, subdomain names, virtual
hosts, open ports/ banners, and employee names from different public sources
(search engines, pgp key servers).

Is a really simple tool, but very effective for the early stages of a penetration
test or just to know the visibility of your company in the Internet.

The sources are:

Passive:
--------
-google: google search engine  - www.google.com

-googleCSE: google custom search engine

-google-profiles: google search engine, specific search for Google profiles

-bing: microsoft search engine  - www.bing.com

-bingapi: microsoft search engine, through the API (you need to add your Key in
          the discovery/bingsearch.py file)

-pgp: pgp key server - pgp.rediris.es

-linkedin: google search engine, specific search for Linkedin users


-vhost: Bing virtual hosts search

-twitter: twitter accounts related to an specific domain (uses google search)

-googleplus: users that works in target company (uses google search)


-shodan: Shodan Computer search engine, will search for ports and banner of the
         discovered hosts  (http://www.shodanhq.com/)


Active:
-------
-DNS brute force: this plugin will run a dictionary brute force enumeration
-DNS reverse lookup: reverse lookup of ip´s discovered in order to find hostnames
-DNS TDL expansion: TLD dictionary brute force enumeration


Modules that need API keys to work:
----------------------------------
-googleCSE: You need to create a Google Custom Search engine(CSE), and add your
 Google API key and CSE ID in the plugin (discovery/googleCSE.py)
-shodan: You need to provide your API key in discovery/shodansearch.py


Dependencies:
------------
-Requests library (http://docs.python-requests.org/en/latest/)

Changelog in 2.5:
-----------------
-Replaced httplib by Requests http library (for Google related)
-Fixed Google searches


Comments? Bugs? requests?
------------------------
cmartorella@edge-security.com

Updates:
--------
https://github.com/laramies/theHarvester

Thanks:
-------
John Matherly -  SHODAN project
Lee Baird for suggestions and bugs reporting