theHarvester/discovery/crtsh.py

import requests
import myparser
import time
from discovery.constants import *

class search_crtsh:

    def __init__(self, word):
        self.word = word.replace(' ', '%20')
        self.results = ""
        self.totalresults = ""
        self.server = "https://crt.sh/?q="
        self.quantity = "100"
        self.counter = 0

    def do_search(self):
        try:
            urly = self.server + self.word
        except Exception as e:
            print(e)
        try:
            params = {'User-Agent': getUserAgent()}
            r=requests.get(urly,headers=params)
        except Exception as e:
            print(e)
        links = self.get_info(r.text)
        for link in links:
            params = {'User-Agent': getUserAgent()}
            #print("\tSearching " + link)
            r = requests.get(link, headers=params)
            time.sleep(getDelay())
            self.results = r.text
            self.totalresults += self.results

    """
    Function goes through text from base request and parses it for links
    @param text requests text
    @return list of links
    """
    def get_info(self,text):
        lines = []
        for line in str(text).splitlines():
            line = line.strip()
            if 'id=' in line:
                lines.append(line)
        links = []
        for i in range(len(lines)):
            if i % 2 == 0: #way html is formatted only care about every other one
                current = lines[i]
                current = current[43:] #43 is not an arbitrary number, the id number always starts at 43rd index
                link = ''
                for ch in current:
                    if ch == '"':
                        break
                    else:
                        link += ch
                links.append(('https://crt.sh?id=' + str(link)))
        return links

    def get_hostnames(self):
        rawres = myparser.parser(self.totalresults, self.word)
        return rawres.hostnames()

    def process(self):
        self.do_search()
        print("\tSearching CRT.sh results..")
Converted more code to python3 shodan works for the most part. 2018-11-21 01:04:57 +08:00			`import requests`
			`import myparser`
Added random selection of user-agent, and sleep inbetween requests for crtsh.py 2018-11-29 03:40:58 +08:00			`import time`
Implemented randomization within plugins and cleaned up small bugs. 2018-12-18 13:21:05 +08:00			`from discovery.constants import *`
Converted more code to python3 shodan works for the most part. 2018-11-21 01:04:57 +08:00
			`class search_crtsh:`

			`def __init__(self, word):`
			`self.word = word.replace(' ', '%20')`
			`self.results = ""`
			`self.totalresults = ""`
Added method to get crt.sh ids and crawl them. 2018-11-28 11:05:51 +08:00			`self.server = "https://crt.sh/?q="`
Converted more code to python3 shodan works for the most part. 2018-11-21 01:04:57 +08:00			`self.quantity = "100"`
			`self.counter = 0`
Added random selection of user-agent, and sleep inbetween requests for crtsh.py 2018-11-29 03:40:58 +08:00
Converted more code to python3 shodan works for the most part. 2018-11-21 01:04:57 +08:00			`def do_search(self):`
			`try:`
Added method to get crt.sh ids and crawl them. 2018-11-28 11:05:51 +08:00			`urly = self.server + self.word`
Converted more code to python3 shodan works for the most part. 2018-11-21 01:04:57 +08:00			`except Exception as e:`
			`print(e)`
			`try:`
Implemented randomization within plugins and cleaned up small bugs. 2018-12-18 13:21:05 +08:00			`params = {'User-Agent': getUserAgent()}`
Added random selection of user-agent, and sleep inbetween requests for crtsh.py 2018-11-29 03:40:58 +08:00			`r=requests.get(urly,headers=params)`
Converted more code to python3 shodan works for the most part. 2018-11-21 01:04:57 +08:00			`except Exception as e:`
			`print(e)`
Added method to get crt.sh ids and crawl them. 2018-11-28 11:05:51 +08:00			`links = self.get_info(r.text)`
			`for link in links:`
Implemented randomization within plugins and cleaned up small bugs. 2018-12-18 13:21:05 +08:00			`params = {'User-Agent': getUserAgent()}`
Integrated security trails into all mode, and touched up minor cosmetics. 2018-12-22 06:47:15 +08:00			`#print("\tSearching " + link)`
Added random selection of user-agent, and sleep inbetween requests for crtsh.py 2018-11-29 03:40:58 +08:00			`r = requests.get(link, headers=params)`
Implemented randomization within plugins and cleaned up small bugs. 2018-12-18 13:21:05 +08:00			`time.sleep(getDelay())`
Got crtsh search working as problem was in get_hostnames first line replaced self.results to self.totalresults. 2018-11-28 12:45:19 +08:00			`self.results = r.text`
			`self.totalresults += self.results`
Added method to get crt.sh ids and crawl them. 2018-11-28 11:05:51 +08:00
Added docstring to crtsh.py 2018-11-29 01:45:03 +08:00			`"""`
			`Function goes through text from base request and parses it for links`
			`@param text requests text`
			`@return list of links`
			`"""`
Added method to get crt.sh ids and crawl them. 2018-11-28 11:05:51 +08:00			`def get_info(self,text):`
			`lines = []`
			`for line in str(text).splitlines():`
			`line = line.strip()`
			`if 'id=' in line:`
			`lines.append(line)`
			`links = []`
			`for i in range(len(lines)):`
Added docstring to crtsh.py 2018-11-29 01:45:03 +08:00			`if i % 2 == 0: #way html is formatted only care about every other one`
Added method to get crt.sh ids and crawl them. 2018-11-28 11:05:51 +08:00			`current = lines[i]`
Added docstring to crtsh.py 2018-11-29 01:45:03 +08:00			`current = current[43:] #43 is not an arbitrary number, the id number always starts at 43rd index`
Added method to get crt.sh ids and crawl them. 2018-11-28 11:05:51 +08:00			`link = ''`
			`for ch in current:`
			`if ch == '"':`
			`break`
			`else:`
			`link += ch`
			`links.append(('https://crt.sh?id=' + str(link)))`
			`return links`

Converted more code to python3 shodan works for the most part. 2018-11-21 01:04:57 +08:00			`def get_hostnames(self):`
Got crtsh search working as problem was in get_hostnames first line replaced self.results to self.totalresults. 2018-11-28 12:45:19 +08:00			`rawres = myparser.parser(self.totalresults, self.word)`
Converted more code to python3 shodan works for the most part. 2018-11-21 01:04:57 +08:00			`return rawres.hostnames()`

			`def process(self):`
			`self.do_search()`
Fixed more print statements and tied up other loose odds and ends. 2018-11-16 01:52:33 +08:00			`print("\tSearching CRT.sh results..")`