Earlier this month I introduced Element Finder, a command line tool which lets you search through multiple HTML files and find matches for a CSS selector. This is fine if there is an HTML template for every page in your site. But what if the pages are generated dynamically, for example with WordPress? Or what if you want to search in a remote site?
In these cases you can use `wget` to crawl a website and download all of the available HTML files. When they have been saved, you can search through them with Element Finder.
If you’re using Linux, `wget` should be pre-installed. If you’re on a Mac, the easiest way to install it is via a package manager like Homebrew: `brew install wget`. There are Windows options available too.
For example, lets say we want to search through this WordPress website, http://keegan.st, and see which pages contain elements matching the following selector, which comes from the default Twenty Eleven WordPress theme stylesheet:
.entry-content h1, .entry-content h2, .comment-content h1, .comment-content h2
First, we would use `wget` to download all HTML files from the website:
wget --mirror --adjust-extension --convert-links --accept html http://keegan.st
Or, for short:
wget -m -E -k -A html http://keegan.st
Following is an explanation of the wget options used in the command above:
–mirror, -m: Recursively download all available files.
–adjust-extension, -E: Add a `.html` filename extension when saving the files.
–convert-links, -k: Change the links in the downloaded files so that the stylesheets and images will be loaded from the remote server when the files are opened off the local file system.
–accept html, -A html: Only download HTML files. Don’t download images, videos, stylesheets, etc.
Next, we CD into the directory where the files where downloaded, and run Element Finder with the CSS selector we want to search for:
cd ~/Downloads/keegan.st elfinder -s ".entry-content h1, .entry-content h2, .comment-content h1, .comment-content h2"
We can use this information to view the pages that would be effected if we were to modify that CSS rule.