Kompx.com or Compmiscellanea.com

Lynx. Web data extraction

Operating systems : Linux

Aside from browsing / displaying web pages, Lynx can dump the formatted text of the content of a web document or its HTML source to standard output. And that then may be processed by means of some tools present in Linux, like gawk, Perl, sed, grep, etc. Some examples:

Dealing with external links

Count number of external links

Lynx sends list of links from the content of a web page to standard output. Grep looks only for lines starting with "http:", sends the result further again to grep that picks lines not starting with "http://compmiscellanea.com" and "http://www.compmiscellanea.com" (external links of the web page) out of it, wc counts the number of links extracted and displays it:

lynx -dump -listonly "elinks.htm" | grep -o "http:.*" | grep -E -v "http://compmiscellanea.com|http://www.compmiscellanea.com" | wc -l

Find external links and save them to a file

Lynx sends list of links from the content of a web page to standard output. Grep looks only for lines starting with "http:", sends the result further again to grep that picks lines not starting with "http://compmiscellanea.com" and "http://www.compmiscellanea.com" (external links of the web page) out of it and saves them to a file:

lynx -dump -listonly "elinks.htm" | grep -o "http:.*" | grep -E -v "http://compmiscellanea.com|http://www.compmiscellanea.com" > file.txt

Find external links, omit duplicate entries and save the output to a file

Lynx sends list of links from the content of a web page to standard output. Grep looks only for lines starting with "http:", sends the result further again to grep that picks lines not starting with "http://compmiscellanea.com" and "http://www.compmiscellanea.com" (external links of the web page) out of it, sort sorts them and uniq deletes duplicate entries. The output is saved to a file:

lynx -dump -listonly "elinks.htm" | grep -o "http:.*" | grep -E -v "http://compmiscellanea.com|http://www.compmiscellanea.com" | sort | uniq > file.txt

Dealing with internal links

Count number of internal links

Lynx sends list of links from the content of a web page to standard output. Grep looks only for lines starting with "http://compmiscellanea.com" and "http://www.compmiscellanea.com" (internal links), wc counts the number of links extracted and displays it:

lynx -dump -listonly "elinks.htm" | grep -E -o "http://compmiscellanea.com.*|http://www.compmiscellanea.com.*" | wc -l

Find internal links and save them to a file

Lynx sends list of links from the content of a web page to standard output. Grep looks only for lines starting with "http://compmiscellanea.com" and "http://www.compmiscellanea.com" (internal links) and saves them to a file:

lynx -dump -listonly "elinks.htm" | grep -E -o "http://compmiscellanea.com.*|http://www.compmiscellanea.com.*" > file.txt

Find internal links, omit duplicate entries and save the output to a file

Lynx sends list of links from the content of a web page to standard output. Grep looks only for lines starting with "http://compmiscellanea.com" and "http://www.compmiscellanea.com" (internal links), sort sorts them and uniq deletes duplicate entries. The output is saved to a file:

lynx -dump -listonly "elinks.htm" | grep -E -o "http://compmiscellanea.com.*|http://www.compmiscellanea.com.*" | sort | uniq > file.txt

The reason behind using "lynx -dump -listonly" instead of just "lynx -dump" is that there may be web pages with plain text strings looking like links (containing "http://" for instance) in the text of the content, as it is the case with http://www.kompx.com/en/elinks.htm page. "Lynx -dump" would send to output formatted text where real links and plain text links like strings would look just the same and grep would not be able to discern one from another. "Lynx -dump -listonly" gives only a list of links, so that there is no confusion with plain text links looking strings.


Aliosque subditos et thema

 

Internet Explorer 3. Screenshots 1

 

Internet Explorer 3 running under Windows 7 (32-bit). Screenshots 1. Internet Explorer 3: windows.microsoft.com/en-US/windows/upgrade-your-browser Internet Explorer 3: w3schools.com/browsers/browsers_stats.asp Internet Explorer 3: en.wikipedia.org/wiki/Internet_Explorer_3 Internet Explorer 3: ebay.com Internet Explorer 3: kompx.com/en/netscape-3-screenshots-1.htm Internet Explorer 3: twitter.com Download Internet Explorer 3: a pack, containing 3.0, 4.01, 5.01, 5.5, 6.0 versions of Internet Explorer. During installation process it is possible to select which one version only to install or install several or install all. It is very much probable that all versions, except Internet Explorer 3, will not work even under 32-bit Windows 7. So if you would like to try all older Internet Explorers you should have Windows XP or earlier installed. Internet Explorer 3 functions under 32-bit Windows 7 pretty well though.

Mobile-friendly HTML table

 

If an HTML table is too wide, having too much data, it may not shrink anymore, it gets wider than the available space and breaks page layout. An horizontal scroll added to the table fixes it up. Example: 12345678910 Table_data_1 Table_data_2 Table_data_3 Table_data_4 Table_data_5 Table_data_6 Table_data_7 Table_data_8 Table_data_9 Table_data_10 HTML / XHTML. Code: <table> <tr> <th>1</th> <th>2</th> <th>3</th> <th>4</th> <th>5</th> <th>6</th> <th>7</th> <th>8</th> <th>9</th> <th>10</th> </tr> <tr> <td>Table_data_1</td> <td>Table_data_2</td> <td>Table_data_3</td> <td>Table_data_4</td> <td>Table_data_5</td> <td>Table_data_6</td> <td>Table_data_7</td> <td>Table_data_8</td> <td>Table_data_9</td> <td>Table_data_10</td> </tr> </table> CSS. Code: table {display: block; overflow-x: auto;} /* Extra CSS, just styling the look: */ table {border-collapse: collapse;} table td,th {padding: 10px; border: 1px #000 solid;} Note: the CSS property of display: block makes the table to occupy only as much space horizontally as it is needed to contain the data without shrinking. Not more, not making itself to stretch from the leftmost to the rightmost sides of the available space - even if width: 100% is added to CSS. Example: 123 Table_data_1 Table_data_2 Table_data_3 [ 1 ] As well as Netscape 9.0. [ 2 ] As well as Netscape 9.0.