Kompx.com or Compmiscellanea.com

Lynx. Web data extraction

Operating systems : Linux

Aside from browsing / displaying web pages, Lynx can dump the formatted text of the content of a web document or its HTML source to standard output. And that then may be processed by means of some tools present in Linux, like gawk, Perl, sed, grep, etc. Some examples:

Dealing with external links

Count number of external links

Lynx sends list of links from the content of a web page to standard output. Grep looks only for lines starting with "http:", sends the result further again to grep that picks lines not starting with "http://compmiscellanea.com" and "http://www.compmiscellanea.com" (external links of the web page) out of it, wc counts the number of links extracted and displays it:

lynx -dump -listonly "elinks.htm" | grep -o "http:.*" | grep -E -v "http://compmiscellanea.com|http://www.compmiscellanea.com" | wc -l

Find external links and save them to a file

Lynx sends list of links from the content of a web page to standard output. Grep looks only for lines starting with "http:", sends the result further again to grep that picks lines not starting with "http://compmiscellanea.com" and "http://www.compmiscellanea.com" (external links of the web page) out of it and saves them to a file:

lynx -dump -listonly "elinks.htm" | grep -o "http:.*" | grep -E -v "http://compmiscellanea.com|http://www.compmiscellanea.com" > file.txt

Find external links, omit duplicate entries and save the output to a file

Lynx sends list of links from the content of a web page to standard output. Grep looks only for lines starting with "http:", sends the result further again to grep that picks lines not starting with "http://compmiscellanea.com" and "http://www.compmiscellanea.com" (external links of the web page) out of it, sort sorts them and uniq deletes duplicate entries. The output is saved to a file:

lynx -dump -listonly "elinks.htm" | grep -o "http:.*" | grep -E -v "http://compmiscellanea.com|http://www.compmiscellanea.com" | sort | uniq > file.txt

Dealing with internal links

Count number of internal links

Lynx sends list of links from the content of a web page to standard output. Grep looks only for lines starting with "http://compmiscellanea.com" and "http://www.compmiscellanea.com" (internal links), wc counts the number of links extracted and displays it:

lynx -dump -listonly "elinks.htm" | grep -E -o "http://compmiscellanea.com.*|http://www.compmiscellanea.com.*" | wc -l

Find internal links and save them to a file

Lynx sends list of links from the content of a web page to standard output. Grep looks only for lines starting with "http://compmiscellanea.com" and "http://www.compmiscellanea.com" (internal links) and saves them to a file:

lynx -dump -listonly "elinks.htm" | grep -E -o "http://compmiscellanea.com.*|http://www.compmiscellanea.com.*" > file.txt

Find internal links, omit duplicate entries and save the output to a file

Lynx sends list of links from the content of a web page to standard output. Grep looks only for lines starting with "http://compmiscellanea.com" and "http://www.compmiscellanea.com" (internal links), sort sorts them and uniq deletes duplicate entries. The output is saved to a file:

lynx -dump -listonly "elinks.htm" | grep -E -o "http://compmiscellanea.com.*|http://www.compmiscellanea.com.*" | sort | uniq > file.txt

The reason behind using "lynx -dump -listonly" instead of just "lynx -dump" is that there may be web pages with plain text strings looking like links (containing "http://" for instance) in the text of the content, as it is the case with http://www.kompx.com/en/elinks.htm page. "Lynx -dump" would send to output formatted text where real links and plain text links like strings would look just the same and grep would not be able to discern one from another. "Lynx -dump -listonly" gives only a list of links, so that there is no confusion with plain text links looking strings.


Aliosque subditos et thema

 

Windows console applications. Text editors

 

FTE : JED : MinEd : Nano : MS-DOS Editor Initially, all text editors did not have a graphical interface. And work with text almost from the outset was one of the main types of user activity on computer. With the invention and spread of low-level and especially high-level programming languages, text editor has become an important working tool of professionals. Then, other users had to use text editors for their daily tasks. So by the time the programs with GUI started to be wide spread, the concept of text editor was already well developed, there were mature, well-designed and implemented specimens of applications for text editing without graphical user interface. Why the text-based versions coexisted with GUI-based ones for very long and still graphical user interface programs have not replaced the console / text-based applications. While the average user is not aware of their existence, he / she does not know the power of vim or emacs, often even MS-DOS Editor, built in all the 32-bit versions of Windows is unknown, none the less, console text editors continue to exist and be developed. As it is the case with the text web browsers, the main line of text-based text editors development is in Linux and other *nix systems world. But under Windows as well, there are several interesting applications. FTE - / home page / Console text editor. Version for Linux, some other *nix systems, DOS, Windows, OS/2. Syntax highlighting support for: C, C++, Java, Perl, Sh, Pascal, SQL, Assembly, PHP, Python, REXX, Ada, Fortran, IDL, LinuxDoc, TeX, TeXInfo, HTML, etc. ASCII table. Various facilities for coding and errors handling. Copying words, characters or text blocks is in the same mode and by the same keyboard shortcuts (except Ctrl+A) as in major Windows text editors with graphical user interface - plus, there may be other variations. FTE 0.49.13: Open file FTE 0.49.13: A submenu FTE 0.49.13: Settings FTE 0.49.13: Opened .php file FTE 0.49.13: Opened .htm file FTE 0.49.13: Opened C code JED - / home page / Console text editor. Version for Linux, some other *nix systems, QNX, OS/2, BeOS, OpenVMS, DOS, Windows. Syntax highlighting support for: C, C++, FORTRAN, TeX, HTML, SH, python, IDL, DCL, NROFF, etc. JED can emulate Emacs, EDT, Wordstar, Borland, Brief. C-like S-Lang language for extra settings possibilities and extensions.

Netscape 3. Screenshots 1

 

Netscape 3 running under Windows 7 (32-bit). Screenshots 1. Netscape 3: netscape.aol.com Netscape 3: w3schools.com/browsers/browsers_stats.asp Netscape 3: en.wikipedia.org/wiki/Netscape_Navigator Netscape 3: ebay.com Netscape 3: kompx.com/en/internet-explorer-3-screenshots-1.htm Netscape 3: twitter.com Download Netscape 3. It may happen to be impossible either to install Netscape 3 or to run it under Windows 7 (32-bit). Try installing Netscape 3 as Administrator then. When installed in the proper way, Netscape 3 can run under Windows 7 (32-bit) quite well.