KOMPX.COM or COMPMISCELLANEA.COM   

Remove duplicate lines in a file (Windows Command Prompt)

Removing duplicate lines in a file under Windows: batch script and CMD (Windows Command Prompt). Case-insensitive. Two variants:

  1. Variant #1
  2. Variant #2

Removing duplicate lines #1

Code for .BAT file:


@ECHO OFF
SETLOCAL EnableDelayedExpansion

:: Setting the file to deduplicate
SET "FILE=file.txt"

:: Creating and setting the output file
TYPE NUL > dedup.txt
SET "OUTFILE=dedup.txt"

:: Removing duplicate lines
FOR /F "USEBACKQ DELIMS=" %%G IN ("%FILE%") DO (
	:: Storing each line in a variable
	SET "LINE=%%G"
	:: Clearing flag, tracking whether the line already exists
	SET "FOUND="
	:: Comparing each line to the current input line and setting the flag to 1, if a match is found
	FOR /F "USEBACKQ DELIMS=" %%H IN ("%OUTFILE%") DO (
		IF /I "!LINE!"=="%%H" SET "FOUND=1"
    )
	:: Appending the line, if found to be unique, to the output file
	IF NOT DEFINED FOUND (
		>> "%OUTFILE%" ECHO(!LINE!)
    )
)

Notes

  1. The /I switch of IF command for case-insensitive string comparison may happen not to work with characters beyond plain ASCII, like Cyrillic. Then deduplication for lines in those characters will be case-sensitive.
  2. The script does not work properly for lines with exclamation marks (!).
  3. On the whole, CMD is not great at full-scale text handling. The introduction of PowerShell was in large part to address the issue.

Removing duplicate lines #2

Code for .BAT file:


@ECHO OFF
SETLOCAL EnableDelayedExpansion

:: Setting the file to deduplicate
SET "FILE=file.txt"

:: Creating and setting the output file
TYPE NUL > dedup.txt
SET "OUTFILE=dedup.txt"

:: Removing duplicate lines
FOR /F "USEBACKQ DELIMS=" %%I IN ("%FILE%") DO (
    SET "LINE=%%I"
    IF NOT DEFINED FILE_!LINE! (
        SET "FILE_!LINE!=1"
        ECHO !LINE!>>"%OUTFILE%"
    )
)

Notes

  1. String comparison may happen to end up case-sensitive for characters beyond plain ASCII, like Cyrillic.
  2. The script does not work properly for lines with exclamation marks (!) and equals signs (=).
  3. On the whole, CMD is not great at full-scale text handling. The introduction of PowerShell was in large part to address the issue.

Links

  1. EnableDelayedExpansion ss64.com/nt/delayedexpansion.html
  2. FOR /F ss64.com/nt/for_f.html
  3. IF ss64.com/nt/if.html

Operating systems

More