Wednesday, March 21, 2012

FOR command: The DOS Sonic Screwdriver

When I was younger I enjoyed the British TV show "Doctor Who".  The imports we got in the states were the seasons where Tom Baker starred as the eponymous, dalek-fighting time-lord.  The shows were entertaining but had the deus ex machina of the sonic screwdriver that the Dr. would use for everything from picking locks to picking up women. ("Yes that is a sonic screwdriver in my pocket and I am happy to see you.")  I used to dream of having a single tool that could do all kinds of useful things.
http://media.screened.com/uploads/0/34/44299-tom_baker1.jpg

Then I found the FOR command in Windows XP and my dreams were made reality -- in a clunky, geeky, command-line sort of way.  (Adult life never really turns out like you imagined it when you were a kid.)  What does the FOR command do?

Runs a specified command for each file in a set of files.

FOR %variable IN (set) DO command [command-parameters]
  %variable  Specifies a single letter replaceable parameter.
  (set)      Specifies a set of one or more files.  Wildcards may be used.
  command    Specifies the command to carry out for each file.
  command-parameters
             Specifies parameters or switches for the specified command.

So (set) is one or more filenames (which may include wildcards) on which we can perform a command.

Well that's fine for working on a bunch of files but what if I need to work on a bunch of directories?

FOR /D %variable IN (set) DO command [command-parameters]
    If set contains wildcards, then specifies to match against directory
    names instead of file names.

What if I need to recursively look through all of these directories?

FOR /R [[drive:]path] %variable IN (set) DO command [command-parameters]
    Walks the directory tree rooted at [drive:]path, executing the FOR
    statement in each directory of the tree.  If no directory
    specification is specified after /R then the current directory is
    assumed.  If set is just a single period (.) character then it
    will just enumerate the directory tree.

But wait, I remember using a FOR command in BASIC to loop through a bunch of integers.

FOR /L %variable IN (start,step,end) DO command [command-parameters]
    The set is a sequence of numbers from start to end, by step amount.
    So (1,1,5) would generate the sequence 1 2 3 4 5 and (5,-1,1) would
    generate the sequence (5 4 3 2 1)

That's all good stuff, if fairly pedestrian.  But the next option is the one I use all the time and starts to crack open the sonic screwdriver capability of the FOR command.

FOR /F ["options"] %variable IN (file-set) DO command [command-parameters]
FOR /F ["options"] %variable IN ("string") DO command [command-parameters]
FOR /F ["options"] %variable IN ('command') DO command [command-parameters]

    filenameset is one or more file names.  Each file is opened, read
    and processed before going on to the next file in filenameset.

These seem like different functions but they get grouped together because they provide similar input and use the same ["options"] (discussed below).  But look at what is available now.  I can provide a set of one or more files, each of which will be opened, parsed line by line, and acted upon.  I can provide a string that will get parsed and acted upon.  But most importantly I can provide a command and have the output of that command parsed and acted upon.  The command can be a native DOS function or some other command line utility that produces output.  This is huge! 

Before we get to some nifty screwdriving let's take a look at the options.

        eol=c           - specifies an end of line comment character
                          (just one)
        skip=n          - specifies the number of lines to skip at the
                          beginning of the file.

These are easy to grasp.  Use one character to mark the end of the line which allows adding comments to the input file (although you can use it for other purposes).  Skip past the first one or more lines of the input to avoid processing header information.  It would be handy if there was a way to mark the end of the input file but it is up to us to process that in our code.

        delims=xxx      - specifies a delimiter set.  This replaces the
                          default delimiter set of space and tab.

Use one or more characters to parse the input.  This lets me process a comma separated .csv file but with a little creativity you can do much more.

        tokens=x,y,m-n  - specifies which tokens from each line are to
                          be passed to the for body for each iteration.

So from the input  I get one or more lines of text that is going to be parsed by breaking it into tokens based on the delimiters specified.  For example parsing the string "A B C D" will produce 4 tokens, one for each letter.  In the FOR command I specify a variable (%X in the following examples) that by default will be assigned to the first token (A).  But if I want the variable to be assigned to a the third token I would specify "tokens=3" and my variable %X will have the value C

I can also generate multiple variables that follow in alphabetical sequence and assign them values based on the tokens I select.  So if I specify "tokens=2,4" the variable %X will have the value B and the variable %Y will have the value D.  Or I can specify a range so "tokens=2-4" will make %X=B, %Y=C, and %Z=D.

I can also use an * to generate one final variable whose value will be the rest of the unparsed line of text.  So "tokens=1,2*" will make %X=A, %=B, and %Z=C D.  Using "tokens=1*" will make %X=A and %Y=B C D"Tokens=*" will prevent parsing completely and make %X=A B C D.


This is another example with the same tokens= values described above.  Notice in the last 2 examples there is no token to assign to the variables at the end so the ECHO command just displays the variable name as if it were a string. 

An important point about the FOR command is that because it can return a sequence of variables it only allows single character variable names.  I tend to start with %A so I can get as many tokens as I might need.  If I need to nest FOR commands in a pipeline I usually start later in the alphabet for the FOR commands later in the chain to avoid collision.

Now scroll back up a bit and notice that the IN part of the FOR /F command can be a command, a string, or a file set.  And notice that the string to be parsed will be in double quotes.  But what if I have a file set that has a filespec that contain spaces?  Well, those filespecs will have to be enclosed in double quotes, otherwise they will be seen as different filespecs.  But if the filespecs are enclosed in quotes won't they be confused for strings?  Hmmmm... why, yes they would.  Well how do we get around this problem?  I'm glad you asked.

        usebackq        - specifies that the new semantics are in force

This option will use the backquote to specify an executable command, a single quote to specify a string, which leaves double quotes available for enclosing file names that contain spaces.  Those guys at Microsoft think of everything, don't they?

So now we have a bunch of arrows in our quiver.  Let's go shoot some stuff.

Who is your computer talking to?  The netstat command will show all ports your computer has open:
I have my browser open to google.com  so the IPv4 addresses are google's servers.  Let's parse that output and see how the traffic gets from my computer to google.

FOR /F "skip=4 tokens=1-4" %A in ('netstat -n') do IF %D==ESTABLISHED 
   FOR /F "delims=:" %X in ('echo %C') do tracert %X

This all goes on a single command line but word wraps in the box above.  In the first FOR command I am skipping the first 4 lines so I can ignore the column headings.  I get 4 tokens broken up by white space.  I am only interested in established connections so my IF checks %D, the fourth token, that shows the connection state.  If netstat tells me the connection is ESTABLISHED then I will parse the third token (variable %C) which has the IP address and port.  I use a second FOR command to echo the IP:port value and have FOR split the string at the colon to get just the IP address. I use that as the parameter for the tracert command which shows each hop on the way from my PC to google.

Well, that is interesting but not terribly useful.  Here is something I have actually used my job.  I needed a way to get the list of users from a domain global group.  The NET GROUP command was selected because it is available on every platform.  But the output from the command lists the users in three columns which isn't useful if you need to do something else with the information like drop it into a spreadsheet or pipe the account names into another command.  So I gave the requestors this:

@ECHO OFF
IF %1.==. GOTO :Done


FOR /F "skip=8 tokens=1-3*" %%I IN ('net group %1 /domain') DO CALL :DumpEm %%I %%J %%K
GOTO :Done

:DumpEm
IF %1.==The. GOTO :Done
ECHO %1
IF NOT %2.==. ECHO %2
IF NOT %3.==. ECHO %3

:Done

I made the font small on the FOR command so it would fit into a single line to avoid confusion.  In this FOR command I'm skipping the header lines and grabbing the three columns of names  (I will gloss over how I handle the input to the batch file for now.  That will be the subject of a future post.)

One thing to note is that because this is run from a batch file and not the command line the variables in the FOR command have to use two percent signs (%%).  In the IF statements the first thing I do is see if we are at the end of the output from the NET GROUP command.  For the other IF statements I only ECHO output if there is data.  (Sometimes you will see examples like IF NOT "%1"=="" but really all the IF statement command does is compare two strings.  If my variable has no value and "%1" resolves to "" or %1. resolves to .  Either way I have verified the empty string and since I am more efficient if I type less so I use the technique shown in my batch file.)


This shows the results of a normal NET GROUP command to compare it to the results of the ShowUsers.cmd file.  So I achieved the goal of skipping the header and footer and getting all the names in a single column.  Mission accomplished.

One final example that shows the directory parsing capabilities of the FOR command.  On our NTFS file servers I was asked to dump the Access Control Lists (ACLs for you cool guys, "who has access to what" if you are in upper management).  I just needed a simple list for the top level folders so I cam up with this.

:: Show the ACLs for the top level folders on each file server data drive
::

SETLOCAL
SET AdmFS=\\FS0111\E$
\\FS0113\E$ \\FS0115\E$
Set FN=FileServerACLs.txt

DEL /Y %FN%

FOR %%A in (%AdmFS%) DO FOR /D %%B in (%%A\*.*) DO CACLS "%%B" >> %FN%

I can't show you the output but I will describe what happens.  The variable %AdmFS% lists the administrative shares on three file servers.  The variable %FN% has the name of the output file which gets cleared by the DEL command each time the script runs.

Now I chain together two FOR commands, the first one parses %AdmFS% to call the second FOR command once for each file server.  The second FOR command lists the top level folders.  Because the folders may contain spaces the variable from the second FOR command is enclosed in quotes so CACLS will correctly process its value.  The results of CACLS are piped using >> so they always append to the output file.

The output from CACLS isn't pretty and fortunately I wasn't asked the follow up of having to list all the users in all of the groups that were output.  If I were working this project today I would use icacls.exe instead because the output is cleaner and the utility has more features.  And I would use Powershell instead of DOS because it provides more capabilities for parsing results and creating friendly reports.

Didn't I say I would talk about Powershell in the first blog post?  Isn't it about time I started?

Wednesday, March 14, 2012

Environment Under the Hood

In the last post I talked about several ways environment variables get created.  But a bunch of environment variables are available just by running the operating system.  Open a command prompt and enter the command SET to see all of the available variables.
Handy tip: If you follow SET with a letter you will see all of the variables that start with that letter.  In the example above SET L would show just %LOCALAPPDATA% and %LOGONSERVER%.

So where do these already available environment variables come from?  Well it looks a little something like this:
OK, not quite. 

There are 3 kinds of environment variables automatically available.  There are variables whose values are calculated at logon such as %USERNAME% and %COMPUTERNAME%.  These variables are static throughout the current session.

There are also variables that are static across sessions such as %OS% and %PATH%.  You can find these variables by going to computer Properties > Advanced > Environment Variables
Note there are two sections.  User variables are just available to the current user while System variables are available to all users on the computer.

Sometimes it is fun and informative to look under the hood.  How does the operating system know what these variables and their values between sessions?  If you said, "the registry", you win the prize.
HKLM/System/CurrentControlSet/Session Manager/Environment and HKCU/Environment are where to check.  If you need to you can hack the registry to create new system variables.

And note that even though these variables are static you can still change the values in your batch file.  It is not generally recommended but you can do it.  Most commonly a batch file will append a folder to the PATH with the command:

PATH %PATH%;C:\Some\New\Folder
So that's two types of variables.  But I said there were three.  Was I lying?  After all, I did mislead you earlier about where variables come from as a cheap excuse to show a Monty Python clip.  But I'm not lying, there really are three.

The third type of automatically available variables aren't displayed with the SET command.  These variables are not static during your session but are calculated as needed.  They are described at the end of HELP SET (or SET /? if you prefer).

%CD% - expands to the current directory string
%DATE% - expands to current date using same format as DATE command.
%TIME% - expands to current time using same format as TIME command.
%RANDOM% - expands to a random decimal number between 0 and 32767.
%ERRORLEVEL% - expands to the current ERRORLEVEL value
%CMDEXTVERSION% - expands to the current Command Processor Extensions
    version number.
%CMDCMDLINE% - expands to the original command line that invoked the
    Command Processor.

These variables can be mangled just like any other variable.

%CMDEXTVERSION% is used to determine what commands are available.  The value is 1 if the operating system is Windows NT and is 2 if the operating system is Windows 2000 or later.  If you check this variable and the value is 1 you are working with limited command set and you might be wearing clothes that are out of style.

On Windows 7 and 2008 there is another variable named %HIGHESTNUMANODENUMBER% which is the highest NUMA node number on the machine.  This is handy information in a multi-processor environment with NUMA support and multi-threaded applications.  But for batch file processing leveraging this variable is probably overkill.

Commenters: Can you think of a reasonable use for the %HighestNumaNodeNumber% variable?  Or is it like the pizza shop that offers pineapple as a topping knowing full well that nobody ever orders pineapple as a topping.

Saturday, March 10, 2012

Slice and Dice Environment Variables

http://www.cherifrost.com/?tag=slap-chop

There are 2 types of environment variables and the options available for creating substrings on those variables are different.  Environment variables that are preceded with a single % sign are passed as arguments into the batch file or subroutine or from the FOR command.  Environment variables that are bracketed by % signs are created by the operating system or with the SET command.

OS and SET variables:
%var:str1=str2% Replace "str1" with "str2" in %var%. If "str1" starts with "*" then replace text up to "str1"
%var:~a,b% Substring "b" characters long starting at offset "a"
%var:~a% Substring from character "a" to the end
%var:~-a% Substring of the last "a" characters
%var:~a,-b% Substring starting at "a" of all but the last "b" characters
Remember that the first character of the string is offset zero. So %var:~0% is the same as %var%.

Parameters and FOR variables:
%~Xremove quotes from var %X
%~fXfull path
%~dXdrive letter only
%~pXpath only
%~nXfile name only
%~xXextension only
%~sXshort name
%~aXattributes of file
%~tXdate/time of file
%~zXsize of file
Those dividers can be combined. So %~ftzaX will give a directory-like output for the variable %X showing the full path, date/time, size, and attributes.

All well and good but what about if you have a parameter variable from which you need a positional substring? Use the SET command to assign it to substringable variable.
SETLOCAL
SET X=%1
ECHO %X:~0,1%
This batch file will echo the first character of the string passed as the parameter.

What about the opposite case where you have a SET variable but you need to parse it as a file path?  I use a CALL statement to a subroutine which turns the variable back to a number parameter.
SETLOCAL
SET X=C:\windows\cmd.exe
CALL :Parse %X%

:Parse
ECHO %~d1 %~p1 %~n1 %~x1
The variable %X% becomes my %1 parameter in the subroutine, and I can parse that to get the drive, path, filename, and extension.

So let's put it all together to demo all the slapping and chopping. A couple of notes on the demo below. In the first part of the script the "&" lets you combine multiple commands on a single line. In this case I'm just adding a comment. (I prefer using "::" to make comments rather then the "REM" statement.) In the bottom part of the script I'm parsing the %0 variable which is the script file as its own parameter.

@ECHO Off
SETLOCAL
SET X=You're gonna love my nuts
ECHO %X:~13,4%   & :: Word in the middle
ECHO %X:~13%     & :: Words to the end
ECHO %X:~-7%     & :: Last 7 characters
ECHO %X:~7,-8%   & :: Skip first 7 and last 8 chars
ECHO %X:'= a%    & :: Replace apostrophe
ECHO.

ECHO Full path of this batch : %~f0
ECHO Drive letter            : %~d0
ECHO Relative path           : %~p0
ECHO Filename                : %~n0
ECHO Extension               : %~x0
ECHO Short name              : %~s0
ECHO File attributes         : %~a0
ECHO Date/time of last save  : %~t0
ECHO Size in bytes           : %~z0

http://www.dostips.com/  BTW, this guy has a bunch of cool tips and tricks if you get stuck having to work some complex batch files.

Commenters:  How else do you use substring capabilities for DOS environment variables?

Sunday, March 4, 2012

Why Script It?

Before we get into the nuts-and-bolts of making scripts to make you more productive, I want to take a minute to consider why you would or not build a scripted solution.  I run into many IT professionals who have great technical knowledge but never develop skills at automating their tasks.  They get the job done, but not always efficiently.

There are valid reasons for not scripting your solutions.  Maybe it really is a task you will do once.  Or something you do so infrequently that there is little value to spending the time to create, test, and debug your script.  Or maybe it is a task for which there are no facilities that allow scripting such as a GUI only interface.

But more often than not you can build an automated solution.  And the more you build scripts the more adept you become at doing it, which makes you more productive, which means you (hopefully) make more money.

Repeatability
In IT if you have to do something once you usually have to do it dozens or hundreds or thousands of times over the course of a contract or project or career.  By scripting your solution you ultimately save time. 

Say a task takes a minute to do manually.  And assume it takes you an hour to create, test, and debug a script that does the same task in a second.  If you have to do that task once a week you get your time back in a year and by the second year you've earned yourself an extra lunch hour.

Consistency
Sometimes consistency is just a stylistic issue such as ensuring consistent capitalization or naming conventions.  But it can be critical in situations where case sensitivity is important, objects need to be in specific locations, or other systems require reliable output from your process.

Delegation
If you need to pass tasks on to others it is to your advantage to make those tasks as simple and bulletproof as possible.  Sometimes the people performing the tasks will not have the best technical skills, sometimes they won't have the extra time, and sometimes they just won't care.  If you can give your delegate the ability to push a few buttons you have a better chance for success than asking them to follow a detailed list of manual tasks.

Reliability
By creating scripts in a test environment you can make sure everything works as expected before you run the tasks against production systems and live data.  By working out issues beforehand you can reduce (and hopefully eliminate) negative impact to end users and reliance on backups.

Change Management
By showing the steps you will take before you take them, and showing confirmation that the steps produce the desired results in your test environment, you generate a clear procedure that can be reliably approved by internal change management facilities.

Audit ability
I always make sure my scripts produce output showing the progress and results of the scripts.  I then save the logs with the project or change management records.  Two years from now when I get blamed for questioned about something my scripts did I can show a clear audit trail to refute the allegations.  And if it is my fault I have the script and output to provide the required details to make corrections.

Commenters: What other reasons do you have for creating scripted solutions for you administrative tasks?