Table of Contents

Ascii

Quick Tips

Ascii is a simple user-written Stata program, written by Adrian Mander. The program returns ascii characters and their number in SMCL markup language. In this article I will go through the program and explain how it works.

Introduction

ascii is a user-written package, developed by Adrian Mande that merely returns numbers of ASCII characters in SMCL markup language.
ASCII (pronunced /askee/), abbreviated from “American Standard Code for Information Interchange” which is a is a character-encoding scheme. The complete list of ascii table is printed below.

The image above is exactly what the ascii package is about, printing the ascii number and showing it’s character! here is an example.

Algorithm

The algorithm of the program can be simplified as described below:

ascii

Analysis of tabmiss package

Next, I begin explaining the program codes step by step to analyze how the program works.

The program begins by defining the name of the program, i.e. tabmiss, and the version of the Stata that the program should be run with, i.e. version 7.0. The program uses Stata syntax and expects a variable list and logical expression (i.e. if and in) for making a selection of observations. However, the varlist is considered optional, which means that if the user does not specify the variables, the program will take all the variables that are loaded into Stata into account. Similarly, using the if and in expressions are optional.

The first row of the returned output is just a table that defines the title of each column. Therefore it should be printed before printing the results of the calculations. Here is how the table and the columns’ titles are defined. For creating the table, {hline #} was used to create horizontal lines (number of # indicates the width of the line in terms of number of characters). In addition, {c |} and {c +} were used to create vertical lines of the table. {c |} is one of the SMCL syntax for creating a vertical tall line and the {c +} which is placed under the {c |}, creates a wide dash and extends the tall vertical line created by {c +} and merges it to a horisontal line. In fact, the {c +} is used to connect tall vertical and horizontal lines together.

You might wonder that what does the _col(30) do? It just prints the second part of the string variable from the column 30, i.e. the 30^th character. Let’s explain this in more details to make sure you understand it clearly. The table begins with displaying the ” Variable {c |} Obs” string which includes 4 empty character + Variable + 1 empty character + {c |} + 5 empty characters + Obs. Note that {c |} is considered to be only one character. In the example below, the white-space characters are shown with hashtags.

So overall this string is 22 characters long. The _col(30) SMCL syntax prints the rest on the character number 30 and therefore skips 7 more characters. Therefore the two commands below should yeild identical results:

The difference between the two commands in the example above is that the first command adds 7 empty characters between Obs and X and the second command uses _col(30) to add space between two separate strings. Now let’s print the actual command that was used for creating the table.

For completing the table that includes the count and relative frequency of the missing and non-missing observations, tabmiss run all the variables though a loop and prints the values in the table. In the loop, the varlist is the macro that includes the variable list. Note that varlist is a local that includes the variable list. The example below demonstrates how to loop over a local macro.

So how does the program count the number of missing observations in each variable? There are many solutions to this problem, but tabmiss creates a temporary binary variable as an indicator of missingness. The variable – which is named contar is generated scored using rowmiss function which belongs to the egen command. Therefore, the temporary variable will have value of 1 for missing observations (rows) and 0 for non-missing observations.

Once the temporary variable is generated, again, there are different possibilities for counting it. Tabmiss prefers to obtain the count by running the summarize command for observations that equal to 1 and then using the r(N) scalar which is automatically returned by the summarize command (type return list after the summarize command to see all the scalars that the command returns).

Tabmiss saves the value of the total number of missing observations in a local macro named faltan. This name itself has no meaning and you could change it to anything (call it Mr_Carrot if you want!) and it would work the same. faltan will be used later on for calculating the frequencies.

â Note that quietly sum means quetly summarize and it is not summing up the variable!

To calculate the total number of observations (both missing and non-missing), tabmiss uses the same procedure that was used for counting the number of missing observations, i.e. using the summarize command to return the total number of observations in the r(N) scalar. The difference is that no logical expression is used to limit the command only to missing observations.

The summarize command uses the temporary variable i.e. contar. alternatively, the actual variable could have been used by using the `i’ instead of `contar’. Although logically, summarizing a binary variable is probably faster than the original variable (assuming you have are running Stata on the same computer that you got on your birthday when you turned 14 and the data set includes tens of thousands of observations). The total number of observations is saved in a local macro named obser.

To count the number of non-missing observations, the total number of observation which is stored in local `obser’ is deducted from the number of missing observation which is stored in local `faltan’. The obtain value that indicates the count of non-missing observations is stored in local nomiss.

Once you understand that the what `falatan’, `nomiss’, and `obser’ macros include, understanding this command becomes very simple. These locals include numbers and can be used in arithmatic operations. As explained in the algorithm, the frequencies are calculated by dividing the number of missing observations to the total number of observations and multiplying the results by 100. The same procedure is used for calculating the frequency of non-missing.

So far, we are calculate the number of missing and non-missing observations and their frequencies as well. All we have to do to finish the program is to print these values in the columns that we have defined at the outset of the program. Since this part is in the loop, the program will loop over the variables and complete the table.

So what display in text %12s is supposed to mean? The content of the display command can be reformated. The format syntax is different based on the content. For String content, the format is % + number + s. Formating the strings allow to consider a constant width for the column. In this example, the minimum width of each variable is set to 12 characters. Therefore, if a variable name is 5 character long, such as “price”, it will begin with 7 empty spaces to align the variables to the right side of the column, where the line is drawn. For example, try the following command in your Stata:

This example makes it clear that when the string variable is shorted than 20 character long, it will be aligned to the right by adding empty space characters to the left side of the variable. However, this function does not limit or fit the variables which are longer than 20 characters.

So we learned how to keep the variables organized in the table, but what if the string (variable name) is longer than 12s which is specified in the program? If you remember, the column of the Variables was 13 character wide (####Variable#{c |}). What happens if we have a variable which its name is longer than 13 characters? Obviously, it will deform the table. To avoid that problem, tabmiss consider a limit for the maximum width of a variable as well, name which is 12 characters. The abbrev() function which is a string function is used to abbreviate the variables which are longer than 12 characters. abbrev(s,n) is usually used with variable names for abbreviating them, although it can abbreviate any string by the n given number of characters. In tabmiss this number is set to 12.

tabmiss also reformats the counted and calculated numbers. These are all Numeric formats because each local macro includes whether an integer (counted number) or a frequency. The distance between the numbers is also added using empty strings. To make sure that you are creating a reasonable table, you should practice reformating the results very carefully. For detailed explanation in this regard, I refer you to the u manual, part Formats: Controlling how data are displayed.