Philip Steiner: 2010

Thursday, December 2, 2010

Smalltalk Lives

Somewhere in the heart of Reqpro, Smalltalk lives on. Got this notice when I quit:

Thursday, October 28, 2010

Quote of the day

functionality is an asset, but code is a liability
- Ted Dziuba on Taco Bell Programming

Friday, September 17, 2010

I don't understand why most generic PC keyboards, you know the kind that plug into a USB port on your PC, lack on-board USB ports themselves. It would be so convenient for things like keyboard lights. Back in the day, my Mac keyboard had Apple Desktop Bus ports, so I could plug the mouse directly into the keyboard.

Monday, September 13, 2010

Compute length of a string in a batch file

set myvar="some string"
rem compute the length of a string
set #=%myvar%
set length=0
:loop
if defined # (
   rem shorten string by one character
   set #=%#:~1%
   rem increment the string count variable %length%
   set /A length += 1
   rem repeat until string is null
   goto loop
)
echo myvar is %length% characters long!

Thanks to poster Secret_Doom on computing.net

Tuesday, September 7, 2010

How to fix "weird" characters in XML

I had been aggravated by an extraneous a-circumflex character (Â, Windows keymap Alt-0194, Unicode '\u, HTML character entity Â) preceding all the degree symbols in XML output from C++ code using MSXML. For historical reasons, the degree symbol is defined in code as a literal const std::string = "°", which is later added to an XML Document tree, then output using fout() function.
Here's an abridged sample:

<?xml version="1.0"?>
<Data>
<Temperature>Â°<Temperature/>
</Data>

[explain ISO-8859-1]. It appears that MSXML stores the character in the XML document as a two byte encoding. UTF-8 uses variable-width encoding, which means the ASCII character codes between 0 and 127 map directly to their hex values in UTF-8, e.g. the capital letter X, is 0x58 in both ASCII and UTF-8.

The hex value, C2B0 (1100 0100 1011 0000), represents a "lead unit". After some Googling, I found a very clear explanation from Andy Hassall to a similar problem:

But your PHP code may be trying to treat UTF-8 as single-byte ISO-8859-1.

A British pound symbol is two bytes in UTF-8 - it's U+00A3 which is 0xC2 0xA3
in UTF-8.

http://www.fileformat.info/info/unic...00A3/index.htm

If you tried to display this as ISO-8859-1 you'd get:

0xC2 = Latin capital A with circumflex
0xA3 = British pound symbol

http://en.wikipedia.org/wiki/ISO_8859-1

Mike Brown explains it this way:

Take, for example, the non-breaking space character, which in HTML we often write as " ", a predefined (in HTML, not XML) entity reference defined as equivalent to " ", which in turn is interpreted as the single non-breaking-space character. Different encoding schemes will represent this character as different bit sequences.

For example, in the "iso-8859-1" encoding, the non-breaking space character maps to the bit sequence 10100000, an 8-bit byte representing a value that we can also easily express as decimal 160 or hex A0. But in "utf-8", the non-breaking space maps to the bit sequence 11000010 10100000. If we interpret this as a pair of 8-bit bytes, we could say they represent the values hex C2 followed by A0 (192 and 160).

Now imagine you are the web browser, receiving an HTTP message containing an HTML document. All you see in the message is a stream of bits. How do you know what 11000010 10100000 means?

If you think the document is encoded using utf-8, you'll correctly interpret this sequence as one single NO-BREAK SPACE character (that's its Unicode name).
If you think the document is encoded using iso-8859-1, you will incorrectly interpret it as *two* characters: (0xC2) LATIN CAPITAL LETTER A WITH CIRCUMFLEX followed by (0xA0) NO-BREAK SPACE.

This was pretty much the same thing I was seeing. The XML processing instruction does not specify an encoding, so it defaults to ISO-8859-1. When MSXML renders the two byte degree character to single-byte ISO-8859-1, the first byte is saved as the Unicode lead unit 0xC2, followed by the single byte for the degree symbol, 0xB0. When the XML file is opened with a text editor (jEdit), the first byte is displayed as the a-circumflex, which happens to be 0xC2 in ISO-8859-1.

I fixed the problem by specifying the encoding in the XML processing instruction for MSXML:

<?xml version="1.0" encoding="UTF-8"?>
<Data>
<Temperature>°<Temperature/>
</Data>

It works equally well if the encoding is specified as ISO-8895-1. Now the XML declaration includes an explicit processing instruction for the encoding scheme, and MSXML renders the degree character as a single byte, 0xB0.

Wednesday, April 7, 2010

one-liner: mount the current directory to a temporary top-level drive letter

Suppose you're doing command line stuff in some ridiculously deep folder, e.g. C:\some path\to some\ridiculously deep\folder\way down\below a reasonable\level>, so the default command prompt pushes way over to the right edge of the screen, and any output containing the wraps 3 times in the window, and it's just really hard to read:

The following one-liner will "substitute" a temporary drive x: for the current path, effectively mounting C:\some path\to some\ridiculously deep\folder\way down\below a reasonable\level> as x:\

for /f "tokens=*" %i in ('cd') do @subst x: "%i"

Delete the temporary drive with the command subst /d x:

Tuesday, April 6, 2010

Substituting characters inline

Use the following technique to make character substitutions on the fly from DOS command results.

Given a file listing with underscores in the file names, e.g.

foo_first.txt
foo_second.txt
foo_third.txt

This is ugly, but it works. Start a new command shell with delayed variable expansion option cmd /v, then run the following one-liner, e.g. to replace the underscores with spaces:

C:\>for /f %i in ('dir /b') do @set x=%i & echo !x:_= !

The result will be:

foo first.txt
foo second.txt
foo third.txt

The first trick, delayed variable expansion, is enabled by starting a new command shell with the /v option. This lets you set and change a variable's value at runtime by surrounding the variable with bangs ! instead of percents %. Normally, the default no immediate variable expansion would not update the x variable each time dir /b returns a line. Only the last value set to x is echoed each time:

foo third.txt
foo third.txt
foo third.txt

The second trick, character substitution, is set by !x:_= !.

You can substitute other characters, e.g. !x:abc=def! would turn abcxyz.txt into defxyz.txt

Wednesday, March 17, 2010

The Metric Law of 90s

Thanks to Bruce F. Webster, who reminded me of this hoary old saying:

The first 90 percent of a development project takes 90 percent of the schedule.

The remaining 10 percent of the project takes the other 90 percent of the schedule.

Note (15 April 2010) - discovered this is also known as the Ninety-ninety Rule

Thursday, March 11, 2010

Experience

This notion puzzled me for some time: Have you got X years' experience [cooking, programming computers, running a business], or one year, repeated X times?

Simply, have you grown? Are you doing something new this year, something you couldn't have comprehended X years ago? Did you acquire a new skill, learn a new point of view, make a new friend, visit a new place?

If not, you're on the hamster wheel. Time to bail out.

Tuesday, January 5, 2010

How to run SQL*Plus one liners on Windows

Suppose you have a simple one line query to run on your Oracle database. Oddly, SQL*Plus doesn't take SQL statements as a command line parameter.

For a simple one liner, it's often too much work to start up sqlplus, logon, type in the SQL statement, then exit. It may also be too trivial to save the SQL statement to a file that you then call from sqlplus, e.g. C:\>sqlplus user/pass @myfile.sql.

To do it all in one line on the command prompt, echo the statement into sqlplus:

echo select count(*) from mytable; | sqlplus user/pass

This technique would probably also work on a Unix box, but I don't have an Oracle db installed on a Unix box to test it out.

Philip Steiner

Pages