An Interesting Site For Computer Languages

This is a fun and fascinating site for anyone who uses, or is thinking of using, a computer programming language. It has code snippets in different computer languages so you can quickly see how they compare, and decide which one looks easiest to use on any particular class of problem.

http://rosettacode.org/

Rosetta Code

Rosetta Code is a programming chrestomathy site. The idea is to present solutions to the same task in as many different languages as possible, to demonstrate how languages are similar and different, and to aid a person with a grounding in one approach to a problem in learning another. Rosetta Code currently has 775 tasks, 167 draft tasks, and is aware of 589 languages, though we do not (and cannot) have solutions to every task in every language.

http://rosettacode.org/wiki/Copy_a_string

For example, lists 131 language samples.

Ada

Long long ago I thought of learning Ada, but didn’t. Still, it is similar enough to many other languages that I can read it OK. At one time, Ada was expected to replace all other languages for D.O.D. work. Don’t know how far that got…

Ada

Ada provides three different kinds of strings. The String type is a fixed length string. The Bounded_String type is a string with variable length up to a specified maximum size. The Unbounded_String type is a variable length string with no specified maximum size. The Bounded_String type behaves a lot like C strings, while the Unbounded_String type behaves a lot like the C++ String class.

Fixed Length String Copying.

Src : String := "Hello";
Dest : String := Src;

Ada provides the ability to manipulate slices of strings.

Src : String := "Rosetta Stone";
Dest : String := Src(1..7); -- Assigns "Rosetta" to Dest
Dest2 : String := Src(9..13); -- Assigns "Stone" to Dest2

So lots of colons and semi-colons in the Pascal way…

Algol

What landed me at the site was Algol. If was fun to see how much I still remembered. Here the Algol-68 example shows the simplicity of the language:

ALGOL 68

In ALGOL 68 strings are simply flexible length arrays of CHAR;

(
  STRING src:="Hello", dest;
  dest:=src
)

You can also see that Ada borrowed heavily from Algol (as did Pascal… but hard to call that a borrow as the same guy wrote both…)

Then Algol-W shows a closer relation to the BEGIN-END languages (where 68 has parenthesis that show up in C as braces). While this looks more complicated, note that it is including comments, and a write statement not in the above. A straight copy would look almost the same as 68 but with BEGIN-END and a statement of fixed length.

ALGOL W

begin
    % strings are (fixed length) values in algol W. Assignment makes a copy   %
    string(10) a, copyOfA;
    a := "some text";
    copyOfA := a;
    % assignment to a will not change copyOfA                                 %
    a := "new value";
    write( a, copyOfA )
end.

I’m rather more fond of Algol-68… then again, it is what I learned in about ’73 so…

FORTRAN

Now a lot of folks wonder why I’m happy to use FORTRAN. Call it names like ‘old’ or ‘primitive’ or ‘whatever’. Yet it has a clean and terse nature that makes many things quite easy. So here’s the FORTRAN:

Fortran

str2 = str1

Because Fortran uses fixed length character strings if str1 is shorter than str2 then str2 is padded out with trailing spaces. If str1 is longer than str2 it is truncated to fit.

Now this does leave our the declaration of the variables str1 and str2, so a little misleading in that sense. Still, it shows the direct and uncomplicated nature of the language. Compare with the following example in C.

C

Right off the bat we have to specify a bunch of headers and libraries to include… Can’t the compiler figure that out on it’s own? Well, not really…

Do note that this example has three methods of doing the job in it. Each with different warnings and modes of failure. Oh, and we get to do memory management too.

C

#include <stdlib.h>	/* exit(), free() */
#include <stdio.h>	/* fputs(), perror(), printf() */
#include <string.h>
 
int
main()
{
	size_t len;
	char src[] = "Hello";
	char dst1[80], dst2[80];
	char *dst3, *ref;
 
	/*
	 * Option 1. Use strcpy() from .
	 *
	 * DANGER! strcpy() can overflow the destination buffer.
	 * strcpy() is only safe if the source string is shorter than
	 * the destination buffer. We know that "Hello" (6 characters
	 * with the final '\0') easily fits in dst1 (80 characters).
	 */
	strcpy(dst1, src);
 
	/*
	 * Option 2. Use strlen() and memcpy() from , to copy
	 * strlen(src) + 1 bytes including the final '\0'.
	 */
	len = strlen(src);
	if (len >= sizeof dst2) {
		fputs("The buffer is too small!\n", stderr);
		exit(1);
	}
	memcpy(dst2, src, len + 1);
 
	/*
	 * Option 3. Use strdup() from <string.h>, to allocate a copy.
	 */
	dst3 = strdup(src);
	if (dst3 == NULL) {
		/* Failed to allocate memory! */
		perror("strdup");
		exit(1);
	}
 
	/* Create another reference to the source string. */
	ref = src;
 
	/* Modify the source string, not its copies. */
	memset(src, '-', 5);
 
	printf(" src: %s\n", src);   /*  src: ----- */
	printf("dst1: %s\n", dst1);  /* dst1: Hello */
	printf("dst2: %s\n", dst2);  /* dst2: Hello */
	printf("dst3: %s\n", dst3);  /* dst3: Hello */
	printf(" ref: %s\n", ref);   /*  ref: ----- */
 
	/* Free memory from strdup(). */
	free(dst3);
 
	return 0;
}

All great stuff if you are writing operating systems, not so much good if you just want to read / copy / write a string…

C# gets some of the simplicity back (but with other complexity shoveled in elsewhere) while C++ is somewhat in the middle:

C++

#include <iostream>
#include <string>
 
int main( ) {
   std::string original ("This is the original");
   std::string my_copy = original;
   std::cout << "This is the copy: " << my_copy << std::endl;
   original = "Now we change the original! ";
   std::cout << "my_copy still is " << my_copy << std::endl;
}

C#

string src = "Hello";
string dst = src;

R

So I’ve also mentioned that I’m starting to learn R. Why? Well, in addition to all that statistics stuff in it, the general design is pleasing and simple to use. Here’s R:

R

Copy a string by value:

str1 <- "abc"
str2 <- str1

I think you can quickly see how cruising though a couple of examples can let you focus in on what language suits you (and the problems you are solving) and potentially save months of time spent learning a language you later find is way more trouble than an alternative…

UNIX, Linux Shell Scripting

I also tend to so a lot with shell scripting. It, too, is simple and direct:

UNIX Shell

foo="Hello"
bar=$foo    # This is a copy of the string

The only ‘weirdness’ being in the use of the “$” sigil where you assign something to a variable by just using the name unadorned (no sigil) but get the contents out by reference with the sigil. Note that everything after the # is a comment.

As it is an interpreted language, you just type and go. No compile or link needed. And, since things are ‘threaded’, you can build up a ‘library’ of commands that can call each other. For example, things are traditionally kept in a directory named ‘bin’ (short for ‘binary’ from back when programs were mostly binary compiled things). Then in ‘your bin’ you can have little code snips that work together. In fact, I ‘make new commands’ so often, I have commands to do that for me.

“cmd” makes a command by tossing me into the editor in the right spot, then making the program “executable” by allowing it to run. In scripting you an use $1 $2 etc. for parameters passed in when called. So if I typed “cmd frog” the value “frog” would be assigned automagically to $1 and the command would end up named ‘frog’.

[chiefio@CentosBox ~]$ bcat cmd
cd $HOME/bin
vi $1
allow $1

What’s “allow”?

[chiefio@CentosBox ~]$ bcat allow
chmod +x $*

I just find it easier to type ‘allow’ than ‘chmod +x’, but I could have put that in the cmd script instead.

Similarly, that ‘bcat’ thing:

[chiefio@CentosBox ~]$ bcat bcat
cd $HOME/bin
more $*

It just goes to my ‘bin’ and prints out a particular command using the ‘pager’ named ‘more’ (so multi-page things take their time and let me page through them).

I have a lot of these kinds of things built up over time. On the CentOS box, which has little on it, I’ve already got:

[chiefio@CentosBox ~]$ lsb
200	    compx	getchiefio  l	      offbots	running    topinflect
2009	    cpdir	gethadley   lf	      olddata	runningp   tops
2010	    crush	grepmean    lildips   olds	runsect    trade
airport     crush2home	grp	    lls       over200	salpha	   trimstn
allow	    crush2sd	grpe	    lsb       over25	stnc	   v
archive     cum		h	    lsl       over50	stncnt	   v2prune
arscom	    cutPalog	had	    middips   pfg	swapstat   viinv
bcat	    dfdelta	heat	    modcnt    pfs	ted	   vimean
bigdips     dks		incomb	    mor       pfsearch	tick4	   xarchive
bin	    dogs	inf	    moreswap  pingout	tickcull4  xcrush
botinflect  DU		infull	    mreo      RPi	tickp
bots	    dusort	inin	    mroe      runbot	tickup
box	    form	inj	    nairp     runbots	tickupall
cm	    fortony	inpf	    nsbox     runetf	today
cmd	    g		inu	    offbot    runlist	top3

Where you can see things like “mor”and “mroe” for when I mis-type “more” …

Part of why I love the Unix / Linux world so much is that you can rapidly capture a process and just “have it” as a word. So “RPi” has the ssh command to log me into the Raspberry Pi doing dns and bittorrent. One word, I have my login prompt… The 200, 2009 and 2010 do various searches on GHCN data for runs with different sets of data, Some of the pf— series do different kinds of stock searches. You can make commands for whatever you do regularly.

FORTH

Next, we’ll look at FORTH since Simon uses it and it is an interesting language. FORTH is also a threaded interpreted language where you build your own dictionary of “Forth words”. But it tends to operate closer to the hardware and depends on a ‘stack machine’ to run. So things put backwards are. Reverse Polish way like HP calculators.

Forth

Forth strings are generally stored in memory as prefix counted string, where the first byte contains the string length. However, on the stack they are most often represented as pairs. Thus the way you copy a string depends on where the source string comes from:

\ Allocate two string buffers
create stringa 256 allot
create stringb 256 allot
 
\ Copy a constant string into a string buffer
s" Hello" stringa place
 
\ Copy the contents of one string buffer into another
stringa count  stringb place

OK, so it includes the declaration of the variables, and a couple of comments. Other than that (which FORTRAN ought to do but didn’t show) it’s about the same complexity. Just way-a-round other is. So: variable other variable put. Also note that it uses the backslash for comments.

There’s some more ‘fine points’ like that use of s”space is a ‘Forth word’ and the space does not end up in the string, but delimits the ‘word’ that does the declaration of the string literal. (It actually loads “Hello” onto the stack register, then ‘place’ puts it into the next variable. To use Forth, you need to keep a vision of the stack machine in your head. But after that, it goes fast…)

Python

Why have I never actually learned Python beyond just enough to make prepackaged things ‘go’? Well, first off, you must pick an era. The language has mutually exclusive dialects. Then, there are enough ‘strangenesses’ that by the time you read about them, you could be done in a half dozen other languages…

Python
Works with: Python version 2.3, 2.4, and 2.5

Since strings are immutable, all copy operations return the same string. Probably the reference is increased.

>>> src = "hello"
>>> a = src
>>> b = src[:]
>>> import copy
>>> c = copy.copy(src)
>>> d = copy.deepcopy(src)
>>> src is a is b is c is d
True

To actually copy a string:

>>> a = 'hello'
>>> b = ''.join(a)
>>> a == b
True
>>> b is a  ### Might be True ... depends on "interning" implementation details!
False

As a result of object “interning” some strings such as the empty string and single character strings like ‘a’ may be references to the same object regardless of copying. This can potentially happen with any Python immutable object and should be of no consequence to any proper code.

Be careful with is – use it only when you want to compare the identity of the object. To compare string values, use the == operator. For numbers and strings any given Python interpreter’s implementation of “interning” may cause the object identities to coincide. Thus any number of names to identical numbers or strings might become references to the same objects regardless of how those objects were derived (even if the contents were properly “copied” around). The fact that these are immutable objects makes this a reasonable behavior.

So Python is not in my future unless someone pays me a lot of money…

Perl

Perl has similar ‘version’ issues, with 6 being a major change. But the basic form is still pretty direct. But Perl has other complications that make it a bit messy for my tastes for general purpose use. (It can be a life saver for some kinds of sys admin, though, so I’m happy to ‘go there’ if needed). Here’s the ‘pre-6’ way… or really ‘ways’ as there are several. Note that stuff after a # is a comment. Notice it is kindof like shell script above in the use of sigils, but now they are on both sides of the equation.

Perl

To copy a string, just use ordinary assignment:

my $original = 'Hello.';
my $new = $original;
$new = 'Goodbye.';
print "$original\n";   # prints "Hello."

To create a reference to an existing string, so that modifying the referent changes the original string, use a backslash:

my $original = 'Hello.';
my $ref = \$original;
$$ref = 'Goodbye.';
print "$original\n";   # prints "Goodbye."

If you want a new name for the same string, so that you can modify it without dereferencing a reference, assign a reference to a typeglob:

my $original = 'Hello.';
our $alias;
local *alias = \$original;
$alias = 'Good evening.';
print "$original\n";   # prints "Good evening."

Note that our $alias, though in most cases a no-op, is necessary under stricture. Beware that local binds dynamically, so any subroutines called in this scope will see (and possibly modify!) the value of $alias assigned here.

To make a lexical variable that is an alias of some other variable, the Lexical::Alias module can be used:

use Lexical::Alias;
my $original = 'Hello.';
my $alias;
alias $alias, $original;
$alias = 'Good evening.';
print "$original\n";   # prints "Good evening."

So it lets you do just about anything… which is nice sometimes… unless you just wanted to copy a string and be done quick… and found yourself researching 10 different ways and their details.

I won’t copy the whole thing for Perl6, but you can see the capricious change from “print” to “say” in the first example:

Perl 6

There is no special handling needed to copy a string; just assign it to a new variable:

my $original = 'Hello.';
my $copy = $original;
say $copy;            # prints "Hello."
$copy = 'Goodbye.';
say $copy;            # prints "Goodbye."
say $original;        # prints "Hello."

I hate capricious changes of reserved words in a language… so I generally avoid Perl now. Yes, petty on my part… but once I have something in my mind, I like to leave it alone and not overlay it with a bunch of re-writes and exceptions and untidiness.

In Conclusion

I know most folks will not find this ‘tech talk’ all that interesting. But hopefully folks will enjoy just looking at the variations in what computer languages can look like. It also lets you get a feel for what us folks who use them go through to get good with one (or worse, a dozen).

Also, should anyone be thinking about learning a programming language, just knowing you can go ‘cruise the examples’ can help clarify what is attractive and what is not. Much easier than realizing ‘that language is crap’ half way through the semester…

It is also an interesting testimony to the creativity of people to have created so many new languages in so short a time. But that is also a problem. Preserving past work is very hard when language fads change so fast. There is a great deal of work needed just to preserve access to older things written in older languages. Then add in that the ‘dialect’ slowly changes over time for many more active languages. Even FORTRAN has made changes that make early vs late look like entirely different languages.

But hopefully with a ‘Rosetta Stone’ of code, it won’t be that much of a problem…

Subscribe to feed

About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in Tech Bits and tagged , , . Bookmark the permalink.

4 Responses to An Interesting Site For Computer Languages

  1. LG says:

    FYI.
    It could a WordPress ? / HTML ? issue.
    In the example showing how to copy strings in C,
    the LTxxxxx.hGT ‘s got dropped.

  2. Sandy McClintock says:

    If you like a GUI for R you can check out Rstudio. It has a nice business model which is basically use it at your own risk. However, if you are a large corporation and need certification and someone to keep your systems up to date and at the level of ‘worlds best practice’, you can afford to pay for services. Their development team includes some of the biggest names in the R-world.

  3. E.M.Smith says:

    @LG:

    Looks like wp stripped things between angle brackets so I’m off to Unicode Land to fix it…

    I think it’s fixed now… Tnks.

    @Sandy:

    I usually learn things long hand text, then shift to GUIs later if desired. Too many years fixing things remote over slow phone lines has left me unwilling to depend on any GUI…

  4. LG says:

    @ ChiefiO,
    Please consider sharing the details of some of those customized commands and where to place them as a teaching guide
    Thanks.

Comments are closed.