Graham King

Solvitas perambulum

Rust: What I learnt so far

software rust

Update 2021: DO NOT USE THIS BLOG POST TO LEARN RUST The language has changed tremendously. The best way to learn is read Programming Rust. The second best way (but free!) is the official Rust book.

Update Spring 2020 I’m happy to report that seven years later I am now writing Rust for my day job (building an audio conferencing server). The best learning resource by far was The Rust Book (also free online here). Compared to Go, Rust is bigger and initially a little scary, but under it’s strict exterior it’s simpler than it looks. Start typing, it’ll work out. The compiler is amazing.


This applies to 0.7pre, many things have changed in 0.8. Particularly core was renamed to std, and std renamed to extra.

Rust is an open-source programming language being developed mostly by Mozilla. It’s goal is the type of applications currently written in C++ (such as Firefox). Details at the Rust Wikipedia page.

I’ve been learning bits of it the past few days, and whilst Rust is still rough around the edges there’s a lot to enjoy. Rust is only at v0.7pre and changing daily, so you may have to adjust some of the code here.

Rust is a big language, and unless you come from C++ it will probably make your head hurt. In a good way :-)

The two most helpful introductions I have found so far are:

I’d encourage you to run through both of those, starting with Rust for Rubyists. When you get stuck reading one of them (and you will), switch back here.

Contents:

Install

At time of writing Rust is v0.7pre:

git clone git://github.com/mozilla/rust.git
cd rust
git checkout incoming
./configure
make
sudo make install

The make step will take a while and heat up your machine nicely.

Hello world

fn main() {
	println("Hello World!");
}

So far so normal. We have curly braces, semi-colons at end-of-line, fn to declare a function, and main as the entry point of executables. Let’s run it.

Compile: rustc hello_world.rs. This makes a regular binary, hello_world

Run: ./hello_world.

Instead of the compile / run cycle you can: rust run hello_world.rs.

You can even make your rust file a shell script by adding as the first line:

#!/usr/local/bin/rust run

(don’t forget to chmod +x hello_world.rs).

Ask your name

/* Ask the user for their name */

fn ask_name(prompt: ~str) -> ~str {
	println(prompt);
	return io::stdin().read_line();
}

fn main() {
	let name = ask_name(~"What is your name?");
	println(fmt!("Hello %s", name));
}

You declare a variable with let, and optionally give a type. The compiler will try and infer the type, and complain if it can’t. Here’s some variables:

let x;
x = 10;   // Compiler will infer  x: int
let pi: float;
let name: ~str;

The built-in types are what you would expect, plus some: List of Rust built-in types

The ~ is (briefly) explained in the next section. For now ~str is just how you declare a string on the heap. You use it for string literals too, as in let name = ~"Bob".

All variables are immutable by default, you can’t change their value once set.

let x;
x = 10;
x = 42;  // Compile error

You allow yourself to change a variable by prefixing it’s name with mut:

let mut x;
x = 10;
x = 32;  // All good

Each .rs file is a module. The line io::stdin().read_line() is using an io.rs file from the std library. Modules are grouped into crates (libraries) and on unix compile to a standard .so file. The std library, which contains the io module, is imported by default, which is why you don’t see an import statement here.

Modules are explained in the tutorial under crates and the module system.

The final new part in this section is fmt!. The ! means it’s a macro, i.e. it’s expanded by the compiler. fmt is a very useful macro, because it does the printf style formatting with all the %s, %d, etc that you would expect.

The most useful part is %? which formats anything. You will probably use println(fmt!("%?", thing)) quite a lot. Unlike C’s printf, fmt is type checked at compile time.

Memory management

The hardest part of Rust for me to understand is the memory management, and the three pointer types which declare it. Rust wants you to tell it how the memory for each pointer should be managed and checked.

~ and @ both mean you have a pointer, you are using heap memory. Without one of those you have a normal local variable (stack memory).

@ means several places can point at that memory, and you want Rust to track who points there and garbage collect the memory. It’s like pointers in Go, and what happens internally in Java and Python.

~ means one place owns that memory. For garbage collection Rust only needs to track who the current owner is, and whether that owner is still in scope. Other pointers can refer to this memory, by “borrowing” it (using the third type of pointer, &), but only as long as the owner is in scope.

fn main() {

	let name: ~str,
		other: ~str;

	name = ~"Bob";

	other = name;
	println(other);

	// This won't compile, because 'other' now
	// owns the memory, you gave it away.
	println(name);
}

The last line in that example println(name) won’t compile because by then you have given away the unique access to that piece of memory, to other. Change all the ~ to @ and it works.

This is the most exciting part of Rust for me, because I hope that once I get it, it will improve my programming in other languages too.

Because @ is the type of pointer you’re probably familiar with, right now I bet you’re thinking “I’ll just use @” everywhere”. But you can’t, because the standard library will return unique pointers (~), and you can’t just put those into managed pointers @.

You can however put either owned or managed pointers into the third type, the borrowed pointer, written &. Don’t worry, the compiler will tell you when to use one of those :-) In general the compiler is very good at telling you when you have to wrong type of pointer, so I just do what the compiler tells me and move on.

There’s lots of other good stuff in Rust, so don’t get too caught up in the memory management, at least at first. Onwards!

Looping

fn main() {

	for 2.times {
		println("Basic loop sugar")
	}
	2.times(||{ println("Basic loop closure"); true });

	for [1,2,3].each |var| {
		println(fmt!("Sugary loop %d", *var));
	}

	[1,2,3].each(|var|{
		println(fmt!("Closurey loop %d", *var));
		true
	});
}

The loop syntax will look familiar to Ruby programmers. The two important features here are traits and closures.

Firstly, integers don’t have a method called times and vectors ([1,2,3] is a vector in Rust, meaning an array) don’t have a method called each. Those are added to the type by a trait, which I think is somewhere between an interface and a mixin. That can make it tricky to find which methods you can call on a given type.

Second, loops 1 and 2, and loops 3 and 4 are the same. for is just a nicer way of writing the loop beneath it.

Look at the documentation for times to see what I mean. See how it’s a method which takes a function? The first parameter will be familiar to Python programmers, it’s the object the method is called on. Ignore that, and look at the second parameter, which is a function with no arguments, returning a boolean.

Closures are well explained near the end of the Rust for Rubyists, Fizzbuzz chapter. They are essentially an anonymous function, declared by listing their parameters within ||, and their code in a block afterwards.

You give a closure to your iterator, and it calls it once each time through the loop. This will be familiar if you know Javascript.

The bool that the closure returns tells the iterator whether to keep running or not. With the for sugar, true is assumed unless you call break. With the un-sugared version you have to explicitly say true.

Finally (yes there’s a lot going on in Rust) notice that it just says true, not return true;. A block in Rust is an expression (as opposed to a statement), so it can evaluate to something. If you end your block with a semi-colon, it doesn’t have a value. If you don’t end with a semi-colon, it has the last thing you wrote. So you can set a variable like this:

let has_sanity =
    if 1 == 1 { true }
    else { false }

By just saying true at the end of the closure you give to the iterator, the whole block evaluates to true, and the loop keeps going.

This is clearly explained in the tutorial’s Syntax basics section, in Expressions and semicolons.

Read a file

fn load(filename: ~str) -> ~[~str] {

	// The simple way:
	// let read_result = io::file_reader(~path::Path(filename));

	let read_result: Result<@Reader, ~str>;
	read_result = io::file_reader(~path::Path(filename));

	if read_result.is_ok() {
		let file = read_result.unwrap();
		return file.read_lines();
	}

	println(fmt!("Error reading file: %?", read_result.unwrap_err()));
	return ~[];
}

fn main() {
	let contents = load(~"myfile.txt");
	println(fmt!("%?", contents));
}

The loading itself is straightforward: Turn the string filename into a Path object (with path::Path), build a Reader object (with io:file_reader) and return all the lines (with file.read_lines()).

The interesting thing here is the Result object which wraps multiple returns, and is how error handling in Rust is often done.

Result (enum has gone away in new version it seems) is a disjoint enumeration, containing either the success result (a Reader here), or an error (here a string). Many Rust methods return a Result, and Rust’s pattern matching is often used to check for errors.

A more “Rustic” way of writing this uses Pattern matching:

fn load(filename: ~str) -> ~[~str] {

	// The simple way:
	// let read_result = io::file_reader(~path::Path(filename));

	let read_result: Result<@Reader, ~str>;
	read_result = io::file_reader(~path::Path(filename));

	match read_result {
		Ok(file) => return file.read_lines(),
		Err(e) => {
			println(fmt!("Error reading file: %?", e));
			return ~[];
		}
	}

}

fn main() {
	let contents = load(~"myfile.txt");
	println(fmt!("%?", contents));
}

match is similar to switch. The Result enum contains two variants: Ok and Err.

Oh, and the angle brackets in Result<@Reader, ~str>? Yes, Rust has generics. Hopefully you know them from Java or C#. They’re a way of making static typing more flexible, so that you can for example define a Map which works on any type, but is still type checked by the compiler. The types in the map are specified when you create an instance of it.

Connect to a socket

extern mod std;

use std::{net_tcp,net_ip};
use std::uv;

fn fetch(code: ~str) -> ~[~str] {

	let ipaddr = net_ip::v4::parse_addr("205.156.51.232");
	let iotask = uv::global_loop::get();
	let connect_result = net_tcp::connect(ipaddr, 80, &iotask);
	let sock;

	let data_get = fmt!(
		"GET /pub/data/observations/metar/decoded/%s.TXT\n HTTP/1.0",
		code.to_ascii().to_upper().to_str_ascii());
	// On 0.6 master branch, this line above should be:
	// code.to_upper().to_str()

	let data_headers = "Host: weather.noaa.gov\n\n";

	match connect_result {
		Ok(socket) => { sock = net_tcp::socket_buf(socket); }
		Err(e) => { println(fmt!("%?", e)); return ~[]; }
	}

	sock.write(data_get.to_bytes());
	sock.write(data_headers.to_bytes());

	return sock.read_lines();
}

fn main() {
	let contents = fetch(~"CYVR");
	println(fmt!("%?", contents));
}

First we declare that we’re using the std external crate (meaning library). Until now we were only using core, which is imported by default (this has changed in 0.8 core->std, and std->extra).

Then we declare which parts of std we use. We don’t have to do this, but it allows us to avoid prefixing everything with std::.

We turn the IP address we’re connecting to (US weather service) into an internal representation, connect to port 80, wrap the socket in a Reader (at the socket_buf call) and finally in the last line read everything and return it (sock.read_lines()).

Of particular interest is this line:

let iotask = uv::global_loop::get();

All IO in Rust is non-blocking, so that Rust can be highly concurrent using tasks, lightweight threads similar to Go’s go-routines, Python’s gevent greenlets, or node.js. It uses libuv for this.

Here we say to run the blocking I/O task on libuv’s global loop. Yeah, I don’t know what that means either. Let’s move on.

Objects

extern mod std;

use std::time;

struct User {
	name: ~str,
	age: int
}

impl User {

	fn new(name: ~str, age: int) -> User {
		User{name: name, age:age}
	}

	fn say_hi(&self) {
		println(fmt!("%s greets you.", self.name));
	}

	fn when(&self, age: int) -> int {
		if age < self.age { -1 }
		else { 1900 + (time::now().tm_year as int) + age - self.age }
	}
}

fn main() {
	//let u1 = User{name: ~"Bob", age: 36};
	let u1 = User::new(~"Bob", 36);
	u1.say_hi();

	println(fmt!("%s will be 40 in %d", u1.name, u1.when(40)));
}

Objects in Rust are a struct to hold the data, plus some functions grouped in an impl block.

The first function new is static. That’s the preferred way to define constructors. It doesn’t have to be called new, and constructors are optional (we’re just making a struct).

The second and third functions are the methods. They take the object itself as their first argument &self, just like in Python.

In new and when we’re using a functional style, without explicit return statements. The other item of note is that you cast with as. We’re casting an i32 (32-bit int) coming from time::now().tm_year to a regular int (size machine dependant).

Use an external module – sqlite3

Let’s build and use an external module – a sqlite wrapper. Rust has a package manager called rustpkg which will install modules for you, but for now we’ll do it manually. Make sure you have SQLite 3 development files, package libsqlite3-dev in Ubuntu / Debian.

git clone git://github.com/linuxfood/rustsqlite.git
cd rustsqlite
rustc sqlite.rc

Compiling it will give you libsqlite-<something>.so. The .rc is just a convention for .rs files which contain libraries, and that convention may change soon. Anyway, let’s use that library:

extern mod sqlite;

fn db() {

	let database =
		match sqlite::open("test.db") {
			Ok(db) => db,
			Err(e) => {
				println(fmt!("Error opening test.db: %?", e));
				return;
			}
		};
	let mut result = database.exec("CREATE TABLE test (name text, age int)");
	println(fmt!("Create OK? %?", result.is_ok()));

	result = database.exec("INSERT INTO test VALUES ('Graham', 36)");
	println(fmt!("Insert OK? %?", result.is_ok()));
}

fn main() {
	db();
}

First we declare usage of the sqlite library, just like we did for std previously.

In the let database = part we’re using pattern matching, and the functional style evaluation. In the Ok case, we just wrote db, so db gets returned. This behaves like database = db.

At line 12 we set result (a Result) to be mutable, because we re-use it at line 16.

There are more example of using sqlite in the test suite at the bottom of rustsqlite’s sqlite.rc.

To compile it you need to tell rustc where to find the library you are using. Assuming you copied libsqlite-<something>.so into the current directory, you just:

rustc -L . use_sqlite.rs

Mutable pointers

There’s one more bit related to memory management that might trip you up. Everything in Rust is immutable by default (constant). To make a variable mutable, you simply say mut in front of it.

let mut x: int;

Easy enough, right? The trick is that in the case of a managed (@) pointer, there are two things which can change – the data the pointer points to, or the pointer itself. This is also the case for unique pointers, but unique pointer contents inherit the mutability of the variable pointing to them. Managed pointers do not.

struct Point { x: int, y:int }

fn main() {

	// Local variable, easy
	let mut p1 = Point{x:1, y:2};
	p1.x = 10;

	// Owned pointer
	// also easy because target inherts mutability
	let mut p2 = ~Point{x:1, y:2};
	p2.x = 10;

	// Managed pointer

	// Not so easy. This won't work
	let mut pNO = @Point{x:1, y:2};
	pNO.x = 10;   // You could change pNO, but not pNO.x

	// Need to make the contents mutable.
	// Do not need to make the variable
	// itself mutable - you can
	// change p.x, but not p.
	let p3 = @mut Point{x:1, y:2};
	p3.x = 10;

	println(fmt!("%d, %d, %d", p1.x, p2.x, p3.x));
}

Multiple files

Each file is a module, and several files can make up a binary or library. This is module m2:

m2.rs

pub fn say_hi() {
	println("Hi!");
}

Names (functions, structs, etc) are private by default. Adding pub makes them public, accessible from other modules.

m1.rs

mod m2;

fn main() {
	m2::say_hi();
}

You reference other modules with mod <name>. By default module x is stored in file x.rs. Directories create a hierarchy of modules.

To compile: rustc m1.rs The compiler will include m2.rs automatically.

That’s everything I’ve learnt so far! More soon.