June 30, 2015

Go: Slice search vs map lookup

Posted in Software at 07:09 by graham

tl;dr Use a map.

Computer science 101 tells us that maps are constant time access, O(1). But what is the constant? The map has to compute the hash, find the right bucket (array access), the right item within the bucket (another array access), and potentially do that multiple times as we walk a chain of buckets (if we overflowed the original bucket). At what point is it faster to iterate through an array comparing each item until we find the one we want?

Motivated by this comparison in C++ I decided to compare Go’s built-in map and slice types. The code is in a gist, a simple set of benchmark tests.

I tried two cases:

  • first a traditional key-value setup comparing map[string]string with []*Item{string,string}. The break-even point here is just five items. Under that the slice is faster, above it is slower.
  • second a set of integers, comparing map[int]struct{} with []int. The break-even point is ten items.

These results are similar to the C++ results. They mean we won’t be doing clever hacks in the name of performance. I call that good news. Use the obvious data structure and it will also be the right choice for performance.

Read the rest of this entry »

June 19, 2015

Software engineering practices

Posted in Software at 03:08 by graham

A selection of software engineering practices, from notes I took at an XTC meetup many years ago. They have survived the test of time very well. Practices change much slower than tools, and are typically a better investment.

  • Keep your methods short.
  • System should always be running – this means you need either live upgrades or a cluster.
  • Always have a goal in mind – visible task. For example keep you current task on an index card stuck to the side of your monitor.
  • Code should express intent – self documenting code.
  • Verbose test names – Think of test methods names as a sentence.
  • The code is the design.
  • Do not duplicate code, effort, anything.
  • Twice is a smell, three times is a pattern.
  • No broken windows. Fight entropy.
  • Check everything into source control. Tools, libraries, documentation, everything.
  • Separate things that change.
  • Write pseudo code first and keep it as comments if it is still useful. OR, write pseudo code as methods and then flesh out.
  • Code for the common case – focus on building functionality that is the most useful first. 80% value for 20% effort.
  • Text is king.
  • Have courage – don’t talk instead of doing – resolve design debates by implementing the alternatives.
  • Take a break.
  • Proceed in small steps / small iterations.
  • Understand the domain. Better domain understanding lowers communication cost and means you can use vaguer specifications.
  • Make it work, then make it right, then make it fast.
  • Strive for a coherent abstraction – does each ‘unit’ fit together.
  • Open black boxes – look into bits of legacy systems you don’t know.
  • Automate.
  • Avoid ‘Manager’ objects – instead make them more specific: Retriever, Calculator, Finder, etc.
  • Think in terms of interfaces, of behaviour, not of data.
  • No magic: No wizards.

May 27, 2015

Go: The price of interface{}

Posted in Software at 00:43 by graham

Go’s empty interface{} is the interface that everything implements. It allows functions that can be passed any type. The function func f(any interface{}) can be called with a string f("a string"), an integer f(42), a custom type, or anything else.

This flexibility comes at a cost. When you assign a value to a type interface{}, Go will call runtime.convT2E to create the interface structure (read more about Go interface internals). That requires a memory allocation. More memory allocations means more garbage on the heap, which means longer garbage collection pauses.

Read the rest of this entry »

March 2, 2015

Quotes from veteran software engineers

Posted in Software at 07:53 by graham

The software industry is the most fashion-conscious industry I know of.
– Ivar Jacobson
If you take some of the programs that exist today, they are more complex that just about any artefact that humankind has build before.
– Bertand Meyer
The best path to high-quality software is talented experts who share a pretty clear sense of what they want to produce. I have no idea how to produce good software without talented programmers.
– Peter Weinberger

Masterminds of Programming is a fascinating book, where the authors get many of our industry’s most highly regarded veterans to speak out about basically anything software related. The hook is that they’re being interviewed about the programming language they created.

Here are some of the best quotes and most interesting historical anecdotes.

Read the rest of this entry »

January 31, 2015

Raw sockets in Go: Link layer

Posted in Software at 21:55 by graham

Continuing our dive into the Internet Protocol Suite from Go (See part 1 Raw sockets in Go: IP layer), we are going to the link layer, so we can see the IP headers. This will also allow us to craft our own IP headers, or handle address families outside IP. We’ll send ping packets (ICMP echo request) and watch the kernel’s response.

Receive

This isn’t wrapped in Go, so we need a syscall. Otherwise it’s very similar to the IP layer in part 1, and pretty similar to the C equivalent.

On the first line of main we request the AF_INET family, meaning IPv4. We could ask for a different address family (AF_* constants) – here’s a list of address families. Most of the protocols in that list are rare (AF_IPX, AF_APPLETALK, etc). We’re in a IP world today.

Other useful address families:

  • AF_INET6 for IPv6.
  • AF_UNIX for unix domain sockets. It is used in net.DialUnix and net.ListenUnix. The POSIX name for AF_UNIX is AF_LOCAL, but Go largely sticks to AF_UNIX. They are equivalent.
  • An odd / interesting one is AF_NETLINK, which is for talking to the kernel. Read about it man 7 netlink or at Linux Journal. Docker has a netlink package.

The second parameter, SOCK_RAW is what makes this a raw socket, where we receive IP packets. SOCK_STREAM would give us TCP, SOCK_DGRAM would give UDP.

The third parameter filters packets so we only receive ICMP. You need a protocol here. As man 7 raw says “Receiving of all IP protocols via IPPROTO_RAW is not possible using raw sockets”. We’ll do that in the next post in this series, at the physical / device driver layer.

Build and run it as root (only root or CAP_NET_RAW can open raw sockets). In a different window ping localhost. You should see something like this:

45 00 00 3C EA FF 40 00 40 06 51 BA 7F 00 00 01 7F 00 00 01 …

This is the IP Header. First byte 45 is 4 for the IP version (IPv4), and 5 for length of this header (5 32-bit words), and so on. This is just like the receive example in the previous post except that we also see the IP header.

Try replacing IPPROTO_ICMP in the Socket call with IPPROTO_TCP, and wget localhost. The first 20 bytes will be similar (the IP header), then you should see a TCP packet, and finally HTTP.

Read the rest of this entry »

January 25, 2015

Continuous Delivery: my notes

Posted in Software at 05:44 by graham

Continuous Delivery, by Jez Humble and David Farley is about three big ideas to get your code into production more reliably:

  • Make a deployment pipeline: commit -> unit test -> acceptance test -> … -> deploy -> release
  • Automate everything.
  • DevOps. Project team should be mix of development, operations and quality assurance / test. Involve operations (sysadmins) from the start.

Stages of the deployment pipeline:

Stage 1: Commit Tests

Trigger off a version control push. Usually happens in Continuous Integration server.

  1. Static analysis (lint, code metrics like cyclomatic complexity & coupling)

  2. Compile

  3. Unit test (output code coverage):

    • Check that a single part of the app does what the programmer intended.
    • Should be very fast.
    • Do not touch the database, filesystem, frameworks, system time or external systems. Mock or stub these, or use in-memory db.
    • Avoid the UI.
    • Try to avoid testing async code. Should never need to sleep in unit tests.
    • Include one or two end-to-end tests to prove app basically runs
  4. Package a release candidate. Bake in version number.

    Use OS’s packaging tools (deb, rpm). Operations team will be familiar with it, all the tools support it.

  5. Push release candidate to artifact store (a file system, or full fledged artifact repository)

    Read the rest of this entry »

November 10, 2014

Release It: Write software for production

Posted in Software at 06:12 by graham

We need to design software to run in production. That’s the main lesson of Michael T. Nygard’s Release It. We often think of shipping the system as the end of the project, when in practice it is just the start.

Release It is an enjoyable book with some excellent production war stories. It suffers from being a little too broad in concepts, and a little too narrow in examples (all enterprise J2EE webapps). Despite this I’d recommend spending some time with it because it advocates a very important and easily overlooked idea: don’t code to pass the QA tests, code to avoid the 3am support call.

What follows are my notes from the book, which are a mixture of what the book says and what I think, grouped in categories that make sense to me.

Read the rest of this entry »

August 23, 2014

Learning assembler on Linux

Posted in Software at 04:49 by graham

For entertainment, I’m learning assembler on Linux. Jotting down some things I learn here.

There are two syntaxes, AT&T and Intel (Go uses it’s own, because Plan 9). They look very different, but once you get over that the differences are minimal. Linux tradition is mostly AT&T syntax, MS Windows mostly Intel.

There’s no standardisation, so each assembler can do things it’s own way. as, the GNU Assembler is the most common one on Linux (and what gcc emits by default), but nasm, the Net wide Assembler is very popular too. Code written for as will not assemble in nasm.

Read the rest of this entry »

June 28, 2014

Dump Go Abstract Syntax Tree

Posted in Software at 20:09 by graham

Go has good support for examining and modifying Go source code. This is a huge help in writing refactoring and code analysis tools. The first step is usually to parse a source file into it’s Abstract Syntax Tree representation. Here’s a complete program to display the AST for a given Go file:

package main

import (
    "go/ast"
    "go/parser"
    "go/token"
    "os"
)

func main() {
    fset := new(token.FileSet)
    f, _ := parser.ParseFile(fset, os.Args[1], nil, 0)
    ast.Print(fset, f)
}

Use:

  • Save that as goast.go
  • Build it: go build goast.go
  • Run it: ./goast <myfile.go>

May 24, 2014

Sync, a Unix way

Posted in Software at 05:49 by graham

Ever since Dropbox, I’ve been searching for a self-hosted, secure (and now Condi-free) way of keeping my machines synchronised and backed up. There are lots. I tried many, wrote a couple myself, but none were exactly what I wanted.

My problem was thinking Windows, looking for a single program. Once I started thinking Unix, looking for modular components, the answers were obvious.

Storage

First we need a remote master storage to sync against, somewhere to backup our files. And we want that exposed as a local filesystem. I use the most obvious answer, sshfs:

sudo apt-get install sshfs
mkdir -p /home/graham/.backup/crypt  # Why 'crypt'? Read on.

sshfs server.example.com:backup /home/graham/.backup/crypt

You can use any storage that can appear as a filesytem, such as FTP (via curlftpfs), NTFS, and many others.

Encryption

There’s two kinds of data: public data, and encrypted data. We want the second kind. Just layer encfs:

Read the rest of this entry »

« Previous entries Next Page » Next Page »