Monthly Curations: November 2019

What the Apollo Guidance Computer looked like

The “trinity of computation”

I never thought of it this way

Logic tells us what propositions exist (what sorts of thoughts we wish to express) and what constitutes a proof (how we can communicate our thoughts to others). Languages (in the sense of programming) tells us what types exist (what computational phenomena we wish to express) and what constitutes a program (how we can give rise to that phenomenon). Categories tell us what structures exist (what mathematical models we have to work with) and what constitutes a mapping between them (how they relate to one another). In this sense all three have ontological force; they codify what is, not how to describe what is already given to us.

In this sense they are foundational; if we suppose that they are merely descriptive, we would be left with the question of where these previously given concepts arise, leading us back again to foundations.

Memristance is futile. Not.

I came across this wired article recently, and what I read sounded too science-fiction-y to be true, so then I decided to go to the source, and found this video (see below) by a researcher at HP, and it turns out to be both true and “science-fiction-y”.

We are used to thinking in terms of standard circuit elements — resistors, capacitors, inductors. One establishes a relationship between voltage and current, the second better voltage and charge, and the third between magnetic flux and current.

Now it never occurred to me to really think about it this way (it’s one of those things that’s only obvious in hindsight), but there is a missing piece of symmetry here.

Look at that list again, and it might jump out at you that among current, voltage, charge and magnetic flux, they’re related in pairs to each other, with the exception of charge and magnetic flux. Seeing this now, it might be reasonable to speculate on another circuit element that should do precisely that. And indeed someone did, about forty years ago, and named the missing piece the memristor.

Now I should acknowledge that there is a bit of controversy whether what HP labs claims to have discovered really matches up with this idea, so we’ll just have to wait a few years to test these claims, since the first commercial applications of this technology won’t be out for another five years at least.

But let’s continue. One of the observations made in the video linked I above is that the memristance obeys an inverse square law. This means the tinier the dimensions, the greater the observed effect. Which also means this is something that would belong purely in a chip, and not something you’d be putting on a breadboard any time soon.

The most exciting property, though, is that it’s behavior in the future depends on its past. So it is both a logic component as well as a storage component. So you could build a dense cluster of these things and determine which parts do what function, in a configurable sense, much like an FPGA on steroids.

I used to think (again, only because this is what I was taught) that the fundamental logic component was a NAND gate — but this turns out not to be true. Instead, it turns out that if we consider the interaction between input A and input/output B expressed using memristors, as an IMP gate, then we can construct a NAND gate out of these.

Further, multiple layers of these memristors can be stacked above a conventional CMOS layout, and densely packed together, leading to unprecedented on-chip memory, perhaps on the order of petabits!

So, how would this change things? It would certainly deprecate the SRAM ->DRAM->Hard Drive pyramid of caches we have right now, and we would not only have an ocean of universal memory, but our processing elements would be floating on this ocean, and entirely commingled with it!

We certainly won’t need to deal with the Von Neumann bottleneck any more …

Comparative Latencies

It is usually hard to get an idea of how the time taken for various fundamental operations varies, and it does matter, but it’s hard to viscerally get it (time intervals in nanoseconds, microseconds, milliseconds aren’t really felt in the same way).

I came across this idea of representing the smallest number as a single second and everything else in terms of it, so that the relationship between the numbers is represented in more of a human scale, which results in the following table:

Op Latency Table

I wanted to show this in a chart, but it never shows more than the last two values, so I had to break it down into a series of smaller charts (I could use a log scale to represent them too, but that would’ve again lessened the impact you feel when seeing these numbers side by side)

Mostly similar, as long as it’s on the chip

 

This is a big jump!

Tremendous jump! Main memory latency is like 0.1% of the SSD access time!

… so imagine how much slower disk is, compared to RAM.

And if that was slow, you should check out the internet …

… exceeded only by an operation on the “machine”; this is finally when we can feel seconds ticking by.

Op Latency 6
Obviously, the worst thing you can do is try to restart the “real” system.

 

Lispium ?

What if you did the following:

  • Take a chromebook
  • Modify the chromium build running to run Sbcl within it.
  • Create lisp bindings to the internal surface, so that all UI elements can be created and manipulated within the Lisp image.
  • Allow downloading, compiling and running arbitrary lisp code
  • One of the tabs is always a Repl
  • Add a caching filesystem that would persist part or whole of the image

… might this create a modern-day Lisp machine? Maybe.

Did I miss anything obvious here? If not, this sounds doable in a few years.

I’m lazy, do you if you like this idea (I’m sure there’s a guaranteed niche market for these machines), go ahead and throw it on to Kickstarter. Or something.

What has speed brought ?

Many good things, to be sure, but more has been omitted.

Perhaps Kent Pitman expressed it the best:

My only problem with this is that things are ALREADY sped up. What’s the point of running a zillion times faster than the machines of yesteryear, yet still not be willing to sacrifice a dime of it to anything other than doing the same kinds of boring computations that you did before? I want speedups not just to make my same old boring life faster, but to buy me the flexibility to do something I wasn’t willing to do at slower speeds.

Is JavaScript the new Lisp?

I seem to have been blissfully unaware of just how far JavaScript has come over the last decade or so.

I wanted to make a foray into mobile app programming (ah ok, nothing serious! Just a toy game or two) — and when I looked around, it looked like I had to deeply immerse myself into either the entire iOS ecosystem or the entire Android ecosystem.

Well, in fact, it’s worse — because I first have to make that choice!

So there are platform-independent alternatives — Xamarin (C#? no thanks), Titanium (maybe) and PhoneGap (heard good things!) Eventually though I came across [this nifty open-source framework], that seemed like it would fit my use case of “a toy game project” just fine.

It was super easy to get started (hey! a simulator in the browser! — contrast that with a completely non-working emulator for Android). But very soon I ran into (what seemed like) a huge problem — how the $#% was I supposed to debug anything here?

The running server (context: the app development environment is basically a node.js environment) just printed nonsensical error messages about “native view: undefined” or some such. This was horrible! How did anyone ever use this?

And then I encountered the obvious solution — the tooling for JavaScript is just really, really good, right out of the box. The simulator mentioned earlier is just another object in my web page — so I can drill into it, print out the values of any running objects, and — isn’t this the live-debugging nirvana? — modify them in-place to fix any error and resume.

Put that in your REPL and smoke it

The JavaScript console (sorry, my experience so far has only been with Chrome) is phenomenal. I can evaluate anything I like, getting not just text, but entire arbitrary-nested hierarchical objects dumped out for my inspection.

Yes, there is the whole “dynamic types => errors from typos” problem, and I ran into this pretty early on when a missing comma gave me a lot of grief. But this is somewhat made up by the source-level debugging at the console, where I can see the problem, and fix it right away!

WTF? Everything just works? And they’re tons of libraries too!

And here I was, thinking that the solution to the Lisp GUI problem was to tie in Webkit bindings to an MVC framework, to create a modern version of CLIM — but there’s already a (non-lispy) version of that out there!

(I’m confused)

Propagating, artfully

Just came across [this paper] today, and it was super-interesting. (I’m assuming the title is a reference to [an earlier one] by Steele and Sussman).

You can read the abstract, or [watch Sussman talk about it], so I won’t waste your time summarizing it here. Instead, I’ll skip all the way ahead to the [thesis version] of the paper, past the “Conclusions” and on to the section titled Philosophical Insights. This part was quite … enlightening.

First off, the “concurrency problem” really isn’t one:

… the problem comes from trying to maintain the illusion that the events of a concurrent system occur in a particular chronological order.

But this artificiality is about concurrency in programs, not in the natural world, since

… concurrency is the natural state of affairs, and synchronicity is what’s difficult. The physical world in which we live is perfectly concurrent: every little patch of universe evolves on its own, according to local rules, and physical effects travel from one patch to another at a finite speed …

More importantly,

Our experience of time appears as a linear stream of events only because our memory imposes an order on them, which is the order in which events are remembered to have occurred … the illusion of synchronous time is sustainable for our society, however, because the patch of universe we occupy synchronizes far faster than the temporal resolution of “event” our society expects.

So, then, why is this a problem? Because we work around the behavior of the “analog world” to create the “digital world”.

A great deal of engineering effort is expended to build from this substrate computational devices that provide the illusion of synchronous time.

This would be fine, except that these cores share memory. Memory is what creates this problem.

This large memory implicitly imposes a linear order on the reads or writes that it experiences. Since synchronizing concurrent things well is hard, the implicit synchronization done by such a memory is terrible.

And this is what makes concurrent programming so hard, which is where the propagator network steps in.

… the structure of a propagator network is analogous to space, and the speed at which information propagates through it is analogous to the speed of light … the idea of partial information lets us build a propagation infrastructure that does not force the notion of synchronous time.

Let’s take a step back now and consider the bigger picture.

An electronic computer is a fixed electrical circuit that performs a universal computation, that is, it implements a layer of indirection which allows it to mimic any computation that is described to it … we understand the meaning of machine instructions as sequences of things for the computer to do, with a very powerful notion of the flow of time

Seen in this way, we can derive many concepts in Computer Science as notions introduced to rearrange or describe parts of these sequences, such as imperative programming …

… pure linear sequences, or course, are insufficient for describing the computations we want to describe, so we add means of naming portions of such sequences and referring to them …

… or virtual memory …

… virtual memory introduces a notion of space. The memories of separate programs running on the same computer are kept separate … each program, therefore, has its own flow of time …

… or lexical scope …

… lexical scope introduces a finer-grained notion of space … each lexical scope is therefore a sort of point in space; and their interconnections give space a topology.

This leads us, inexorably, to thinking of the fundamental structure of computations as trees rather than lines.

Tree-shared computing systems can of course be made out of line-shaped ones with some additional mechanism, such as a recursive interpreter of the tree structure of a program.

… and I love this next bit …

Conceptually, the space of such a system is topologically a tree, and time is inherited from the underlying linear interpreter, and therefore constitutes some traversal of the tree that is space. The use of names to refer to reusable partial results of a tree computation turns it into a directed graph computation. The sequential time of the underlying computer constitutes a topological sort of the computation, forcing the graph to be acyclic.

Whew. Ok! So now we know! With this insight, let’s tackle the next bit, the bane of all FP-advocates: Side effects.

(Note: I would like the paper to end right here, at this high point. I could split this article into two halves, but if I stop here I probably won’t get around to the next one)

The essential reason why side effects tend to mess up models of computation is that side effects inescapably introduce time.

So the fundamental tension can be expressed as …

… we want to be able to build programs whose observable actions are what we want them to be; for that we need the observable actions of our programs to be predictable from the programs themselves.

… or equivalently, as …

… between giving our computers the freedom to operate as they will on their internal structures, where we are not concerned with the details of progress but only with its result; and constraining our computers to act the way we wish externally …

Ok, we know this, we’ve seen this before. What exactly is new about this?

The same tension exists in the architecture of human minds. Indeed, people too can both think and act … we have a word for the transition humans make between thought and action: “decision”.

This is something I found quite novel. Thinking doesn’t respect the flow of time at all. But action is constrained entirely by it. So we are faced with an (apparent ?) cartesian-like duality of thought and action, and we have to choose how to deal with it.

So the result is either that the primitives for action amount to always doing everything whenever one thinks about doing it, or that the story about thinking offers no primitives for action at all.

So either thoughts are constrained in a way to make actions predictable, or else “bridges” between these two worlds of thought and action are needed. Again, the propagator paradigm can step in and resolve this.

We can have three worlds, of thoughts, actions and decisions, each being some sort of propagator. To put it more concretely,

… when a network has many fan ins and cycles, and operates on interesting partial information structures, propagators can run in a chaotic order, and be run many times, and let the merging of the partial information ensure that they produce a coherent result at the end; this is thought.
… when a network is acyclic and the “partial” information structure is the all-or-nothing structure, then propagation mimics conventional evaluation, and the order of events is perfectly predictable; this is action
… propagators that will accept partial inputs, that can perhaps be refined with the passage of time, and will take upon themselves the responsibility of picking a moment, making a choice, and committing to an all-or-nothing answer; this is decision

That’s it! Read it once, read it twice, I can guarantee you that you won’t think about the computation the same way again!

Programming is like writing

From a comment on Slashdot:

I swtiched jobs from being a computer programmer to being an ESL teacher in Japan. Japan is somewhat famous for churning out students who know a lot about English, but can’t order a drink at Mac Donald’s. We used to have a name for those kinds of people with regard to programming languages: language laywers. They can answer any question you put to them about a programming language, but couldn’t program to save their life. These people often make it past job interviews easily, but then turn out to be huge disappointments when they actually get down to work. I’ve read a lot about this problem, but the more I look at it, the more I realise that these disabled programmers are just like my students. They have a vocabulary of 5000 words, know every grammar rule in the book but just can’t speak.

My current theory is that programming is quite literally writing. The vast majority of programming is not conceptually difficult (contrary to what a lot of people would have you believe). We only make it difficult because we suck at writing. The vast majority of programmers aren’t fluent, and don’t even have a desire to be fluent. They don’t read other people’s code. They don’t recognise or use idioms. They don’t think in the programming language. Most code sucks because we have the fluency equivalent of 3 year olds trying to write a novel. And so our programs are needlessly complex.

Those programmers with a “spark” are programmers who have an innate talent for the language. Or they are people who have read and read and read code. Or both. We teach programming wrong. We teach it the way Japanese teachers have been teaching English. We teach about programming and expect that students will spontaneously learn to write from this collection of facts.

In language acquisition there is a hypothesis called the “Input Hypothesis”. It states that all language acquisition comes from “comprehensible input”. That is, if you hear or read language that you can understand based on what you already know and from context, you will acquire it. Explanation does not help you acquire language. I believe the same is true of programming. We should be immersing students in good code. We should be burying them in idiom after idiom after idiom, allowing them to acquire the ability to program without explanation.

Alan Kay: The Computer Revolution Hasn’t Happened Yet

Watching this amazing video by Alan Kay, at OOPSLA 1998.

It starts off fairly normal, with some reminiscing about Smalltalk, but then slowly spirals onward and upward, generating one BIG insight after another. Here are just a few extracts — but think of this as a trailer, it’s no substitute for watching the talk itself.

The realization here—and it’s not possible to assign this realization to any particular person because it was in the seeds of Sketchpad, and in the seeds of [the] air training command file system, and in the seeds of Simula. That is, that once you have encapsulated, in such a way that there is an interface between the inside and the outside, it is possible to make an object act like anything.

 

The reason is simply this, that what you have encapsulated is a computer. You have done a powerful thing in computer science, which is to take the powerful thing you’re working on, and not lose it by partitioning up your design space. This is the bug in data and procedure languages. I think this is the most pernicious thing about languages a lot like C++ and Java, is that they think they’re helping the programmer by looking as much like the old thing as possible, but in fact they are hurting the programmer terribly by making it difficult for the programmer to understand what’s really powerful about this new metaphor. People who were doing time-sharing systems had already figured this out as well. Butler Lampson’s thesis in 1965 was about what you want to give a person on a time-sharing system, is something that is now called a virtual machine, which is not the same as what the Java VM is, but something that is as much like the physical computer as possible, but give one separately to everybody.

 

UNIX had that sense about it. The biggest problem with that scheme is that a UNIX process had an overhead of about two thousand bytes just to have a process, and so it was going to be very difficult in UNIX to let a UNIX process just be the number three. You would be going from three bits to a couple of thousand bytes, and you have this problem with scaling. A lot of the problem here is both deciding that the biological metaphor is the one that is going to win out over the next twenty-five years or so, and then committing to it enough to get it so it can be practical at all the levels of scale we actually need.

 

Here’s one we haven’t faced up to much yet, that, now we have to construct this stuff and soon we’ll be required to grow it. It’s very easy, for instance, to grow a baby six inches. They do it about ten times in their life and you never have to take it down for maintenance. But if you try and grow a 747, you are faced with an unbelievable problem, because it’s in this simple-minded mechanical world, in which the only object has been to make the artifact in the first place. Not to fix it. Not to change it. Not to let it live for a hundred years.

 

There is not one line of code in the Internet today that was in the original ARPANET. Of course, if we had IBM main frames in the original ARPANET, that wouldn’t have been true. This is a system that has expanded by a hundred million, has changed every atom and every bit, and has never had to stop. That is the metaphor we absolutely must apply to what we think are smaller things.

 

This notion of meta-programming. Lots of different ways of looking at it. One of them is that, any particular implementation is making pragmatic choices, and these pragmatic choices are likely not to be able to cover all of the cases, at the level of efficiency, and even at the level of richness required. Of course, this is standard OOP lore. This is why we encapsulate. We need to hide our messes. We need to have different ways of dealing with the same concepts in a way that does not distract the programmer. But in fact, it is also applicable, as the LISP people found, and we at Xerox PARC found; you can also apply it to the building of the language itself. The more the language can see its own structures, the more liberated you can be from the tyranny of a single implementation. I think this is one of the most critical things that very few people are worrying about in a practical form.

 

What they missed was, to me, the deepest thing I would like to communicate with you today, and that is we don’t know how to design systems yet. Let’s not make what we don’t know into a religion, for God’s sake. What we need to do is to constantly think and think and think about what’s important. We have to have our systems let us get to the next levels of abstraction as we come to them.

… and my favorite one:

When we think programming is small, that’s why your programs are so big. That’s why they become pyramids instead of gothic cathedrals.