August 18, 2016

Blaming Inslee for Washington's Early Prisoner Release Is Unconscionable

For a guy who keeps trying to convince me he's better than Jay Inslee, Bill Bryant seems determined to ensure I never vote for him in any election, ever.

As The Seattle Times so eloquently explains, the software bug leading to the early release of prisoners dates back to 2002, and wasn't discovered by the DOC until 2012, a year before Inslee took office! Even then, Inslee didn't learn of the problem until December 16th, after which he promptly held a news conference less than a week later. He fired multiple state workers and immediately instituted a rigorous double-checking procedure at the DOC to ensure prisoners were being released on time. Given Jay Inslee's position, I simply cannot imagine a better way to handle this, short of Jay Inslee miraculously developing mind reading powers that told him to audit the DOC for no apparent reason.

The fact that Bryant is blaming Inslee simply because Inslee was the one who discovered the problem is absolutely disgusting. I hear people learning about officials covering up mistakes all the time, and people always ask why they didn't just come forward about the problem and try to fix it. Well, that's exactly what Inslee did, and now Bryant is dragging him through the mud for it. I guess Bill Bryant is determined to make sure no good deed goes unpunished.

What kind of depraved human being would crucify Inslee for something that isn't his fault? An even better question is why they think I would vote for them. There are plenty of other real issues facing Jay Inslee that would be entirely valid criticisms of his administration, but I honestly don't care about them anymore. The only thing I know is that I don't want Bill Bryant elected to any position in any public office, ever, until he publicly apologizes for his morally reprehensible behavior towards Jay Inslee. Washington state does not need a leader that attacks people for doing the right thing just so they can get a few votes. No state in the entire country deserves a leader like that.

It's clear to me that Bill Bryant, and anyone who supports his shameless mudslinging, is the reason we can't have nice things.

July 30, 2016

Mathematical Notation Is Awful

Today, a friend asked me for help figuring out how to calculate the standard deviation over a discrete probability distribution. I pulled up my notes from college and was able to correctly calculate the standard deviation they had been unable to derive after hours upon hours of searching the internet and trying to piece together poor explanations from questionable sources. The crux of the problem was, as I had suspected, the astonishingly bad notation involved with this particular calculation. You see, the expected value of a given distribution $$X$$ is expressed as $$E[X]$$, which is calculated using the following formula:
\[ E[X] = \sum_{i=1}^{\infty} x_i p(x_i) \]
The standard deviation is the square root of the variance, and the variance is given in terms of the expected value.
\[ Var(X) = E[X^2] - (E[X])^2 \]
Except that $$E[X^2]$$ is of course completely different from $$(E[X])^2$$, but it gets worse, because $$E[X^2]$$ makes no notational sense whatsoever. In any other function, in math, doing $$f(x^2)$$ means going through and substitution $$x$$ with $$x^2$$. In this case, however, $$E[X]$$ actually doesn't have anything to do with the resulting equation, because $$X \neq x_i$$, and as a result, the equation for $$E[X^2]$$ is this:
\[ E[X^2] = \sum_i x_i^2 p(x_i) \]
Only the first $$x_i$$ is squared. $$p(x_i)$$ isn't, because it doesn't make any sense in the first place. It should really be just $$P_{Xi}$$ or something, because it's a discrete value, not a function! It would also explain why the $$x_i$$ inside $$p()$$ isn't squared - because it doesn't even exist, it's just a gross abuse of notation. This situation is so bloody confusing I even explicitely laid out the equation for $$E[X^2]$$ in my own notes, presumably to prevent me from trying to figure out what the hell was going on in the middle of my final.

That, however, was only the beginning. Another question required them to find the covariance between two seperate discrete distributions, $$X$$ and $$Y$$. I have never actually done covariance, so my notes were of no help here, and I was forced to return to wikipedia, which gives this helpful equation.
\[ cov(X,Y) = E[XY] - E[X]E[Y] \]
Oh shit. I've already established that $$E[X^2]$$ is impossible to determine because the notation doesn't rely on any obvious rules, which means that $$E[XY]$$ could evaluate to god knows what. Luckily, wikipedia has an alternative calculation method:
\[ cov(X,Y) = \frac{1}{n}\sum_{i=1}^{n} (x_i - E(X))(y_i - E(Y)) \]
This almost works, except for two problems. One, $$\frac{1}{n}$$ doesn't actually work because we have a nonuniform discrete probability distribution, so we have to substitute multiplying in the probability mass function $$p(x_i,y_i)$$ instead. Two, wikipedia refers to $$E(X)$$ and $$E(Y)$$ as the means, not the expected value. This gets even more confusing because, at the beginning of the Wikipedia article, it used brackets ($$E[X]$$), and now it's using parenthesis ($$E(X)$$). Is that the same value? Is it something completely different? Calling it the mean would be confusing because the average of a given data set isn't necessarily the same as finding what the average expected value of a probability distribution is, which is why we call it the expected value. But naturally, I quickly discovered that yes, the mean and the average and the expected value are all exactly the same thing! Also, I still don't know why Wikipedia suddenly switched to $$E(X)$$ instead of $$E[X]$$ because it stills means the exact same goddamn thing.

We're up to, what, five different ways of saying the same thing? At least, I'm assuming it's the same thing, but there could be some incredibly subtle distinction between the two that nobody ever explains anywhere except in some academic paper locked up behind a paywall that was published 30 years ago, because apparently mathematicians are okay with this.

Even then, this is just one instance where the ambiguity and redundancy in our mathematical notation has caused enormous confusion. I find it particularly telling that the most difficult part about figuring out any mathematical equation for me is usually to simply figure out what all the goddamn notation even means, because usually most of it isn't explained at all. Do you know how many ways we have of taking the derivative of something?

$$f'(x)$$ is the same as $$\frac{dy}{dx}$$ or $$\frac{df}{dx}$$ even $$\frac{d}{dx}f(x)$$ which is the same as $$\dot x$$ which is the same as $$Df$$ which is technically the same as $$D_xf(x)$$ and also $$D_xy$$ which is also the same as $$f_x(x)$$ provided x is the only variable, because taking the partial derivative of a function with only one variable is the exact same as taking the derivative in the first place, and I've actually seen math papers abuse this fact instead of use some other sane notation for the derivative. And that's just for the derivative!

Don't even get me started on multiplication, where we use $$2 \times 2$$ in elementary school, $$*$$ on computers, but use $$\cdot$$ or simply stick two things next to each other in traditional mathematics. Not only is using $$\times$$ confusing as a multiplicative operator when you have $$x$$ floating around, but it's a real operator! It means cross product in vector analysis. Of course, the $$\cdot$$ also doubles as meaning the Dot Product, which is at least nominally acceptable since a dot product does reduce to a simple multiplication of scalar values. The Outer Product is generally given as $$\otimes$$, unless you're in Geometric Algebra, in which case it's given by $$\wedge$$, which of course means AND in binary logic. Geometric Algebra then re-uses the cross product symbol $$\times$$ to instead mean commutator product, and also defines the regressive product as the dual of the outer product, which uses $$\nabla$$. This conflicts with the gradient operator in multivariable calculus, which uses the exact same symbol in a totally different context, and just for fun it also defined $$*$$ as the "scalar" product, just to make sure every possible operator has been violently hijacked to mean something completely unexpected.

This is just one area of mathematics - it is common for many different subfields of math to redefine operators into their own meaning and god forbid any of these fields actually come into contact with each other because then no one knows what the hell is going on. Math is a language that is about as consistent as English, and that's on a good day.

I am sick and tired of people complaining that nobody likes math when they refuse to admit that mathematical notation sucks, and is a major roadblock for many students. It is useful only for advanced mathematics that take place in university graduate programs and research laboratories. It's hard enough to teach people calculus, let alone expose them to something useful like statistical analysis or matrix algebra that is relevant in our modern world when the notation looks like Greek and makes about as much sense as the English pronunciation rules. We simply cannot introduce people to advanced math by writing a bunch of incoherent equations on a whiteboard. We need to find a way to separate the underlying mathematical concepts from the arcane scribbles we force students to deal with.

Personally, I understand most of higher math by reformulating it in terms of lambda calculus and type theory, because they map to real world programs I can write and investigate and explore. Interpreting mathematical concepts in terms of computer programs is just one way to make math more tangible. There must be other ways we can explain math without having to explain the extraordinarily dense, outdated notation that we use.

April 28, 2016

The GPL Is Usually Overkill

Something that really bothers me about the GPL and free software crusaders in general is that they don't seem to understand the nuances behind the problem they are attempting to solve. I'm not entirely sure they are even trying to solve the right problem in the first place.

The core issue at play here is control. In a world run by software, we need control over what code is being executed on our hardware. This issue is of paramount importance as we move into an age of autonomous vehicles and wearable devices. Cory Doctorow's brilliant essay, Lockdown: The coming war on general-purpose computing, gives a detailed explanation of precisely why it is of such crucial importance that you have control over what software gets executed on your machine, and not some corporation. Are you really going to buy a car and then get into it when you have no way of controlling what it does? This isn't some desktop computer anymore, it's a machine that can decide whether you live or die.

It is completely true that, if everything was open source, we would then have total control over everything that is being run on our machines. However, proponents of open source software don't seem to realize that this is the nuclear option. Yes, it solves your problem, but it does so via massive overkill. There are much less extreme ways to achieve exactly what you want that don't involve requiring literally everything to be open source.

Our goal is to ensure that only trusted code is run on our machine. In order to achieve this goal, the firmware and operating system must be open source. This is because both the operating system and the firmware act as gatekeepers - if we have something like Intel's Trusted Execution Technology, we absolutely must have access to the firmware's source code, because it is the first link in our trust chain. If we can make the TXT engine work for us, we can use it to ensure that only operating systems we approve of can be run on our hardware.

We now reach the second stage of our trust chain. By using a boot-time validation of a cryptographic signature, we can verify that we are running an operating system of our choice. Because the operating system is what implements the necessary program isolation and protection mechanisms, it too is a required part of our trust chain and therefore must be open source. We also want the operating system to implement some form of permission restriction - ideally it would be like Android, except that anything running on the OS must explicitly tell you what resources it needs to access, not just apps you download from the store.

And... that's it. Most free software zealots that are already familiar with this chain of events would go on to say that you should only install open source software and whatever, but in reality this is unnecessary. You certainly want open source drivers, but once you have control over the firmware and the operating system, no program can be run without your permission. Any closed-source application can only do what I have given it permission to do. It could phone home without my knowledge or abuse the permissions I have granted it in other ways, but the hardware is still under my control. I can simply uninstall the application if I decide I don't like it. Applications cannot control my hardware unless I give them explicit permission to do so.

Is this enough? The FSF would argue that, no, it's not enough until your entire computer is running only open source code that can be verified and compiled by yourself, but I disagree. At some point, you have to trust someone, somewhere, as demonstrated by the Ken Thompson Hack. I'm fine with trusting a corporation, but I need have total control over who I am trusting. If a corporation controls my firmware or my operating system, then they control who I trust. I can ask them politely to please trust or not trust some program, but I can't actually be sure of anything, because I do not control the root of the trust chain. Open source software is important for firmware and operating systems, because it's the only way to establish a chain of trust that you control. However, once I have an existing chain of trust, I don't need open-source programs anymore. I now control what runs on my computer. I can tell my computer to never run anything written by Oracle, and be assured that it will actually never trust anything written by Oracle.

If I sell you a piece of software, you have a right to decompile it, modify it, hack it, or uninstall it. This does not, however, require me to explain how it works to you, nor how to assemble it. If you trust me, you can run this piece of software on your hardware, but if you don't trust me, then a verifiable chain of trust will ensure that my software never runs on your machine. This is what matters. This is what having control means. It means being able to say what does or doesn't run on your computer. It does not require that everything is open source. It only requires an open source, verifiable chain of trust, and faith in whatever company made the software you've chosen to run. If a company's software is proprietary and they do something you don't like, use something else.

Open source code is important for firmware and operating systems, not everything.

March 13, 2016

The Right To Ignore: The Difference Between Free Speech And Harassment

On one hand, America was built on the values of free speech, which are obviously important. We cannot control what people are saying or our democracy falls apart. On the other hand, allowing harassment has a stifling effect on free speech, because it allows people to be bullied into silence. Before the internet, this distinction was fairly simple: Someone with a megaphone screaming hate speech in a park is exercising their right to free speech. Someone with a megaphone following a guy down the street is harassing them.

The arrival of the internet has made this line much more vague. However, the line between free speech and harassment is not nearly as blurry as some people might think. Our concept of reasonable freedom of speech is guided primarily by our ability to ignore it. The idea is, someone can go to a public park and say whatever they want, because other people can simply go somewhere else. As long as people have the ability to ignore whatever you're saying, you can say pretty much whatever you want. We have some additional controls on this for safety reasons, so you can't go to a park and talk about how you're going to kill all the gay people, tomorrow, with a shotgun, because that is a death threat.

Unfortunately, in the midst of defending free speech, a disturbing number of people have gotten it into their heads that other people aren't allowed to ignore them. This is harassment. The moment you take away someone's right to ignore what you are saying, you have crossed the line from free speech into harassment. Freedom of speech means that you have the right to say whatever you want, and everyone else has the right to ignore you. I'm not entirely sure why people think that free speech somehow lets them bypass blocking on a social network. Blocking people on the internet is what gives us our ability to ignore them. Blocking someone on the internet is the equivalent of walking away from some guy in the park who was yelling obscenities at you. If you bypass the blocking mechanism, this is basically chasing them down as they run to their car, screaming your opinions at them with a megaphone. Most societies think this is completely unacceptable behavior in real life, and it should be just as unacceptable on the internet.

On the other hand, enforcing political correctness is censorship. Political correctness is obviously something that should be encouraged, but enforcing it when someone is saying things you don't like in a public place is a clear violation of free speech. This is not a blurry line. This is not vague. If a Nazi supporter is saying how much they hate Jews, and they are not targeting this message at any one individual, this is clearly protected free speech. Now, if the Nazi is actually saying we should murder all of the Jews, this turns into hate speech because it is inciting violence against a group of people, and is restricted for the same safety reasons that prevent you from shouting "FIRE!" in a crowded movie theater.

Now, what if the Nazi supporter tells a specific person they are a dirty Jew? This gets into fuzzy territory. On one hand, we can say it isn't harassment so long as the targeted person can block the Nazi and the Nazi never attempts to circumvent this, or we can start classifying certain speech as being harassment if it is ever targeted at a specific individual. When we are debating what counts as harassment, this is the kind of stuff we should be debating. This is what society needs to figure out. Arguing that mass-block lists on twitter are hurting free speech is absurd. If someone wants to disregard everything you're saying because you happen to be walking past a group of Nazi supporters, they have a right to do that. If someone wants to ignore everyone else on the planet, they have a right to do that.

This kind of stuff is a problem when the vast majority of humanity uses a single private platform to espouse their opinions. Namely, Facebook. This is because Facebook has recently been banning people for saying faggot, and because it's a private company, this is completely legal. This is also harmful to free speech. What appears to be happening is that websites, in an attempt to crack down on harassment, have instead accidentally started cracking down on free speech by outright banning people who say hateful things, instead of focusing on people who say hateful things to specific individuals.

Both sides of the debate are to blame for this. Extreme harassment on the internet has caused a backlash, resulting in a politically correct movement that calls for tempering all speech that might be even slightly offensive to anyone. In response, free speech advocates overreact and start attacking them by bypassing blocking mechanisms and harassing their members. This causes SJWs to overcompensate and start clamping down on all hateful speech, even hateful speech that is clearly protected free speech and has nothing to do with harassment. This just pisses off the free speech movement even more, causing them to oppose any restrictions on free speech, even reasonable ones. This just encourages SJWs to censor even more stuff in an endless spiral of aggression and escalation, until on one side, everyone is screaming obscenities and racial slurs in an incoherent cacophony, and on the other side, no one is saying anything at all.

If society is going to make any progress on this at all, we need to hammer out precisely what constitutes harassment and what is protected free speech. We need to make it absolutely clear that you can only harass an individual, and that everyone has the right to a block button. If a Nazi supporter starts screaming at you on twitter, you have the right to block him, and he has the right to block you. We cannot solve harassment with censorship. Instead of banning people we disagree with, we need to focus on building tools that let us ignore people we don't want to listen to. If you object to everyone blocking you because you use insulting racial epithets all the time, maybe you should stop doing that, and perhaps some of them might actually listen to you.

Of course, if you disagree with me, you are welcome to exercise your right to tell me exactly how much you hate my guts and want me to die in the comments section below, and I will exercise my right to completely ignore everything you say.

February 8, 2016

Standard Intermediate Assembly For CPUs

Over in GPU land, a magical thing is happening. All the graphics card vendors and big companies got together and came up with SPIR-V as the technological underpinning of Vulkan, the as-of-yet unreleased new graphics API. SPIR-V is a cross-platform binary format for compiled shaders, which allows developers to use any language that can compile to SPIR-V to write shaders, and to run those compiled shaders on any architecture that supports SPIR-V. This is big news, and if it works as well as everyone's hoping it does, it will set the stage for a major change in how shaders are compiled in graphics engine toolchains.

How is this possible? The SPIR-V specification tells us that it is essentially a cross-platform intermediate assembly language. It's higher level than conventional assembly, but lower than an actual language. The specifics of the language are fine-tuned towards modern graphics hardware, so that the instructions can encode sufficient metadata about what they're doing to enable hardware optimizations, while still allowing those instructions to be efficiently decoded by the hardware and implemented by the chip's microcode.

While SPIR-V is specifically designed for GPUs, the specification bears some resemblance to another intermediate assembly language - LLVM IR. This move by GPU vendors towards an intermediate assembly representation mirrors how modern language design is moving towards a standardized intermediate representation that many different languages compile to, which can then itself be compiled to any CPU architecture required. The LLVM IR is used for C, C++, Haskell, Rust, and many others. This intermediate representation decouples the underlying hardware from the high level languages, allowing any language that compiles down to LLVM IR to compile to any of it's supported CPU architectures - even asm.js.

However, we have a serious problem looming over our heads in CPU land. Did you know that the x86 mov instruction is turing complete? In fact, even the page fault handler is turing complete, so you can run programs on x86 without actually executing any instructions! The x86 architecture is so convoluted and bloated that it no longer has any predictable running time for any given set of instructions. Inserting random useless mov instructions can increase speed by 5% or more, false dependencies destroy performance, and Intel keeps introducing more and more complex instruction sets that don't even work properly. As a result, it's extremely difficult for any compiler to produce properly optimized assembly code, even when it's targeting a specific architecture.

One way to attack this problem is to advocate for RISC - Reduced Instruction Set Computer. The argument is that fewer instructions will be easier to implement, reducing the chance of errors and making it easier for compilers to actually optimize the code in a meaningful way. Unfortunately, RISC has a serious problem: the laws of physics. A modern CPU is so fast that it can process an instruction faster than the electrical signal can get to the other side of the chip. Consequently, it spends the majority of it's time just waiting for memory. Both pipelining and branch prediction were created to deal with the memory latency problem, and it turns out that having complex instructions gives you a distinct advantage. The more complex your instruction is, the more the CPU has to do before it needs to fetch things from memory. This was the core observation of the Itanium instruction set, which relies on the compiler to determine which instructions can be executed in parallel in an attempt to remove the need for pipelining. Unfortunately, it turns out that removing dependency calculations is not enough - this is why many of Intel's new instructions are about encapsulating complex behaviors into single instructions instead of simply adding more parallel operators.

Of course, creating hardware that supports increasingly complex operations is very unsustainable, which is why modern CPUs don't execute assembly instructions directly. Instead, they use Microcode, which is the raw machine code that actually implements the "low-level" x86 assembly. Of course, at this point, x86 is so far removed from the underlying hardware it might as well be a (very crude) high level language all by itself. For example, the mov instruction usually doesn't actually move anything, it just renames the internal register being used. Because of this, the modern language stack looks something like this:

Modern Language Stack

Even thought we're talking about four CPU architectures, what we really have is four competing intermediate layers. x86, x86-64, ARM and Itanium are all just crude abstractions above the CPU itself, which has it's own architecture dependent microcode that actually figures out how to run things. Since our CPUs will inevitably have complex microcode no matter what we do, why not implement something else with it? What if the CPUs just executed LLVM IR directly? Then we would have this:

LLVM IR Microcode Stack

Instead of implementing x86-64 with microcode, implement the LLVM intermediate assembly code with microcode. This would make writing platform-independent code trivial, and would allow for way more flexibility for hardware designers to experiment with their CPU architecture. The high-level nature of the instructions would allow the CPU to load large chunks of data into registers for complex operations and perform more efficient optimizations with the additional contextual information.

Realistically, this will probably never happen. For one, directly executing LLVM IR is probably a bad idea, because it was never developed with this in mind. Instead, Intel, AMD and ARM would have to cooperate to create something like SPIR-V that could be efficiently decoded and implemented by the hardware. Getting these competitors to actually cooperate with each other is the biggest obstacle to implementing something like this, and I don't see it happening anytime soon. Even then, a new standard architecture wouldn't replace LLVM IR, so you'd still have to compile to it.

In addition, an entire new CPU architecture is extraordinarily unlikely to be widely adopted. One of the primary reasons x86-64 won out over Itanium was because it was capable of running x86 code at native speed, and Itanium's x86 emulation was notoriously bad. Even if we somehow moved to an industry-wide standard assembly language, the vast majority of the world's programs are still built for x86, so an efficient translation between x86 and our new intermediate representation would be paramount. That's without even considering that you'd have to recompile your OS to take advantage of the new assembly language, and modern OSes still have some platform-specific hand-written assembly in them.

Sadly, as much as I like this concept, it will probably remain nothing more than a thought experiment. Perhaps as we move past the age of silicon and look towards new materials, we might get some new CPU architectures out of it. Maybe if we keep things like this in mind, next time we can do a better job than x86.