Comments on Erik McClure: Mathematical Notation Is Awful

The second half of your post is something I'd ...

2016-12-05T06:36:52.685-08:00

The second half of your post is something I'd like to discuss a little more. To start I'll just say that (regarding derivatives for example), the "df/dx versus f'(x)" thing essentially goes back to Leibniz vs Newton, while the other notations arose for various reasons: the "dotted-variable" notation is ONLY used to denote differentiation with respect to time (because this operation is just *so* common, e.g. in classical mechanics but really physics in general), while the other notations (Jacobian matrices, gradients, etc.) were likely invented to emphasize that multivariable functions were at play. To understand this, it usually suffices to study the relevant areas for a while. For example, if you learn some introductory Lagrangian mechanics you'll see just how useful it is to be able to write dots over certain things...

If you look in old physics/engineering books you will see all sorts of weird stuff: people putting arrows over vectors, or typesetting them in bold face, etc. It also appears in lower-division math classes, likely because it's expected to lessen the cognitive load on novice students, specifically, by removing or at least diminishing their need to remember the *types* of the objects they're manipulating. However, the overwhelming majority of modern math books, at least pure math books, don't do this. They will write "x" without even batting an eye, whether x is a number, or a vector in R^n, a point on a manifold, or an element of some crazy Fréchet space or something. The expectation is that, for example, if I say "Suppose x is in R^n", then technically you *have* been told that it's a vector, by virtue of it being an element of R^n, and you should not need to be reminded. This ties into the later part of your post: it is very true that mathematics has sprawled into an enormous tree. It's quite common, unfortunately, for two branches (even nearby branches!) to be mutually unintelligible. But I've also grown to accept the present state of affairs since I genuinely believe that striving for absolute uniqueness in notation, as you seem to suggest, not only has diminishing returns, but is just straight up *impossible*. What are your thoughts?

Moving on, you say that "E[X]" (which, y...

2016-12-05T06:16:03.949-08:00

Moving on, you say that "E[X]" (which, yes, is the same thing as "E(X)" -- mathematics is done by humans after all, and for better or worse there will always be such minor variations among people's notational preferences) is bad notation. Actually, it's fine though: I mean, at the end of the day it *is* just a notation and nothing more, but you can roughly think about E[-] as being some kind of function that consumes *random variables*, and produces a real number as output (called the "expectation", "expected value", or, indeed, also the "mean" of the r.v.). Thus, even though X does not "equal x_i", as you say, those numbers p(x_i) *are* part of the data packaged into X. [How? Well, the domain Ω of X is a probability space, which means it comes equipped with a gadget called a "probability measure", P, that allows us to decide how "large" subsets of Ω are. Thus, p(x_i) just means P({w in Ω such that X(w) = x_i}), or in words, it's the probability that an outcome occurs to which X assigns the numerical value x_i. Remember what I said above about X being a function from Ω to the reals.]

OK, next you say E[X^2] isn't actually equal to (E[X])^2. This is true, and it's good, because otherwise the variance would always give zero! To understand what the notation "E[X^2]" means, note that X^2 is a *new* random variable. Rigorously (going back to our definition of random variables as functions), it is the *composition* of the function X : Ω->R with the squaring map g : R->R, that is, X^2 = g circle X, as a function. Intuitively, you can think of X^2 like this: to get the probability that X^2=x for some x, you basically just find the probabilities that X=+sqrt(x) and X=-sqrt(x), and add 'em together. Pretty natural, right? The same reasoning applies to XY: it is a *new* discrete random variable, and to find out the probability that, say, XY=10, you have to say "hmm, well if X and Y both take on positive integer values, how could we possibly have XY=10? well, we could have X=1 and Y=10, or X=2 and Y=5, or X=5 and Y=2, or X=10 and Y=1"... and then add up all those probabilities! (Note that saying things like "X=1" is teeechnically abusive, because X is NOT a number -- it's a random variable, i.e. one of these "measurable functions", and functions can never equal numbers; it's a type mismatch -- but people do this all the time because it's more intuitive to think of these as "realizations" of a random variable than the rigorous way, which would be writing the awful eyesore P({w in Ω : X(w)Y(w) = 10}) in place of Pr(XY=10).)

Again, hope I didn't just confuse you with all this. Please let me know if anything is unclear, I'd be glad to help more.

Hi there! I just read your post and wanted to help...

2016-12-05T06:15:48.076-08:00

Hi there! I just read your post and wanted to help; I'm a PhD student in pure math so I can really relate to your struggles. Before I start, I'd like to just say that all the equations you pulled from Wikipedia are in fact correct, all this stuff does make sense, and I hope to convince you below that the notation, once understood, is actually very efficient and precise. I would guess that your confusions mostly stem from poor instruction (e.g. professors placing too much emphasis on computations rather than defining things carefully). I'll just start from the top and work my way down, and then you can ask me if anything I say is unclear.

Alright, so at the top you're discussing X, which you refer to as a "distribution", but really, it is an object called a (discrete) random variable. Many courses will gloss over what exactly is meant by a random variable, and say something vague, like "it's just some random numerical quantity". However, as the existence of your post shows, this can easily lead one into a swamp of confusion. So, here's the straight dope: a random variable is really a (measurable) *function*, X, from a *probability space* (what's this? see below), call it Ω, to the set of real numbers. Ignoring the "measurable" part (which is just a technical assumption), this means that X is a thingy which eats sample points (in other words, elements of the probability space Ω aka sample space), and spits out real numbers. For example, consider rolling two dice. The sample space here would be the set {(1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(2,1),(2,2),...,(6,5),(6,6)}, which of course has 6*6=36 elements. An example of a discrete random variable would be the function X that sends a sample point to the sum of the two values, so for example X((6,5)) would just be 6+5 = 11. In a book you would see them write "Define a random variable X to be the sum of the two numbers rolled on the dice", or something. It's important to note that this is sort of misleading, because they're really defining X to be the sum NOT of two numbers, but of two random variables! (Which ones are they?) So there, now you know exactly what the X you're talking about above actually *is*.

Above I should have also told you what "probability space" means. Well, it's basically a set, equipped with a "probability measure": a way of assigning probabilities (numbers between 0 and 1) to certain subsets which people call "events". Why only *certain* subsets you ask? Uhhh... this is a technicality and not really relevant to understanding things for now. It's because by invoking the Axiom of Choice and constructing some awful monsters, you can show that in a lot of natural scenarios you can't find a consistent notion of measure that works for *all* subsets. You can read more about this in any introductory treatment of the Lebesgue measure.

My statistics class did not give that definition, ...

2016-11-25T19:33:04.180-08:00

My statistics class did not give that definition, it was provided as a function in all cases, consistently, despite being a discrete value. Perhaps I had a terrible statistics teacher?

Looks like your course notes are bad. That expres...

2016-11-25T15:57:24.461-08:00

Looks like your course notes are bad. That expression for expected value doesn't make sense, a quick google search shows a wikipedia article with the correct definition: https://en.wikipedia.org/wiki/Expected_value#Univariate_discrete_random_variable.2C_countable_case

Absolutely with you on this; Lately I'm findin...

2016-11-25T11:20:04.727-08:00

Absolutely with you on this; Lately I'm finding myself thinking how cool it will be if scientific papers (specially discrete maths) will look more like jupyter notebooks. This is not necessarily the best example but I was quite excited to with how the ligo observations on gravitational waves were explained in this format https://losc.ligo.org/s/events/GW150914/GW150914_tutorial.html

Funny story on my end is I learned more calculus f...

2016-08-29T05:32:38.230-07:00

Funny story on my end is I learned more calculus from programming games than I did in class. I literally averaged Ds, Fs in my first semester before the teacher stopped requiring "proper" work and then it was As from there because my answer was never wrong, just my notation/proofs.

Well, the good thing is that people define their n...

2016-07-31T19:19:11.469-07:00

Well, the good thing is that people define their notation in papers, due to overlap. The different styles in some sense prepare you to, eh, different styles. Finance on the other hand, is witchcraft - all letters have been defined to mean exactly the same thing all the time, and you're just supposed to know what they mean. Then they invented a bunch of more letters, that sound greek, just for fun, but then, since it is hard to typeset new letters, they use existing greek letters... yeah, f*ck finance..

This is a recent very interesting article about th...

2016-07-31T07:46:07.331-07:00

This is a recent very interesting article about this: https://aeon.co/videos/maths-notation-is-needlessly-complex-it-can-and-should-be-better

Mathemeticians should give up on algebraic notatio...

2016-07-31T06:07:22.552-07:00

Mathemeticians should give up on algebraic notation as a mess; instead go with prefix s-expressions the way our new AI overlords will eventually (sensibly) insist upon. :-) I do wonder if we are making a mistake as fundamental as the Roman-conditioned Europe ignoring Arabic numerals.

I end up understanding math by re-writing it in co...

2016-07-31T05:27:26.328-07:00

I end up understanding math by re-writing it in code. it seems to be the only way that I can finally understand how its supposed to work. The only issue is that of course it fails with numbers like infinity, however I still do my best to plod along with some kind of actual implementable logic.