tag:blogger.com,1999:blog-63712798370903079832017-09-05T18:55:55.557-07:00Shawn O'HareMathematics and Data.Shawn O'Harenoreply@blogger.comBlogger26125tag:blogger.com,1999:blog-6371279837090307983.post-20728291355138441712014-07-07T20:56:00.000-07:002014-07-08T08:15:56.879-07:00Law of Iterated ExpectationLet $X$ and $Y$ denote continuous, random, real-valued variables with joint probability density function $f(X, Y)$. The marginal density function of $Y$ is $f_Y(y) := \int_{x \in \mathbb R} f(x, y) dx$. The expectation $E(Y)$ of $Y$ can be recovered by integrating against the marginal density function. In particular, \begin{equation}\label{eq:E(Y)} E(Y)=\int_{y \in \mathbb R} y f_Y(y) dy = \int_{y \in \mathbb R} \int_{x \in \mathbb R} y f(x, y) \ dx \ dy. \end{equation} The conditional probability density function of $Y$ given that $X$ is equal to some value $x$ is defined by \begin{equation}\label{eq:cdf} f_{Y \mid X} (y \mid X = x):= f_{Y \mid X} (y \mid x)= \frac{f(x, y)}{f_X(x)} = \frac{f(x,y)}{\int_{y \in \mathbb R} f(x, y) \ dy}. \end{equation} The conditional expectation $E(Y \mid X = x)$ of $Y$ given that $X$ has value $x$ is given by \begin{equation}\label{eq:cond exp} E(Y \mid X = x) = \int_{y \in \mathbb R} y f_{Y \mid X} (y \mid x) \ dy. \end{equation} But $E(Y \mid X = x)$ depends on $X$, so in turn is itself a random variable denoted $E(Y \mid X)$, whence we can compute its expectation. Now \begin{align*} E (E (Y \mid X)) &= \int_{x \in \mathbb R} E(Y \mid x) f_X(x) \ dx \\ & = \int_{x \in \mathbb R} f_X(x) \left( \int_{y \in \mathbb R} y f_{Y \mid X}(x, y) \ dy \right) \ dx \quad \text{(by \ref{eq:cond exp} )} \\ & = \int_{x \in \mathbb R} \int_{y \in \mathbb R} f_X(x) \cdot y \frac{f(x,y)}{f_X(x)} \ dy \ dx \quad \text{(by \ref{eq:cdf})}\\ & = \int_{y \in \mathbb R} \int_{x \in \mathbb R} y f(x, y) \ dx \ dy \\ & = E(Y). \quad \text{(by \ref{eq:E(Y)})} \end{align*} This result is sometimes called the <b>law of the iterated expectation</b>.Shawn O'Harehttps://plus.google.com/115904086044687543049noreply@blogger.com0tag:blogger.com,1999:blog-6371279837090307983.post-12730222956765413512014-07-01T16:52:00.000-07:002014-09-26T11:26:10.309-07:00Distributions, Densities, and Mass FunctionsOne unfortunate aspect of probability theory is that common measure theoretic constructions are given different, often conflated, names. The technical definitions of the probability distribution, probability density function, and probability mass function for a random variable are all related to each other, as this post hopes to make clear. <br><br>We begin with some general measure theoretic constructions. Let $(A, \mathcal A)$ and $(B, \mathcal B)$ be measurable spaces and suppose that $\mu$ is a measure on $(A, \mathcal A)$. Any $(\mathcal A, \mathcal B)$-measurable function $X \colon A \to B$ induces a <b>push-forward measure $X_*(\mu)$<b></b></b> on $(B, \mathcal B)$ via: \[ [X_*(\mu)](S \in \mathcal B):= \mu(X^{-1} (S)). \] <a href="http://www.shawnohare.com/2014/07/distributions-densities-and-mass.html#more">Read more »</a>Shawn O'Harehttps://plus.google.com/115904086044687543049noreply@blogger.com0tag:blogger.com,1999:blog-6371279837090307983.post-79694943974432568822014-04-24T14:05:00.001-07:002014-04-24T14:05:18.152-07:00Ring Endomorphisms of the RealsToday we present a (mostly) algebraic proof that the only field endomorphism of the real numbers $\mathbb R$ is the identity map. Many arguments of this fact invoke continuity, which we particularly try to avoid. To be clear, we require in this discussion that ring morphisms fix the multiplicative identity. First we recall that any characteristic $0$ field has a prime subfield isomorphic to the rationals $\mathbb Q$, so we will always assume characteristic $0$ fields contain $\mathbb Q$.<br><a href="http://www.shawnohare.com/2014/04/ring-endomorphisms-of-reals.html#more">Read more »</a>Shawn O'Harehttps://plus.google.com/115904086044687543049noreply@blogger.com0tag:blogger.com,1999:blog-6371279837090307983.post-17014448671455439672014-01-12T14:43:00.000-08:002014-09-08T15:06:47.439-07:00Categorical Limits and ColimitsMany types of universal constructions that appear in a wide variety of mathematical contexts can be realized as categorical limits and colimits. To define a limit we first need the notion of a cone of a diagram. A diagram of type $J$ in a category $\mathcal C$ is a simply a functor $F \colon J \to \mathcal C$. We imagine $J$ as an indexing for the objects and morphisms in $\mathcal C$ under consideration. When $J$ is a finite category it can be visualized as a directed graph. <br><br>A cone of $F$ is a pair $(N, \Psi)$ where $N$ is an object of $\mathcal C$ and $\Psi$ is a collection of $\mathcal C$-morphisms $\Psi_X \colon N \to F(X) $ (one for each object $X$ in $J$) such that $\Psi_Y = F(f) \circ \Psi_X$ for every $J$-morphism $f \colon X \to Y$. Given two cones $(N, \Psi)$ and $(M, \Phi)$ we call a $\mathcal C$-morphism $u \colon N \to M$ a cone morphism from $(N, \Psi)$ to $(M, \Phi)$ provided that $u$ respects the cone property, which is to say that for every object $X$ in $J$ the morphism $\Psi_X$ factors through $u$, i.e., $\Psi_X = \Phi_X \circ u$. We will write $u \colon (N, \Psi) \to (M, \Phi)$ to denote that $u$ is a cone morphism. <br><a href="http://www.shawnohare.com/2014/01/categorical-limits-and-colimits.html#more">Read more »</a>Shawn O'Harehttps://plus.google.com/115904086044687543049noreply@blogger.com0tag:blogger.com,1999:blog-6371279837090307983.post-55173458054172216212013-12-10T19:56:00.003-08:002013-12-10T22:23:40.826-08:00TF-IDF for Document ClusteringIn this post we discuss a standard way to encode a text document as a vector using a term frequency-inverse document frequency (tf-idf) score for each word, with an aim to cluster similar documents in a corpus.<br><br><a href="http://www.shawnohare.com/2013/12/tf-idf-for-document-clustering.html#more">Read more »</a>Shawn O'Harehttps://plus.google.com/115904086044687543049noreply@blogger.com0tag:blogger.com,1999:blog-6371279837090307983.post-41713838755413262562013-12-05T22:40:00.000-08:002014-09-26T11:12:22.908-07:00The Continuum Hypothesis and Weird ProbabilitiesI recently heard of an interesting "proof" that $(0,1)$ does not have cardinality $\aleph_1$. This would disprove the Continuum Hypothesis ($\textbf{CH}$), which asserts that any subset of $(0,1)$ is either countable or has the same cardinality as $(0,1)$. More precisely, $\textbf{CH}$ states that if $\omega_0$ denotes the first countable ordinal (i.e., the set of natural numbers) and $\omega_1$ denotes the first uncountable ordinal, then $| \omega_1| = | 2^{\omega_0}|$. Here $2^{\omega_0}$ is the set of all binary-valued functions $f \colon \omega_0 \to \{ 0, 1\}$, which has the same cardinality as the power set of $\omega_0$ and as $(0,1)$.<br><br><a href="http://www.shawnohare.com/2013/12/the-continuum-hypothesis-and-weird.html#more">Read more »</a>Shawn O'Harehttps://plus.google.com/115904086044687543049noreply@blogger.com0tag:blogger.com,1999:blog-6371279837090307983.post-42823880399066627052013-11-12T12:14:00.002-08:002013-11-12T12:14:36.494-08:00Simple Divisibility TestsIn this post we will justify some common divisibility tests that most school children are familiar with. Recall that a number such as $111$ is divisible by $3$ if and only if the sum of the digits is divisible by $3$. Since $1+1+1=3$ is divisible by $3$, we see that $111$ is divisible by three. <br><br>More generally, an integer $N$ is divisible by $3$ if and only if the sum of the digits appearing in $N$ is divisible by $3$. <div><br></div><a href="http://www.shawnohare.com/2013/11/simple-divisibility-tests.html#more">Read more »</a>Shawn O'Harehttps://plus.google.com/115904086044687543049noreply@blogger.com0tag:blogger.com,1999:blog-6371279837090307983.post-24959475161604542222013-11-02T15:11:00.000-07:002013-11-02T15:11:45.649-07:00A Simple Proof that the Harmonic Series DivergesOne usually encounters the harmonic series \[ \sum_{k=1}^{\infty} \frac{1}{k} \] as an example of a series that diverges for non-obvious reasons. With $H_n$ defined to be the partial sum $H_n:=\sum_{k=1}^n \frac{1}{k}$, a typical way to prove divergence is to observe that \begin{align*} H_{2n} &= H_n + \frac{1}{n+1} + \dots + \frac{1}{2n} \\ & \geq H_n + \underbrace{\frac{1}{2n} + \dots + \frac{1}{2n}}_{n \text{ terms}} \\ & = H_n + \frac{1}{2}. \end{align*} In particular, $H_{2^n} \geq 1 + \frac{n}{2}$, so the sequence of partial sums of the harmonic series has an unbounded subsequence, whence the series diverges. <a href="http://www.shawnohare.com/2013/11/a-simple-proof-that-harnoic-series.html#more">Read more »</a>Shawn O'Harehttps://plus.google.com/115904086044687543049noreply@blogger.com0tag:blogger.com,1999:blog-6371279837090307983.post-38989489194334236452013-10-25T16:53:00.001-07:002013-12-06T07:00:28.616-08:00Flickr Tags are UselessRecently I began to code up an image classifier, and my hope was to use Flickr images as a source of real data. The Flickr API allows you to easily obtain the most recent 100 images that have a given tag. However, I found that the correspondence between tags and content to be marginal. About 75% of the images I downloaded tagged with "chicken" were pictures of a ferret (presumably named Chicken?) and more than half of the images tagged with "kitten" were of puppies. Images tagged with "sushi" included dresses, kids playing the violin, ducks, and an assortment of other objects with absolutely no relation to sushi.<br><br><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="http://4.bp.blogspot.com/-XXkuXZBrH3c/UmsDfVgGKlI/AAAAAAAAAmc/Od6KC5jPDns/s1600/10319725425_5b45433ce4.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="320" src="http://4.bp.blogspot.com/-XXkuXZBrH3c/UmsDfVgGKlI/AAAAAAAAAmc/Od6KC5jPDns/s320/10319725425_5b45433ce4.jpg" width="208"></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Tagged with "sushi"</td></tr></tbody></table><div><br></div><br><br><a href="http://www.shawnohare.com/2013/10/flickr-tags-are-useless.html#more">Read more »</a>Shawn O'Harehttps://plus.google.com/115904086044687543049noreply@blogger.com0tag:blogger.com,1999:blog-6371279837090307983.post-42718381037092103362013-10-15T15:54:00.002-07:002013-12-06T06:59:47.701-08:00Sample Covariance MatrixSuppose $X_1, \dots, X_p$ are random variables and $X:=(X_1, \dots, X_p)$ is the random vector of said random variables. Given a sample $\mathbf x_1, \dots \mathbf x_N$ of $X$, how do we obtain an estimate for the covariance matrix $\text{Cov}(X)$? <br><br> To this end, let $\mathbf X=( \mathbf X_{ij})$ denote the $N \times p$ data matrix whose $k$th row is $\mathbf x_k$ and let $\bar x_i$ denote the sample mean of $X_i$. That is, $\bar x_i$ is the mean of the $i$th column of $\mathbf X$. Then the sample covariance matrix $Q = (Q_{ij})$ is defined by \[ Q_{ij} := \frac{1}{N-1} \sum_{k=1}^{N} (\mathbf X_{ki} - \bar x_i)(\mathbf X_{kj} - \bar x_j). \] We can write this a bit more compactly if we introduce yet more notation. Let $\bar{\mathbf x}:=(\bar x_1, \dots, \bar x_p)$ denote the sample mean vector and $M$ the $N \times p$ matrix with all rows equal to $\bar{\mathbf x}$. The matrix $N:=\mathbf X - M$ is the data matrix that has been centered about the sample means, and we quickly see that \[ Q = N^T N. \]Shawn O'Harehttps://plus.google.com/115904086044687543049noreply@blogger.com0tag:blogger.com,1999:blog-6371279837090307983.post-32641329085363739542013-10-11T18:19:00.002-07:002013-10-11T18:19:09.663-07:00US Life Expectancy DataI took a look at the World Health Organization's data on the at birth life expectancies for various countries in the world. The number given for each year assumes that levels of mortality will remain roughly the same throughout the individual's life, and it is a summary of the mortality patterns across all individuals in a given population.<br /><br />From the WHO's global data I excised the data relating specifically to the United States. After plotting the data and fitting a regression line, it's pretty clear that so far the US life expectancy exhibits a linear rate of increase, growing from about 69 years in the early 1960's to around 79 years today.<br /><br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-8h-F_PPd7X0/Ulii8I48bKI/AAAAAAAAAmA/lLY1xcgeMg4/s1600/lr_life_expectancy.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="480" src="http://4.bp.blogspot.com/-8h-F_PPd7X0/Ulii8I48bKI/AAAAAAAAAmA/lLY1xcgeMg4/s640/lr_life_expectancy.png" width="640" /></a></div>Shawn O'Harehttps://plus.google.com/115904086044687543049noreply@blogger.com0tag:blogger.com,1999:blog-6371279837090307983.post-54128116317529908752013-10-11T17:59:00.001-07:002013-10-11T18:04:07.110-07:00Linear ModelsSuppose we think that a random variable $Y$ (the response) is linearly dependent on random variables $X_1, \dots, X_p$, where $p$ is some integer. We can model this by assuming that \[ Y =\underbrace{ \beta_0 + \sum_{j=1}^p \beta_j X_j}_{\hat Y} + \epsilon, \] where $\beta = (\beta_0, \dots, \beta_p)$ is the parameter vector for said model and $\epsilon$ is a distribution for the residual error between the true value of $Y$ and the linear prediction $\hat Y$. We assume that $\epsilon$ has mean $0$. Letting $\mathbf x$ denote the random vector $(1,X_1, \dots, X_p)$, this model can be rewritten \[ Y = \mathbf x \cdot \beta + \epsilon, \quad \text{or} \quad y(\mathbf x ) = \mathbf x \cdot \beta + \epsilon, \] where $- \cdot -$ represents the standard inner product on $\mathbb R^{p+1}$. One often assumes, as we do now, that $\epsilon$ has a Gaussian distribution. The probability density function $p$ for $Y$ then becomes \[ p(y \mid \mathbf x, \beta) = \mathcal N(y \mid \mu (\mathbf x), \sigma^2 (\mathbf x)) \] where $\mu(\mathbf x) = \mathbf x \cdot \beta$. We will assume that the variance $\sigma^2(\mathbf x)$ is a constant $\sigma^2$. Suppose we make the $N$ observations that $Y = y^{(i)}$ when $X_j = x^{(i)}_j$ for $j=1, \dots, p$ and $i = 1, \dots, N$. Let $\mathbf x^{(i)}$ denote the $i$th feature vector. Further, let $\mathbf y:=(y^{(1)}, \dots, y^{(N)})$ denote the vector of responses. For mathematical convenience we assume that each feature vector $\mathbf x^{(i)}$ is a $p+1$-vector with first entry equal to $1$. Finally, let $\mathbf X$ denote the $N \times (p+1)$ feature matrix whose $i$th row is just the feature vector $\mathbf x^{(i)}$. Given the data above, how can we find $\beta$ such that $\hat Y$ is best fit, in some sense? One standard way to do this is to minimize the residual sum square error \[ \text{RSS}(\beta) := \| \mathbf y - \mathbf X \beta \|^2 =(\mathbf y - \mathbf X \beta ) \cdot ( \mathbf y - \mathbf X \beta ) = \sum_{i=1}^N ( y^{(i)} - \mathbf x^{(i)} \cdot \beta)^2. \] Here $\mathbf X \beta$ denotes the usual matrix multiplication. We can find all minimums in the usual analytical way by computing the partials of $\text{RSS}(\beta)$ with respect to the variables $\beta_j$ for $j=0, \dots, p$. Letting $\mathbf X_j$ denote the $j$th column of $\mathbf X$, we see that \begin{align*} \frac{\partial \text{RSS}}{\partial \beta_j} & = 2 \sum_{i=1}^N ( y^{(i)} - \beta_j x^{(i)}_j) x^{(i)}_j \\ &= 2 (\mathbf y - \beta_j \mathbf X_j) \cdot \mathbf X_j \end{align*} and so $\text{RSS}(\beta)$ is minimized when \[ \mathbf X^T ( \mathbf y - \mathbf X \beta) = 0. \] In the case that $\mathbf X^T \mathbf X$ is not singular the unique solution is \[ \hat \beta = (\mathbf X^T \mathbf X)^{-1} \mathbf X^T \mathbf y. \] <br><br><a href="http://www.shawnohare.com/2013/10/linear-models.html#more">Read more »</a>Shawn O'Harehttps://plus.google.com/115904086044687543049noreply@blogger.com0tag:blogger.com,1999:blog-6371279837090307983.post-77302100804850878222013-09-19T04:53:00.004-07:002013-09-28T21:03:48.411-07:00The Monty Hall Problem via Bayes' RuleThe Monty Hall Problem is a puzzle presented in the context of a gameshow. Behind one of three doors is a prize, and behind the other two doors are goats. Here we assume that the prize is not a goat. You pick a door at random. The game's host, Monty Hall, knows which door the prize is behind. He opens one of the remaining two doors to reveal a goat and asks you if you'd like to switch your choice to the other unopened door. Should you switch?<br><br><a href="http://www.shawnohare.com/2013/09/the-monty-hall-problem-via-bayes-rule.html#more">Read more »</a>Shawn O'Harehttps://plus.google.com/115904086044687543049noreply@blogger.com0tag:blogger.com,1999:blog-6371279837090307983.post-2388847493641840992013-09-12T13:18:00.000-07:002013-09-12T13:18:01.752-07:00A Beginner's Guide to MatplotlibI recently stumbled upon this <a href="http://www.loria.fr/~rougier/teaching/matplotlib/">gentle introduction</a> to the Matplotlib module for Python. Shawn O'Harehttps://plus.google.com/115904086044687543049noreply@blogger.com0tag:blogger.com,1999:blog-6371279837090307983.post-90388388741271020542013-08-19T20:26:00.001-07:002015-02-03T18:44:51.810-08:00Counting Inversions in PythonBelow is an O(n log(n)) algorithm for counting the number of inversions in an array of distinct positive integers. We call the index pair (i,j) of an array A an inversion provided that i < j and A[i] > A[j]. The idea is to break the array A into a left half L:=A[:n] and a right half R:=A[:n] and then count the number of inversions in L, in R, and the number of split inversions (i,j) where i is less than n and j is greater than or equal to n. If L and R are already sorted then we can easily count the number of split inversions via a slight augmentation to the merge subroutine of merge-sort. <br /><br /><br /><pre class="prettyprint">def SortCount(A):<br /> l = len(A)<br /> if l > 1:<br /> n = l//2<br /> C = A[:n]<br /> D = A[n:]<br /> C, c = SortCount(A[:n])<br /> D, d = SortCount(A[n:])<br /> B, b = MergeCount(C,D)<br /> return B, b+c+d<br /> else:<br /> return A, 0<br /><br /><br />def MergeCount(A,B):<br /> count = 0<br /> M = []<br /> while A and B:<br /> if A[0] <= B[0]: <br /> M.append(A.pop(0)) <br /> else: <br /> count += len(A)<br /> M.append(B.pop(0)) <br /> M += A + B <br /> return M, count <br /></pre>Shawn O'Harehttps://plus.google.com/115904086044687543049noreply@blogger.com0tag:blogger.com,1999:blog-6371279837090307983.post-17831178271424752012013-08-06T21:51:00.001-07:002013-09-19T04:54:16.294-07:00A Coin Tossing GameAs an example of the law of total probabilities, I'm going to explore a slightly generalized variant of an interview question I recently encountered. The basic scenario is to imagine two players, let's say Amanda and Bill, both of who have a weighted coin. They take turns tossing their coin and whoever obtains the first heads wins. Amanda's coin comes up heads with probability $\theta$ and Bill's with probability $\psi$. If Amanda tosses first, what's the probability she will win? <a href="http://www.shawnohare.com/2013/08/a-coin-tossing-game.html#more">Read more »</a>Shawn O'Harehttps://plus.google.com/115904086044687543049noreply@blogger.com0tag:blogger.com,1999:blog-6371279837090307983.post-21839598889116754852013-07-27T17:27:00.000-07:002013-07-27T17:31:31.764-07:00A natural isomorphism between a vector space and its double dual.Soon after one begins a study of abstract vector spaces, they are confronted with the fact that a finite dimensional vector space is naturally (or canonically) isomorphic to its double dual space. Intuitively, this means that the isomorphism is somehow independent of any particular choice of basis. This notion is made rigorous with the notion of natural transformations of functors between categories.<br><br><a href="http://www.shawnohare.com/2013/07/the-natural-isomorphism-between-vector.html#more">Read more »</a>Shawn O'Harehttps://plus.google.com/115904086044687543049noreply@blogger.com0tag:blogger.com,1999:blog-6371279837090307983.post-29133464104452975252013-06-24T11:40:00.000-07:002014-07-07T20:01:13.574-07:00Densities and ExpectationsFor the layperson, it's probably most helpful to think of the density function $f$ associated to a random variable $X$ as the function you integrate to compute probabilities. Similarly, the expected value $E(X)$ is thought of as the average value that $X$ takes. Often $E(X)$ is defined already in terms of the density function, and it's not clear to the beginner why $\int_{\mathbb R} x f(x) \ dx$ should compute the expected value of $X$. What should be perhaps a bit more obvious is that if you integrate $X$ over the entire probability space, with respect to the given probability measure, then you obtain the average value of $X$. This is indeed how the expectation of $X$ is typically defined in a more analytical setting.<br><a href="http://www.shawnohare.com/2013/06/densities-and-expectations.html#more">Read more »</a>Shawn O'Harehttps://plus.google.com/115904086044687543049noreply@blogger.com0tag:blogger.com,1999:blog-6371279837090307983.post-6682449635680702612013-06-09T14:43:00.002-07:002013-06-10T10:15:43.083-07:00Complex Numbers as MatricesMany people take exception with the complex number field $\mathbb C$ because the equality \begin{equation}\label{eq:neg1} i^2 = -1 \end{equation} rankles them in some way. I suspect that the layperson's distaste for this identity is their wont to interpret $i$ as a real number, since they have the most experience with the arithmetic of real numbers. Surely our convention of calling numbers such as $\sqrt 2$, $\pi$, and $e$ real but $-2i$ ``purely imaginary'' doesn't help. <br><br>Identities of the form in \eqref{eq:neg1} abound in the wild. For instance, within the ring $M_2(\mathbb R)$ of $2 \times 2$-matrices with real entries, the rotation by $90^{\circ}$ matrix $R$ defined by \begin{equation}\label{eq:matrix} R := \begin{pmatrix} 0 & -1 \\ 1 & 0 \end{pmatrix} \end{equation} satisfies $R^2 = -\mathbf{1}$. Here $\mathbf{1}$ signifies the identity matrix \[ \mathbf{1}:=\begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}, \] which is the multiplicative identity in $M_2(\mathbb R)$. <br><a href="http://www.shawnohare.com/2013/06/many-people-take-exception-to-complex.html#more">Read more »</a>Shawn O'Harehttps://plus.google.com/115904086044687543049noreply@blogger.com0tag:blogger.com,1999:blog-6371279837090307983.post-16944993207235549642013-03-30T12:25:00.000-07:002013-03-30T12:26:44.523-07:00Mathematical Intro to Data ScienceI just came across <a href="http://www.math.pku.edu.cn/teachers/yaoy/">Yuan Yao's</a> lecture notes/book titled <a href="http://www.math.pku.edu.cn/teachers/yaoy/Fall2012/lectures.pdf">A Mathematical Introduction to Data Science</a>. Unfortunately, the last few chapters are presently empty. Shawn O'Harehttps://plus.google.com/115904086044687543049noreply@blogger.com0tag:blogger.com,1999:blog-6371279837090307983.post-54434891072228710402013-03-29T17:27:00.002-07:002013-03-29T17:37:34.674-07:00Non-linear Additive Maps<br>While teaching an introductory linear algebra course, a colleague noticed that most of the examples of additive maps he gave turned out to be linear. He asked whether I could think of a map which was additive, but not linear. In a general context, the question was to find a ring $R$, $R$-modules $ V$ and $ W$, and a map $ f \colon V \to W$ such that $ f $ is a group homomorphism, but not an $ R$-module morphism, i.e, $ f(x+y)=f(x)+f(y)$ for all $ x,y \in V,$ but there is some $ r \in R$ and $ z \in V$ such that $ f(r \cdot z) \neq r \cdot f(z)$.<br><br><a href="http://www.shawnohare.com/2013/03/non-linear-additive-maps.html#more">Read more »</a>Shawn O'Harehttps://plus.google.com/115904086044687543049noreply@blogger.com0tag:blogger.com,1999:blog-6371279837090307983.post-49649513855245427962013-03-29T17:23:00.001-07:002013-03-31T09:25:42.079-07:00Symmetries of the Naturals<br>This was a little problem I thought about while backpacking a few years ago.<br><br>We define a symmetry of the natural numbers to be any bijective function $ f \colon \mathbb{N} \to \mathbb{N}$. Let $ S$ denote the set of symmetries of $ \mathbb N$.<br><br><b>Claim</b>. $ S$ has the cardinality of the continuum.<br><br><a href="http://www.shawnohare.com/2013/03/symmetries-of-naturals.html#more">Read more »</a>Shawn O'Harehttps://plus.google.com/115904086044687543049noreply@blogger.com0tag:blogger.com,1999:blog-6371279837090307983.post-19787423972250384082013-03-29T16:47:00.001-07:002013-03-29T16:47:46.148-07:00Sums of Consecutive IntegersA fun fact I recently learned about is the following.<br><br><b>Claim </b>. A natural number can be written as the sum of at least two consecutive positive integers if and only if it is not a power of 2.<br><br><a href="http://www.shawnohare.com/2013/03/sums-of-consecutive-integers.html#more">Read more »</a>Shawn O'Harehttps://plus.google.com/115904086044687543049noreply@blogger.com0tag:blogger.com,1999:blog-6371279837090307983.post-59936507526961176802013-03-29T16:32:00.001-07:002013-03-30T12:27:15.202-07:00All Horses Are the Same Color<br>I was fortunate to TA UCSC's introductory proofs course three times. One of the exercises I like to pose to my students is the following fallacious induction proof that all horses are the same color.<br><br><b>The "Proof"</b><br><br>As a base case, it's clear that any horse is the same color as itself. Assume as an inductive hypothesis that any collection of n horses are the same color, where n is some fixed natural number. Given any collection of n+1 horses, we may select a subset consisting of all but a single horse. By the inductive hypothesis, all the horses in this subset are the same color. Now regroup the horses and select another n horse subset, but this time we pick the horse that left out of the first selection. Again all the horses in this subset are the same color, so all the horses in the original set are the same color.<br><br><a href="http://www.shawnohare.com/2013/03/all-horses-are-same-color.html#more">Read more »</a>Shawn O'Harehttps://plus.google.com/115904086044687543049noreply@blogger.com0tag:blogger.com,1999:blog-6371279837090307983.post-43549568491026993402013-03-29T16:12:00.001-07:002013-03-30T12:27:28.742-07:00Code SnippetsPresently, I am using Google's <a href="http://code.google.com/p/google-code-prettify/">Code-Prettify</a> to include code snippets in my posts. Getting this up and running was fairly painless.<br /><br />1. Navigate to Design>Template>Edit Html.<br />2. After the <code> <head></code> tag insert <code><script defer="defer" src="https://google-code-prettify.googlecode.com/svn/loader/run_prettify.js?autoload=true&lang=tex&skin=sons-of-obsidian"> </script></code>. 3. Within the <code><body></code> insert <code>onload='prettyPrint()'</code>. The tag should now look similar to: <code><body expr:class='"loading" + data:blog.mobileClass' onload='prettyPrint()'></code> <br /><span style="font-family: monospace;">4. Consult the Google pages linked to above to see how to employ code-prettify.</span><br /><span style="font-family: monospace;"><br /></span><span style="font-family: monospace;">One must be careful to <a href="http://www.opinionatedgeek.com/dotnet/tools/htmlencode/encode.aspx">encode</a> the snippet so that it is HTML ready.</span>Shawn O'Harehttps://plus.google.com/115904086044687543049noreply@blogger.com0