In this post we discuss a standard way to encode a text document
as a vector using a term frequency-inverse document frequency (tf-idf) score for each word,
with an aim to cluster similar documents in a corpus.

## Tuesday, December 10, 2013

## Thursday, December 5, 2013

### The Continuum Hypothesis and Weird Probabilities

I recently heard of an interesting "proof" that $(0,1)$ does not
have cardinality $\aleph_1$. This would disprove the Continuum Hypothesis
($\textbf{CH}$), which asserts that any subset of $(0,1)$ is either countable or
has the same cardinality as $(0,1)$. More precisely, $\textbf{CH}$ states that
if $\omega_0$ denotes the first countable ordinal (i.e., the set of natural numbers)
and $\omega_1$ denotes the first uncountable ordinal, then
$| \omega_1| = | 2^{\omega_0}|$. Here $2^{\omega_0}$ is the set of all
binary-valued functions $f \colon \omega_0 \to \{ 0, 1\}$, which has
the same cardinality as the power set of $\omega_0$ and as
$(0,1)$.

Subscribe to:
Posts (Atom)