In this article we go through Gyöngy’s theorem. This result, widely used in mathematical finance, was originally published in Gyöngy, I. (1986). Mimicking the one-dimensional marginal distributions of processes having an Itô differential. Probability Theory and Related Fields, 71(4), 501–516. The idea is quite simple: assume you have a $d-$dimensional stochastic differential equation (SDE); of the $d-$dimensions, one is more relevant to you. Gyöngy’s theorem says you can always replace the whole system with a single equation — and lose nothing that matters for pricing.

Suppose your model consists of a scalar process $X_t \in \mathbb{R}$ — in finance it will be the (logarithm of the) stock price — and a vector of auxiliary factors $Y_t \in \mathbb{R}^{d-1}$ — for example, stochastic volatility, interest rates. Together they form the full system, driven by an $d$-dimensional Brownian motion $W_t$:

\[dX_t = \mu(t, X_t, Y_t)\, dt + \sigma(t, X_t, Y_t)\, dW_t\] \[dY_t = \beta(t, X_t, Y_t)\, dt + \gamma(t, X_t, Y_t)\, dW_t\]

Here $\mu$ and $\sigma$ are the drift and diffusion row of $X_t$, while $\beta$ and $\gamma$ are the corresponding objects for the factor vector $Y_t$. The key point is that $\mu$ and $\sigma$ depend on $Y_t$ — the factors influence the dynamics of $X_t$, which is precisely what makes the system genuinely multidimensional and hard to work with. Gyöngy’s result show that it is possible to replace this whole coupled system with a single scalar SDE for $X_t$ alone, provided the right scalar model is chosen.

This is done in the following way. Define the local drift and local variance by averaging the coefficients of $X_t$ over all possible states of the hidden factors, conditional on the current value of $X_t$:

\[\hat\mu(t, x) := \mathbb{E}\!\left[ \mu(t, X_t, Y_t) \mid X_t = x \right]\] \[\hat\sigma^2(t, x) := \mathbb{E}\!\left[ \sigma(t, X_t, Y_t) \sigma(t, X_t, Y_t)^\top \mid X_t = x \right]\]

Then the mimicking SDE is:

\[d\hat{X}_t = \hat\mu(t, \hat{X}_t)\, dt + \sqrt{\hat\sigma^2(t, \hat{X}_t)}\, d\hat{W}_t\]

where $\hat{W}_t$ is a standard one-dimensional Brownian motion. Note that there is no $Y_t$ anywhere. The process $\hat{X}_t$ has exactly the same one-dimensional marginal distributions as $X_t$ at every time $t \in [0, T]$. For any bounded measurable $f$, we have $\mathbb{E}[f(X_t)] = \mathbb{E}[f(\hat{X}_t)]$.

The strategy is to show that $X_t$ and $\hat{X}_t$ both satisfy the same evolution equation for expectations, then conclude they share the same law by uniqueness.

Step 1 — Probe with a test function

Pick any $\varphi \in C_c^\infty(\mathbb{R})$ and apply Itô’s formula to $\varphi(X_t)$. Since $X_t$ is a scalar Itô process with drift $b(t, X_t, Y_t)$ and quadratic variation $\vert\sigma(t, X_t, Y_t)\vert^2\, dt$, taking expectations kills the stochastic integral and gives:

\[\partial_t\, \mathbb{E}[\varphi(X_t)] = \mathbb{E}\!\left[ \varphi'(X_t)\, b(t, X_t, Y_t) \right] + \tfrac{1}{2}\, \mathbb{E}\!\left[ \varphi''(X_t)\, \vert\sigma(t, X_t, Y_t)\vert^2 \right]\]

This is exact, but the right-hand side still involves $Y_t$.

Step 2 — Apply the tower property

Since $\varphi’(X_t)$ is a function of $X_t$ alone, the tower property of conditional expectation gives:

\[\mathbb{E}\!\left[ \varphi'(X_t) \cdot b(t, X_t, Y_t) \right] = \mathbb{E}\!\left[ \varphi'(X_t) \cdot \mathbb{E}\!\left[ \mu(t, X_t, Y_t) \mid X_t \right] \right]\]

The inner expectation is exactly $\hat\mu(t, X_t)$ — the average drift of $X_t$ given its current value, with $Y_t$ integrated out. Applying the same trick to the variance term, the equation becomes:

\[\partial_t\, \mathbb{E}[\varphi(X_t)] = \mathbb{E}\!\left[ \varphi'(X_t)\, \hat\mu(t, X_t) + \tfrac{1}{2}\, \varphi''(X_t)\, \hat\sigma^2(t, X_t) \right],\]

having removed $Y_t$ completely: the right-hand side depends on the process only through $X_t$ itself.

Step 3 — The scalar SDE satisfies the same equation

Apply Itô’s formula to $\varphi(\hat{X}_t)$ and take expectations. The drift of $\hat{X}_t$ is $B(t, \hat{X}_t)$ and its quadratic variation is $\hat\sigma^2(t, \hat{X}_t)\, dt$ by construction, so:

\[\partial_t\, \mathbb{E}[\varphi(\hat{X}_t)] = \mathbb{E}\!\left[ \varphi'(\hat{X}_t)\, B(t, \hat{X}_t) + \tfrac{1}{2}\, \varphi''(\hat{X}_t)\, \hat\sigma^2(t, \hat{X}_t) \right]\]

This is identical in form to what we derived for $X_t$. The mimicking SDE was designed precisely so this would happen.

Step 4 — Uniqueness closes the argument

Both $X_t$ and $\hat{X}_t$ start from the same distribution at $t = 0$ and satisfy the same equation for all test functions $\varphi$. This is the martingale problem for the generator:

\[\mathcal{L}_t \varphi(x) = \hat\mu(t, x)\, \varphi'(x) + \tfrac{1}{2}\, \hat\sigma^2(t, x)\, \varphi''(x)\]

Under mild regularity on $B$ and $A$, this problem has a unique solution in the space of probability measures. Both $X_t$ and $\hat{X}_t$ are solutions — so their laws must agree at every $t$. $\blacksquare$

It is important to note that the equivalence is neither in a pathwise sense, not $X_t$ and $\hat{X}_t$ are close on the same probability space; in fact, the two processes can live on entirely different spaces. The proof works at the level of expectations, that is their distributions coincide.

The tower property in step 2 is the real engine. It is what lets you integrate out $Y_t$ without approximation. The price is that $\hat\mu$ and $\hat\sigma^2$ are no longer simple explicit functions — they encode the average effect of the hidden factors on $X_t$. Computing them in practice requires knowing the joint distribution of $(X_t, Y_t)$, which is often the hard part.