Main Question
In this question:
- A polynomial $P(x)$ is written in Bernstein form of degree $n$ if it is written as $P(x)=\sum_{k=0}^n a_k {n \choose k} x^k (1-x)^{n-k},$ where $a_0, ..., a_n$ are the polynomial's Bernstein coefficients.
- For the Bernstein polynomial of $f(x)$ of degree $n$, $a_k = f(k/n)$.
Suppose $f:[0,1]\to [0,1]$ is continuous and belongs to a large class of functions (for example, the $k$-th derivative, $k\ge 0$, is continuous, Lipschitz continuous, concave, strictly increasing, bounded variation, and/or in the Zygmund class, or $f$ is real analytic).
Then, compute the Bernstein coefficients of a sequence of polynomials ($g_n$) of degree 2, 4, 8, ..., $2^i$, ... that converge to $f$ from below and satisfy: $(g_{2n}-g_{n})$ is a polynomial with non-negative Bernstein coefficients once it's rewritten to a polynomial in Bernstein form of degree exactly $2n$. Assume $0\lt f(\lambda)\lt 1$ or $f$ is polynomially bounded.
The convergence rate must be $O(1/n^{r/2})$ if the class has only functions with Lipschitz-continuous $(r-1)$-th derivative. The method may not introduce transcendental or trigonometric functions (as with Chebyshev interpolants).
See "Strategies", below, for different ways to answer this question.
Background
I asked this question in order to solve the so-called Bernoulli factory problem, described next. We're given a coin that shows heads with an unknown probability, $\lambda$. The goal is to use that coin (and possibly also a fair coin) to build a "new" coin that shows heads with a probability that depends on $\lambda$, call it $f(\lambda)$. This is the Bernoulli factory problem, and it can be solved only if $f$ is continuous (Keane and O'Brien 1994).
However, since I asked this question I have found a Bernoulli factory algorithm that I believe is general enough to cover all the cases that this question would help solve.
Since this question may be of broader interest, though, I leave this question open. See also my other open questions about the Bernoulli factory problem.
Polynomials that approach a factory function
An algorithm simulates a factory function $f(\lambda)$ via two sequences of polynomials that converge from above and below to that function. To use the algorithm, however, the polynomial sequences must meet certain requirements, one of which is:
For $f(\lambda)$ there must be a sequence of polynomials ($g_n$) in Bernstein form of degree 1, 2, 3, ... that converge to $f$ from below and satisfy:$(g_{n+1}-g_{n})$is a polynomial with non-negative Bernstein coefficients once it's rewritten to a polynomial in Bernstein form of degree exactly $n+1$ (see end notes; Nacu and Peres 2005; Holtz et al. 2011). For $f(\lambda)=1-f(\lambda)$ there must likewise be a sequence of this kind.
A Matter of Efficiency
However, ordinary Bernstein polynomials converge to a function at the rate $\Omega(1/n)$ in general, a result known since Voronovskaya (1932)[^5] and a rate that will lead to an infinite expected number of coin flips in general. (See also my supplemental notes.)
But Lorentz (1966)[^6] showed that if the function is positive and has a continuous $k$-th derivative, there are polynomials with nonnegative Bernstein coefficients that converge at the rate $O(1/n^{k/2})$ (and thus can enable a finite expected number of coin flips if the function is "smooth" enough).
Thus, researchers have studied alternatives to Bernstein polynomials that improve the convergence rate for "smoother" functions. See Holtz et al. (2011), Sevy (1991), Waldron (2009), Costabile et al. (2005), Han (2003), Khosravian-Arab et al. (2018), and references therein; see also Micchelli (1973), Güntürk and Li (2021a, 2021b_, Draganov (2024), and Tachev (2022).
These alternative polynomials usually come with results where the error bound is the desired $O(1/n^{k/2})$, but most of those results (with the notable exception of Sevy) have hidden constants with no upper bounds given, making them unimplementable (that is, it can't be known beforehand whether a given polynomial will come close to the target function within a user-specified error tolerance).
A Conjecture on Polynomial Approximation
The following is a conjecture that could help reduce this problem to the problem of finding explicit error bounds when approximating a function by polynomials.
Let $f(\lambda):[0,1]\to(0,1)$ have a continuous $r$-th derivative, where $r\ge 1$, let $M$ be the maximum of the absolute value of $f$ and its derivatives up to the $r$-th derivative, and denote the Bernstein polynomial of degree $n$ of a function $g$ as $B_n(g)$. Let $W_{2^0}(\lambda), W_{2^1}(\lambda), ..., W_{2^i}(\lambda),...$ be a sequence of bounded functions on [0, 1] that converge uniformly to $f$.
For each integer $n\ge 1$ that's a power of 2, suppose that there is $D>0$ such that—
$$\text{abs}(f(\lambda)-B_n(W_n(\lambda))) \le DM/n^{r/2},$$
whenever $0\le \lambda\le 1$. Then there is $C_0\ge D$ such that the polynomials $(g_n)$ in Bernstein form of degree 2, 4, 8, ..., $2^i$, ..., defined as $g_n=B_n(W_n(\lambda)) - C_0 M/n^{r/2}$, converge from below to $f$ and satisfy: $(g_{2n}-g_{n})$ is a polynomial with nonnegative Bernstein coefficients once it's rewritten to a polynomial in Bernstein form of degree exactly $2n$.
Equivalently (see also Nacu and Peres (2005)), there is $C_1>0$ such that the inequality—
$$0\le W_{2n}\left(\frac{k}{2n}\right) - \sum_{i=0}^k W_n\left(\frac{i}{n}\right)\sigma_{n,k,i}\le C_1M/n^{r/2},\tag{PB}$$
holds true for each integer $n\ge 1$ that's a power of 2 and whenever $0\le k\le 2n$, where $\sigma_{n,k,i} = {n\choose i}{n\choose {k-i}}/{2n \choose k}=\mathbb{P}(X_k=i)$ and $X_k$ is a hypergeometric($2n$, $k$, $n$) random variable.
$C_0$ or $C_1$ may depend on $r$ and the sequence $W_n$, but not on $f$ or $n$. When $C_0$ or $C_1$ exists, find a good upper bound for it.
Strategies
The following are some strategies for answering these questions:
- Verify my proofs for the results on error bounds for certain polynomials in "Results Used in Approximations By Polynomials", including:
- Iterated Boolean sums (linear combinations of iterates) of Bernstein polynomials ($B_n(W_n) = f-(f-B_n(f))^k$; see Note 4 in "End Notes" later in this page): Propositions B10C and B10D.
- Linear combinations of Bernstein polynomials (see Costabile et al. (2005)): Proposition B10.
- The Lorentz operator (Holtz et al. 2011).
- Find the hidden constants $\theta_\alpha$, $s$, and $D$ as well as those in Lemmas 15, 17 to 22, 24, and 25 in Holtz et al. (2011).
- Find polynomials of the following kinds and find explicit bounds, with no hidden constants, on the approximation error for those polynomials:
- Polynomial operators that preserve polynomials at a higher degree than linear functions.
- Operators that produce a degree-$n$ polynomial from $O(n^2)$ sample points.
- Polynomials built from samples at rational values of a function $f$ that cluster at a quadratic rate toward the endpoints (Adcock et al. 2019) (for example, values that converge to Chebyshev points $\cos(j\pi/n)$ with increasing $n$, or to Legendre points). See also 7, 8, and 12 of Trefethen, Approximation Theory and Approximation Practice, 2013.
- Find a nonnegative random variable $X$ and a series $f(\lambda)=\sum_{a\ge 0}\gamma_a(\lambda)$ such that $\gamma_a(\lambda)/\mathbb{P}(X=a)$ (letting 0/0 equal 0) is a polynomial or rational function with rational Bernstein coefficients lying in $[0, 1]$.
References
- Łatuszyński, K., Kosmidis, I., Papaspiliopoulos, O., Roberts, G.O., "Simulating events of unknown probabilities via reverse time martingales", arXiv:0907.4018v2 [stat.CO], 2009/2011.
- Keane, M. S., and O'Brien, G. L., "A Bernoulli factory", ACM Transactions on Modeling and Computer Simulation 4(2), 1994.
- Holtz, O., Nazarov, F., Peres, Y., "New Coins from Old, Smoothly", Constructive Approximation 33 (2011).
- Nacu, Şerban, and Yuval Peres. "Fast simulation of new coins from old", The Annals of Applied Probability 15, no. 1A (2005): 93-115.
- Micchelli, C. (1973). The saturation class and iterates of the Bernstein polynomials. Journal of Approximation Theory, 8(1), 1-18.
- Güntürk, C. Sinan, and Weilin Li. "Approximation with one-bit polynomials in Bernstein form" arXiv preprint arXiv:2112.09183 (2021).
- C.S. Güntürk, W. Li, "Approximation of functions with one-bit neural networks", arXiv:2112.09181 [cs.LG], 2021.
- Draganov, Borislav R. "On simultaneous approximation by iterated Boolean sums of Bernstein operators." Results in Mathematics 66, no. 1 (2014): 21-41.
- Tachev, Gancho. "Linear combinations of two Bernstein polynomials", Mathematical Foundations of Computing, 2022.
- Sevy, J., “Acceleration of convergence of sequences of simultaneous approximants”, dissertation, Drexel University, 1991.
- Waldron, S., "Increasing the polynomial reproduction of a quasi-interpolation operator", Journal of Approximation Theory 161 (2009).
- Costabile, F., Gualtieri, M.I., Serra, S., “Asymptotic expansion and extrapolation for Bernstein polynomials with applications”, BIT 36 (1996)
- Han, Xuli. “Multi-node higher order expansions of a function.” Journal of Approximation Theory 124.2 (2003): 242-253. https://doi.org/10.1016/j.jat.2003.08.001
- Khosravian-Arab, Hassan, Mehdi Dehghan, and M. R. Eslahchi. "A new approach to improve the order of approximation of the Bernstein operators: theory and applications." Numerical Algorithms 77 (2018): 111-150.
- Adcock, B., Platte, R.B., Shadrin, A., “Optimal sampling rates for approximating analytic functions from pointwise samples, IMA Journal of Numerical Analysis 39(3), July 2019
Note 5: This condition is also known as a "consistency requirement"; it ensures that not only the polynomials "increase" to $f(\lambda)$, but also their Bernstein coefficients do as well. This condition is equivalent in practice to the following statement (Nacu & Peres 2005). For every integer $n\ge 1$ that's a power of 2, $a(2n, k)\ge\mathbb{E}[a(n, X_{n,k})]= \left(\sum_{i=0}^k a(n,i) {n\choose i}{n\choose {k-i}}/{2n\choose k}\right)$, where $a(n,k)$ is the degree-$n$ polynomial's $k$-th Bernstein coefficient, where $0\le k\le 2n$ is an integer, and where $X_{n,k}$ is a hypergeometric($2n$, $k$, $n$) random variable. A hypergeometric($2n$, $k$, $n$) random variable is the number of "good" balls out of $n$ balls taken uniformly at random, all at once, from a bag containing $2n$ balls, $k$ of which are "good". See also my MathOverflow question on finding bounds for hypergeometric variables.
Note 6: If $W_n(0)=f(0)$ and $W_n(1)=f(1)$ for every $n$, then the inequality $(PB)$ is automatically true when $k=0$ and $k=2n$, so that the statement has to be checked only for $0\lt k\lt 2n$. If, in addition, $W_n$ is symmetric about 1/2, so that $W_n(\lambda)=W_n(1-\lambda)$ whenever $0\le \lambda\le 1$, then the statement has to be checked only for $0\lt k\le n$ (since the values $\sigma_{n,k,i} = {n\choose i}{n\choose {k-i}}/{2n \choose k}$ are symmetric in that they satisfy $\sigma_{n,k,i}=\sigma_{n,k,k-i}$).