Jiaheng Chen

Hi! I am a Ph.D. student in Computational and Applied Mathematics at the University of Chicago, where I am fortunate to be advised by Daniel Sanz-Alonso. Before coming to Chicago, I received my B.S. in Mathematics and Applied Mathematics from Shanghai Jiao Tong University. Here is my CV.

My research lies at the intersection of applied mathematics, statistics, and data science. I am broadly interested in the mathematics of data science and scientific machine learning. A central theme of my work is integrating mathematical analysis with statistical, learning, and algorithmic methodologies to address theoretical and computational challenges in data-centric applications.

My recent work focuses on the theory and algorithms for learning and inference of operators, tensors, and dynamics, motivated by applications in inverse problems, data assimilation, and machine learning.


Publications and Preprints

  1. Convergence rates for learning pseudo-differential operators. With D. Sanz-Alonso, (2026).
    [arXiv] [Abstract]
    This paper establishes convergence rates for learning elliptic pseudo-differential operators, a fundamental operator class in partial differential equations and mathematical physics. In a wavelet-Galerkin framework, we formulate learning over this class as a structured infinite-dimensional regression problem with multiscale sparsity. Building on this structure, we propose a sparse, data- and computation-efficient estimator, which leverages a novel matrix compression scheme tailored to the learning task and a nested-support strategy to balance approximation and estimation errors. In addition to obtaining convergence rates for the estimator, we show that the learned operator induces an efficient and stable Galerkin solver whose numerical error matches its statistical accuracy. Our results therefore contribute to bringing together operator learning, data-driven solvers, and wavelet methods in scientific computing.
  2. High-dimensional quasi-Monte Carlo via combinatorial discrepancy. With H. Jiang and N. Kirk, (2025).
    [arXiv] [Abstract]
    Monte Carlo (MC) and Quasi-Monte Carlo (QMC) methods are classical approaches for the numerical integration of functions $f$ over $[0,1]^d$. While QMC methods can achieve faster convergence rates than MC in moderate dimensions, their tractability in high dimensions typically relies on additional structure—such as low effective dimension or carefully chosen coordinate weights—since worst-case error bounds grow prohibitively large as $d$ increases.
    In this work, we study the construction of high-dimensional QMC point sets via combinatorial discrepancy, extending the recent QMC method of Bansal and Jiang. We establish error bounds for these constructions in weighted function spaces, and for functions with low effective dimension in both the superposition and truncation sense. We also present numerical experiments to empirically assess the performance of these constructions.
  3. Sharp concentration of simple random tensors II: asymmetry. With D. Sanz-Alonso, (2025).
    [arXiv] [Abstract]
    This paper establishes sharp concentration inequalities for simple random tensors. Our theory unveils a phenomenon that arises only for asymmetric tensors of order $p \ge 3:$ when the effective ranks of the covariances of the component random variables lie on both sides of a critical threshold, an additional logarithmic factor emerges that is not present in sharp bounds for symmetric tensors. To establish our results, we develop empirical process theory for products of $p$ different function classes evaluated at $p$ different random variables, extending generic chaining techniques for quadratic and product empirical processes to higher-order settings.
  4. Optimal estimation of structured covariance operators. With O. Al-Ghattas, D. Sanz-Alonso, and N. Waniorek, (2024). [arXiv] [Abstract]
    This paper establishes optimal convergence rates for estimation of structured covariance operators of Gaussian processes. We study banded operators with kernels that decay rapidly off-the-diagonal and $L^q$-sparse operators with an unordered sparsity pattern. For these classes of operators, we find the minimax optimal rate of estimation in operator norm, identifying the fundamental dimension-free quantities that determine the sample complexity. In addition, we prove that tapering and thresholding estimators attain the optimal rate. The proof of the upper bound for tapering estimators requires novel techniques to circumvent the issue that discretization of a banded operator does not result, in general, in a banded covariance matrix. To derive lower bounds for banded and $L^q$-sparse classes, we introduce a general framework to lift theory from high-dimensional matrix estimation to the operator setting. Our work contributes to the growing literature on operator estimation and learning, building on ideas from high-dimensional statistics while also addressing new challenges that emerge in infinite dimension.
  5. On the estimation of Gaussian moment tensors. With O. Al-Ghattas and D. Sanz-Alonso.
    Electronic Communications in Probability, 30, 1-15, (2025). [Journal] [arXiv] [Abstract]
    This paper studies two estimators for Gaussian moment tensors: the standard sample moment estimator and a plug-in estimator based on Isserlis's theorem. We establish dimension-free, non-asymptotic error bounds that demonstrate and quantify the advantage of Isserlis's estimator for tensors of even order $p>2.$ Our bounds hold in operator and entrywise maximum norms, and apply to symmetric and asymmetric tensors.
  6. Sharp concentration of simple random tensors. With O. Al-Ghattas and D. Sanz-Alonso.
    Information and Inference: A Journal of the IMA, 14(4), 1-41, (2025). [Journal] [arXiv] [Slides] [Abstract]
    This paper establishes sharp dimension-free concentration inequalities and expectation bounds for the deviation of the sum of simple random tensors from its expectation. As part of our analysis, we use generic chaining techniques to obtain a sharp high-probability upper bound on the suprema of $L_p$ empirical processes. In so doing, we generalize classical results for quadratic and product empirical processes to higher-order settings.
  7. Precision and Cholesky factor estimation for Gaussian processes. With D. Sanz-Alonso.
    SIAM/ASA Journal on Uncertainty Quantification, 13(3), 1085-1115, (2025). [Journal] [arXiv] [Abstract]
    This paper studies the estimation of large precision matrices and Cholesky factors obtained by observing a Gaussian process at many locations. Under general assumptions on the precision and the observations, we show that the sample complexity scales poly-logarithmically with the size of the precision matrix and its Cholesky factor. The key challenge in these estimation tasks is the polynomial growth of the condition number of the target matrices with their size. For precision estimation, our theory hinges on an intuitive local regression technique on the lattice graph which exploits the approximate sparsity implied by the screening effect. For Cholesky factor estimation, we leverage a block-Cholesky decomposition recently used to establish complexity bounds for sparse Cholesky factorization.
  8. Covariance operator estimation: sparsity, lengthscale, and ensemble Kalman filters. With O. Al-Ghattas, D. Sanz-Alonso, and N. Waniorek.
    Bernoulli, 31(3), 2377-2402, (2025). [Journal] [arXiv] [Slides] [Abstract]
    This paper investigates covariance operator estimation via thresholding. For Gaussian random fields with approximately sparse covariance operators, we establish non-asymptotic bounds on the estimation error in terms of the sparsity level of the covariance and the expected supremum of the field. We prove that thresholded estimators enjoy an exponential improvement in sample complexity compared with the standard sample covariance estimator if the field has a small correlation lengthscale. As an application of the theory, we study thresholded estimation of covariance operators within ensemble Kalman filters.
  9. A machine learning framework for geodesics under spherical Wasserstein-Fisher-Rao metric and its application for weighted sample generation. With Y. Jing, L. Li, and J. Lu.
    Journal of Scientific Computing, 98(5), 1-34, (2024). [Journal] [Abstract]
    Wasserstein–Fisher–Rao (WFR) distance is a family of metrics to gauge the discrepancy of two Radon measures, which takes into account both transportation and weight change. Spherical WFR distance is a projected version of WFR distance for probability measures so that the space of Radon measures equipped with WFR can be viewed as metric cone over the space of probability measures with spherical WFR. Compared to the case for Wasserstein distance, the understanding of geodesics under the spherical WFR is less clear and still an ongoing research focus. In this paper, we develop a deep learning framework to compute the geodesics under the spherical WFR metric, and the learned geodesics can be adopted to generate weighted samples. Our approach is based on a Benamou–Brenier type dynamic formulation for spherical WFR. To overcome the difficulty in enforcing the boundary constraint brought by the weight change, a Kullback–Leibler divergence term based on the inverse map is introduced into the cost function. Moreover, a new regularization term using the particle velocity is introduced as a substitute for the Hamilton–Jacobi equation for the potential in dynamic formula. When used for sample generation, our framework can be beneficial for applications with given weighted samples, especially in the Bayesian inference, compared to sample generation with previous flow models.
  10. Fluctuation suppression and enhancement in interacting particle systems. With L. Li, (2022).
    [arXiv] [Slides] [Abstract]
    We investigate in this work the effects of interaction on the fluctuation of empirical measures. The systems with positive definite interaction potentials tend to exhibit smaller fluctuation compared to the fluctuation in standard Monte Carlo sampling while systems with negative definite potentials tend to exhibit larger fluctuation. Moreover, if the temperature goes to zero, the fluctuation for positive definite kernels in the long time tends to vanish to zero, while the fluctuation for negative definite kernels in the long time tends to blow up to infinity. This phenomenon may gain deeper understanding to some physical systems like the Poisson-Boltzmann system, and may help to understand the properties of some particle based variational inference sampling methods.

Notes

Teaching

University of Chicago Teaching Assistant

STAT/CAAM 31521: Applied Stochastic Processes, Spring 2025

STAT/CAAM 31050: Applied Approximation Theory, Spring 2024

STAT/CAAM 38100: Measure-Theoretic Probability I, Winter 2024

STAT/CAAM 31150: Inverse Problems and Data Assimilation, Autumn 2023

Contact

Email: jiaheng@uchicago.edu

Office: George Herbert Jones Laboratory 307, 5747 S Ellis Avenue, Chicago, IL 60637