Mathematics of Data Science Seminar meets (usually) on Tuesdays 11:30-12:30, in MW 154. The schedule is below. If you would like to speak in the seminar, please contact Maria Han Veiga (hanveiga.1@osu.edu) or Vladimir Kobzar (kobzar.1@osu.edu).
Date | Speaker | Title |
Fall 2024 | ||
September 10, 11:30 | Bernardo Modenesi (University of Michigan) | Unveiling Hidden Patterns in Agent Behavior with Discrete-Choice and Network Theory |
October 17, 16:00 (Thursday) | Tim Kunisky (John Hopkins University) | Spectral pseudorandomness, free probability, and the clique number of the Paley graph |
November 19, 11:30 | Thomas O'Leary-Roseberry (UT Austin) | Efficient Infinite-Dimensional Bayesian Inversion using Derivative-Informed Neural Operators |
Title: Unveiling Hidden Patterns in Agent Behavior with Discrete-Choice and Network Theory
Abstract: Many datasets in data science stem from agents making repeated choices over time, with each choice leading to an observable outcome. In this setup, we introduce a novel approach to uncover latent agent heterogeneity, enhancing our understanding of agent behavior and improving causal inference estimation. By combining discrete choice models with network theory, we develop a method to measure agent similarity based on their patterns of choice. This results in a network-based unsupervised clustering technique that groups agents with similar behaviors, offering an interpretable alternative to black-box machine learning clustering models, with explicit estimation assumptions. In this seminar, I will illustrate our approach using labor market data, where workers (agents) and jobs (choices) are represented as nodes in a bipartite network, with edges in this network denoting worker-job matches. By clustering workers based on their job choices, we can infer unobserved workers’ skills—an important factor in economic analysis. Through Bayesian estimation, we reveal latent groups of similar workers, which is used to make more accurate predictions of labor market outcomes and measurement of labor market discrimination, compared to models relying only on observable characteristics. This seminar will detail our methodological framework, estimation strategy, and practical applications for understanding and predicting agent-choice dynamics.
Title: Spectral pseudorandomness, free probability, and the clique number of the Paley graph
Abstract: The Paley graph is a classical number-theoretic construction of a graph that is believed to behave "pseudorandomly" in many regards. Accurately bounding the clique number of the Paley graph is a long-standing open problem in number theory, with applications to several other questions about the statistics of finite fields. I will present a new approach to this problem, which also opens up intriguing connections with random matrix theory and free probability. In particular, I will show that certain deterministic induced subgraphs of the Paley graph have the same limiting spectrum as induced subgraphs on random subsets of vertices of the same size. I will discuss how this phenomenon arises as a consequence of asymptotic freeness (in the sense of free probability) of certain matrices associated with the Paley graph. I will then present conjectures describing a stronger analogy between random and pseudorandom deterministic induced subgraphs that would lead to clique number bounds improving on the state of the art. On the way, I will describe new techniques for understanding the eigenvalue statistics of more general random or pseudorandom submatrices of certain structured matrices like ones associated to incoherent tight frames, and will mention how these helped to resolve a recent conjecture of Haikin, Zamir, and Gavish in frame theory.
Title: Efficient Infinite-Dimensional Bayesian Inversion using Derivative-Informed Neural Operators
Abstract: We address Bayesian inverse problems (BIPs) for functions that parametrize PDE models, such as heterogeneous coefficient fields. These problems are challenging due to high-dimensional parameter spaces upon discretization, the computational demands of PDE-based likelihood evaluations, and the complex geometry of the posterior (e.g., concentration and non-linear multi-modality). While efficient algorithms for infinite-dimensional BIPs leverage posterior geometry through likelihood derivatives, they become impractical when the computational model (and its adjoints) is expensive to compute. In this talk, we introduce derivative-informed neural operators (DINOs), a class of neural operator surrogates trained to accurately approximate both PDE mappings and their derivatives (Fréchet derivatives). DINOs offer superior generalization over traditional neural operators, enabling precise derivative use in high-dimensional uncertainty quantification (UQ) tasks like BIPs. Specifically, reduced basis DINOs are a natural choice for algorithms that employ subspace decompositions. In the context of BIPs, these algorithms allowing for efficient inversion in a likelihood-dominated subspace, with the prior completing the orthogonal component. We will explore these algorithms within the frameworks of geometric MCMC and transport map variational inference. Our numerical results show that DINO-accelerated algorithms achieve state-of-the-art cost-accuracy performance compared to various other methods.