El Shar, Ibrahim
(2023)
Exploiting Structure and Relaxations in Reinforcement Learning and Stochastic Optimal Control.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
Abstract
Stochastic optimal control studies the problem of sequential decision-making under uncertainty. Dynamic programming (DP) offers a principled approach to solving stochastic optimal control problems. A major drawback of DP methods, however, is that they become quickly intractable in large-scale problems.
In this thesis, we show how structural results and various relaxation techniques can be used to obtain good approximations and accelerate learning.
First, we propose a new provably convergent variant of Q-learning that leverages upper and lower bounds derived using information relaxation techniques to improve performance in the tabular setting.
Second, we study weakly coupled DPs which are a broad class of stochastic sequential decision problems comprised of multiple subproblems coupled by some linking constraints but are otherwise independent. We propose another Q-learning based algorithm that makes use of Lagrangian relaxation to generate upper bounds and improve performance. We also extend our algorithm to the function approximation case using Deep Q-Networks.
Finally, we study the problem of spatial dynamic Pricing for a fixed number of shared resources that circulate in a network. For the general network, we show that the optimal value function is concave and for a network composed of two locations, we show that the optimal policy enjoys certain monotonicity and bounded sensitivity properties. We use these results to propose a novel heuristic algorithm which we compare against several baselines.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
19 January 2023 |
Date Type: |
Publication |
Defense Date: |
28 October 2022 |
Approval Date: |
19 January 2023 |
Submission Date: |
1 November 2022 |
Access Restriction: |
1 year -- Restrict access to University of Pittsburgh for a period of 1 year. |
Number of Pages: |
131 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
Swanson School of Engineering > Industrial Engineering |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
Markov decision processes; Approximate dynamic programming; Reinforcement learning |
Date Deposited: |
19 Jan 2023 19:18 |
Last Modified: |
19 Jan 2024 06:15 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/43775 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |