Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Exploiting Structure and Relaxations in Reinforcement Learning and Stochastic Optimal Control

El Shar, Ibrahim (2023) Exploiting Structure and Relaxations in Reinforcement Learning and Stochastic Optimal Control. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

[img]
Preview
PDF
Updated Version

Download (2MB) | Preview

Abstract

Stochastic optimal control studies the problem of sequential decision-making under uncertainty. Dynamic programming (DP) offers a principled approach to solving stochastic optimal control problems. A major drawback of DP methods, however, is that they become quickly intractable in large-scale problems.
In this thesis, we show how structural results and various relaxation techniques can be used to obtain good approximations and accelerate learning.
First, we propose a new provably convergent variant of Q-learning that leverages upper and lower bounds derived using information relaxation techniques to improve performance in the tabular setting.
Second, we study weakly coupled DPs which are a broad class of stochastic sequential decision problems comprised of multiple subproblems coupled by some linking constraints but are otherwise independent. We propose another Q-learning based algorithm that makes use of Lagrangian relaxation to generate upper bounds and improve performance. We also extend our algorithm to the function approximation case using Deep Q-Networks.
Finally, we study the problem of spatial dynamic Pricing for a fixed number of shared resources that circulate in a network. For the general network, we show that the optimal value function is concave and for a network composed of two locations, we show that the optimal policy enjoys certain monotonicity and bounded sensitivity properties. We use these results to propose a novel heuristic algorithm which we compare against several baselines.


Share

Citation/Export:
Social Networking:
Share |

Details

Item Type: University of Pittsburgh ETD
Status: Unpublished
Creators/Authors:
CreatorsEmailPitt UsernameORCID
El Shar, Ibrahimije8@pitt.eduije80000-0002-3093-1354
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Thesis AdvisorJiang, Danieldrjiang@pitt.edu
Committee MemberMaillart, Lisamaillart@pitt.edu
Committee MemberRajgopal, Jayantj.rajgopal@pitt.edu
Committee MemberBo, Zengbzeng@pitt.edu
Committee MemberBarati, Masoudmasoud.barati@pitt.edu
Date: 19 January 2023
Date Type: Publication
Defense Date: 28 October 2022
Approval Date: 19 January 2023
Submission Date: 1 November 2022
Access Restriction: 1 year -- Restrict access to University of Pittsburgh for a period of 1 year.
Number of Pages: 131
Institution: University of Pittsburgh
Schools and Programs: Swanson School of Engineering > Industrial Engineering
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Markov decision processes; Approximate dynamic programming; Reinforcement learning
Date Deposited: 19 Jan 2023 19:18
Last Modified: 19 Jan 2024 06:15
URI: http://d-scholarship.pitt.edu/id/eprint/43775

Metrics

Monthly Views for the past 3 years

Plum Analytics


Actions (login required)

View Item View Item