Exploiting Structure and Relaxations in Reinforcement Learning and Stochastic Optimal Control

El Shar, Ibrahim (2023) Exploiting Structure and Relaxations in Reinforcement Learning and Stochastic Optimal Control. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Preview

PDF
Updated Version
Download (2MB) | Preview

Abstract

Stochastic optimal control studies the problem of sequential decision-making under uncertainty. Dynamic programming (DP) offers a principled approach to solving stochastic optimal control problems. A major drawback of DP methods, however, is that they become quickly intractable in large-scale problems.
In this thesis, we show how structural results and various relaxation techniques can be used to obtain good approximations and accelerate learning.
First, we propose a new provably convergent variant of Q-learning that leverages upper and lower bounds derived using information relaxation techniques to improve performance in the tabular setting.
Second, we study weakly coupled DPs which are a broad class of stochastic sequential decision problems comprised of multiple subproblems coupled by some linking constraints but are otherwise independent. We propose another Q-learning based algorithm that makes use of Lagrangian relaxation to generate upper bounds and improve performance. We also extend our algorithm to the function approximation case using Deep Q-Networks.
Finally, we study the problem of spatial dynamic Pricing for a fixed number of shared resources that circulate in a network. For the general network, we show that the optimal value function is concave and for a network composed of two locations, we show that the optimal policy enjoys certain monotonicity and bounded sensitivity properties. We use these results to propose a novel heuristic algorithm which we compare against several baselines.

Citation/Export:
Social Networking:	Share \|

Details

Item Type:

University of Pittsburgh ETD

Status:

Unpublished

Creators/Authors:

Creators	Email	Pitt Username	ORCID
El Shar, Ibrahim	ije8@pitt.edu	ije8	0000-0002-3093-1354

ETD Committee:

Title	Member	Email Address
Thesis Advisor	Jiang, Daniel	drjiang@pitt.edu
Committee Member	Maillart, Lisa	maillart@pitt.edu
Committee Member	Rajgopal, Jayant	j.rajgopal@pitt.edu
Committee Member	Bo, Zeng	bzeng@pitt.edu
Committee Member	Barati, Masoud	masoud.barati@pitt.edu

Date:

19 January 2023

Date Type:

Publication

Defense Date:

28 October 2022

Approval Date:

19 January 2023

Submission Date:

1 November 2022

Access Restriction:

1 year -- Restrict access to University of Pittsburgh for a period of 1 year.

Number of Pages:

131

Institution:

University of Pittsburgh

Schools and Programs:

Swanson School of Engineering > Industrial Engineering

Degree:

PhD - Doctor of Philosophy

Thesis Type:

Doctoral Dissertation

Refereed:

Yes

Uncontrolled Keywords:

Markov decision processes; Approximate dynamic programming; Reinforcement learning

Date Deposited:

19 Jan 2023 19:18

Last Modified:

19 Jan 2024 06:15

URI:

http://d-scholarship.pitt.edu/id/eprint/43775

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item

My Account

Search

Browse

Information

Exploiting Structure and Relaxations in Reinforcement Learning and Stochastic Optimal Control

Abstract

Share

Details

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

Connect with us

Send Comments or Questions

Feeds