Structured Strategies for Learning and Exploration in Sequential Decision Making

Wang, Yijia (2022) Structured Strategies for Learning and Exploration in Sequential Decision Making. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Preview

PDF
Download (8MB) | Preview

Abstract

Solving Markov decision processes (MDPs) efficiently is challenging in many cases, for example, when the state space or action space is large, when the reward function is sparse and delayed, and when there is a distribution of MDPs. Structures in the policy, value function, reward function, or state space can be useful in accelerating the learning process. In this thesis, we exploit structures in MDPs to solve them effectively and efficiently. First, we study problems with concave value function and basestock policy and leverage these two structures to propose an approximate dynamic programming (ADP) algorithm. Next, we study the exploration problem in unknown MDPs, introduce structured intrinsic reward to the problem, and propose a Bayes-optimal algorithm for learning the intrinsic reward. Finally, we move to problems with structured state space (slow and fast state), build a hierarchical model which exploits the structure, and propose ADP algorithms for the hierarchical model.

Citation/Export:
Social Networking:	Share \|

Details

Item Type:

University of Pittsburgh ETD

Status:

Unpublished

Creators/Authors:

Creators	Email	Pitt Username	ORCID
Wang, Yijia	yiw94@pitt.edu	yiw94

ETD Committee:

Title	Member	Email Address	Pitt Username	ORCID
Thesis Advisor	Jiang, Daniel	drjiang@pitt.edu
Committee Member	Maillart, Lisa	maillart@pitt.edu	MAILLART	0000-0002-6321-2671
Committee Member	Rajgopal, Jayant	j.rajgopal@pitt.edu
Committee Member	Polozek, Matthias	matthias.poloczek@gmx.de
Committee Member	Kharoufeh, Jeffrey	kharouf@clemson.edu

Date:

10 June 2022

Date Type:

Publication

Defense Date:

5 April 2022

Approval Date:

10 June 2022

Submission Date:

7 April 2022

Access Restriction:

No restriction; Release the ETD for access worldwide immediately.

Number of Pages:

202

Institution:

University of Pittsburgh

Schools and Programs:

Swanson School of Engineering > Industrial Engineering

Degree:

PhD - Doctor of Philosophy

Thesis Type:

Doctoral Dissertation

Refereed:

Yes

Uncontrolled Keywords:

Markov decision processes; Approximate dynamic programming; Reinforcement learning.

Date Deposited:

10 Jun 2022 19:27

Last Modified:

10 Jun 2022 19:27

URI:

http://d-scholarship.pitt.edu/id/eprint/42524

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item

My Account

Search

Browse

Information

Structured Strategies for Learning and Exploration in Sequential Decision Making

Abstract

Share

Details

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

Connect with us

Send Comments or Questions

Feeds