Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Structured Strategies for Learning and Exploration in Sequential Decision Making

Wang, Yijia (2022) Structured Strategies for Learning and Exploration in Sequential Decision Making. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Download (8MB) | Preview


Solving Markov decision processes (MDPs) efficiently is challenging in many cases, for example, when the state space or action space is large, when the reward function is sparse and delayed, and when there is a distribution of MDPs. Structures in the policy, value function, reward function, or state space can be useful in accelerating the learning process. In this thesis, we exploit structures in MDPs to solve them effectively and efficiently. First, we study problems with concave value function and basestock policy and leverage these two structures to propose an approximate dynamic programming (ADP) algorithm. Next, we study the exploration problem in unknown MDPs, introduce structured intrinsic reward to the problem, and propose a Bayes-optimal algorithm for learning the intrinsic reward. Finally, we move to problems with structured state space (slow and fast state), build a hierarchical model which exploits the structure, and propose ADP algorithms for the hierarchical model.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Wang, Yijiayiw94@pitt.eduyiw94
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Thesis AdvisorJiang,
Committee MemberMaillart, Lisamaillart@pitt.eduMAILLART0000-0002-6321-2671
Committee MemberRajgopal,
Committee MemberPolozek,
Committee MemberKharoufeh,
Date: 10 June 2022
Date Type: Publication
Defense Date: 5 April 2022
Approval Date: 10 June 2022
Submission Date: 7 April 2022
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 202
Institution: University of Pittsburgh
Schools and Programs: Swanson School of Engineering > Industrial Engineering
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Markov decision processes; Approximate dynamic programming; Reinforcement learning.
Date Deposited: 10 Jun 2022 19:27
Last Modified: 10 Jun 2022 19:27


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item