Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Estimating the reliability of MDP policies: A confidence interval approach

Tetreault, JR and Bohus, D and Litman, DJ (2007) Estimating the reliability of MDP policies: A confidence interval approach. In: UNSPECIFIED.

[img]
Preview
PDF
Published Version
Available under License : See the attached license file.

Download (151kB) | Preview
[img] Plain Text (licence)
Available under License : See the attached license file.

Download (1kB)

Abstract

Past approaches for using reinforcement learning to derive dialog control policies have assumed that there was enough collected data to derive a reliable policy. In this paper we present a methodology for numerically constructing confidence intervals for the expected cumulative reward for a learned policy. These intervals are used to (1) better assess the reliability of the expected cumulative reward, and (2) perform a refined comparison between policies derived from different Markov Decision Processes (MDP) models. We applied this methodology to a prior experiment where the goal was to select the best features to include in the MDP statespace. Our results show that while some of the policies developed in the prior work exhibited very large confidence intervals, the policy developed from the best feature set had a much smaller confidence interval and thus showed very high reliability. © 2007 Association for Computational Linguistics.


Share

Citation/Export:
Social Networking:
Share |

Details

Item Type: Conference or Workshop Item (UNSPECIFIED)
Status: Published
Creators/Authors:
CreatorsEmailPitt UsernameORCID
Tetreault, JR
Bohus, D
Litman, DJdlitman@pitt.eduDLITMAN
Centers: University Centers > Learning Research and Development Center (LRDC)
Date: 1 December 2007
Date Type: Publication
Journal or Publication Title: NAACL HLT 2007 - Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Main Conference
Page Range: 276 - 283
Event Type: Conference
Schools and Programs: Dietrich School of Arts and Sciences > Computer Science
Dietrich School of Arts and Sciences > Intelligent Systems
Refereed: Yes
Related URLs:
Date Deposited: 01 Dec 2015 16:13
Last Modified: 02 Feb 2019 15:59
URI: http://d-scholarship.pitt.edu/id/eprint/23208

Metrics

Monthly Views for the past 3 years

Plum Analytics


Actions (login required)

View Item View Item