Algoritmos de aprendizagem por reforço para problemas de otimização multiobjetivo

Oliveira, Thiago Henrique Freire de

Algoritmos de aprendizagem por reforço para problemas de otimização multiobjetivo

Página do item simplificado Estatísticas

dc.contributor.advisor	Doria Neto, Adrião Duarte
dc.contributor.advisorID		pt_BR
dc.contributor.advisorLattes	http://lattes.cnpq.br/1987295209521433	pt_BR
dc.contributor.author	Oliveira, Thiago Henrique Freire de
dc.contributor.authorID		pt_BR
dc.contributor.authorLattes	http://lattes.cnpq.br/0465224964961501	pt_BR
dc.contributor.referees1	Araújo, Aluizio Fausto Ribeiro
dc.contributor.referees1ID		pt_BR
dc.contributor.referees1Lattes	http://lattes.cnpq.br/8715023255304328	pt_BR
dc.contributor.referees2	Araújo, Daniel Sabino Amorim de
dc.contributor.referees2ID		pt_BR
dc.contributor.referees2Lattes	http://lattes.cnpq.br/4744754780165354	pt_BR
dc.contributor.referees3	Lima Júnior, Francisco Chagas de
dc.contributor.referees3ID		pt_BR
dc.contributor.referees3Lattes	http://lattes.cnpq.br/9342041276186254	pt_BR
dc.contributor.referees4	Melo, Jorge Dantas de
dc.contributor.referees4ID		pt_BR
dc.contributor.referees4Lattes	http://lattes.cnpq.br/7325007451912598	pt_BR
dc.contributor.referees5	Fernandes, Marcelo Augusto Costa
dc.contributor.referees5ID		pt_BR
dc.contributor.referees5Lattes	http://lattes.cnpq.br/3475337353676349	pt_BR
dc.date.accessioned	2021-06-22T16:49:50Z
dc.date.available	2021-06-22T16:49:50Z
dc.date.issued	2021-01-11
dc.description.abstract	Multi-objective optimization problems depict real situations and therefore, this class of problems is extremely important. However, even though it has been studied for decades, this class of problems continues to provide challenging situations, especially with the increasing complexity of problems that arise over time. Among all the difficulties that we can find in the optimization of multiple objectives simultaneously, whether conflicting or not, one of the main ones with which existing algorithms and approaches possess is the need for a priori knowledge of the problem, causing a predefined importance for each of the objectives, seeking to establish an isomorphic between weighting and a solution. When dealing with this class of problems through reinforcement learning, two approaches are predominant: single policy (single-policy) and multiple policies (multi-policy). Algorithms and techniques that use the first approach suffer from the need for prior knowledge of the problem, an inherent characteristic of multi-objective problems. The second approach has other difficulties, such as: limiting the set of solutions and high computational cost. Given this presented context, the work proposes two hybrid algorithms, called Q-Managed with reset and Q-Managed without reset. Both are a hybridization of the Q-learning algorithm and the econstraint approach, respectively techniques belonging to reinforcement learning and multi-objective optimization. In summary, the proposed algorithms work as follows: Q-Learning is used for environment exploration, while the econstraint approach is used for the environment dynamic delimitation — restriction in the solution space search —, allowing to keep intact the essence of how the algorithm Q-Learning works. This delimitation has the following purpose: to impose the learning agent can learn other solutions by blocking actions that lead to solutions already learned and without improving them, that is, solutions to which the learning agent has already converged. This blocking actions feature is performed by the figure of a manager, where it is responsible for observing everything that occurs in the environment. Regarding the difference between the proposed algorithms, basically it is the choice of whether or not to take advantage of the knowledge already acquired of the environment after a solution is considered to be learned, that is, the learning agent has converged to a particular solution. As a way of testing the effectiveness of Q-Managed two versions, traditional benchmarks were used, which were also adopted in other works, thus allowing a fairer comparison. Thus, two comparative approaches were adopted, the first of which was through the implementation of third-party algorithms for direct comparison, while the second was done through a common metric to everyone who used the same benchmarks. In all possible tests, the algorithms proposed here proved to be effective, always finding the entire Pareto Front.	pt_BR
dc.description.resumo	Problemas de otimização multiobjetivo retratam situações reais e por isso, esta classe de problemas é extremamente importante. No entanto, mesmo já sendo estudada há décadas, esta classe de problemas continua a proporcionar situações desafiadoras, ainda mais com a crescente complexidade dos problemas que surgem ao longo do tempo. Dentre todas as dificuldades que podemos encontrar na otimização de múltiplos objetivos simultaneamente, sejam eles conflitantes ou não, uma das principais com que os algoritmos e abordagens existentes se possuem é a necessidade de conhecimento a priori do problema, ocasionando uma predefinição de importância para cada um dos objetivos, buscando estabelecer isomórfica entre a ponderação e uma solução. Já quando tratamos esta classe de problemas por meio da aprendizagem por reforço, duas abordagens são predominantes: política única (single-policy) e múltiplas políticas (multi-policy). Algoritmos e técnicas que utilizam a primeira abordagem sofrem com a necessidade de conhecimento prévio do problema, característica inerente dos problemas multiobjetivo. Já a segunda abordagem possui outras dificuldades, tais como: limitação do conjunto de soluções e elevado custo computacional. Diante deste contexto apresentado, o trabalho propõe dois algoritmos híbridos, chamados de Q-Managed with reset e Q-Managed without reset. Ambos são uma hibridização do algoritmo Q-Learning e a abordagem econstraint, respectivamente técnicas pertencentes a aprendizagem por reforço e otimização multiobjetivo. De forma resumida, os algoritmos propostos atuam da seguinte forma: o Q-Learning é utilizado para a exploração do ambiente, enquanto que a abordagem econstraint é utilizada para a delimitação dinâmica do ambiente — restrição da busca no espaço de soluções —, permitindo manter intacta a essência de como o algoritmo Q-Learning atua. Essa delimitação tem a seguinte finalidade: impor que o agente de aprendizagem possa aprender outras soluções por meio do bloqueio de ações que o levem a soluções já aprendidas e sem melhoria das mesmas, ou seja, soluções para qual o agente de aprendizagem já convergiu. Tal característica do bloqueio de ações é realizada pela figura de um supervisor (Manager), onde o mesmo é responsável por observar tudo o que ocorre no ambiente. Com relação a diferença entre os algoritmos propostos, basicamente trata-se da escolha de aproveitar ou não o conhecimento já adquirido do ambiente após uma solução ser considerada aprendida, ou seja, o agente de aprendizado ter convergido para uma determinada solução. Como forma de testar a eficácia das duas versões do Q-Managed, foram utilizados benchmarks tradicionais, os quais também foram adotados em outros trabalhos, permitindo assim uma comparação mais justa. Assim, duas abordagens comparativas foram adotadas, sendo a primeira delas por meio da implementação dos algoritmos de terceiros para uma comparação direta, enquanto que a segunda se deu por meio de uma métrica comum a todos que utilizaram os mesmos benchmarks. Em todos os testes possíveis, os algoritmos aqui propostos se mostraram eficazes, sempre encontrando toda a Fronteira de Pareto.	pt_BR
dc.identifier.citation	OLIVEIRA, Thiago Henrique Freire de. Algoritmos de aprendizagem por reforço para problemas de otimização multiobjetivo. 2021. 86f. Tese (Doutorado em Engenharia Elétrica e de Computação) - Centro de Tecnologia, Universidade Federal do Rio Grande do Norte, Natal, 2021.	pt_BR
dc.identifier.uri	https://repositorio.ufrn.br/handle/123456789/32753
dc.language	pt_BR	pt_BR
dc.publisher	Universidade Federal do Rio Grande do Norte	pt_BR
dc.publisher.country	Brasil	pt_BR
dc.publisher.initials	UFRN	pt_BR
dc.publisher.program	PROGRAMA DE PÓS-GRADUAÇÃO EM ENGENHARIA ELÉTRICA E DE COMPUTAÇÃO	pt_BR
dc.rights	Acesso Aberto	pt_BR
dc.subject	Otimização multiobjetivo	pt_BR
dc.subject	Q-Learning	pt_BR
dc.subject	ε−constraint	pt_BR
dc.subject	Fronteira de Pareto	pt_BR
dc.subject	Hypervolume	pt_BR
dc.subject	Abordagem de política única	pt_BR
dc.title	Algoritmos de aprendizagem por reforço para problemas de otimização multiobjetivo	pt_BR
dc.type	doctoralThesis	pt_BR

Arquivos

Pacote Original

Agora exibindo 1 - 1 de 1

Nome:: Algoritmosaprendizagemreforco_Oliveira_2021.pdf
Tamanho:: 1.31 MB
Formato:: Adobe Portable Document Format

Baixar

Coleções

PPGEE - Doutorado em Engenharia Elétrica e de Computação

SIGAA

Algoritmos de aprendizagem por reforço para problemas de otimização multiobjetivo

Arquivos

Pacote Original

Coleções