RLPD: Reinforcement Learning with Prior Data
Can we simply apply existing off-policy methods to leverage offline data when learning online, without offline RL pre-training or explicit imitation terms that privilege the prior offline data? The primary objective of the authors is to answer this question. However, to do this, the authors had to solve three main problems.
Expensive expert ...
Decoding RECAP: A Theoretical Look at $π^{*}_{0.6}$'s Reinforcement Learning Approach
In this post, I want to explore RECAP(RL with Experience and Corrections via Advantage-conditioned Policies) which incorporates advantage estimation with imitation learning like actor-critic method in RL. In RECAP algorithm, advantage of actions are calculated through value network and feed this information into VLM backbone as improvement indic...
Cantor’s Diagonal Argument: Not All Infinities Are Equal
One of the biggest surprises that I encountered while majoring in applied mathematics was the statement that “the cardinal numbers of $\mathbb{N}$ (the set of natural numbers) and $\mathbb{Z}$ (the set of integers) are equal”. The cardinal number of a set is defined as the size of the set. For finite sets, if a set $A$ is empty then the cardinal...
Basic Guide to build and run ROS 2 Services (Python & C++)
If you don’t know about ROS 2 Topics, go to this page and learn.
Topics are used for data streams (unidirectional), and Services are used for a client/server interactions (bidirectional).
First , Services can work in a synchronous or asynchronous manner. If the service is synchronous, the client sends a Request and blocks until receiving a res...
Basic Guide to build and run ROS 2 Topics (Python & C++)
A Topic is a receiver of a signal from a publisher (node). The publisher is able to send data to the topic while not knowing which subscribers(nodes) receive this data. Similarly, subscribers do not know which nodes send the data to the topic. On top of that, Nodes’ capability of sending data is not restricted to sending to single topic but send...
Basic Guide to build and run ROS 2 Nodes (Python & C++)
Nodes are subprograms in an application, responsible for only one thing. Nodes communicate with each other through topics, services, and parameters. Like OOP, nodes reduce code complexity, and provide low fault tolerance. Even further, nodes can be written in many different programming languages including Python and C++. Nodes should have a sing...
Basic Commands for ROS 2
Run nodes
ros2 run <package name> <node name>
NOTE: “-h” option shows arguments and options like below
ros2 -h
ros2 run -h
ros2 node -h
Checking running nodes
ros2 node list
Check running nodes
ros2 node info <node name>
WARNING: It is not encouraged to run two nodes with identical names. These could run at the same t...
Monte Carlo Tree Search
Intro
Monte Carlo Tree Search (MCTS) works well in practice but poses theoretical challenges.
In this writing, I want to describe MCTS algorithm, and why this algorithm works.
Open-loop planning algorithms like MCTS, can plan future actions from an initial state $s_0$. They assume access to a model of the environment, either stochastically or ...
9 post articles, 2 pages.