<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en"><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://jaehyun-jeong.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://jaehyun-jeong.github.io/" rel="alternate" type="text/html" hreflang="en" /><updated>2026-04-03T17:45:02+09:00</updated><id>https://jaehyun-jeong.github.io/feed.xml</id><title type="html">Jaehyun Jeong</title><subtitle>Graduate in Applied Mathematics | Reinforcement Learning Enthusiast
</subtitle><author><name>Jaehyun Jeong</name><email>jj.for.jaehyun@gmail.com</email></author><entry><title type="html">RLPD: Reinforcement Learning with Prior Data</title><link href="https://jaehyun-jeong.github.io/2026/04/02/rlpd.html" rel="alternate" type="text/html" title="RLPD: Reinforcement Learning with Prior Data" /><published>2026-04-02T00:00:00+09:00</published><updated>2026-04-02T00:00:00+09:00</updated><id>https://jaehyun-jeong.github.io/2026/04/02/rlpd</id><content type="html" xml:base="https://jaehyun-jeong.github.io/2026/04/02/rlpd.html"><![CDATA[<p>Can we simply apply existing off-policy methods to leverage offline data when learning online, without offline RL pre-training or explicit imitation terms that privilege the prior offline data? The primary objective of the authors is to answer this question. However, to do this, the authors had to solve three main problems.</p>

<ul>
  <li>Expensive expert data from real robots</li>
  <li>Sparsity of reward signal in robotics</li>
  <li>Poor sample efficiency with offline data</li>
</ul>

<p>There have been methods to address these problems with pre-training and imitation-term. Yet, they are not sample-efficient and require reliable data source and doing so is very expensive. On top of that, these algorithms are very sensitive to OOD(out-of-distribution) due to their learning dynamics.</p>

<p>With these challenges in mind, RLPD provides robustness in dataset quality and sample-efficiency. More precisely, RLPD can train with suboptimal data and expert data and even with off-policy data. The authors propose three distinct methods to mitigate these problems based on SAC(Soft Actor-Critic). These are called the <strong>“Symmetric Sampling”</strong>, <strong>“Layer Normalization”</strong>, and <strong>“Random Ensemble Distillation”</strong>.</p>

<h2 id="symmetric-sampling">Symmetric Sampling</h2>

<p><img src="/assets/images/2026-04-02-rlpd/symmetric-sampling.png" alt="" /></p>

<p>Overall idea of symmetric sampling is extremely simple. It constructs each batch with 50% samples from the replay buffer and 50% from the offline data buffer. In spite of this simplicity, it resolves the OOD problem in a stable manner, alleviating the restriction of data with sub-optimal trajectories.</p>

<ul>
  <li><strong>Offline data</strong>: Expert Demonstration (small) + Sub-optimal trajectories (large) collected by sub-optimal policies.</li>
</ul>

<h2 id="layer-normalization">Layer Normalization</h2>

<p><img src="/assets/images/2026-04-02-rlpd/layer-normalization.png" alt="" /></p>

<p>Batch normalization normalizes each feature value through samples. Unlike batch normalization, layer normalization normalizes each sample’s values through layer outputs. To provide further clarification, layer normalization calculates mean and standard deviation from a sample’s activations across the layer’s outputs, rather than across a batch.</p>

<p>Through this method, RLPD can mitigate Q-value overestimation problem in OOD observations. This is because layer normalization constrains the Q-value within the weight norm, as shown below.</p>

<p>\(Q^*(s, a) = \sum_{s', r} p(s', r | s, a) \left[ r + \gamma \max_{a'} Q^*(s', a') \right]\)<br />
<em>Bellman optimal equation triggers Q-value overestimation</em></p>

<p>\(\begin{aligned}
\|Q_{\theta,w}(s, a)\| &amp;= \|w^T \mathrm{relu}(\psi_\theta(s, a))\| \\
&amp;\le \|w\| \|\mathrm{relu}(\psi_\theta(s, a))\| \le \|w\| \|\psi(s, a)\| \\
&amp;\le \|w\| (\because Layer Norm)
\end{aligned}\)<br />
<em>Layer normalization constrains Q-values within the weight norm</em></p>

<h2 id="random-ensemble-distillation--high-utdupdate-to-data-ratio">Random Ensemble Distillation + High UTD(update-to-data) ratio</h2>

<p><img src="/assets/images/2026-04-02-rlpd/random-ensemble-distillation.png" alt="" /></p>

<p>UTD means the number of updates per batch. As a result of high UTD, the algorithm can use data more efficiently, and it means more sample-efficient learning. Ironically, other studies have shown that it can lead to statistical overfitting (Li et al., 2022) due to repeated updates on the same samples. To ameliorate this, authors have suggested to use random ensemble distillation.</p>

<p>Random ensemble distillation addresses overfitting similarly to DDQN and TD3, by maintaining multiple value functions. In the context of random ensemble distillation, it maintains an ensemble of $E$ Q-models, randomly selects 2 for the update step, and averages all $E$ Q-models when updating the policy to estimate the true Q-value</p>

<h2 id="rlpdreinforcement-learning-with-prior-data-algorithm">RLPD(Reinforcement Learning with Prior Data) Algorithm</h2>

<p><img src="/assets/images/2026-04-02-rlpd/rlpd.png" alt="" /></p>

<p>Green lines refer shared methods for all tasks and purple lines are task specific methods. The purple lines are optional and can be applied depending on the task.</p>

<p>As you can see in the pseudocode, the algorithm is a combination of SAC, TD3, and the features I introduced above. This incorporates the clipped double Q-learning from TD3 and entropy maximization from SAC.</p>

<h2 id="experiments">Experiments</h2>

<p>In the experiments, the authors tried to answer the following questions.</p>

<ul>
  <li>Is RLPD competitive with prior work despite using <em>no pre-training nor having explicit constraints</em>?</li>
  <li>Does RLPD transfer to <em>pixel-based</em> environments?</li>
  <li>Does LayerNorm <em>mitigate value divergence</em>?</li>
</ul>

<p>Let’s see the detailed results and the analysis.</p>

<h4 id="rlpds-competitiveness-with-prior-data-without-pre-training-nor-explicit-constraints">RLPD’s competitiveness with prior data without <em>pre-training nor explicit constraints</em>?</h4>

<p align="center"><img src="/assets/images/2026-04-02-rlpd/fig4.png" width="70%" /></p>
<p><em>SACfD initializes the online replay buffer with the offline data</em></p>

<p>RLPD achieves 2.5$\times$ the performance on the sparse Adroit ‘Door’ task.</p>

<h4 id="does-rlpd-transfer-to-pixels">Does RLPD transfer to pixels?</h4>

<p><em>V-D4RL (Lu et al., 2022), an offline dataset with only pixel observations.</em></p>

<p align="center"><img src="/assets/images/2026-04-02-rlpd/fig5.png" width="70%" /></p>

<p>To evaluate the performance in pixel-based environments, they applied RLPD to V-D4RL(DeepMind Control Suite with visual observations only). In these environments, the authors proved that RLPD provides consistent improvements over online approaches, greatly improving over a BC baseline as well.</p>

<p align="center"><img src="/assets/images/2026-04-02-rlpd/fig6.png" width="40%" /></p>

<p>Also, they demonstrate a remarkable improvement in performance with the offline dataset and high UTD(update-to-data) ratio. It is worth noting that UTD=10 means 10 times updates per batch.</p>

<h4 id="does-layernorm-mitigate-value-divergence">Does LayerNorm mitigate value divergence?</h4>

<p align="center"><img src="/assets/images/2026-04-02-rlpd/fig7.png" width="70%" /></p>

<p>In Adroit domain, LayerNorm plays a crucial role for strong performance. Excluding LayerNorm escalates variance and reduces mean performance. In addition, in AntMaze and Humanoid Walk environments, LayerNorm diminishes excessive extrapolation.</p>

<h2 id="references">References</h2>

<ul>
  <li>Ball, P. J., Smith, L., Kostrikov, I., &amp; Levine, S. (2023). Efficient online reinforcement learning with offline data. In International Conference on Machine Learning (pp. 1577-1594). PMLR.</li>
</ul>]]></content><author><name>Jaehyun Jeong</name><email>jj.for.jaehyun@gmail.com</email></author><category term="RL" /><summary type="html"><![CDATA[Can we simply apply existing off-policy methods to leverage offline data when learning online, without offline RL pre-training or explicit imitation terms that privilege the prior offline data? The primary objective of the authors is to answer this question. However, to do this, the authors had to solve three main problems. Expensive expert data from real robots Sparsity of reward signal in robotics Poor sample efficiency with offline data There have been methods to address these problems with pre-training and imitation-term. Yet, they are not sample-efficient and require reliable data source and doing so is very expensive. On top of that, these algorithms are very sensitive to OOD(out-of-distribution) due to their learning dynamics. With these challenges in mind, RLPD provides robustness in dataset quality and sample-efficiency. More precisely, RLPD can train with suboptimal data and expert data and even with off-policy data. The authors propose three distinct methods to mitigate these problems based on SAC(Soft Actor-Critic). These are called the “Symmetric Sampling”, “Layer Normalization”, and “Random Ensemble Distillation”. Symmetric Sampling Overall idea of symmetric sampling is extremely simple. It constructs each batch with 50% samples from the replay buffer and 50% from the offline data buffer. In spite of this simplicity, it resolves the OOD problem in a stable manner, alleviating the restriction of data with sub-optimal trajectories. Offline data: Expert Demonstration (small) + Sub-optimal trajectories (large) collected by sub-optimal policies. Layer Normalization Batch normalization normalizes each feature value through samples. Unlike batch normalization, layer normalization normalizes each sample’s values through layer outputs. To provide further clarification, layer normalization calculates mean and standard deviation from a sample’s activations across the layer’s outputs, rather than across a batch. Through this method, RLPD can mitigate Q-value overestimation problem in OOD observations. This is because layer normalization constrains the Q-value within the weight norm, as shown below. \(Q^*(s, a) = \sum_{s', r} p(s', r | s, a) \left[ r + \gamma \max_{a'} Q^*(s', a') \right]\) Bellman optimal equation triggers Q-value overestimation \(\begin{aligned} \|Q_{\theta,w}(s, a)\| &amp;= \|w^T \mathrm{relu}(\psi_\theta(s, a))\| \\ &amp;\le \|w\| \|\mathrm{relu}(\psi_\theta(s, a))\| \le \|w\| \|\psi(s, a)\| \\ &amp;\le \|w\| (\because Layer Norm) \end{aligned}\) Layer normalization constrains Q-values within the weight norm Random Ensemble Distillation + High UTD(update-to-data) ratio UTD means the number of updates per batch. As a result of high UTD, the algorithm can use data more efficiently, and it means more sample-efficient learning. Ironically, other studies have shown that it can lead to statistical overfitting (Li et al., 2022) due to repeated updates on the same samples. To ameliorate this, authors have suggested to use random ensemble distillation. Random ensemble distillation addresses overfitting similarly to DDQN and TD3, by maintaining multiple value functions. In the context of random ensemble distillation, it maintains an ensemble of $E$ Q-models, randomly selects 2 for the update step, and averages all $E$ Q-models when updating the policy to estimate the true Q-value RLPD(Reinforcement Learning with Prior Data) Algorithm Green lines refer shared methods for all tasks and purple lines are task specific methods. The purple lines are optional and can be applied depending on the task. As you can see in the pseudocode, the algorithm is a combination of SAC, TD3, and the features I introduced above. This incorporates the clipped double Q-learning from TD3 and entropy maximization from SAC. Experiments In the experiments, the authors tried to answer the following questions. Is RLPD competitive with prior work despite using no pre-training nor having explicit constraints? Does RLPD transfer to pixel-based environments? Does LayerNorm mitigate value divergence? Let’s see the detailed results and the analysis. RLPD’s competitiveness with prior data without pre-training nor explicit constraints? SACfD initializes the online replay buffer with the offline data RLPD achieves 2.5$\times$ the performance on the sparse Adroit ‘Door’ task. Does RLPD transfer to pixels? V-D4RL (Lu et al., 2022), an offline dataset with only pixel observations. To evaluate the performance in pixel-based environments, they applied RLPD to V-D4RL(DeepMind Control Suite with visual observations only). In these environments, the authors proved that RLPD provides consistent improvements over online approaches, greatly improving over a BC baseline as well. Also, they demonstrate a remarkable improvement in performance with the offline dataset and high UTD(update-to-data) ratio. It is worth noting that UTD=10 means 10 times updates per batch. Does LayerNorm mitigate value divergence? In Adroit domain, LayerNorm plays a crucial role for strong performance. Excluding LayerNorm escalates variance and reduces mean performance. In addition, in AntMaze and Humanoid Walk environments, LayerNorm diminishes excessive extrapolation. References Ball, P. J., Smith, L., Kostrikov, I., &amp; Levine, S. (2023). Efficient online reinforcement learning with offline data. In International Conference on Machine Learning (pp. 1577-1594). PMLR.]]></summary></entry><entry><title type="html">Decoding RECAP: A Theoretical Look at $π^{*}_{0.6}$’s Reinforcement Learning Approach</title><link href="https://jaehyun-jeong.github.io/2026/03/29/pi-star-0.6.html" rel="alternate" type="text/html" title="Decoding RECAP: A Theoretical Look at $π^{*}_{0.6}$’s Reinforcement Learning Approach" /><published>2026-03-29T00:00:00+09:00</published><updated>2026-03-29T00:00:00+09:00</updated><id>https://jaehyun-jeong.github.io/2026/03/29/pi-star-0.6</id><content type="html" xml:base="https://jaehyun-jeong.github.io/2026/03/29/pi-star-0.6.html"><![CDATA[<p>In this post, I want to explore RECAP(RL with Experience and Corrections via Advantage-conditioned Policies) which incorporates advantage estimation with imitation learning like actor-critic method in RL. In RECAP algorithm, advantage of actions are calculated through value network and feed this information into VLM backbone as improvement indicator. I believe that’s the overall concept of this method. However, this simple idea addresses the fundamental problem of combining RL with flow matching.</p>

<p>To begin with, I will first explain why you to know why we need RL in the pi 0.6 model and why combining RL with pi 0.6 was challenging. On top of that, I want to talk about the details of this method through equations.</p>

<h2 id="why-rl">Why RL?</h2>

<p>In the field of Physical Intelligence, pretrained models have shown performance improvements in a number of tasks such as folding a laundry and assemble a box. Even so, pretraining + fine tune strategy was highly sensitive to the environment setting while having performance ceiling. In addition to that, if the robot encounters an unseen observations, it is susceptible to distribution shift due to the lack of data. However, applying RL can overcome these problems with human-intervention and self-experience. This is because that RL method can collect and learn from existing policy and human intervention.</p>

<h2 id="difficulties-in-rl--flow-matching-approach">Difficulties in RL + flow matching approach</h2>

<p>To understand why the flow matching is hard to combine with RL, we need to understand the most popular RL method’s approach.</p>

<h3 id="probability-distribution">Probability distribution</h3>

<p>PPO and SAC are the most common method in RL, and below is the key equations in these two algorithms. As you can verify, they needs $ \pi_{\theta}(a_t \mid s_t) $, which is the action distribution given by state $s_t$.</p>

<ul>
  <li>
    <p>\(L^{CPI}(\theta) = \hat{\mathbb{E}}_{t} \left[ \frac{\pi_{\theta}(a_{t} \mid s_{t})}{\pi_{\theta_{\mathrm{old}}}(a_{t} \mid s_{t})} \hat{A}_{t} \right] = \hat{\mathbb{E}}_{t} \left[ r_{t}(\theta) \hat{A}_{t} \right]\)<br />
<em>PPO’s loss function</em></p>
  </li>
  <li>
    <p>\(J_{\pi}(\phi)=\mathbb{E}_{s_{t}\sim\mathcal{D}}[\mathbb{E}_{a_{t}\sim\pi_{\phi}}[\alpha \log(\pi_{\phi}(a_{t}|s_{t}))-Q_{\theta}(s_{t},a_{t})]]\)<br />
<em>SAC’s objective function</em></p>
  </li>
</ul>

<p>As you already know, flow matching method generates continuous actions through integration and it is not efficient to compute exact action log-probability. This is the primary reason why simple RL + flow matching does not work.</p>

<h2 id="theoretical-details-of-recap">Theoretical details of RECAP</h2>

<p><img src="/assets/images/2026-03-18-pi-star-0.6/RECAP.png" alt="" /></p>

<p>However, the authors proposed a straightforward approach to avoid this problem. <strong>They just implemented value network and they labeled actions as “positive” or “negative” with this value network.</strong> To be more precise, value network returns value which is the expected cumulative sum of rewards, and calculate advantage(how good the action is compared to expected value), and, finally, it categorizes the top 30% of advantages as positive and the bottom 30% as negative, passing this signal to the vlm backbone.</p>

<p>To train this value function they used reward function like below. I also note that success and failure are decided by human. $-C_{\text{fail}}$ is a large enough negative value so that the value can be distinguished from positive observation, and the agent receives a -1 reward at every step to encourage reaching the goal as quickly as possible.</p>

\[r_t =
\begin{cases}
0 &amp; \text{if } t = T \text{ and success} \\
-C_{\text{fail}} &amp; \text{if } t = T \text{ and failure} \\
-1 &amp; \text{otherwise.}
\end{cases}\]

<p>When they calculate the value function, they used <strong>distributional value function</strong>. It’s slightly different from a standard value function which returns a real number. They divided values into $B = 201$ bins and trained the value function like a bin classifier, and to calculate the real value it calculates expected value. The detailed equations are as follows.</p>

\[\begin{flalign}
&amp; \min_{\phi} \mathbb{E}_{\tau \in \mathcal{D}} \left[ \sum_{\mathbf{o}_{t} \in \tau} H(R_{t}^{B}(\tau), p_{\phi}(V \mid \mathbf{o}_{t}, \ell)) \right] \\
&amp; V(\mathbf{o}_{t}, \ell) = \sum_{b=1}^{B} p_{\phi}(V=b \mid \mathbf{o}_{t}, \ell) \cdot v(b) \\
&amp; v(b) = V_{\min} + (b - 1) \frac{V_{\max} - V_{\min}}{B - 1} \quad \text{for } b \in \{1, 2, \dots, B\}
\end{flalign}\]

<p>Once the value function is trained, advantages are computed and the top 30% are labeled as “positive”.</p>

\[A^{\pi}(\mathbf{o}_t, \mathbf{a}_t) = \mathbb{E}_{\rho_{\pi}(\tau)} \left[ \sum_{t'=t}^{t+N-1} r_{t'} + V^{\pi}(\mathbf{o}_{t+N}) \right] - V^{\pi}(\mathbf{o}_t)\]

<p>By labeling observations as positive or negative, the model learns to distinguish good actions from bad ones without computing explicit action log-probabilities. In conclusion, RECAP elegantly circumvents the core incompatibility between flow matching and standard RL objectives.</p>

<h2 id="real-application-of-recap">Real application of RECAP</h2>

<pre class="pseudocode">
\begin{algorithm}
\caption{RL with Experience and Corrections via Advantage-conditioned Policies (RECAP)}
\begin{algorithmic}
\REQUIRE multi-task demonstration dataset $\mathcal{D}_{\text{demo}}$
\STATE Train $V_{\text{pre}}$ on $\mathcal{D}_{\text{demo}}$ using Eq. 1
\STATE Train $\pi_{\text{pre}}$ on $\mathcal{D}_{\text{demo}}$ using Eq. 3 and $V_{\text{pre}}$
\STATE Initialize $\mathcal{D}_\ell$ with demonstrations for $\ell$
\STATE Train $V_\ell^0$ from $V_{\text{pre}}$ on $\mathcal{D}_\ell$ using Eq. 1
\STATE Train $\pi_\ell^0$ from $\pi_{\text{pre}}$ on $\mathcal{D}_\ell$ using Eq. 3 and $V_\ell^0$
\FOR{$k = 1$ to $K$}
  \STATE Collect data with $\pi_\ell^{k-1}$, add it to $\mathcal{D}_\ell$
  \STATE Train $V_\ell^k$ from $V_{\text{pre}}$ on $\mathcal{D}_\ell$ using Eq. 1
  \STATE Train $\pi_\ell^k$ from $\pi_{\text{pre}}$ on $\mathcal{D}_\ell$ using Eq. 3 and $V_\ell^k$
\ENDFOR
\end{algorithmic}
\end{algorithm}
</pre>

<p>The algorithm starts with training value network in the pretraining data, and train the policy. After that, it fine-tunes both the value function and policy for each task. Then in the for loop with k they collect more data with the policy while intervening bad actions by human. In the collecting process in the loop, robots collect data with their policy but a human corrects the robot’s actions when they appear unsafe or clearly wrong.</p>

<p>In line 7, human interventions are always labeled as positive, under the assumption that actions provided by a human are correct and other actions that the policy has generated are classified with advantage function and if it’s in 30%, actions are positive. Otherwise, actions are negative.</p>

<h2 id="conclusion">Conclusion</h2>

<p>At first glance, I thought that it’s not like RL since it pretrains policy with imitation learning and, even at the last, collecting data contains human intervention. On top of that, <strong>it’s not a reward maximizing algorithm</strong>. Reward function is just for value function training. I believe that it explains how hard to reach a goal without guidance of human. With full RL method like PPO and SAC, humanoid walking environment can be solved with motion which is far from natural human gait. In the case of robotics, their tasks like pick&amp;place and folding laundry are very difficult compared to walking, and they give very sparse rewards since the reward is only provided when the task is done. As a result, researchers have made sophisticated imitation learning method inspired by RL method, and fully automated data collecting and training loop is the future challenge humanity must solve for general-purpose robotic systems.</p>

<h2 id="references">References</h2>

<ol>
  <li>
    <p>Physical Intelligence et al., “$\pi^{*}_{0.6}$: a VLA That Learns From Experience,” <em>arXiv:2511.14759</em>, 2025. <a href="https://arxiv.org/abs/2511.14759">https://arxiv.org/abs/2511.14759</a></p>
  </li>
  <li>
    <p>Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O., “Proximal Policy Optimization Algorithms,” <em>arXiv:1707.06347</em>, 2017. <a href="https://arxiv.org/abs/1707.06347">https://arxiv.org/abs/1707.06347</a></p>
  </li>
  <li>
    <p>Haarnoja, T., Zhou, A., Hartikainen, K., Tucker, G., Ha, S., Tan, J., Kumar, V., Zhu, H., Gupta, A., Abbeel, P., and Levine, S., “Soft Actor-Critic Algorithms and Applications,” <em>arXiv:1812.05905</em>, 2018. <a href="https://arxiv.org/abs/1812.05905">https://arxiv.org/abs/1812.05905</a></p>
  </li>
</ol>]]></content><author><name>Jaehyun Jeong</name><email>jj.for.jaehyun@gmail.com</email></author><category term="RL" /><summary type="html"><![CDATA[In this post, I want to explore RECAP(RL with Experience and Corrections via Advantage-conditioned Policies) which incorporates advantage estimation with imitation learning like actor-critic method in RL. In RECAP algorithm, advantage of actions are calculated through value network and feed this information into VLM backbone as improvement indicator. I believe that’s the overall concept of this method. However, this simple idea addresses the fundamental problem of combining RL with flow matching. To begin with, I will first explain why you to know why we need RL in the pi 0.6 model and why combining RL with pi 0.6 was challenging. On top of that, I want to talk about the details of this method through equations. Why RL? In the field of Physical Intelligence, pretrained models have shown performance improvements in a number of tasks such as folding a laundry and assemble a box. Even so, pretraining + fine tune strategy was highly sensitive to the environment setting while having performance ceiling. In addition to that, if the robot encounters an unseen observations, it is susceptible to distribution shift due to the lack of data. However, applying RL can overcome these problems with human-intervention and self-experience. This is because that RL method can collect and learn from existing policy and human intervention. Difficulties in RL + flow matching approach To understand why the flow matching is hard to combine with RL, we need to understand the most popular RL method’s approach. Probability distribution PPO and SAC are the most common method in RL, and below is the key equations in these two algorithms. As you can verify, they needs $ \pi_{\theta}(a_t \mid s_t) $, which is the action distribution given by state $s_t$. \(L^{CPI}(\theta) = \hat{\mathbb{E}}_{t} \left[ \frac{\pi_{\theta}(a_{t} \mid s_{t})}{\pi_{\theta_{\mathrm{old}}}(a_{t} \mid s_{t})} \hat{A}_{t} \right] = \hat{\mathbb{E}}_{t} \left[ r_{t}(\theta) \hat{A}_{t} \right]\) PPO’s loss function \(J_{\pi}(\phi)=\mathbb{E}_{s_{t}\sim\mathcal{D}}[\mathbb{E}_{a_{t}\sim\pi_{\phi}}[\alpha \log(\pi_{\phi}(a_{t}|s_{t}))-Q_{\theta}(s_{t},a_{t})]]\) SAC’s objective function As you already know, flow matching method generates continuous actions through integration and it is not efficient to compute exact action log-probability. This is the primary reason why simple RL + flow matching does not work. Theoretical details of RECAP However, the authors proposed a straightforward approach to avoid this problem. They just implemented value network and they labeled actions as “positive” or “negative” with this value network. To be more precise, value network returns value which is the expected cumulative sum of rewards, and calculate advantage(how good the action is compared to expected value), and, finally, it categorizes the top 30% of advantages as positive and the bottom 30% as negative, passing this signal to the vlm backbone. To train this value function they used reward function like below. I also note that success and failure are decided by human. $-C_{\text{fail}}$ is a large enough negative value so that the value can be distinguished from positive observation, and the agent receives a -1 reward at every step to encourage reaching the goal as quickly as possible. \[r_t = \begin{cases} 0 &amp; \text{if } t = T \text{ and success} \\ -C_{\text{fail}} &amp; \text{if } t = T \text{ and failure} \\ -1 &amp; \text{otherwise.} \end{cases}\] When they calculate the value function, they used distributional value function. It’s slightly different from a standard value function which returns a real number. They divided values into $B = 201$ bins and trained the value function like a bin classifier, and to calculate the real value it calculates expected value. The detailed equations are as follows. \[\begin{flalign} &amp; \min_{\phi} \mathbb{E}_{\tau \in \mathcal{D}} \left[ \sum_{\mathbf{o}_{t} \in \tau} H(R_{t}^{B}(\tau), p_{\phi}(V \mid \mathbf{o}_{t}, \ell)) \right] \\ &amp; V(\mathbf{o}_{t}, \ell) = \sum_{b=1}^{B} p_{\phi}(V=b \mid \mathbf{o}_{t}, \ell) \cdot v(b) \\ &amp; v(b) = V_{\min} + (b - 1) \frac{V_{\max} - V_{\min}}{B - 1} \quad \text{for } b \in \{1, 2, \dots, B\} \end{flalign}\] Once the value function is trained, advantages are computed and the top 30% are labeled as “positive”. \[A^{\pi}(\mathbf{o}_t, \mathbf{a}_t) = \mathbb{E}_{\rho_{\pi}(\tau)} \left[ \sum_{t'=t}^{t+N-1} r_{t'} + V^{\pi}(\mathbf{o}_{t+N}) \right] - V^{\pi}(\mathbf{o}_t)\] By labeling observations as positive or negative, the model learns to distinguish good actions from bad ones without computing explicit action log-probabilities. In conclusion, RECAP elegantly circumvents the core incompatibility between flow matching and standard RL objectives. Real application of RECAP \begin{algorithm} \caption{RL with Experience and Corrections via Advantage-conditioned Policies (RECAP)} \begin{algorithmic} \REQUIRE multi-task demonstration dataset $\mathcal{D}_{\text{demo}}$ \STATE Train $V_{\text{pre}}$ on $\mathcal{D}_{\text{demo}}$ using Eq. 1 \STATE Train $\pi_{\text{pre}}$ on $\mathcal{D}_{\text{demo}}$ using Eq. 3 and $V_{\text{pre}}$ \STATE Initialize $\mathcal{D}_\ell$ with demonstrations for $\ell$ \STATE Train $V_\ell^0$ from $V_{\text{pre}}$ on $\mathcal{D}_\ell$ using Eq. 1 \STATE Train $\pi_\ell^0$ from $\pi_{\text{pre}}$ on $\mathcal{D}_\ell$ using Eq. 3 and $V_\ell^0$ \FOR{$k = 1$ to $K$} \STATE Collect data with $\pi_\ell^{k-1}$, add it to $\mathcal{D}_\ell$ \STATE Train $V_\ell^k$ from $V_{\text{pre}}$ on $\mathcal{D}_\ell$ using Eq. 1 \STATE Train $\pi_\ell^k$ from $\pi_{\text{pre}}$ on $\mathcal{D}_\ell$ using Eq. 3 and $V_\ell^k$ \ENDFOR \end{algorithmic} \end{algorithm} The algorithm starts with training value network in the pretraining data, and train the policy. After that, it fine-tunes both the value function and policy for each task. Then in the for loop with k they collect more data with the policy while intervening bad actions by human. In the collecting process in the loop, robots collect data with their policy but a human corrects the robot’s actions when they appear unsafe or clearly wrong. In line 7, human interventions are always labeled as positive, under the assumption that actions provided by a human are correct and other actions that the policy has generated are classified with advantage function and if it’s in 30%, actions are positive. Otherwise, actions are negative. Conclusion At first glance, I thought that it’s not like RL since it pretrains policy with imitation learning and, even at the last, collecting data contains human intervention. On top of that, it’s not a reward maximizing algorithm. Reward function is just for value function training. I believe that it explains how hard to reach a goal without guidance of human. With full RL method like PPO and SAC, humanoid walking environment can be solved with motion which is far from natural human gait. In the case of robotics, their tasks like pick&amp;place and folding laundry are very difficult compared to walking, and they give very sparse rewards since the reward is only provided when the task is done. As a result, researchers have made sophisticated imitation learning method inspired by RL method, and fully automated data collecting and training loop is the future challenge humanity must solve for general-purpose robotic systems. References Physical Intelligence et al., “$\pi^{*}_{0.6}$: a VLA That Learns From Experience,” arXiv:2511.14759, 2025. https://arxiv.org/abs/2511.14759 Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O., “Proximal Policy Optimization Algorithms,” arXiv:1707.06347, 2017. https://arxiv.org/abs/1707.06347 Haarnoja, T., Zhou, A., Hartikainen, K., Tucker, G., Ha, S., Tan, J., Kumar, V., Zhu, H., Gupta, A., Abbeel, P., and Levine, S., “Soft Actor-Critic Algorithms and Applications,” arXiv:1812.05905, 2018. https://arxiv.org/abs/1812.05905]]></summary></entry><entry><title type="html">Cantor’s Diagonal Argument: Not All Infinities Are Equal</title><link href="https://jaehyun-jeong.github.io/2026/01/13/cantors-theorem.html" rel="alternate" type="text/html" title="Cantor’s Diagonal Argument: Not All Infinities Are Equal" /><published>2026-01-13T00:00:00+09:00</published><updated>2026-01-13T00:00:00+09:00</updated><id>https://jaehyun-jeong.github.io/2026/01/13/cantors-theorem</id><content type="html" xml:base="https://jaehyun-jeong.github.io/2026/01/13/cantors-theorem.html"><![CDATA[<p>One of the biggest surprises that I encountered while majoring in applied mathematics was the statement that “the cardinal numbers of $\mathbb{N}$ (the set of natural numbers) and $\mathbb{Z}$ (the set of integers) are equal”. The cardinal number of a set is defined as the size of the set. For finite sets, if a set $A$ is empty then the cardinal number of $A$ is 0, and if the set $A$ has $k$ elements then the cardinal number of $A$ is $k$. However, for infinite sets $A$ and $B$, their cardinal numbers are equal if and only if there exists a one-to-one correspondence (bijective) from $A$ to $B$. We can deduce that aforementioned statement is true from the definition of cardinal number. The really interesting part of this theorem is that the mismatch between our intuition and a rigorous mathematical concepts, since our brains tend to believe, Intuitively, that the size of $\mathbb{Z}$ should be twice the size of $\mathbb{N}$, plus one.</p>

<p>Additionally, it is also worth noting that “the cardinal numbers of $\mathbb{N}$ and $\mathbb{R}$ are different”. This also blew my mind, and it explained how countable and uncountable sets can be different. In this post, I want to prove that the cardinal numbers of $\mathbb{N}$ and $\mathbb{R}$ are different by proving the theorems below.</p>

<ul>
  <li>An open interval (0, 1) is not denumerable.</li>
  <li>The sets (0, 1) and $\mathbb{R}$ are equipotent. (have the same size)</li>
</ul>

<h2 id="def-denumerable-set">Def) Denumerable set</h2>

<p>A set $S$ is denumerable if and only if there exists a bijective function from $S$ to $\mathbb{N}$.</p>

<p><strong>NOTE: in this case, we denote $S \sim \mathbb{N}$, and we say the sets A and B are equipotent.</strong></p>

<h2 id="th-1-the-open-unit-interval-0-1-of-real-numbers-is-nondenumerable">Th 1) The open unit interval (0, 1) of real numbers is nondenumerable</h2>

<p>$
\underline{\text{proof}}
$</p>

<p>$
\forall x \in (0, 1) \quad \exists x_1, x_2, x_3 \dots \in {0, 1, \dots, 9} \quad \text{s.t.} \quad x = 0.x_1 x_2 x_3 \dots
$</p>

<p>$
(\text{For example, } x = \frac{1}{3} = 0.333\dots \implies x_1 = 3 \land x_2 = 3 \land \dots)
$</p>

<p>$
\text{we will treat repeating zeros } (\text{such as } \frac{1}{4} = 0.25000\dots) \text{ by decreasing the last non-zero digit by 1 }
$</p>

<p>$
(\text{as } \frac{1}{4} = 0.24999\dots)
$</p>

<p>$
\text{Under this agreement, assume that } (0, 1) \text{ is denumerable so that}
$</p>

<p>$
\exists \text{ bijective } f: \mathbb{N} \to (0, 1) \quad \text{s.t.}
$</p>

<p>$
f(1) = 0.x_{11} x_{12} x_{13} \dots
$</p>

<p>$
f(2) = 0.x_{21} x_{22} x_{23} \dots
$</p>

<p>$
f(3) = 0.x_{31} x_{32} x_{33} \dots
$</p>

<p>$
\vdots
$</p>

<p>$
f(k) = 0.x_{k1} x_{k2} x_{k3} \dots
$</p>

<p>$
\text{and let } z \in (0, 1) \text{ be defined as follows:}
$</p>

\[z = 0.z_1 z_2 z_3 \dots \quad \text{s.t.} \quad \forall k \in \mathbb{N}, \quad
\begin{cases}
z_k = 1 &amp; (x_{kk} \neq 1) \\
z_k = 2 &amp; (x_{kk} = 1)
\end{cases}\]

<p>$
\text{then } \forall n \in \mathbb{N}, \quad f(n) \neq z \quad (\because \forall n \in \mathbb{N}, \quad x_{nn} \neq z_n \implies f(n) \neq z)
$</p>

<p>$
\therefore \text{This contradicts our assumption} \quad \blacksquare
$</p>

<h2 id="th-2-open-intervals-0-1-and--1-1-are-equipotent">Th 2) Open intervals (0, 1) and (-1, 1) are equipotent.</h2>

<p>$
1) \ (0,1) \sim (-1,1)
$</p>

<p>$
\underline{\text{proof}}
$</p>

<p>$
\text{The function } f: (0,1) \to (-1,1) \text{ given by } f(x) = 2x - 1 \text{ is one-to-one correspondence.}
$</p>

<p>$
\because
$</p>

<p>$
\forall x_1, x_2 \in (0,1) \quad \text{s.t.} \quad x_1 \neq x_2
$</p>

<p>$
f(x_1) = 2x_1 - 1 \neq 2x_2 - 1 = f(x_2) \  (\because 2x_1 - 1 = 2x_2 - 1 \iff 2x_1 = 2x_2 \iff x_1 = x_2)
$</p>

<p>$
\therefore f \text{ is injective}
$</p>

<p>$
\forall y \in (-1,1) \ \exists x \in (0,1) \quad \text{s.t.} \quad
$</p>

<p>$
y = 2x - 1 \ (\because -1 \lt y \lt 1 \Rightarrow 0 \lt y+1 \lt 2 \Rightarrow 0 \lt \frac{y+1}{2} \lt 1)
$</p>

<p>$
\therefore f \text{ is surjective} \quad \blacksquare
$</p>

<h2 id="th-3-the-open-intervals--1-1-and-mathbbr-are-equipotent">Th 3) The open intervals (-1, 1) and $\mathbb{R}$ are equipotent.</h2>

<p>$
2) \ (-1,1) \sim \mathbb{R}
$</p>

<p>$
\underline{\text{proof}}
$</p>

<p>$
\text{The function } g: (-1,1) \to \mathbb{R} \text{ given by } g(x) = \tan(\frac{\pi}{2}x) \text{ is one-to-one correspondence}
$</p>

<p>$
\because
$</p>

<p>$
\forall x_1, x_2 \in (-1,1) \quad \text{s.t.} \quad x_1 \neq x_2 
$</p>

<p>$
g(x_1) \neq g(x_2) \ (\because \tan(x) \text{ is one-to-one correspondence})
$</p>

<p>$
\therefore g \text{ is injective}
$</p>

<p>$
\forall y \in \mathbb{R} \ \exists x \in (-1,1) \quad \text{s.t.} \quad g(x) = y
$</p>

<p>$
\left( \because \exists x’ \quad \text{s.t.} \quad \tan(x’)=y \text{ and } \frac{\pi}{2}x = x’ \Rightarrow x = \frac{2}{\pi}x’ \right)
$</p>

<p>$
\therefore g \text{ is surjective} \quad \blacksquare
$</p>

<h2 id="conclusion">Conclusion</h2>

<p>$
\text{By Th 1, Th 2, and Th 3}
$</p>

<p>$
\mathbb{N} \nsim (0,1) \text{ and } (0,1) \sim \mathbb{R}
$</p>

<p>$
\therefore \mathbb{N} \nsim \mathbb{R} \quad \blacksquare
$</p>]]></content><author><name>Jaehyun Jeong</name><email>jj.for.jaehyun@gmail.com</email></author><category term="SetTheory" /><summary type="html"><![CDATA[One of the biggest surprises that I encountered while majoring in applied mathematics was the statement that “the cardinal numbers of $\mathbb{N}$ (the set of natural numbers) and $\mathbb{Z}$ (the set of integers) are equal”. The cardinal number of a set is defined as the size of the set. For finite sets, if a set $A$ is empty then the cardinal number of $A$ is 0, and if the set $A$ has $k$ elements then the cardinal number of $A$ is $k$. However, for infinite sets $A$ and $B$, their cardinal numbers are equal if and only if there exists a one-to-one correspondence (bijective) from $A$ to $B$. We can deduce that aforementioned statement is true from the definition of cardinal number. The really interesting part of this theorem is that the mismatch between our intuition and a rigorous mathematical concepts, since our brains tend to believe, Intuitively, that the size of $\mathbb{Z}$ should be twice the size of $\mathbb{N}$, plus one. Additionally, it is also worth noting that “the cardinal numbers of $\mathbb{N}$ and $\mathbb{R}$ are different”. This also blew my mind, and it explained how countable and uncountable sets can be different. In this post, I want to prove that the cardinal numbers of $\mathbb{N}$ and $\mathbb{R}$ are different by proving the theorems below. An open interval (0, 1) is not denumerable. The sets (0, 1) and $\mathbb{R}$ are equipotent. (have the same size) Def) Denumerable set A set $S$ is denumerable if and only if there exists a bijective function from $S$ to $\mathbb{N}$. NOTE: in this case, we denote $S \sim \mathbb{N}$, and we say the sets A and B are equipotent. Th 1) The open unit interval (0, 1) of real numbers is nondenumerable $ \underline{\text{proof}} $ $ \forall x \in (0, 1) \quad \exists x_1, x_2, x_3 \dots \in {0, 1, \dots, 9} \quad \text{s.t.} \quad x = 0.x_1 x_2 x_3 \dots $ $ (\text{For example, } x = \frac{1}{3} = 0.333\dots \implies x_1 = 3 \land x_2 = 3 \land \dots) $ $ \text{we will treat repeating zeros } (\text{such as } \frac{1}{4} = 0.25000\dots) \text{ by decreasing the last non-zero digit by 1 } $ $ (\text{as } \frac{1}{4} = 0.24999\dots) $ $ \text{Under this agreement, assume that } (0, 1) \text{ is denumerable so that} $ $ \exists \text{ bijective } f: \mathbb{N} \to (0, 1) \quad \text{s.t.} $ $ f(1) = 0.x_{11} x_{12} x_{13} \dots $ $ f(2) = 0.x_{21} x_{22} x_{23} \dots $ $ f(3) = 0.x_{31} x_{32} x_{33} \dots $ $ \vdots $ $ f(k) = 0.x_{k1} x_{k2} x_{k3} \dots $ $ \text{and let } z \in (0, 1) \text{ be defined as follows:} $ \[z = 0.z_1 z_2 z_3 \dots \quad \text{s.t.} \quad \forall k \in \mathbb{N}, \quad \begin{cases} z_k = 1 &amp; (x_{kk} \neq 1) \\ z_k = 2 &amp; (x_{kk} = 1) \end{cases}\] $ \text{then } \forall n \in \mathbb{N}, \quad f(n) \neq z \quad (\because \forall n \in \mathbb{N}, \quad x_{nn} \neq z_n \implies f(n) \neq z) $ $ \therefore \text{This contradicts our assumption} \quad \blacksquare $ Th 2) Open intervals (0, 1) and (-1, 1) are equipotent. $ 1) \ (0,1) \sim (-1,1) $ $ \underline{\text{proof}} $ $ \text{The function } f: (0,1) \to (-1,1) \text{ given by } f(x) = 2x - 1 \text{ is one-to-one correspondence.} $ $ \because $ $ \forall x_1, x_2 \in (0,1) \quad \text{s.t.} \quad x_1 \neq x_2 $ $ f(x_1) = 2x_1 - 1 \neq 2x_2 - 1 = f(x_2) \ (\because 2x_1 - 1 = 2x_2 - 1 \iff 2x_1 = 2x_2 \iff x_1 = x_2) $ $ \therefore f \text{ is injective} $ $ \forall y \in (-1,1) \ \exists x \in (0,1) \quad \text{s.t.} \quad $ $ y = 2x - 1 \ (\because -1 \lt y \lt 1 \Rightarrow 0 \lt y+1 \lt 2 \Rightarrow 0 \lt \frac{y+1}{2} \lt 1) $ $ \therefore f \text{ is surjective} \quad \blacksquare $ Th 3) The open intervals (-1, 1) and $\mathbb{R}$ are equipotent. $ 2) \ (-1,1) \sim \mathbb{R} $ $ \underline{\text{proof}} $ $ \text{The function } g: (-1,1) \to \mathbb{R} \text{ given by } g(x) = \tan(\frac{\pi}{2}x) \text{ is one-to-one correspondence} $ $ \because $ $ \forall x_1, x_2 \in (-1,1) \quad \text{s.t.} \quad x_1 \neq x_2 $ $ g(x_1) \neq g(x_2) \ (\because \tan(x) \text{ is one-to-one correspondence}) $ $ \therefore g \text{ is injective} $ $ \forall y \in \mathbb{R} \ \exists x \in (-1,1) \quad \text{s.t.} \quad g(x) = y $ $ \left( \because \exists x’ \quad \text{s.t.} \quad \tan(x’)=y \text{ and } \frac{\pi}{2}x = x’ \Rightarrow x = \frac{2}{\pi}x’ \right) $ $ \therefore g \text{ is surjective} \quad \blacksquare $ Conclusion $ \text{By Th 1, Th 2, and Th 3} $ $ \mathbb{N} \nsim (0,1) \text{ and } (0,1) \sim \mathbb{R} $ $ \therefore \mathbb{N} \nsim \mathbb{R} \quad \blacksquare $]]></summary></entry><entry><title type="html">Basic Guide to build and run ROS 2 Services (Python &amp;amp; C++)</title><link href="https://jaehyun-jeong.github.io/2025/12/21/ros2-services.html" rel="alternate" type="text/html" title="Basic Guide to build and run ROS 2 Services (Python &amp;amp; C++)" /><published>2025-12-21T00:00:00+09:00</published><updated>2025-12-21T00:00:00+09:00</updated><id>https://jaehyun-jeong.github.io/2025/12/21/ros2-services</id><content type="html" xml:base="https://jaehyun-jeong.github.io/2025/12/21/ros2-services.html"><![CDATA[<p>If you don’t know about ROS 2 Topics, go to <a href="https://jaehyun-jeong.github.io/2025/12/09/ros2-topics.html">this</a> page and learn.</p>

<p><strong>Topics are used for data streams (unidirectional), and Services are used for a client/server interactions (bidirectional).</strong></p>

<p>First , Services can work in a synchronous or asynchronous manner. If the service is synchronous, the client sends a Request and blocks until receiving a response. However, if the service is asynchronous, the client sends a Request, registers a callback function for the response and continues its execution. When the server responds, the callback function is triggered.</p>

<p>Furthermore, you define services by name and a pair of messages. One message is the Request and other message is the Response.</p>

<p>Finally, only one server can exist for a given service name.</p>

<h2 id="simple-python-code">Simple Python code</h2>

<h3 id="server">server</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">#!/usr/bin/env python3
</span><span class="kn">import</span> <span class="nn">rclpy</span>
<span class="kn">from</span> <span class="nn">rclpy.node</span> <span class="kn">import</span> <span class="n">Node</span>
<span class="kn">from</span> <span class="nn">example_interfaces.srv</span> <span class="kn">import</span> <span class="n">AddTwoInts</span>


<span class="k">class</span> <span class="nc">AddTwoIntsServerNode</span><span class="p">(</span><span class="n">Node</span><span class="p">):</span>
    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        <span class="nb">super</span><span class="p">().</span><span class="n">__init__</span><span class="p">(</span><span class="s">"add_two_ints_server"</span><span class="p">)</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">server_</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">create_service</span><span class="p">(</span>
            <span class="n">AddTwoInts</span><span class="p">,</span>
            <span class="s">"add_two_ints"</span><span class="p">,</span>  <span class="c1"># Use a verb for service name
</span>            <span class="bp">self</span><span class="p">.</span><span class="n">callback_add_two_ints</span><span class="p">,</span>
        <span class="p">)</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">get_logger</span><span class="p">().</span><span class="n">info</span><span class="p">(</span><span class="s">"Add Two Ints server has been started"</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">callback_add_two_ints</span><span class="p">(</span>
        <span class="bp">self</span><span class="p">,</span>
        <span class="n">request</span><span class="p">:</span> <span class="n">AddTwoInts</span><span class="p">.</span><span class="n">Request</span><span class="p">,</span>
        <span class="n">response</span><span class="p">:</span> <span class="n">AddTwoInts</span><span class="p">.</span><span class="n">Response</span>
    <span class="p">):</span>
        <span class="n">response</span><span class="p">.</span><span class="nb">sum</span> <span class="o">=</span> <span class="n">request</span><span class="p">.</span><span class="n">a</span> <span class="o">+</span> <span class="n">request</span><span class="p">.</span><span class="n">b</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">get_logger</span><span class="p">().</span><span class="n">info</span><span class="p">(</span>
            <span class="nb">str</span><span class="p">(</span><span class="n">request</span><span class="p">.</span><span class="n">a</span><span class="p">)</span> <span class="o">+</span> <span class="s">" + "</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">request</span><span class="p">.</span><span class="n">b</span><span class="p">)</span> <span class="o">+</span> <span class="s">" = "</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">response</span><span class="p">.</span><span class="nb">sum</span><span class="p">)</span>
        <span class="p">)</span>
        <span class="k">return</span> <span class="n">response</span>


<span class="k">def</span> <span class="nf">main</span><span class="p">(</span><span class="n">args</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span>
    <span class="n">rclpy</span><span class="p">.</span><span class="n">init</span><span class="p">(</span><span class="n">args</span><span class="o">=</span><span class="n">args</span><span class="p">)</span>
    <span class="n">node</span> <span class="o">=</span> <span class="n">AddTwoIntsServerNode</span><span class="p">()</span>
    <span class="n">rclpy</span><span class="p">.</span><span class="n">spin</span><span class="p">(</span><span class="n">node</span><span class="p">)</span>
    <span class="n">rclpy</span><span class="p">.</span><span class="n">shutdown</span><span class="p">()</span>


<span class="k">if</span> <span class="n">__name__</span> <span class="o">==</span> <span class="s">"__main__"</span><span class="p">:</span>
    <span class="n">main</span><span class="p">()</span>
</code></pre></div></div>

<h3 id="client">Client</h3>

<p><em>Non-OOP method</em></p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">#!/usr/bin/env python3
</span><span class="kn">import</span> <span class="nn">rclpy</span>
<span class="kn">from</span> <span class="nn">rclpy.node</span> <span class="kn">import</span> <span class="n">Node</span>
<span class="kn">from</span> <span class="nn">example_interfaces.srv</span> <span class="kn">import</span> <span class="n">AddTwoInts</span>


<span class="k">def</span> <span class="nf">main</span><span class="p">(</span><span class="n">args</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span>
    <span class="n">rclpy</span><span class="p">.</span><span class="n">init</span><span class="p">(</span><span class="n">args</span><span class="o">=</span><span class="n">args</span><span class="p">)</span>
    <span class="n">node</span> <span class="o">=</span> <span class="n">Node</span><span class="p">(</span><span class="s">"add_two_ints_client_no_oop"</span><span class="p">)</span>

    <span class="n">client</span> <span class="o">=</span> <span class="n">node</span><span class="p">.</span><span class="n">create_client</span><span class="p">(</span>
        <span class="n">AddTwoInts</span><span class="p">,</span>
        <span class="s">"add_two_ints"</span>
    <span class="p">)</span>
    <span class="k">while</span> <span class="ow">not</span> <span class="n">client</span><span class="p">.</span><span class="n">wait_for_service</span><span class="p">(</span><span class="mf">1.0</span><span class="p">):</span>
        <span class="n">node</span><span class="p">.</span><span class="n">get_logger</span><span class="p">().</span><span class="n">warn</span><span class="p">(</span><span class="s">"Waiting for Add Two Ints server..."</span><span class="p">)</span>

    <span class="n">request</span> <span class="o">=</span> <span class="n">AddTwoInts</span><span class="p">.</span><span class="n">Request</span><span class="p">()</span>
    <span class="n">request</span><span class="p">.</span><span class="n">a</span> <span class="o">=</span> <span class="mi">3</span>
    <span class="n">request</span><span class="p">.</span><span class="n">b</span> <span class="o">=</span> <span class="mi">8</span>

    <span class="n">future</span> <span class="o">=</span> <span class="n">client</span><span class="p">.</span><span class="n">call_async</span><span class="p">(</span><span class="n">request</span><span class="p">)</span>  <span class="c1"># client.call (sync)
</span>    <span class="c1"># Spin until getting the response
</span>    <span class="n">rclpy</span><span class="p">.</span><span class="n">spin_until_future_complete</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="n">future</span><span class="p">)</span>

    <span class="n">response</span> <span class="o">=</span> <span class="n">future</span><span class="p">.</span><span class="n">result</span><span class="p">()</span>
    <span class="n">node</span><span class="p">.</span><span class="n">get_logger</span><span class="p">().</span><span class="n">info</span><span class="p">(</span>
        <span class="nb">str</span><span class="p">(</span><span class="n">request</span><span class="p">.</span><span class="n">a</span><span class="p">)</span> <span class="o">+</span> <span class="s">" + "</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">request</span><span class="p">.</span><span class="n">b</span><span class="p">)</span> <span class="o">+</span> <span class="s">" = "</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">response</span><span class="p">.</span><span class="nb">sum</span><span class="p">)</span>
    <span class="p">)</span>

    <span class="n">rclpy</span><span class="p">.</span><span class="n">shutdown</span><span class="p">()</span>


<span class="k">if</span> <span class="n">__name__</span> <span class="o">==</span> <span class="s">"__main__"</span><span class="p">:</span>
    <span class="n">main</span><span class="p">()</span>
</code></pre></div></div>

<p><em>OOP method</em></p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">#!/usr/bin/env python3
</span><span class="kn">import</span> <span class="nn">rclpy</span>
<span class="kn">from</span> <span class="nn">rclpy.node</span> <span class="kn">import</span> <span class="n">Node</span>
<span class="kn">from</span> <span class="nn">example_interfaces.srv</span> <span class="kn">import</span> <span class="n">AddTwoInts</span>
<span class="kn">from</span> <span class="nn">functools</span> <span class="kn">import</span> <span class="n">partial</span>


<span class="k">class</span> <span class="nc">AddTwoIntsClient</span><span class="p">(</span><span class="n">Node</span><span class="p">):</span>
    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        <span class="nb">super</span><span class="p">().</span><span class="n">__init__</span><span class="p">(</span><span class="s">"add_two_ints_client"</span><span class="p">)</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">client_</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">create_client</span><span class="p">(</span><span class="n">AddTwoInts</span><span class="p">,</span> <span class="s">"add_two_ints"</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">call_add_two_ints</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">):</span>
        <span class="k">while</span> <span class="ow">not</span> <span class="bp">self</span><span class="p">.</span><span class="n">client_</span><span class="p">.</span><span class="n">wait_for_service</span><span class="p">(</span><span class="mf">1.0</span><span class="p">):</span>
            <span class="bp">self</span><span class="p">.</span><span class="n">get_logger</span><span class="p">().</span><span class="n">warn</span><span class="p">(</span><span class="s">"Waiting for Add Two Ints server..."</span><span class="p">)</span>

        <span class="n">request</span> <span class="o">=</span> <span class="n">AddTwoInts</span><span class="p">.</span><span class="n">Request</span><span class="p">()</span>
        <span class="n">request</span><span class="p">.</span><span class="n">a</span> <span class="o">=</span> <span class="n">a</span>
        <span class="n">request</span><span class="p">.</span><span class="n">b</span> <span class="o">=</span> <span class="n">b</span>

        <span class="n">future</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">client_</span><span class="p">.</span><span class="n">call_async</span><span class="p">(</span><span class="n">request</span><span class="p">)</span>
        <span class="c1"># To add another argument, arguments must be wrapped with partial
</span>        <span class="n">future</span><span class="p">.</span><span class="n">add_done_callback</span><span class="p">(</span><span class="n">partial</span><span class="p">(</span>
            <span class="bp">self</span><span class="p">.</span><span class="n">callback_call_add_two_ints</span><span class="p">,</span> <span class="n">request</span><span class="o">=</span><span class="n">request</span>
        <span class="p">))</span>

    <span class="k">def</span> <span class="nf">callback_call_add_two_ints</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">future</span><span class="p">,</span> <span class="n">request</span><span class="p">):</span>
        <span class="n">response</span> <span class="o">=</span> <span class="n">future</span><span class="p">.</span><span class="n">result</span><span class="p">()</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">get_logger</span><span class="p">().</span><span class="n">info</span><span class="p">(</span>
            <span class="nb">str</span><span class="p">(</span><span class="n">request</span><span class="p">.</span><span class="n">a</span><span class="p">)</span> <span class="o">+</span> <span class="s">" + "</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">request</span><span class="p">.</span><span class="n">b</span><span class="p">)</span> <span class="o">+</span> <span class="s">" = "</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">response</span><span class="p">.</span><span class="nb">sum</span><span class="p">)</span>
        <span class="p">)</span>


<span class="k">def</span> <span class="nf">main</span><span class="p">(</span><span class="n">args</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span>
    <span class="n">rclpy</span><span class="p">.</span><span class="n">init</span><span class="p">(</span><span class="n">args</span><span class="o">=</span><span class="n">args</span><span class="p">)</span>
    <span class="n">node</span> <span class="o">=</span> <span class="n">AddTwoIntsClient</span><span class="p">()</span>
    <span class="n">node</span><span class="p">.</span><span class="n">call_add_two_ints</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">7</span><span class="p">)</span>
    <span class="n">node</span><span class="p">.</span><span class="n">call_add_two_ints</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">4</span><span class="p">)</span>
    <span class="n">node</span><span class="p">.</span><span class="n">call_add_two_ints</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">20</span><span class="p">)</span>
    <span class="n">rclpy</span><span class="p">.</span><span class="n">spin</span><span class="p">(</span><span class="n">node</span><span class="p">)</span>
    <span class="n">rclpy</span><span class="p">.</span><span class="n">shutdown</span><span class="p">()</span>


<span class="k">if</span> <span class="n">__name__</span> <span class="o">==</span> <span class="s">"__main__"</span><span class="p">:</span>
    <span class="n">main</span><span class="p">()</span>
</code></pre></div></div>

<p><strong>NOTE: In the OOP method, “rclpy.spin_until_future_complete(node, future)” is not required since the class is already spinning. Instead of this, it is required to add a callback function using “future.add_done_callback”.</strong></p>

<h2 id="simple-c-code">Simple C++ code</h2>

<h3 id="server-1">Server</h3>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">"rclcpp/rclcpp.hpp"</span><span class="cp">
#include</span> <span class="cpf">"example_interfaces/srv/add_two_ints.hpp"</span><span class="cp">
</span>
<span class="k">using</span> <span class="k">namespace</span> <span class="n">std</span><span class="o">::</span><span class="n">placeholders</span><span class="p">;</span>


<span class="k">class</span> <span class="nc">AddTwoIntsServerNode</span> <span class="o">:</span> <span class="k">public</span> <span class="n">rclcpp</span><span class="o">::</span><span class="n">Node</span><span class="p">{</span>
<span class="nl">public:</span>
    <span class="n">AddTwoIntsServerNode</span><span class="p">()</span> <span class="o">:</span> <span class="n">Node</span><span class="p">(</span><span class="s">"add_two_ints_server"</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="n">server_</span> <span class="o">=</span> <span class="k">this</span><span class="o">-&gt;</span><span class="n">create_service</span><span class="o">&lt;</span><span class="n">example_interfaces</span><span class="o">::</span><span class="n">srv</span><span class="o">::</span><span class="n">AddTwoInts</span><span class="o">&gt;</span><span class="p">(</span>
            <span class="s">"add_two_ints"</span><span class="p">,</span>
            <span class="n">std</span><span class="o">::</span><span class="n">bind</span><span class="p">(</span><span class="o">&amp;</span><span class="n">AddTwoIntsServerNode</span><span class="o">::</span><span class="n">callbackAddTwoInts</span><span class="p">,</span> <span class="k">this</span><span class="p">,</span> <span class="n">_1</span><span class="p">,</span> <span class="n">_2</span><span class="p">)</span>
        <span class="p">);</span>
        <span class="n">RCLCPP_INFO</span><span class="p">(</span><span class="k">this</span><span class="o">-&gt;</span><span class="n">get_logger</span><span class="p">(),</span> <span class="s">"Add Two Ints Service has been started"</span><span class="p">);</span>
    <span class="p">}</span>
<span class="nl">private:</span>
    <span class="kt">void</span> <span class="n">callbackAddTwoInts</span><span class="p">(</span>
        <span class="k">const</span> <span class="n">example_interfaces</span><span class="o">::</span><span class="n">srv</span><span class="o">::</span><span class="n">AddTwoInts</span><span class="o">::</span><span class="n">Request</span><span class="o">::</span><span class="n">SharedPtr</span> <span class="n">request</span><span class="p">,</span>
        <span class="k">const</span> <span class="n">example_interfaces</span><span class="o">::</span><span class="n">srv</span><span class="o">::</span><span class="n">AddTwoInts</span><span class="o">::</span><span class="n">Response</span><span class="o">::</span><span class="n">SharedPtr</span> <span class="n">response</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="n">response</span><span class="o">-&gt;</span><span class="n">sum</span> <span class="o">=</span> <span class="n">request</span><span class="o">-&gt;</span><span class="n">a</span> <span class="o">+</span> <span class="n">request</span><span class="o">-&gt;</span><span class="n">b</span><span class="p">;</span>
        <span class="n">RCLCPP_INFO</span><span class="p">(</span><span class="k">this</span><span class="o">-&gt;</span><span class="n">get_logger</span><span class="p">(),</span> <span class="s">"%d + %d = %d"</span><span class="p">,</span> <span class="p">(</span><span class="kt">int</span><span class="p">)</span><span class="n">request</span><span class="o">-&gt;</span><span class="n">a</span><span class="p">,</span> <span class="p">(</span><span class="kt">int</span><span class="p">)</span><span class="n">request</span><span class="o">-&gt;</span><span class="n">b</span><span class="p">,</span> <span class="p">(</span><span class="kt">int</span><span class="p">)</span><span class="n">response</span><span class="o">-&gt;</span><span class="n">sum</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="n">rclcpp</span><span class="o">::</span><span class="n">Service</span><span class="o">&lt;</span><span class="n">example_interfaces</span><span class="o">::</span><span class="n">srv</span><span class="o">::</span><span class="n">AddTwoInts</span><span class="o">&gt;::</span><span class="n">SharedPtr</span> <span class="n">server_</span><span class="p">;</span>
<span class="p">};</span>


<span class="kt">int</span> <span class="n">main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="kt">char</span> <span class="o">**</span><span class="n">argv</span><span class="p">){</span>
    <span class="n">rclcpp</span><span class="o">::</span><span class="n">init</span><span class="p">(</span><span class="n">argc</span><span class="p">,</span> <span class="n">argv</span><span class="p">);</span>
    <span class="k">auto</span> <span class="n">node</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="n">make_shared</span><span class="o">&lt;</span><span class="n">AddTwoIntsServerNode</span><span class="o">&gt;</span><span class="p">();</span>
    <span class="n">rclcpp</span><span class="o">::</span><span class="n">spin</span><span class="p">(</span><span class="n">node</span><span class="p">);</span>
    <span class="n">rclcpp</span><span class="o">::</span><span class="n">shutdown</span><span class="p">();</span>
    <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<h3 id="client-1">Client</h3>

<p><em>Non-OOP method</em></p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">"rclcpp/rclcpp.hpp"</span><span class="cp">
#include</span> <span class="cpf">"example_interfaces/srv/add_two_ints.hpp"</span><span class="cp">
</span>
<span class="k">using</span> <span class="k">namespace</span> <span class="n">std</span><span class="o">::</span><span class="n">chrono_literals</span><span class="p">;</span>


<span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="kt">char</span> <span class="o">**</span><span class="n">argv</span><span class="p">){</span>
    <span class="n">rclcpp</span><span class="o">::</span><span class="n">init</span><span class="p">(</span><span class="n">argc</span><span class="p">,</span> <span class="n">argv</span><span class="p">);</span>
    <span class="k">auto</span> <span class="n">node</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="n">make_shared</span><span class="o">&lt;</span><span class="n">rclcpp</span><span class="o">::</span><span class="n">Node</span><span class="o">&gt;</span><span class="p">(</span><span class="s">"add_two_ints_client_no_oop"</span><span class="p">);</span>

    <span class="k">auto</span> <span class="n">client</span> <span class="o">=</span> <span class="n">node</span><span class="o">-&gt;</span><span class="n">create_client</span><span class="o">&lt;</span><span class="n">example_interfaces</span><span class="o">::</span><span class="n">srv</span><span class="o">::</span><span class="n">AddTwoInts</span><span class="o">&gt;</span><span class="p">(</span><span class="s">"add_two_ints"</span><span class="p">);</span>
    <span class="k">while</span><span class="p">(</span><span class="o">!</span><span class="n">client</span><span class="o">-&gt;</span><span class="n">wait_for_service</span><span class="p">(</span><span class="mx">1s</span><span class="p">)){</span>
        <span class="n">RCLCPP_WARN</span><span class="p">(</span><span class="n">node</span><span class="o">-&gt;</span><span class="n">get_logger</span><span class="p">(),</span> <span class="s">"Waiting for the server..."</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="k">auto</span> <span class="n">request</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="n">make_shared</span><span class="o">&lt;</span><span class="n">example_interfaces</span><span class="o">::</span><span class="n">srv</span><span class="o">::</span><span class="n">AddTwoInts</span><span class="o">::</span><span class="n">Request</span><span class="o">&gt;</span><span class="p">();</span>
    <span class="n">request</span><span class="o">-&gt;</span><span class="n">a</span> <span class="o">=</span> <span class="mi">6</span><span class="p">;</span>
    <span class="n">request</span><span class="o">-&gt;</span><span class="n">b</span> <span class="o">=</span> <span class="mi">2</span><span class="p">;</span>

    <span class="k">auto</span> <span class="n">future</span> <span class="o">=</span> <span class="n">client</span><span class="o">-&gt;</span><span class="n">async_send_request</span><span class="p">(</span><span class="n">request</span><span class="p">);</span>
    <span class="n">rclcpp</span><span class="o">::</span><span class="n">spin_until_future_complete</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="n">future</span><span class="p">);</span>

    <span class="k">auto</span> <span class="n">response</span> <span class="o">=</span> <span class="n">future</span><span class="p">.</span><span class="n">get</span><span class="p">();</span>
    <span class="n">RCLCPP_INFO</span><span class="p">(</span><span class="n">node</span><span class="o">-&gt;</span><span class="n">get_logger</span><span class="p">(),</span> <span class="s">"%d + %d = %d"</span><span class="p">,</span> <span class="p">(</span><span class="kt">int</span><span class="p">)</span><span class="n">request</span><span class="o">-&gt;</span><span class="n">a</span><span class="p">,</span> <span class="p">(</span><span class="kt">int</span><span class="p">)</span><span class="n">request</span><span class="o">-&gt;</span><span class="n">b</span><span class="p">,</span> <span class="p">(</span><span class="kt">int</span><span class="p">)</span><span class="n">response</span><span class="o">-&gt;</span><span class="n">sum</span><span class="p">);</span>

    <span class="n">rclcpp</span><span class="o">::</span><span class="n">shutdown</span><span class="p">();</span>
    <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p><em>OOP method</em></p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">"rclcpp/rclcpp.hpp"</span><span class="cp">
#include</span> <span class="cpf">"example_interfaces/srv/add_two_ints.hpp"</span><span class="cp">
</span>
<span class="k">using</span> <span class="k">namespace</span> <span class="n">std</span><span class="o">::</span><span class="n">chrono_literals</span><span class="p">;</span>
<span class="k">using</span> <span class="k">namespace</span> <span class="n">std</span><span class="o">::</span><span class="n">placeholders</span><span class="p">;</span>


<span class="k">class</span> <span class="nc">AddTwoIntsClientNode</span> <span class="o">:</span> <span class="k">public</span> <span class="n">rclcpp</span><span class="o">::</span><span class="n">Node</span><span class="p">{</span>
<span class="nl">public:</span>
    <span class="n">AddTwoIntsClientNode</span><span class="p">()</span> <span class="o">:</span> <span class="n">Node</span><span class="p">(</span><span class="s">"add_two_ints_client"</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="n">client_</span> <span class="o">=</span> <span class="k">this</span><span class="o">-&gt;</span><span class="n">create_client</span><span class="o">&lt;</span><span class="n">example_interfaces</span><span class="o">::</span><span class="n">srv</span><span class="o">::</span><span class="n">AddTwoInts</span><span class="o">&gt;</span><span class="p">(</span><span class="s">"add_two_ints"</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="kt">void</span> <span class="n">callAddTwoInts</span><span class="p">(</span><span class="kt">int</span> <span class="n">a</span><span class="p">,</span> <span class="kt">int</span> <span class="n">b</span><span class="p">){</span>
        <span class="k">while</span><span class="p">(</span><span class="o">!</span><span class="k">this</span><span class="o">-&gt;</span><span class="n">client_</span><span class="o">-&gt;</span><span class="n">wait_for_service</span><span class="p">(</span><span class="mx">1s</span><span class="p">)){</span>
            <span class="n">RCLCPP_WARN</span><span class="p">(</span><span class="k">this</span><span class="o">-&gt;</span><span class="n">get_logger</span><span class="p">(),</span> <span class="s">"Waiting for the server..."</span><span class="p">);</span>
        <span class="p">}</span>

        <span class="k">auto</span> <span class="n">request</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="n">make_shared</span><span class="o">&lt;</span><span class="n">example_interfaces</span><span class="o">::</span><span class="n">srv</span><span class="o">::</span><span class="n">AddTwoInts</span><span class="o">::</span><span class="n">Request</span><span class="o">&gt;</span><span class="p">();</span>
        <span class="n">request</span><span class="o">-&gt;</span><span class="n">a</span> <span class="o">=</span> <span class="n">a</span><span class="p">;</span>
        <span class="n">request</span><span class="o">-&gt;</span><span class="n">b</span> <span class="o">=</span> <span class="n">b</span><span class="p">;</span>

        <span class="n">client_</span><span class="o">-&gt;</span><span class="n">async_send_request</span><span class="p">(</span>
            <span class="n">request</span><span class="p">,</span>
            <span class="n">std</span><span class="o">::</span><span class="n">bind</span><span class="p">(</span><span class="o">&amp;</span><span class="n">AddTwoIntsClientNode</span><span class="o">::</span><span class="n">callbackCallAddInts</span><span class="p">,</span> <span class="k">this</span><span class="p">,</span> <span class="n">_1</span><span class="p">)</span>
        <span class="p">);</span>
    <span class="p">}</span>

<span class="nl">private:</span>

    <span class="kt">void</span> <span class="n">callbackCallAddInts</span><span class="p">(</span><span class="n">rclcpp</span><span class="o">::</span><span class="n">Client</span><span class="o">&lt;</span><span class="n">example_interfaces</span><span class="o">::</span><span class="n">srv</span><span class="o">::</span><span class="n">AddTwoInts</span><span class="o">&gt;::</span><span class="n">SharedFuture</span> <span class="n">future</span><span class="p">){</span>
        <span class="k">auto</span> <span class="n">response</span> <span class="o">=</span> <span class="n">future</span><span class="p">.</span><span class="n">get</span><span class="p">();</span>
        <span class="n">RCLCPP_INFO</span><span class="p">(</span><span class="k">this</span><span class="o">-&gt;</span><span class="n">get_logger</span><span class="p">(),</span> <span class="s">"Sum: %d"</span><span class="p">,</span> <span class="p">(</span><span class="kt">int</span><span class="p">)</span><span class="n">response</span><span class="o">-&gt;</span><span class="n">sum</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="n">rclcpp</span><span class="o">::</span><span class="n">Client</span><span class="o">&lt;</span><span class="n">example_interfaces</span><span class="o">::</span><span class="n">srv</span><span class="o">::</span><span class="n">AddTwoInts</span><span class="o">&gt;::</span><span class="n">SharedPtr</span> <span class="n">client_</span><span class="p">;</span>
<span class="p">};</span>


<span class="kt">int</span> <span class="n">main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="kt">char</span> <span class="o">**</span><span class="n">argv</span><span class="p">){</span>
    <span class="n">rclcpp</span><span class="o">::</span><span class="n">init</span><span class="p">(</span><span class="n">argc</span><span class="p">,</span> <span class="n">argv</span><span class="p">);</span>
    <span class="k">auto</span> <span class="n">node</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="n">make_shared</span><span class="o">&lt;</span><span class="n">AddTwoIntsClientNode</span><span class="o">&gt;</span><span class="p">();</span>
    <span class="n">node</span><span class="o">-&gt;</span><span class="n">callAddTwoInts</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">5</span><span class="p">);</span>
    <span class="n">node</span><span class="o">-&gt;</span><span class="n">callAddTwoInts</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">15</span><span class="p">);</span>
    <span class="n">node</span><span class="o">-&gt;</span><span class="n">callAddTwoInts</span><span class="p">(</span><span class="mi">12</span><span class="p">,</span> <span class="mi">7</span><span class="p">);</span>
    <span class="n">rclcpp</span><span class="o">::</span><span class="n">spin</span><span class="p">(</span><span class="n">node</span><span class="p">);</span>
    <span class="n">rclcpp</span><span class="o">::</span><span class="n">shutdown</span><span class="p">();</span>

    <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<h2 id="ros-2-commands-for-services">ROS 2 commands for services</h2>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ros2 service <span class="nt">-h</span>

ros2 service list
<span class="c"># OUT</span>
<span class="c"># example_interfaces/srv/AddTwoInts</span>

<span class="c"># put the output into ros2 interface command</span>
ros2 interface show example_interfaces/srv/AddTwoInts
<span class="c"># OUT</span>
<span class="c"># int64 a</span>
<span class="c"># int64 b</span>
<span class="c"># ---</span>
<span class="c"># int64 sum</span>

<span class="c"># Then you can test this server like the below command</span>
ros2 service call /add_two_ints example_interfaces/srv/AddTwoInts <span class="s2">"{a: 7, b: 3}"</span>
<span class="c"># OUT</span>
<span class="c">#</span>
<span class="c"># waiting for service to become available...</span>
<span class="c"># requester: making request: example_interfaces.srv.AddTwoInts_Request(a=7, b=3)</span>
<span class="c">#</span>
<span class="c"># response:</span>
<span class="c"># example_interfaces.srv.AddTwoInts_Response(sum=10)</span>

<span class="c"># Service name can be changed with an argument below</span>
ros2 run &lt;package name&gt; &lt;server node name&gt; <span class="nt">--ros-args</span> <span class="nt">-r</span> &lt;service name&gt;:<span class="o">=</span>&lt;change the service name to this&gt;
<span class="c"># Client can change the service name with the argument below</span>
ros2 run &lt;package name&gt; &lt;client node name&gt; <span class="nt">--ros-args</span> <span class="nt">-r</span> &lt;service name&gt;:<span class="o">=</span>&lt;change the service name to this&gt;
</code></pre></div></div>]]></content><author><name>Jaehyun Jeong</name><email>jj.for.jaehyun@gmail.com</email></author><category term="ros2" /><summary type="html"><![CDATA[If you don’t know about ROS 2 Topics, go to this page and learn. Topics are used for data streams (unidirectional), and Services are used for a client/server interactions (bidirectional). First , Services can work in a synchronous or asynchronous manner. If the service is synchronous, the client sends a Request and blocks until receiving a response. However, if the service is asynchronous, the client sends a Request, registers a callback function for the response and continues its execution. When the server responds, the callback function is triggered. Furthermore, you define services by name and a pair of messages. One message is the Request and other message is the Response. Finally, only one server can exist for a given service name. Simple Python code server #!/usr/bin/env python3 import rclpy from rclpy.node import Node from example_interfaces.srv import AddTwoInts class AddTwoIntsServerNode(Node): def __init__(self): super().__init__("add_two_ints_server") self.server_ = self.create_service( AddTwoInts, "add_two_ints", # Use a verb for service name self.callback_add_two_ints, ) self.get_logger().info("Add Two Ints server has been started") def callback_add_two_ints( self, request: AddTwoInts.Request, response: AddTwoInts.Response ): response.sum = request.a + request.b self.get_logger().info( str(request.a) + " + " + str(request.b) + " = " + str(response.sum) ) return response def main(args=None): rclpy.init(args=args) node = AddTwoIntsServerNode() rclpy.spin(node) rclpy.shutdown() if __name__ == "__main__": main() Client Non-OOP method #!/usr/bin/env python3 import rclpy from rclpy.node import Node from example_interfaces.srv import AddTwoInts def main(args=None): rclpy.init(args=args) node = Node("add_two_ints_client_no_oop") client = node.create_client( AddTwoInts, "add_two_ints" ) while not client.wait_for_service(1.0): node.get_logger().warn("Waiting for Add Two Ints server...") request = AddTwoInts.Request() request.a = 3 request.b = 8 future = client.call_async(request) # client.call (sync) # Spin until getting the response rclpy.spin_until_future_complete(node, future) response = future.result() node.get_logger().info( str(request.a) + " + " + str(request.b) + " = " + str(response.sum) ) rclpy.shutdown() if __name__ == "__main__": main() OOP method #!/usr/bin/env python3 import rclpy from rclpy.node import Node from example_interfaces.srv import AddTwoInts from functools import partial class AddTwoIntsClient(Node): def __init__(self): super().__init__("add_two_ints_client") self.client_ = self.create_client(AddTwoInts, "add_two_ints") def call_add_two_ints(self, a, b): while not self.client_.wait_for_service(1.0): self.get_logger().warn("Waiting for Add Two Ints server...") request = AddTwoInts.Request() request.a = a request.b = b future = self.client_.call_async(request) # To add another argument, arguments must be wrapped with partial future.add_done_callback(partial( self.callback_call_add_two_ints, request=request )) def callback_call_add_two_ints(self, future, request): response = future.result() self.get_logger().info( str(request.a) + " + " + str(request.b) + " = " + str(response.sum) ) def main(args=None): rclpy.init(args=args) node = AddTwoIntsClient() node.call_add_two_ints(2, 7) node.call_add_two_ints(1, 4) node.call_add_two_ints(10, 20) rclpy.spin(node) rclpy.shutdown() if __name__ == "__main__": main() NOTE: In the OOP method, “rclpy.spin_until_future_complete(node, future)” is not required since the class is already spinning. Instead of this, it is required to add a callback function using “future.add_done_callback”. Simple C++ code Server #include "rclcpp/rclcpp.hpp" #include "example_interfaces/srv/add_two_ints.hpp" using namespace std::placeholders; class AddTwoIntsServerNode : public rclcpp::Node{ public: AddTwoIntsServerNode() : Node("add_two_ints_server") { server_ = this-&gt;create_service&lt;example_interfaces::srv::AddTwoInts&gt;( "add_two_ints", std::bind(&amp;AddTwoIntsServerNode::callbackAddTwoInts, this, _1, _2) ); RCLCPP_INFO(this-&gt;get_logger(), "Add Two Ints Service has been started"); } private: void callbackAddTwoInts( const example_interfaces::srv::AddTwoInts::Request::SharedPtr request, const example_interfaces::srv::AddTwoInts::Response::SharedPtr response) { response-&gt;sum = request-&gt;a + request-&gt;b; RCLCPP_INFO(this-&gt;get_logger(), "%d + %d = %d", (int)request-&gt;a, (int)request-&gt;b, (int)response-&gt;sum); } rclcpp::Service&lt;example_interfaces::srv::AddTwoInts&gt;::SharedPtr server_; }; int main(int argc, char **argv){ rclcpp::init(argc, argv); auto node = std::make_shared&lt;AddTwoIntsServerNode&gt;(); rclcpp::spin(node); rclcpp::shutdown(); return 0; } Client Non-OOP method #include "rclcpp/rclcpp.hpp" #include "example_interfaces/srv/add_two_ints.hpp" using namespace std::chrono_literals; int main(int argc, char **argv){ rclcpp::init(argc, argv); auto node = std::make_shared&lt;rclcpp::Node&gt;("add_two_ints_client_no_oop"); auto client = node-&gt;create_client&lt;example_interfaces::srv::AddTwoInts&gt;("add_two_ints"); while(!client-&gt;wait_for_service(1s)){ RCLCPP_WARN(node-&gt;get_logger(), "Waiting for the server..."); } auto request = std::make_shared&lt;example_interfaces::srv::AddTwoInts::Request&gt;(); request-&gt;a = 6; request-&gt;b = 2; auto future = client-&gt;async_send_request(request); rclcpp::spin_until_future_complete(node, future); auto response = future.get(); RCLCPP_INFO(node-&gt;get_logger(), "%d + %d = %d", (int)request-&gt;a, (int)request-&gt;b, (int)response-&gt;sum); rclcpp::shutdown(); return 0; } OOP method #include "rclcpp/rclcpp.hpp" #include "example_interfaces/srv/add_two_ints.hpp" using namespace std::chrono_literals; using namespace std::placeholders; class AddTwoIntsClientNode : public rclcpp::Node{ public: AddTwoIntsClientNode() : Node("add_two_ints_client") { client_ = this-&gt;create_client&lt;example_interfaces::srv::AddTwoInts&gt;("add_two_ints"); } void callAddTwoInts(int a, int b){ while(!this-&gt;client_-&gt;wait_for_service(1s)){ RCLCPP_WARN(this-&gt;get_logger(), "Waiting for the server..."); } auto request = std::make_shared&lt;example_interfaces::srv::AddTwoInts::Request&gt;(); request-&gt;a = a; request-&gt;b = b; client_-&gt;async_send_request( request, std::bind(&amp;AddTwoIntsClientNode::callbackCallAddInts, this, _1) ); } private: void callbackCallAddInts(rclcpp::Client&lt;example_interfaces::srv::AddTwoInts&gt;::SharedFuture future){ auto response = future.get(); RCLCPP_INFO(this-&gt;get_logger(), "Sum: %d", (int)response-&gt;sum); } rclcpp::Client&lt;example_interfaces::srv::AddTwoInts&gt;::SharedPtr client_; }; int main(int argc, char **argv){ rclcpp::init(argc, argv); auto node = std::make_shared&lt;AddTwoIntsClientNode&gt;(); node-&gt;callAddTwoInts(10, 5); node-&gt;callAddTwoInts(10, 15); node-&gt;callAddTwoInts(12, 7); rclcpp::spin(node); rclcpp::shutdown(); return 0; } ROS 2 commands for services ros2 service -h ros2 service list # OUT # example_interfaces/srv/AddTwoInts # put the output into ros2 interface command ros2 interface show example_interfaces/srv/AddTwoInts # OUT # int64 a # int64 b # --- # int64 sum # Then you can test this server like the below command ros2 service call /add_two_ints example_interfaces/srv/AddTwoInts "{a: 7, b: 3}" # OUT # # waiting for service to become available... # requester: making request: example_interfaces.srv.AddTwoInts_Request(a=7, b=3) # # response: # example_interfaces.srv.AddTwoInts_Response(sum=10) # Service name can be changed with an argument below ros2 run &lt;package name&gt; &lt;server node name&gt; --ros-args -r &lt;service name&gt;:=&lt;change the service name to this&gt; # Client can change the service name with the argument below ros2 run &lt;package name&gt; &lt;client node name&gt; --ros-args -r &lt;service name&gt;:=&lt;change the service name to this&gt;]]></summary></entry><entry><title type="html">Basic Guide to build and run ROS 2 Topics (Python &amp;amp; C++)</title><link href="https://jaehyun-jeong.github.io/2025/12/09/ros2-topics.html" rel="alternate" type="text/html" title="Basic Guide to build and run ROS 2 Topics (Python &amp;amp; C++)" /><published>2025-12-09T00:00:00+09:00</published><updated>2025-12-09T00:00:00+09:00</updated><id>https://jaehyun-jeong.github.io/2025/12/09/ros2-topics</id><content type="html" xml:base="https://jaehyun-jeong.github.io/2025/12/09/ros2-topics.html"><![CDATA[<p><strong>A Topic is a receiver of a signal from a publisher (node).</strong> The publisher is able to send data to the topic while not knowing which subscribers(nodes) receive this data. Similarly, subscribers do not know which nodes send the data to the topic. On top of that, Nodes’ capability of sending data is not restricted to sending to single topic but sending to multiple topics to different topics. In addition to that, the data stream is unidirectional. Data can be sent to subscriber but cannot be returned to the publisher.</p>

<p>Technically, ROS 2 messages are transferred using middleware named <strong>DDS</strong>. However, users do not need to handle DDS as libraries such as RCL provide abstraction.</p>

<h2 id="simple-python-code">Simple Python code</h2>

<h3 id="publisher">Publisher</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">#!/usr/bin/env python3
</span><span class="kn">import</span> <span class="nn">rclpy</span>
<span class="kn">from</span> <span class="nn">rclpy.node</span> <span class="kn">import</span> <span class="n">Node</span>
<span class="kn">from</span> <span class="nn">example_interfaces.msg</span> <span class="kn">import</span> <span class="n">String</span>


<span class="k">class</span> <span class="nc">RobotNewsStationNode</span><span class="p">(</span><span class="n">Node</span><span class="p">):</span>
    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        <span class="nb">super</span><span class="p">().</span><span class="n">__init__</span><span class="p">(</span><span class="s">"robot_news_station"</span><span class="p">)</span>  <span class="c1"># Choosing the same node name with file name is quite common.
</span>        <span class="bp">self</span><span class="p">.</span><span class="n">robot_name</span> <span class="o">=</span> <span class="s">"C3PO"</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">publisher_</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">create_publisher</span><span class="p">(</span>
            <span class="n">String</span><span class="p">,</span>
            <span class="s">"robot_news"</span><span class="p">,</span>
            <span class="mi">10</span>
        <span class="p">)</span>
        <span class="c1"># 0.5 means twice per seconds
</span>        <span class="bp">self</span><span class="p">.</span><span class="n">timer_</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">create_timer</span><span class="p">(</span><span class="mf">0.5</span><span class="p">,</span> <span class="bp">self</span><span class="p">.</span><span class="n">publish_news</span><span class="p">)</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">get_logger</span><span class="p">().</span><span class="n">info</span><span class="p">(</span><span class="s">"Robot News Station has been started."</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">publish_news</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        <span class="n">msg</span> <span class="o">=</span> <span class="n">String</span><span class="p">()</span>
        <span class="n">msg</span><span class="p">.</span><span class="n">data</span> <span class="o">=</span> <span class="sa">f</span><span class="s">"Hi, this is </span><span class="si">{</span><span class="bp">self</span><span class="p">.</span><span class="n">robot_name</span><span class="si">}</span><span class="s"> from the robot news station."</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">publisher_</span><span class="p">.</span><span class="n">publish</span><span class="p">(</span><span class="n">msg</span><span class="p">)</span>


<span class="k">def</span> <span class="nf">main</span><span class="p">(</span><span class="n">args</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span>
    <span class="n">rclpy</span><span class="p">.</span><span class="n">init</span><span class="p">(</span><span class="n">args</span><span class="o">=</span><span class="n">args</span><span class="p">)</span>
    <span class="n">node</span> <span class="o">=</span> <span class="n">RobotNewsStationNode</span><span class="p">()</span>
    <span class="n">rclpy</span><span class="p">.</span><span class="n">spin</span><span class="p">(</span><span class="n">node</span><span class="p">)</span>
    <span class="n">rclpy</span><span class="p">.</span><span class="n">shutdown</span><span class="p">()</span>


<span class="k">if</span> <span class="n">__name__</span> <span class="o">==</span> <span class="s">"__main__"</span><span class="p">:</span>
    <span class="n">main</span><span class="p">()</span>
</code></pre></div></div>

<p><strong>NOTE: Do not forget to add “example_interfaces” library in the package.xml file for String message type and install the node in the setup.py.</strong></p>

<h3 id="subscriber">Subscriber</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">#!/usr/bin/env python3
</span><span class="kn">import</span> <span class="nn">rclpy</span>
<span class="kn">from</span> <span class="nn">rclpy.node</span> <span class="kn">import</span> <span class="n">Node</span>
<span class="kn">from</span> <span class="nn">example_interfaces.msg</span> <span class="kn">import</span> <span class="n">String</span>


<span class="k">class</span> <span class="nc">SmartphoneNode</span><span class="p">(</span><span class="n">Node</span><span class="p">):</span>
    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        <span class="nb">super</span><span class="p">().</span><span class="n">__init__</span><span class="p">(</span><span class="s">"smartphone"</span><span class="p">)</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">subscriber_</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">create_subscription</span><span class="p">(</span>
            <span class="n">String</span><span class="p">,</span>
            <span class="s">"robot_news"</span><span class="p">,</span>
            <span class="c1"># When the subscriber receives the message
</span>            <span class="bp">self</span><span class="p">.</span><span class="n">callback_robot_news</span><span class="p">,</span>
            <span class="c1"># queue size
</span>            <span class="mi">10</span>
        <span class="p">)</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">get_logger</span><span class="p">().</span><span class="n">info</span><span class="p">(</span><span class="s">"Smartphone has been started."</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">callback_robot_news</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">msg</span><span class="p">:</span> <span class="n">String</span><span class="p">):</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">get_logger</span><span class="p">().</span><span class="n">info</span><span class="p">(</span><span class="n">msg</span><span class="p">.</span><span class="n">data</span><span class="p">)</span>


<span class="k">def</span> <span class="nf">main</span><span class="p">(</span><span class="n">args</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span>
    <span class="n">rclpy</span><span class="p">.</span><span class="n">init</span><span class="p">(</span><span class="n">args</span><span class="o">=</span><span class="n">args</span><span class="p">)</span>
    <span class="n">node</span> <span class="o">=</span> <span class="n">SmartphoneNode</span><span class="p">()</span>
    <span class="n">rclpy</span><span class="p">.</span><span class="n">spin</span><span class="p">(</span><span class="n">node</span><span class="p">)</span>
    <span class="n">rclpy</span><span class="p">.</span><span class="n">shutdown</span><span class="p">()</span>


<span class="k">if</span> <span class="n">__name__</span> <span class="o">==</span> <span class="s">"__main__"</span><span class="p">:</span>
    <span class="n">main</span><span class="p">()</span>
</code></pre></div></div>

<h2 id="simple-c-code">Simple C++ code</h2>

<h3 id="publisher-1">Publisher</h3>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">"rclcpp/rclcpp.hpp"</span><span class="cp">
#include</span> <span class="cpf">"example_interfaces/msg/string.hpp"</span><span class="cp">
</span>
<span class="k">using</span> <span class="k">namespace</span> <span class="n">std</span><span class="o">::</span><span class="n">chrono_literals</span><span class="p">;</span>

<span class="k">class</span> <span class="nc">RobotNewsStationNode</span> <span class="o">:</span> <span class="k">public</span> <span class="n">rclcpp</span><span class="o">::</span><span class="n">Node</span><span class="p">{</span>
<span class="nl">public:</span>
    <span class="n">RobotNewsStationNode</span><span class="p">()</span> <span class="o">:</span> <span class="n">Node</span><span class="p">(</span><span class="s">"robot_news_station"</span><span class="p">),</span> <span class="n">robot_name_</span><span class="p">(</span><span class="s">"R2D2"</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="n">publisher_</span> <span class="o">=</span> <span class="k">this</span><span class="o">-&gt;</span><span class="n">create_publisher</span><span class="o">&lt;</span><span class="n">example_interfaces</span><span class="o">::</span><span class="n">msg</span><span class="o">::</span><span class="n">String</span><span class="o">&gt;</span><span class="p">(</span><span class="s">"robot_news"</span><span class="p">,</span> <span class="mi">10</span><span class="p">);</span>
        <span class="n">timer_</span> <span class="o">=</span> <span class="k">this</span><span class="o">-&gt;</span><span class="n">create_wall_timer</span><span class="p">(</span><span class="mx">0.5s</span><span class="p">,</span> <span class="n">std</span><span class="o">::</span><span class="n">bind</span><span class="p">(</span><span class="o">&amp;</span><span class="n">RobotNewsStationNode</span><span class="o">::</span><span class="n">publishNews</span><span class="p">,</span> <span class="k">this</span><span class="p">));</span>
        <span class="n">RCLCPP_INFO</span><span class="p">(</span><span class="k">this</span><span class="o">-&gt;</span><span class="n">get_logger</span><span class="p">(),</span> <span class="s">"Robot News Station has been started"</span><span class="p">);</span>
    <span class="p">}</span>

<span class="nl">private:</span>
    <span class="kt">void</span> <span class="n">publishNews</span><span class="p">(){</span>
        <span class="k">auto</span> <span class="n">msg</span> <span class="o">=</span> <span class="n">example_interfaces</span><span class="o">::</span><span class="n">msg</span><span class="o">::</span><span class="n">String</span><span class="p">();</span>
        <span class="n">msg</span><span class="p">.</span><span class="n">data</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="p">(</span><span class="s">"Hi, this is "</span><span class="p">)</span> <span class="o">+</span> <span class="n">robot_name_</span> <span class="o">+</span> <span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="p">(</span><span class="s">" from the robot news station."</span><span class="p">);</span>
        <span class="n">publisher_</span><span class="o">-&gt;</span><span class="n">publish</span><span class="p">(</span><span class="n">msg</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="n">std</span><span class="o">::</span><span class="n">string</span> <span class="n">robot_name_</span><span class="p">;</span>
    <span class="n">rclcpp</span><span class="o">::</span><span class="n">Publisher</span><span class="o">&lt;</span><span class="n">example_interfaces</span><span class="o">::</span><span class="n">msg</span><span class="o">::</span><span class="n">String</span><span class="o">&gt;::</span><span class="n">SharedPtr</span> <span class="n">publisher_</span><span class="p">;</span>
    <span class="n">rclcpp</span><span class="o">::</span><span class="n">TimerBase</span><span class="o">::</span><span class="n">SharedPtr</span> <span class="n">timer_</span><span class="p">;</span>
<span class="p">};</span>

<span class="kt">int</span> <span class="n">main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="kt">char</span> <span class="o">**</span><span class="n">argv</span><span class="p">){</span>
    <span class="n">rclcpp</span><span class="o">::</span><span class="n">init</span><span class="p">(</span><span class="n">argc</span><span class="p">,</span> <span class="n">argv</span><span class="p">);</span>
    <span class="k">auto</span> <span class="n">node</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="n">make_shared</span><span class="o">&lt;</span><span class="n">RobotNewsStationNode</span><span class="o">&gt;</span><span class="p">();</span>
    <span class="n">rclcpp</span><span class="o">::</span><span class="n">spin</span><span class="p">(</span><span class="n">node</span><span class="p">);</span>
    <span class="n">rclcpp</span><span class="o">::</span><span class="n">shutdown</span><span class="p">();</span>
    <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<h3 id="subscriber-1">Subscriber</h3>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">"rclcpp/rclcpp.hpp"</span><span class="cp">
#include</span> <span class="cpf">"example_interfaces/msg/string.hpp"</span><span class="cp">
</span>
<span class="k">using</span> <span class="k">namespace</span> <span class="n">std</span><span class="o">::</span><span class="n">placeholders</span><span class="p">;</span>

<span class="k">class</span> <span class="nc">SmartphoneNode</span> <span class="o">:</span> <span class="k">public</span> <span class="n">rclcpp</span><span class="o">::</span><span class="n">Node</span><span class="p">{</span>
<span class="nl">public:</span>
    <span class="n">SmartphoneNode</span><span class="p">()</span> <span class="o">:</span> <span class="n">Node</span><span class="p">(</span><span class="s">"smartphone"</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="n">subscriber_</span> <span class="o">=</span> <span class="k">this</span><span class="o">-&gt;</span><span class="n">create_subscription</span><span class="o">&lt;</span><span class="n">example_interfaces</span><span class="o">::</span><span class="n">msg</span><span class="o">::</span><span class="n">String</span><span class="o">&gt;</span><span class="p">(</span>
            <span class="s">"robot_news"</span><span class="p">,</span>
            <span class="mi">10</span><span class="p">,</span>
            <span class="n">std</span><span class="o">::</span><span class="n">bind</span><span class="p">(</span><span class="o">&amp;</span><span class="n">SmartphoneNode</span><span class="o">::</span><span class="n">callbackRobotNews</span><span class="p">,</span> <span class="k">this</span><span class="p">,</span> <span class="n">_1</span><span class="p">)</span>
        <span class="p">);</span>
        <span class="n">RCLCPP_INFO</span><span class="p">(</span><span class="k">this</span><span class="o">-&gt;</span><span class="n">get_logger</span><span class="p">(),</span> <span class="s">"Smartphone has been started."</span><span class="p">);</span>
    <span class="p">}</span>

<span class="nl">private:</span>
    <span class="kt">void</span> <span class="n">callbackRobotNews</span><span class="p">(</span><span class="k">const</span> <span class="n">example_interfaces</span><span class="o">::</span><span class="n">msg</span><span class="o">::</span><span class="n">String</span><span class="o">::</span><span class="n">SharedPtr</span> <span class="n">msg</span><span class="p">){</span>
        <span class="n">RCLCPP_INFO</span><span class="p">(</span><span class="k">this</span><span class="o">-&gt;</span><span class="n">get_logger</span><span class="p">(),</span> <span class="s">"%s"</span><span class="p">,</span> <span class="n">msg</span><span class="o">-&gt;</span><span class="n">data</span><span class="p">.</span><span class="n">c_str</span><span class="p">());</span>
    <span class="p">}</span>

    <span class="n">rclcpp</span><span class="o">::</span><span class="n">Subscription</span><span class="o">&lt;</span><span class="n">example_interfaces</span><span class="o">::</span><span class="n">msg</span><span class="o">::</span><span class="n">String</span><span class="o">&gt;::</span><span class="n">SharedPtr</span> <span class="n">subscriber_</span><span class="p">;</span>
<span class="p">};</span>

<span class="kt">int</span> <span class="n">main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="kt">char</span> <span class="o">**</span><span class="n">argv</span><span class="p">){</span>
    <span class="n">rclcpp</span><span class="o">::</span><span class="n">init</span><span class="p">(</span><span class="n">argc</span><span class="p">,</span> <span class="n">argv</span><span class="p">);</span>
    <span class="k">auto</span> <span class="n">node</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="n">make_shared</span><span class="o">&lt;</span><span class="n">SmartphoneNode</span><span class="o">&gt;</span><span class="p">();</span>
    <span class="n">rclcpp</span><span class="o">::</span><span class="n">spin</span><span class="p">(</span><span class="n">node</span><span class="p">);</span>
    <span class="n">rclcpp</span><span class="o">::</span><span class="n">shutdown</span><span class="p">();</span>
    <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<h2 id="bags">Bags</h2>

<p>Suppose you are building robot software with ROS 2 and a robot. Then you need the robot to code and test with. But “Bag” provides very handy features in this case. ROS 2 Bag can save data from topic with any amount of time, then can replay these data as many times as you want.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Help</span>
ros2 bag <span class="nt">-h</span>

<span class="c"># Record topics</span>
ros2 bag record &lt;topic name 1&gt; &lt;topic name 2&gt; ...
<span class="c"># Record topics with custom record name</span>
ros2 bag record <span class="nt">-o</span> &lt;record name&gt; &lt;topic name 1&gt; &lt;topic name 2&gt; ...
<span class="c"># Record all topics</span>
ros2 bag record <span class="nt">-a</span>

<span class="c"># Play a record</span>
ros2 bag play &lt;record name&gt;

<span class="c"># Print record Information</span>
ros2 bag info &lt;record name&gt;
</code></pre></div></div>]]></content><author><name>Jaehyun Jeong</name><email>jj.for.jaehyun@gmail.com</email></author><category term="ros2" /><summary type="html"><![CDATA[A Topic is a receiver of a signal from a publisher (node). The publisher is able to send data to the topic while not knowing which subscribers(nodes) receive this data. Similarly, subscribers do not know which nodes send the data to the topic. On top of that, Nodes’ capability of sending data is not restricted to sending to single topic but sending to multiple topics to different topics. In addition to that, the data stream is unidirectional. Data can be sent to subscriber but cannot be returned to the publisher. Technically, ROS 2 messages are transferred using middleware named DDS. However, users do not need to handle DDS as libraries such as RCL provide abstraction. Simple Python code Publisher #!/usr/bin/env python3 import rclpy from rclpy.node import Node from example_interfaces.msg import String class RobotNewsStationNode(Node): def __init__(self): super().__init__("robot_news_station") # Choosing the same node name with file name is quite common. self.robot_name = "C3PO" self.publisher_ = self.create_publisher( String, "robot_news", 10 ) # 0.5 means twice per seconds self.timer_ = self.create_timer(0.5, self.publish_news) self.get_logger().info("Robot News Station has been started.") def publish_news(self): msg = String() msg.data = f"Hi, this is {self.robot_name} from the robot news station." self.publisher_.publish(msg) def main(args=None): rclpy.init(args=args) node = RobotNewsStationNode() rclpy.spin(node) rclpy.shutdown() if __name__ == "__main__": main() NOTE: Do not forget to add “example_interfaces” library in the package.xml file for String message type and install the node in the setup.py. Subscriber #!/usr/bin/env python3 import rclpy from rclpy.node import Node from example_interfaces.msg import String class SmartphoneNode(Node): def __init__(self): super().__init__("smartphone") self.subscriber_ = self.create_subscription( String, "robot_news", # When the subscriber receives the message self.callback_robot_news, # queue size 10 ) self.get_logger().info("Smartphone has been started.") def callback_robot_news(self, msg: String): self.get_logger().info(msg.data) def main(args=None): rclpy.init(args=args) node = SmartphoneNode() rclpy.spin(node) rclpy.shutdown() if __name__ == "__main__": main() Simple C++ code Publisher #include "rclcpp/rclcpp.hpp" #include "example_interfaces/msg/string.hpp" using namespace std::chrono_literals; class RobotNewsStationNode : public rclcpp::Node{ public: RobotNewsStationNode() : Node("robot_news_station"), robot_name_("R2D2") { publisher_ = this-&gt;create_publisher&lt;example_interfaces::msg::String&gt;("robot_news", 10); timer_ = this-&gt;create_wall_timer(0.5s, std::bind(&amp;RobotNewsStationNode::publishNews, this)); RCLCPP_INFO(this-&gt;get_logger(), "Robot News Station has been started"); } private: void publishNews(){ auto msg = example_interfaces::msg::String(); msg.data = std::string("Hi, this is ") + robot_name_ + std::string(" from the robot news station."); publisher_-&gt;publish(msg); } std::string robot_name_; rclcpp::Publisher&lt;example_interfaces::msg::String&gt;::SharedPtr publisher_; rclcpp::TimerBase::SharedPtr timer_; }; int main(int argc, char **argv){ rclcpp::init(argc, argv); auto node = std::make_shared&lt;RobotNewsStationNode&gt;(); rclcpp::spin(node); rclcpp::shutdown(); return 0; } Subscriber #include "rclcpp/rclcpp.hpp" #include "example_interfaces/msg/string.hpp" using namespace std::placeholders; class SmartphoneNode : public rclcpp::Node{ public: SmartphoneNode() : Node("smartphone") { subscriber_ = this-&gt;create_subscription&lt;example_interfaces::msg::String&gt;( "robot_news", 10, std::bind(&amp;SmartphoneNode::callbackRobotNews, this, _1) ); RCLCPP_INFO(this-&gt;get_logger(), "Smartphone has been started."); } private: void callbackRobotNews(const example_interfaces::msg::String::SharedPtr msg){ RCLCPP_INFO(this-&gt;get_logger(), "%s", msg-&gt;data.c_str()); } rclcpp::Subscription&lt;example_interfaces::msg::String&gt;::SharedPtr subscriber_; }; int main(int argc, char **argv){ rclcpp::init(argc, argv); auto node = std::make_shared&lt;SmartphoneNode&gt;(); rclcpp::spin(node); rclcpp::shutdown(); return 0; } Bags Suppose you are building robot software with ROS 2 and a robot. Then you need the robot to code and test with. But “Bag” provides very handy features in this case. ROS 2 Bag can save data from topic with any amount of time, then can replay these data as many times as you want. # Help ros2 bag -h # Record topics ros2 bag record &lt;topic name 1&gt; &lt;topic name 2&gt; ... # Record topics with custom record name ros2 bag record -o &lt;record name&gt; &lt;topic name 1&gt; &lt;topic name 2&gt; ... # Record all topics ros2 bag record -a # Play a record ros2 bag play &lt;record name&gt; # Print record Information ros2 bag info &lt;record name&gt;]]></summary></entry><entry><title type="html">Basic Commands for ROS 2</title><link href="https://jaehyun-jeong.github.io/2025/12/06/ros2-commands.html" rel="alternate" type="text/html" title="Basic Commands for ROS 2" /><published>2025-12-06T00:00:00+09:00</published><updated>2025-12-06T00:00:00+09:00</updated><id>https://jaehyun-jeong.github.io/2025/12/06/ros2-commands</id><content type="html" xml:base="https://jaehyun-jeong.github.io/2025/12/06/ros2-commands.html"><![CDATA[<h2 id="run-nodes">Run nodes</h2>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ros2 run &lt;package name&gt; &lt;node name&gt;
</code></pre></div></div>

<p><strong>NOTE: “-h” option shows arguments and options like below</strong></p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ros2 <span class="nt">-h</span>
ros2 run <span class="nt">-h</span>
ros2 node <span class="nt">-h</span>
</code></pre></div></div>

<h2 id="checking-running-nodes">Checking running nodes</h2>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ros2 node list
</code></pre></div></div>
<p><em>Check running nodes</em></p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ros2 node info &lt;node name&gt;
</code></pre></div></div>

<p><strong>WARNING: It is not encouraged to run two nodes with identical names. These could run at the same time, but they will show the message like below</strong></p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>WARNING: Be aware that there are nodes <span class="k">in </span>the graph that share an exact name, which can have unintended side effects.
</code></pre></div></div>

<h2 id="running-nodes-with-the-same-node-name">Running nodes with the same node name</h2>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ros2 run &lt;package name&gt; &lt;node name&gt; <span class="nt">--ros-args</span> <span class="nt">-r</span> __node:<span class="o">=</span>&lt;new node name&gt;
</code></pre></div></div>
<p><em>“-r” can be replaced with “–remap”</em></p>

<h2 id="building-commands">Building commands</h2>

<p>The basic build command is</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>colcon build
</code></pre></div></div>
<p><em>Build all packages</em></p>

<p><strong>NOTE: The build command should only be executed in the project folder that contains the src folder</strong></p>

<p>The command below builds only the selected package.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>colcon build <span class="nt">--packages-select</span> &lt;package name&gt;
</code></pre></div></div>

<h2 id="building-commands-only-for-python">Building commands only for Python</h2>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>colcon build <span class="nt">--packages-select</span> &lt;package name&gt; <span class="nt">--symlink-install</span>
</code></pre></div></div>
<p>”–symlink-install” option makes the package run with source file. Therefore, rebuilding is unnecessary when the Python file changed.</p>

<h2 id="ros-2-with-gui">ROS 2 with GUI</h2>

<p>The commands below open the GUI tools for ROS 2.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>rqt
rqt_graph  <span class="c"># Shows a graph of packages</span>
</code></pre></div></div>

<h2 id="topics">Topics</h2>

<p>The command below shows currently running topics.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ros2 topic list
</code></pre></div></div>

<p>To see what topic is recieving, run the command below.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ros2 topic <span class="nb">echo</span> &lt;topic name&gt;
</code></pre></div></div>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ros2 topic info &lt;topic name&gt;
</code></pre></div></div>

<p>The commands below print frequency and bandwidth of the topic.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ros2 topic hz &lt;topic name&gt;
ros2 topic bw &lt;topic name&gt;
</code></pre></div></div>

<p>The command below instantly publish a topic.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ros2 topic pub <span class="nt">-r</span> &lt;seconds&gt; &lt;topic name&gt; &lt;interface name&gt; &lt;data&gt;
<span class="c"># Like this one</span>
ros2 topic pub <span class="nt">-r</span> 5 /robot_news example_interfaces/msg/String <span class="s2">"{data: 'Hello from the terminal'}"</span>
</code></pre></div></div>

<p>The command below can change the topic name.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ros2 run my_py_pkg robot_news_station <span class="nt">--ros-args</span> <span class="nt">-r</span> __node:<span class="o">=</span>my_station <span class="nt">-r</span> robot_news:<span class="o">=</span>abc
</code></pre></div></div>
<p><em>robot_news to abc</em></p>

<p><strong>NOTE: In the same way, node, topic publisher, and topic reciever can be remaped with -r option.</strong></p>

<h2 id="interfaces">Interfaces</h2>

<p>The command below returns the interface information.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ros2 interface &lt;interface name&gt;
</code></pre></div></div>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ros2 interface show geometry_msgs/msg/Twist
</code></pre></div></div>

<h2 id="bags">Bags</h2>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Help</span>
ros2 bag <span class="nt">-h</span>

<span class="c"># Record topics</span>
ros2 bag record &lt;topic name 1&gt; &lt;topic name 2&gt; ...
<span class="c"># Record topics with custom record name</span>
ros2 bag record <span class="nt">-o</span> &lt;record name&gt; &lt;topic name 1&gt; &lt;topic name 2&gt; ...
<span class="c"># Record all topics</span>
ros2 bag record <span class="nt">-a</span>

<span class="c"># Play a record</span>
ros2 bag play &lt;record name&gt;

<span class="c"># Print record Information</span>
ros2 bag info &lt;record name&gt;
</code></pre></div></div>

<h2 id="services">Services</h2>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ros2 service call &lt;server node name&gt; &lt;interface name&gt; &lt;request&gt;
<span class="c"># Like this one</span>
ros2 service call /add_two_ints example_interfaces/srv/AddTwoInts <span class="s2">"{a: 3, b: 7}"</span>
</code></pre></div></div>]]></content><author><name>Jaehyun Jeong</name><email>jj.for.jaehyun@gmail.com</email></author><category term="ros2" /><summary type="html"><![CDATA[Run nodes ros2 run &lt;package name&gt; &lt;node name&gt; NOTE: “-h” option shows arguments and options like below ros2 -h ros2 run -h ros2 node -h Checking running nodes ros2 node list Check running nodes ros2 node info &lt;node name&gt; WARNING: It is not encouraged to run two nodes with identical names. These could run at the same time, but they will show the message like below WARNING: Be aware that there are nodes in the graph that share an exact name, which can have unintended side effects. Running nodes with the same node name ros2 run &lt;package name&gt; &lt;node name&gt; --ros-args -r __node:=&lt;new node name&gt; “-r” can be replaced with “–remap” Building commands The basic build command is colcon build Build all packages NOTE: The build command should only be executed in the project folder that contains the src folder The command below builds only the selected package. colcon build --packages-select &lt;package name&gt; Building commands only for Python colcon build --packages-select &lt;package name&gt; --symlink-install ”–symlink-install” option makes the package run with source file. Therefore, rebuilding is unnecessary when the Python file changed. ROS 2 with GUI The commands below open the GUI tools for ROS 2. rqt rqt_graph # Shows a graph of packages Topics The command below shows currently running topics. ros2 topic list To see what topic is recieving, run the command below. ros2 topic echo &lt;topic name&gt; ros2 topic info &lt;topic name&gt; The commands below print frequency and bandwidth of the topic. ros2 topic hz &lt;topic name&gt; ros2 topic bw &lt;topic name&gt; The command below instantly publish a topic. ros2 topic pub -r &lt;seconds&gt; &lt;topic name&gt; &lt;interface name&gt; &lt;data&gt; # Like this one ros2 topic pub -r 5 /robot_news example_interfaces/msg/String "{data: 'Hello from the terminal'}" The command below can change the topic name. ros2 run my_py_pkg robot_news_station --ros-args -r __node:=my_station -r robot_news:=abc robot_news to abc NOTE: In the same way, node, topic publisher, and topic reciever can be remaped with -r option. Interfaces The command below returns the interface information. ros2 interface &lt;interface name&gt; ros2 interface show geometry_msgs/msg/Twist Bags # Help ros2 bag -h # Record topics ros2 bag record &lt;topic name 1&gt; &lt;topic name 2&gt; ... # Record topics with custom record name ros2 bag record -o &lt;record name&gt; &lt;topic name 1&gt; &lt;topic name 2&gt; ... # Record all topics ros2 bag record -a # Play a record ros2 bag play &lt;record name&gt; # Print record Information ros2 bag info &lt;record name&gt; Services ros2 service call &lt;server node name&gt; &lt;interface name&gt; &lt;request&gt; # Like this one ros2 service call /add_two_ints example_interfaces/srv/AddTwoInts "{a: 3, b: 7}"]]></summary></entry><entry><title type="html">Basic Guide to build and run ROS 2 Nodes (Python &amp;amp; C++)</title><link href="https://jaehyun-jeong.github.io/2025/12/06/ros2-nodes.html" rel="alternate" type="text/html" title="Basic Guide to build and run ROS 2 Nodes (Python &amp;amp; C++)" /><published>2025-12-06T00:00:00+09:00</published><updated>2025-12-06T00:00:00+09:00</updated><id>https://jaehyun-jeong.github.io/2025/12/06/ros2-nodes</id><content type="html" xml:base="https://jaehyun-jeong.github.io/2025/12/06/ros2-nodes.html"><![CDATA[<p>Nodes are subprograms in an application, responsible for only one thing. Nodes communicate with each other through topics, services, and parameters. Like OOP, nodes reduce code <strong>complexity</strong>, and provide low <strong>fault tolerance</strong>. Even further, nodes can be written in <strong>many different programming languages</strong> including Python and C++. Nodes should have a single purpose while communicating each nodes.</p>

<p>In ros2, a package is an independent unit in an application. packages contain nodes, enabling inter-package communication.</p>

<pre><code class="language-mermaid">flowchart TB

  %% -------------------------
  %% Package 1
  %% -------------------------
  subgraph Pkg1["sensing_pkg"]
    direction TB
    S1["Node: sensor_reader"]
    S2["Node: imu_reader"]
  end

  %% -------------------------
  %% Package 2
  %% -------------------------
  subgraph Pkg2["processing_pkg"]
    direction TB
    P1["Node: data_filter"]
    P2["Node: state_estimator"]
  end

  %% -------------------------
  %% Package 3
  %% -------------------------
  subgraph Pkg3["control_pkg"]
    direction TB
    C1["Node: controller"]
  end

  %% -------------------------
  %% Package 4
  %% -------------------------
  subgraph Pkg4["output_pkg"]
    direction TB
    O1["Node: actuator_driver"]
    O2["Node: logger"]
  end

  %% -------------------------
  %% Node-to-node communications
  %% -------------------------
  S1 -- "/sensor_data" --&gt; P1
  S2 -- "/imu_data" --&gt; P1

  P1 -- "/filtered_data" --&gt; P2
  P2 -- "/state" --&gt; C1

  C1 -- "/control_cmd" --&gt; O1
  C1 -- "/status" --&gt; O2
</code></pre>

<h3 id="write-code-for-nodes-python">Write code for nodes. (Python)</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">#!/usr/bin/env python3
</span><span class="kn">import</span> <span class="nn">rclpy</span>
<span class="kn">from</span> <span class="nn">rclpy.node</span> <span class="kn">import</span> <span class="n">Node</span>


<span class="k">class</span> <span class="nc">MyNode</span><span class="p">(</span><span class="n">Node</span><span class="p">):</span>
    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        <span class="nb">super</span><span class="p">().</span><span class="n">__init__</span><span class="p">(</span><span class="s">"py_test"</span><span class="p">)</span>  <span class="c1"># Create a node
</span>        <span class="bp">self</span><span class="p">.</span><span class="n">get_logger</span><span class="p">().</span><span class="n">info</span><span class="p">(</span><span class="s">"Hello world"</span><span class="p">)</span>  <span class="c1"># Logging with the node
</span>        <span class="c1"># Run timer_callback every 1 second.
</span>        <span class="bp">self</span><span class="p">.</span><span class="n">create_timer</span><span class="p">(</span><span class="mf">1.0</span><span class="p">,</span> <span class="bp">self</span><span class="p">.</span><span class="n">timer_callback</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">timer_callback</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">get_logger</span><span class="p">().</span><span class="n">info</span><span class="p">(</span><span class="s">"Hello"</span><span class="p">)</span>


<span class="k">def</span> <span class="nf">main</span><span class="p">(</span><span class="n">args</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span>
    <span class="n">rclpy</span><span class="p">.</span><span class="n">init</span><span class="p">(</span><span class="n">args</span><span class="o">=</span><span class="n">args</span><span class="p">)</span>

    <span class="n">node</span> <span class="o">=</span> <span class="n">MyNode</span><span class="p">()</span>
    <span class="n">rclpy</span><span class="p">.</span><span class="n">spin</span><span class="p">(</span><span class="n">node</span><span class="p">)</span>  <span class="c1"># makes the node keep running
</span>
    <span class="n">rclpy</span><span class="p">.</span><span class="n">shutdown</span><span class="p">()</span>

<span class="k">if</span> <span class="n">__name__</span> <span class="o">==</span> <span class="s">"__main__"</span><span class="p">:</span>
    <span class="n">main</span><span class="p">()</span>
</code></pre></div></div>

<h3 id="install-a-node-to-a-package">Install a node to a package.</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">setuptools</span> <span class="kn">import</span> <span class="n">find_packages</span><span class="p">,</span> <span class="n">setup</span>

<span class="n">package_name</span> <span class="o">=</span> <span class="s">'my_py_pkg'</span>

<span class="n">setup</span><span class="p">(</span>
    <span class="n">name</span><span class="o">=</span><span class="n">package_name</span><span class="p">,</span>
    <span class="n">version</span><span class="o">=</span><span class="s">'0.0.0'</span><span class="p">,</span>
    <span class="n">packages</span><span class="o">=</span><span class="n">find_packages</span><span class="p">(</span><span class="n">exclude</span><span class="o">=</span><span class="p">[</span><span class="s">'test'</span><span class="p">]),</span>
    <span class="n">data_files</span><span class="o">=</span><span class="p">[</span>
        <span class="p">(</span><span class="s">'share/ament_index/resource_index/packages'</span><span class="p">,</span>
            <span class="p">[</span><span class="s">'resource/'</span> <span class="o">+</span> <span class="n">package_name</span><span class="p">]),</span>
        <span class="p">(</span><span class="s">'share/'</span> <span class="o">+</span> <span class="n">package_name</span><span class="p">,</span> <span class="p">[</span><span class="s">'package.xml'</span><span class="p">]),</span>
    <span class="p">],</span>
    <span class="n">install_requires</span><span class="o">=</span><span class="p">[</span><span class="s">'setuptools'</span><span class="p">],</span>
    <span class="n">zip_safe</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
    <span class="n">maintainer</span><span class="o">=</span><span class="s">'jj'</span><span class="p">,</span>
    <span class="n">maintainer_email</span><span class="o">=</span><span class="s">'jj@todo.todo'</span><span class="p">,</span>
    <span class="n">description</span><span class="o">=</span><span class="s">'TODO: Package description'</span><span class="p">,</span>
    <span class="n">license</span><span class="o">=</span><span class="s">'TODO: License declaration'</span><span class="p">,</span>
    <span class="n">extras_require</span><span class="o">=</span><span class="p">{</span>
        <span class="s">'test'</span><span class="p">:</span> <span class="p">[</span>
            <span class="s">'pytest'</span><span class="p">,</span>
        <span class="p">],</span>
    <span class="p">},</span>
    <span class="s">''' Where I've changed '''</span>
    <span class="n">entry_points</span><span class="o">=</span><span class="p">{</span>
        <span class="s">'console_scripts'</span><span class="p">:</span> <span class="p">[</span>
            <span class="s">"py_node = my_py_pkg.my_first_node:main"</span>  <span class="c1"># node_name = path_to_py_file:function_name
</span>        <span class="p">],</span>
    <span class="p">},</span>
<span class="p">)</span>
</code></pre></div></div>
<p><em>setup.py</em></p>

<p>and run below so that you can</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>colcon build <span class="nt">--packages-select</span> my_py_pkg  <span class="c"># Build with a node</span>
<span class="nb">source</span> ./install/setup.bash  <span class="c"># Run setup.bash whenever finished building.</span>
ros2 run my_py_pkg py_node
</code></pre></div></div>

<p>You should see output similar to the following.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">[</span>INFO] <span class="o">[</span>1764994343.482507922] <span class="o">[</span>py_test]: Hello world
</code></pre></div></div>

<p><strong>NOTE: py_test is a “node name” and py_node is an “execution name”. Node name is defined in the node Python code, and excution name is defined in the setup.py file</strong></p>

<p><strong>NOTE: Remember that below commands should be run every time the code for the node is fixed.</strong><br /></p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>colcon build <span class="nt">--packages-select</span> my_py_pkg
<span class="nb">source</span> ./install/setup.bash
ros2 run my_py_pkg my_node
</code></pre></div></div>

<h3 id="write-code-for-nodes-c">Write code for nodes. (C++)</h3>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">"rclcpp/rclcpp.hpp"</span><span class="cp">
</span>
<span class="k">class</span> <span class="nc">MyNode</span> <span class="o">:</span> <span class="k">public</span> <span class="n">rclcpp</span><span class="o">::</span><span class="n">Node</span><span class="p">{</span>
<span class="nl">public:</span>
    <span class="n">MyNode</span><span class="p">()</span> <span class="o">:</span> <span class="n">Node</span><span class="p">(</span><span class="s">"cpp_test"</span><span class="p">),</span> <span class="n">counter_</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="n">RCLCPP_INFO</span><span class="p">(</span><span class="k">this</span><span class="o">-&gt;</span><span class="n">get_logger</span><span class="p">(),</span> <span class="s">"Hello world"</span><span class="p">);</span>
        <span class="n">timer_</span> <span class="o">=</span> <span class="k">this</span><span class="o">-&gt;</span><span class="n">create_wall_timer</span><span class="p">(</span>
            <span class="n">std</span><span class="o">::</span><span class="n">chrono</span><span class="o">::</span><span class="n">seconds</span><span class="p">(</span><span class="mi">1</span><span class="p">),</span>
            <span class="n">std</span><span class="o">::</span><span class="n">bind</span><span class="p">(</span><span class="o">&amp;</span><span class="n">MyNode</span><span class="o">::</span><span class="n">timerCallback</span><span class="p">,</span> <span class="k">this</span><span class="p">)</span>
        <span class="p">);</span>
    <span class="p">}</span>
<span class="nl">private:</span>
    <span class="kt">void</span> <span class="n">timerCallback</span><span class="p">(){</span>
        <span class="n">RCLCPP_INFO</span><span class="p">(</span><span class="k">this</span><span class="o">-&gt;</span><span class="n">get_logger</span><span class="p">(),</span> <span class="s">"Hello %d"</span><span class="p">,</span> <span class="n">counter_</span><span class="p">);</span>
        <span class="n">counter_</span><span class="o">++</span><span class="p">;</span>
    <span class="p">}</span>
    <span class="n">rclcpp</span><span class="o">::</span><span class="n">TimerBase</span><span class="o">::</span><span class="n">SharedPtr</span> <span class="n">timer_</span><span class="p">;</span>
    <span class="kt">int</span> <span class="n">counter_</span><span class="p">;</span>
<span class="p">};</span>

<span class="kt">int</span> <span class="n">main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="kt">char</span> <span class="o">**</span><span class="n">argv</span><span class="p">){</span>
    <span class="n">rclcpp</span><span class="o">::</span><span class="n">init</span><span class="p">(</span><span class="n">argc</span><span class="p">,</span> <span class="n">argv</span><span class="p">);</span>

    <span class="k">auto</span> <span class="n">node</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="n">make_shared</span><span class="o">&lt;</span><span class="n">MyNode</span><span class="o">&gt;</span><span class="p">();</span>
    <span class="n">rclcpp</span><span class="o">::</span><span class="n">spin</span><span class="p">(</span><span class="n">node</span><span class="p">);</span>

    <span class="n">rclcpp</span><span class="o">::</span><span class="n">shutdown</span><span class="p">();</span>
    <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cmake_minimum_required(VERSION 3.8)
project(my_cpp_pkg)

if(CMAKE_COMPILER_IS_GNUCXX OR CMAKE_CXX_COMPILER_ID MATCHES "Clang")
  add_compile_options(-Wall -Wextra -Wpedantic)
endif()

# find dependencies
find_package(ament_cmake REQUIRED)
find_package(rclcpp REQUIRED)

# Where I've changed
add_executable(cpp_node src/my_first_node.cpp)
ament_target_dependencies(cpp_node rclcpp)

install(TARGETS
  cpp_node
  DESTINATION lib/${PROJECT_NAME}
)

ament_package()
</code></pre></div></div>
<p><em>my_cpp_pkg/CMakeLists.txt</em></p>]]></content><author><name>Jaehyun Jeong</name><email>jj.for.jaehyun@gmail.com</email></author><category term="ros2" /><summary type="html"><![CDATA[Nodes are subprograms in an application, responsible for only one thing. Nodes communicate with each other through topics, services, and parameters. Like OOP, nodes reduce code complexity, and provide low fault tolerance. Even further, nodes can be written in many different programming languages including Python and C++. Nodes should have a single purpose while communicating each nodes. In ros2, a package is an independent unit in an application. packages contain nodes, enabling inter-package communication. flowchart TB %% ------------------------- %% Package 1 %% ------------------------- subgraph Pkg1["sensing_pkg"] direction TB S1["Node: sensor_reader"] S2["Node: imu_reader"] end %% ------------------------- %% Package 2 %% ------------------------- subgraph Pkg2["processing_pkg"] direction TB P1["Node: data_filter"] P2["Node: state_estimator"] end %% ------------------------- %% Package 3 %% ------------------------- subgraph Pkg3["control_pkg"] direction TB C1["Node: controller"] end %% ------------------------- %% Package 4 %% ------------------------- subgraph Pkg4["output_pkg"] direction TB O1["Node: actuator_driver"] O2["Node: logger"] end %% ------------------------- %% Node-to-node communications %% ------------------------- S1 -- "/sensor_data" --&gt; P1 S2 -- "/imu_data" --&gt; P1 P1 -- "/filtered_data" --&gt; P2 P2 -- "/state" --&gt; C1 C1 -- "/control_cmd" --&gt; O1 C1 -- "/status" --&gt; O2 Write code for nodes. (Python) #!/usr/bin/env python3 import rclpy from rclpy.node import Node class MyNode(Node): def __init__(self): super().__init__("py_test") # Create a node self.get_logger().info("Hello world") # Logging with the node # Run timer_callback every 1 second. self.create_timer(1.0, self.timer_callback) def timer_callback(self): self.get_logger().info("Hello") def main(args=None): rclpy.init(args=args) node = MyNode() rclpy.spin(node) # makes the node keep running rclpy.shutdown() if __name__ == "__main__": main() Install a node to a package. from setuptools import find_packages, setup package_name = 'my_py_pkg' setup( name=package_name, version='0.0.0', packages=find_packages(exclude=['test']), data_files=[ ('share/ament_index/resource_index/packages', ['resource/' + package_name]), ('share/' + package_name, ['package.xml']), ], install_requires=['setuptools'], zip_safe=True, maintainer='jj', maintainer_email='jj@todo.todo', description='TODO: Package description', license='TODO: License declaration', extras_require={ 'test': [ 'pytest', ], }, ''' Where I've changed ''' entry_points={ 'console_scripts': [ "py_node = my_py_pkg.my_first_node:main" # node_name = path_to_py_file:function_name ], }, ) setup.py and run below so that you can colcon build --packages-select my_py_pkg # Build with a node source ./install/setup.bash # Run setup.bash whenever finished building. ros2 run my_py_pkg py_node You should see output similar to the following. [INFO] [1764994343.482507922] [py_test]: Hello world NOTE: py_test is a “node name” and py_node is an “execution name”. Node name is defined in the node Python code, and excution name is defined in the setup.py file NOTE: Remember that below commands should be run every time the code for the node is fixed. colcon build --packages-select my_py_pkg source ./install/setup.bash ros2 run my_py_pkg my_node Write code for nodes. (C++) #include "rclcpp/rclcpp.hpp" class MyNode : public rclcpp::Node{ public: MyNode() : Node("cpp_test"), counter_(0) { RCLCPP_INFO(this-&gt;get_logger(), "Hello world"); timer_ = this-&gt;create_wall_timer( std::chrono::seconds(1), std::bind(&amp;MyNode::timerCallback, this) ); } private: void timerCallback(){ RCLCPP_INFO(this-&gt;get_logger(), "Hello %d", counter_); counter_++; } rclcpp::TimerBase::SharedPtr timer_; int counter_; }; int main(int argc, char **argv){ rclcpp::init(argc, argv); auto node = std::make_shared&lt;MyNode&gt;(); rclcpp::spin(node); rclcpp::shutdown(); return 0; } cmake_minimum_required(VERSION 3.8) project(my_cpp_pkg) if(CMAKE_COMPILER_IS_GNUCXX OR CMAKE_CXX_COMPILER_ID MATCHES "Clang") add_compile_options(-Wall -Wextra -Wpedantic) endif() # find dependencies find_package(ament_cmake REQUIRED) find_package(rclcpp REQUIRED) # Where I've changed add_executable(cpp_node src/my_first_node.cpp) ament_target_dependencies(cpp_node rclcpp) install(TARGETS cpp_node DESTINATION lib/${PROJECT_NAME} ) ament_package() my_cpp_pkg/CMakeLists.txt]]></summary></entry><entry><title type="html">Monte Carlo Tree Search</title><link href="https://jaehyun-jeong.github.io/2025/09/24/monte-carlo-tree-search.html" rel="alternate" type="text/html" title="Monte Carlo Tree Search" /><published>2025-09-24T00:00:00+09:00</published><updated>2025-09-24T00:00:00+09:00</updated><id>https://jaehyun-jeong.github.io/2025/09/24/monte-carlo-tree-search</id><content type="html" xml:base="https://jaehyun-jeong.github.io/2025/09/24/monte-carlo-tree-search.html"><![CDATA[<h2 id="intro">Intro</h2>

<p>Monte Carlo Tree Search (MCTS) works well in practice but poses theoretical challenges.
In this writing, I want to describe MCTS algorithm, and why this algorithm works.</p>

<p>Open-loop planning algorithms like MCTS, can plan future actions from an initial state $s_0$. They assume access to a model of the environment, either stochastically or deterministically. By contrast, typical reinforcement learning algorithm is closed-loop: at each time step it selects an action based on the current state $s_t$</p>

<h2 id="intuition">Intuition</h2>

<p>MCTS selects the action expected to yeild the highest return. However, evaluating every state’s value is usually infeasible; if all values were known, the agent could simply choose the best action. Thus we must estimate values, and <strong>rollout</strong> make this possible. A single rollout can be inaccurate, but MCTS mitigates this by repeating rollouts and expanding the tree: as the number of visits increases, the estimates become more accurate.</p>

<p>For deep trees, computation can be expensive: with a fixed branching factor (number of actions), the cost grows exponentially with depth. I believe there might be some sort of trade-off methologies. I plan to discuss these after further study.</p>

<h2 id="pseudocode">Pseudocode</h2>

<pre class="pseudocode">
\begin{algorithm}
\caption{MCTS: Selection–Expansion–Rollout (from $s_0$)}
\begin{algorithmic}
\STATE current $\leftarrow$ $s_0$
\WHILE{not Leaf(current)}
  \STATE current $\leftarrow$ $\arg\max_{s_i \in \mathcal{C}(\text{current})}\; \text{UCB1}(s_i)$
\ENDWHILE
\IF{$N(\text{current}) = 0$}
  \Return Rollout(current) \Comment{unvisited leaf $\rightarrow$ rollout}
\ELSE
  \FOR{each action $a$ available from current}
    \STATE addNewStateToTree(current, $a$)
  \ENDFOR
  \STATE current $\leftarrow$ firstNewChild(current)
  \Return Rollout(current) \Comment{expand then rollout}
\ENDIF
\end{algorithmic}
\end{algorithm}
</pre>

<pre class="pseudocode">
\begin{algorithm}
\caption{Rollout$(s_i)$}
\begin{algorithmic}
\STATE $s \gets s_i$
\WHILE{true}
  \IF{$\operatorname{Terminal}(s)$}
    \Return $\operatorname{Value}(s)$
  \ENDIF
  \STATE $a \gets \operatorname{Random}(\operatorname{AvailableActions}(s))$
  \STATE $s \gets \operatorname{Simulate}(a, s)$
\ENDWHILE
\end{algorithmic}
\end{algorithm}
</pre>

<p><strong>Note.</strong> The pseudocode above shows how to run MCTS algorithm step by step so that the agent can choose the action with the highest estimated value. A node’s value is typically computed as the averaged sum of leaves’ values including the node’s own value.</p>

<blockquote>
  <h3 id="ucb1">UCB1</h3>

\[\text{UCB1}(s_t) = \frac{Q(s_t)}{N(s_t)} + C\sqrt{\frac{ln(N(s_{t-1}))}{N(s_t)}}\]

  <p>$ Q(s_t) $: cumulative return</p>

  <p>$ N(s_t) $: The number of visits</p>

  <p>The first term is the empirical mean (exploitation).<br />
The second term encourages exploration of less-visited nodes and shrinks as $N(s_t)$ grows.</p>
</blockquote>

<h2 id="example">Example</h2>

<style>
  /* 3행×2열 그리드 (반응형: 좁으면 1열) */
  .mmd-grid {
    display: grid;
    grid-template-columns: repeat(2, minmax(320px, 1fr));
    gap: 16px;
    align-items: start;
  }
  @media (max-width: 720px) {
    .mmd-grid { grid-template-columns: 1fr; }
  }

  /* 카드 + 제목 */
  .mmd-card {
    border: 1px solid #D9F99D;
    border-radius: 10px;
    padding: 12px 12px 16px;
    background: #F2FCE7; /* 연노랑 배경(원하면 변경) */
  }
  .mmd-title {
    margin: 0 0 8px;
    text-align: center;
    font-weight: 600;
    font-size: 14px;
  }

  /* Mermaid 중앙 정렬 */
  .mmd-card .mermaid { display: grid; place-items: center; }
  .mmd-card .mermaid > svg { display: block; margin: 0 auto; max-width: 100%; height: auto; }
</style>

<div class="mmd-grid">

  <!-- 1 -->
  <section class="mmd-card">
    <h4 class="mmd-title">Step 0: Initialization</h4>
    <div class="mermaid">
    %%{init:{'flowchart':{'useMaxWidth':false,'htmlLabels':true}}}%%
    flowchart TB
      subgraph T0[" "]
      direction TB
        t0_s0["s0<br />Q=0<br />N=0"]
      end
    </div>
  </section>

  <!-- 2 -->
  <section class="mmd-card">
    <h4 class="mmd-title">Step 1: Expand</h4>
    <div class="mermaid">
    %%{init:{'flowchart':{'useMaxWidth':false,'htmlLabels':true}}}%%
    flowchart TB
      subgraph T1[" "]
      direction TB
        t1_s0["s0<br />Q=0.00<br />N=0"]
        t1_s0 --&gt;|a1 = 0| t1_s1L["s1<br />Q=0<br />N=0"]
        t1_s0 --&gt;|a1 = 1| t1_s1R["s1<br />Q=0<br />N=0"]
      end
    </div>
  </section>

  <!-- 3 -->
  <section class="mmd-card">
    <h4 class="mmd-title">Step 2: Rollout &amp; Backpropagation</h4>
    <div class="mermaid">
    %%{init:{'flowchart':{'useMaxWidth':false,'htmlLabels':true}}}%%
    flowchart TB
      subgraph T2[" "]
      direction TB
        t2_s0["s0<br />Q=20<br />N=1"]
        t2_s0 --&gt;|a1 = 0| t2_s1L["s1<br />Q=20<br />N=1"]
        t2_s0 --&gt;|a1 = 1| t2_s1R["s1<br />Q=0<br />N=0"]
        t2_s1L -.-&gt;|"π(a_t &#124; s_t)"| t2_terL["s_ter"]
      end
    </div>
  </section>

  <!-- 4 -->
  <section class="mmd-card">
    <h4 class="mmd-title">Step 3: Rollout &amp; Backpropagation</h4>
    <div class="mermaid">
    %%{init:{'flowchart':{'useMaxWidth':false,'htmlLabels':true}}}%%
    flowchart TB
      subgraph T2[" "]
      direction TB
        t2_s0["s0<br />Q=15<br />N=2"]
        t2_s0 --&gt;|a1 = 0| t2_s1L["s1<br />Q=20<br />N=1"]
        t2_s0 --&gt;|a1 = 1| t2_s1R["s1<br />Q=10<br />N=1"]
        t2_s1R -.-&gt;|"π(a_t &#124; s_t)"| t2_terR["s_ter"]
      end
    </div>
  </section>

  <!-- 5 -->
  <section class="mmd-card">
    <h4 class="mmd-title">Step 4: Expand</h4>
    <div class="mermaid">
    %%{init:{'flowchart':{'useMaxWidth':false,'htmlLabels':true}}}%%
    flowchart TB
      subgraph T3[" "]
      direction TB
        t3_s0["s0<br />Q=15<br />N=2"]
        t3_s0 --&gt;|a1 = 0| t3_s1L["s1<br />Q=20<br />N=1"]
        t3_s0 --&gt;|a1 = 1| t3_s1R["s1<br />Q=10<br />N=1"]
        t3_s1L --&gt;|a2 = 0| t3_s2LL["s2<br />Q=0<br />N=0"]
        t3_s1L --&gt;|a2 = 1| t3_s2LR["s2<br />Q=0<br />N=0"]
      end
    </div>
  </section>

  <!-- 6 -->
  <section class="mmd-card">
    <h4 class="mmd-title">Step 5: Rollout &amp; Backpropagation</h4>
    <div class="mermaid">
    %%{init:{'flowchart':{'useMaxWidth':false,'htmlLabels':true}}}%%
    flowchart TB
      subgraph T4[" "]
      direction TB
        t4_s0["s0<br />Q=10<br />N=3"]
        t4_s0 --&gt;|a1 = 0| t4_s1L["s1<br />Q=10<br />N=2"]
        t4_s0 --&gt;|a1 = 1| t4_s1R["s1<br />Q=10<br />N=1"]
        t4_s1L --&gt;|a2 = 0| t4_s2LL["s2<br />Q=0<br />N=1"]
        t4_s1L --&gt;|a2 = 1| t4_s2LR["s2<br />Q=0<br />N=0"]
        t4_s2LL -.-&gt;|"π(a_t &#124; s_t)"| t4_terL["s_ter"]
      end
    </div>
  </section>

  <!-- 7 -->
  <section class="mmd-card">
    <h4 class="mmd-title">Step 6: Expand</h4>
    <div class="mermaid">
    %%{init:{'flowchart':{'useMaxWidth':false,'htmlLabels':true}}}%%
    flowchart TB
      subgraph T5[" "]
      direction TB
        t5_s0["s0<br />Q=10<br />N=3"]
        t5_s0 --&gt;|a1 = 0| t5_s1L["s1<br />Q=10<br />N=2"]
        t5_s0 --&gt;|a1 = 1| t5_s1R["s1<br />Q=10<br />N=1"]
        t5_s1L --&gt;|a2 = 0| t5_s2LL["s2<br />Q=0<br />N=1"]
        t5_s1L --&gt;|a2 = 1| t5_s2LR["s2<br />Q=0<br />N=0"]
        t5_s1R --&gt;|a2 = 0| t5_s2RL["s2<br />Q=0<br />N=0"]
        t5_s1R --&gt;|a2 = 1| t5_s2RR["s2<br />Q=0<br />N=0"]
      end
    </div>
  </section>

  <!-- 8 -->
  <section class="mmd-card">
    <h4 class="mmd-title">Step 7: Rollout &amp; Backpropagation</h4>
    <div class="mermaid">
    %%{init:{'flowchart':{'useMaxWidth':false,'htmlLabels':true}}}%%
    flowchart TB
      subgraph T6[" "]
      direction TB
        t6_s0["s0<br />Q=11<br />N=4"]
        t6_s0 --&gt;|a1 = 0| t6_s1L["s1<br />Q=10<br />N=2"]
        t6_s0 --&gt;|a1 = 1| t6_s1R["s1<br />Q=12<br />N=2"]
        t6_s1L --&gt;|a2 = 0| t6_s2LL["s2<br />Q=0<br />N=1"]
        t6_s1L --&gt;|a2 = 1| t6_s2LR["s2<br />Q=0<br />N=0"]
        t6_s1R --&gt;|a2 = 0| t6_s2RL["s2<br />Q=14<br />N=1"]
        t6_s1R --&gt;|a2 = 1| t6_s2RR["s2<br />Q=0<br />N=0"]
        t6_s2RL -.-&gt;|"π(a_t &#124; s_t)"| t6_terL["s_ter"]
      end
    </div>
  </section>
</div>]]></content><author><name>Jaehyun Jeong</name><email>jj.for.jaehyun@gmail.com</email></author><category term="RL" /><category term="Open-Loop" /><category term="Planning" /><summary type="html"><![CDATA[Intro Monte Carlo Tree Search (MCTS) works well in practice but poses theoretical challenges. In this writing, I want to describe MCTS algorithm, and why this algorithm works. Open-loop planning algorithms like MCTS, can plan future actions from an initial state $s_0$. They assume access to a model of the environment, either stochastically or deterministically. By contrast, typical reinforcement learning algorithm is closed-loop: at each time step it selects an action based on the current state $s_t$ Intuition MCTS selects the action expected to yeild the highest return. However, evaluating every state’s value is usually infeasible; if all values were known, the agent could simply choose the best action. Thus we must estimate values, and rollout make this possible. A single rollout can be inaccurate, but MCTS mitigates this by repeating rollouts and expanding the tree: as the number of visits increases, the estimates become more accurate. For deep trees, computation can be expensive: with a fixed branching factor (number of actions), the cost grows exponentially with depth. I believe there might be some sort of trade-off methologies. I plan to discuss these after further study. Pseudocode \begin{algorithm} \caption{MCTS: Selection–Expansion–Rollout (from $s_0$)} \begin{algorithmic} \STATE current $\leftarrow$ $s_0$ \WHILE{not Leaf(current)} \STATE current $\leftarrow$ $\arg\max_{s_i \in \mathcal{C}(\text{current})}\; \text{UCB1}(s_i)$ \ENDWHILE \IF{$N(\text{current}) = 0$} \Return Rollout(current) \Comment{unvisited leaf $\rightarrow$ rollout} \ELSE \FOR{each action $a$ available from current} \STATE addNewStateToTree(current, $a$) \ENDFOR \STATE current $\leftarrow$ firstNewChild(current) \Return Rollout(current) \Comment{expand then rollout} \ENDIF \end{algorithmic} \end{algorithm} \begin{algorithm} \caption{Rollout$(s_i)$} \begin{algorithmic} \STATE $s \gets s_i$ \WHILE{true} \IF{$\operatorname{Terminal}(s)$} \Return $\operatorname{Value}(s)$ \ENDIF \STATE $a \gets \operatorname{Random}(\operatorname{AvailableActions}(s))$ \STATE $s \gets \operatorname{Simulate}(a, s)$ \ENDWHILE \end{algorithmic} \end{algorithm} Note. The pseudocode above shows how to run MCTS algorithm step by step so that the agent can choose the action with the highest estimated value. A node’s value is typically computed as the averaged sum of leaves’ values including the node’s own value. UCB1 \[\text{UCB1}(s_t) = \frac{Q(s_t)}{N(s_t)} + C\sqrt{\frac{ln(N(s_{t-1}))}{N(s_t)}}\] $ Q(s_t) $: cumulative return $ N(s_t) $: The number of visits The first term is the empirical mean (exploitation). The second term encourages exploration of less-visited nodes and shrinks as $N(s_t)$ grows. Example Step 0: Initialization %%{init:{'flowchart':{'useMaxWidth':false,'htmlLabels':true}}}%% flowchart TB subgraph T0[" "] direction TB t0_s0["s0Q=0N=0"] end Step 1: Expand %%{init:{'flowchart':{'useMaxWidth':false,'htmlLabels':true}}}%% flowchart TB subgraph T1[" "] direction TB t1_s0["s0Q=0.00N=0"] t1_s0 --&gt;|a1 = 0| t1_s1L["s1Q=0N=0"] t1_s0 --&gt;|a1 = 1| t1_s1R["s1Q=0N=0"] end Step 2: Rollout &amp; Backpropagation %%{init:{'flowchart':{'useMaxWidth':false,'htmlLabels':true}}}%% flowchart TB subgraph T2[" "] direction TB t2_s0["s0Q=20N=1"] t2_s0 --&gt;|a1 = 0| t2_s1L["s1Q=20N=1"] t2_s0 --&gt;|a1 = 1| t2_s1R["s1Q=0N=0"] t2_s1L -.-&gt;|"π(a_t &#124; s_t)"| t2_terL["s_ter"] end Step 3: Rollout &amp; Backpropagation %%{init:{'flowchart':{'useMaxWidth':false,'htmlLabels':true}}}%% flowchart TB subgraph T2[" "] direction TB t2_s0["s0Q=15N=2"] t2_s0 --&gt;|a1 = 0| t2_s1L["s1Q=20N=1"] t2_s0 --&gt;|a1 = 1| t2_s1R["s1Q=10N=1"] t2_s1R -.-&gt;|"π(a_t &#124; s_t)"| t2_terR["s_ter"] end Step 4: Expand %%{init:{'flowchart':{'useMaxWidth':false,'htmlLabels':true}}}%% flowchart TB subgraph T3[" "] direction TB t3_s0["s0Q=15N=2"] t3_s0 --&gt;|a1 = 0| t3_s1L["s1Q=20N=1"] t3_s0 --&gt;|a1 = 1| t3_s1R["s1Q=10N=1"] t3_s1L --&gt;|a2 = 0| t3_s2LL["s2Q=0N=0"] t3_s1L --&gt;|a2 = 1| t3_s2LR["s2Q=0N=0"] end Step 5: Rollout &amp; Backpropagation %%{init:{'flowchart':{'useMaxWidth':false,'htmlLabels':true}}}%% flowchart TB subgraph T4[" "] direction TB t4_s0["s0Q=10N=3"] t4_s0 --&gt;|a1 = 0| t4_s1L["s1Q=10N=2"] t4_s0 --&gt;|a1 = 1| t4_s1R["s1Q=10N=1"] t4_s1L --&gt;|a2 = 0| t4_s2LL["s2Q=0N=1"] t4_s1L --&gt;|a2 = 1| t4_s2LR["s2Q=0N=0"] t4_s2LL -.-&gt;|"π(a_t &#124; s_t)"| t4_terL["s_ter"] end Step 6: Expand %%{init:{'flowchart':{'useMaxWidth':false,'htmlLabels':true}}}%% flowchart TB subgraph T5[" "] direction TB t5_s0["s0Q=10N=3"] t5_s0 --&gt;|a1 = 0| t5_s1L["s1Q=10N=2"] t5_s0 --&gt;|a1 = 1| t5_s1R["s1Q=10N=1"] t5_s1L --&gt;|a2 = 0| t5_s2LL["s2Q=0N=1"] t5_s1L --&gt;|a2 = 1| t5_s2LR["s2Q=0N=0"] t5_s1R --&gt;|a2 = 0| t5_s2RL["s2Q=0N=0"] t5_s1R --&gt;|a2 = 1| t5_s2RR["s2Q=0N=0"] end Step 7: Rollout &amp; Backpropagation %%{init:{'flowchart':{'useMaxWidth':false,'htmlLabels':true}}}%% flowchart TB subgraph T6[" "] direction TB t6_s0["s0Q=11N=4"] t6_s0 --&gt;|a1 = 0| t6_s1L["s1Q=10N=2"] t6_s0 --&gt;|a1 = 1| t6_s1R["s1Q=12N=2"] t6_s1L --&gt;|a2 = 0| t6_s2LL["s2Q=0N=1"] t6_s1L --&gt;|a2 = 1| t6_s2LR["s2Q=0N=0"] t6_s1R --&gt;|a2 = 0| t6_s2RL["s2Q=14N=1"] t6_s1R --&gt;|a2 = 1| t6_s2RR["s2Q=0N=0"] t6_s2RL -.-&gt;|"π(a_t &#124; s_t)"| t6_terL["s_ter"] end]]></summary></entry><entry><title type="html">Pinsker’s Inequality</title><link href="https://jaehyun-jeong.github.io/2025/09/06/pinskers-inequality.html" rel="alternate" type="text/html" title="Pinsker’s Inequality" /><published>2025-09-06T00:00:00+09:00</published><updated>2025-09-06T00:00:00+09:00</updated><id>https://jaehyun-jeong.github.io/2025/09/06/pinskers-inequality</id><content type="html" xml:base="https://jaehyun-jeong.github.io/2025/09/06/pinskers-inequality.html"><![CDATA[<h2 id="th-pinskers-inequality"><em>Th) Pinsker’s Inequality</em></h2>

<p>$\forall$ ( P, Q ): probability distributions on measurable space $( U, \Sigma )$,</p>

<p>$\delta(P, Q) \leq \sqrt{\frac{1}{2} D_{\text{KL}}(P \| Q)}$</p>

<ul>
  <li>$\delta(P, Q)$ : Total variation</li>
  <li>$D_{\text{KL}}(P \| Q)$ : KL divergence</li>
</ul>

<hr />

<p>Proof)</p>

<p>I only prove for discrete case.</p>

<p>A special case of Pinsker’s Inequality first be proved for whole proof.</p>

<blockquote>
  <h2 id="special-case-of-pinskers-inequality"><em>Special case of Pinsker’s Inequality</em></h2>
  <p>
\( P = \begin{cases}
1 &amp; \text{w.p. } p \\
0 &amp; \text{w.p. } 1-p
\end{cases} \)
</p>

  <p>
\( Q = \begin{cases}
1 &amp; \text{w.p. } q \\
0 &amp; \text{w.p. } 1-q
\end{cases} \)
</p>

  <p>s.t. $ p \ge q $</p>

  <p>$ |P-Q|_1 = |p-q| + |(1-p) - (1-q)| = 2|p-q| = 2(p-q) \quad (\because p \geq q)$</p>

  <p>$f(p,q) = p \log \frac{p}{q} + (1-p)\log \frac{1-p}{1-q} - \frac{1}{2 \ln 2}(2(p-q))^2$</p>

  <p>and</p>

  <p>$\frac{\partial f}{\partial q} = \frac{\partial}{\partial q}\left(p\log p - p\log q\right) + \frac{\partial}{\partial q}\left((1-p)(\log(1-p) - \log(1-q))\right) - \frac{\partial}{\partial q}\frac{1}{2\ln 2}(2(p-q))^2$</p>

  <p>$= -\frac{p}{q \ln 2} + \frac{1-p}{(1-q)\ln 2} - \frac{1}{2\ln 2}\cdot 2(2(p-q))(-2)$</p>

  <p>$= \frac{1}{\ln 2}\left(-\frac{p}{q} + \frac{1-p}{1-q}\right) + \frac{4}{\ln 2}(p-q)$</p>

  <p>$= \frac{1}{\ln 2}\left(-\frac{p}{q} + \frac{1-p}{1-q} + 4(p-q)\right)$</p>

  <p>$= -\frac{p-q}{\ln 2}\left(\frac{1}{q(1-q)} - 4\right) \le 0 \quad (\because p \ge q \land \frac{1}{q(1-q)} \ge 4)$</p>

  <p>and</p>

  <p>$q = p \implies f(p,q)=0$</p>

  <p>$\therefore f(p,q)\ge 0 \quad (p \ge q)$</p>

  <p>which means that</p>

  <p>$f(p,q) = D_{\mathrm{KL}}(P \| Q) - \tfrac{1}{2 \ln 2} |P - Q|_1^2 \ge 0$</p>

  <p>$\therefore D_{\mathrm{KL}}(P \| Q) \ge \tfrac{1}{2 \ln 2} |P - Q|_1^2 \quad \cdots \quad (1)$</p>
</blockquote>

<p>Let</p>

<p>$p(x) := P_P(x)$<br />
$q(x) := P_Q(x)$<br />
$ A := \{x \mid p(x) \geq q(x) \} $</p>

<p>then define random variable</p>

<p>
\( Z(x) := \begin{cases}
1 &amp; (x \in A)  \\
0 &amp; (x \notin A)
\end{cases} \)
</p>

<p>then, below holds</p>

<blockquote>
  <h2 id="th-chain-rule-of-kl-divergence"><em>Th) chain rule of KL divergence</em></h2>
  <p>$D_{\text{KL}}(P \| Q) = D_{\text{KL}}(P(Z) \| Q(Z)) + D_{\text{KL}}(P \| Q | Z)$</p>

  <p>Proof)</p>

  <p>$D_{\text{KL}}(P(Z) \| Q(Z))$</p>

  <p>$= P(A) \log \frac{P(A)}{Q(A)} + P(A^c) \log \frac{P(A^c)}{Q(A^c)}$</p>

  <p>$= \sum_{x \in A} p(x) \log \frac{P(A)}{Q(A)} + \sum_{x \notin A} p(x) \log \frac{P(A^c)}{Q(A^c)} \quad\cdots\quad \text{(2)}$</p>

  <p>and</p>

  <p>$D_{\text{KL}}(P \| Q \mid Z)$</p>

  <p>
\(
= \mathbb{E}_{Z \sim P(Z)}\!\left[
  D_{\mathrm{KL}}\!\left( P(P \mid Z=z)\,\|\,P(Q \mid Z=z) \right)
\right] \quad (\text{KL divergence between two conditional probability distributions})
\)
</p>

  <p>$= P(A) D_{\text{KL}}(P(P \mid Z=1) \,\|\, P(Q \mid Z=1)) + P(A^c) D_{\text{KL}}(P(P \mid Z=0) \| P(Q \mid Z=0))$</p>

  <p>$= P(A) \sum_{x \in A} p(x \mid Z=1) \log \frac{p(x \mid Z=1)}{q(x \mid Z=1)} + P(A^c) \sum_{x \notin A} p(x \mid Z=0) \log \frac{p(x \mid Z=0)}{q(x \mid Z=0)}$</p>

  <p>$ = \sum_{x\in A} p(x)\,\log\frac{p(x)}{q(x)}\cdot\frac{Q(A)}{P(A)}+\sum_{x\notin A} p(x)\,\log\frac{p(x)}{q(x)}\cdot\frac{Q(A^c)}{P(A^c)} \quad\cdots\quad \text{(3)}$</p>

  <p>$ \left( \because p(x \mid Z=1) = \frac{p(x)}{P(A)} \text{, } q(x \mid Z=1) = \frac{q(x)}{Q(A)} \text{, } p(x \mid Z=0) = \frac{p(x)}{P(A^{c})} \text{, } q(x \mid Z=0) = \frac{q(x)}{Q(A^{c})}\right) $</p>

  <p>Combine (2), (3), then</p>

  <p>$D_{\text{KL}}(P(Z) \| Q(Z)) + D_{\text{KL}}(P \| Q \mid Z)$</p>

  <p>$= \sum_{x \in A} p(x) \log \frac{P(A)}{Q(A)} + \sum_{x \notin A} p(x) \log \frac{P(A^c)}{Q(A^c)} + \sum_{x \in A} p(x) \log \frac{p(x)}{q(x)} \cdot \frac{Q(A)}{P(A)} + \sum_{x \notin A} p(x) \log \frac{p(x)}{q(x)} \cdot \frac{Q(A^c)}{P(A^c)}$</p>

  <p>$= \sum_{x \in A} p(x) \left( \log \frac{p(x)}{q(x)} \cdot \frac{Q(A)}{P(A)} + \log \frac{P(A)}{Q(A)} \right) + \sum_{x \notin A} p(x) \left( \log \frac{p(x)}{q(x)} \cdot \frac{Q(A^c)}{P(A^c)} + \log \frac{P(A^c)}{Q(A^c)} \right)$</p>

  <p>$= \sum_{x \in U} p(x) \log \frac{p(x)}{q(x)}$</p>

  <p>$= D_{\text{KL}}(P \| Q)$</p>

  <p>$\blacksquare$</p>
</blockquote>

<hr />

<p>Let</p>

<p>
\( P_A := \begin{cases}
1 &amp; \text{w.p. } \sum_{x \in A} p(x) \\
0 &amp; \text{w.p. } \sum_{x \notin A} p(x)
\end{cases} \)
</p>
<p>
\( Q_A :=
\begin{cases}
1 &amp; \text{w.p. } \sum_{x \in A} q(x) \\
0 &amp; \text{w.p. } \sum_{x \notin A} q(x)
\end{cases} \)
</p>

<p>Then,</p>

<p>$
|P - Q|_1
$</p>

<p>$
= \sum_x |p(x) - q(x)|
$</p>

<p>$
= \sum_{x \in A} (p(x) - q(x)) + \sum_{x \notin A} (q(x) - p(x)) \quad (\because p(x) \geq q(x) \, \forall x \in A)
$</p>

<p>$
= \left| \sum_{x \in A} p(x) - \sum_{x \in A} q(x) \right| + \left| \sum_{x \notin A} q(x) - \sum_{x \notin A} p(x) \right|
$</p>

<p>$
= |P(P_A = 1) - P(Q_A = 1)| + |P(P_A = 0) - P(Q_A = 0)|
$</p>

<p>$
= \sum_{x \in \{0,1\}} |P(P_A = x) - P(Q_A = x)|
$</p>

<p>$
= |P(P_A) - P(Q_A)|_1 \quad\cdots\quad (4)
$</p>

<p>Therefore, below holds</p>

<p>$
D_{\mathrm{KL}}(P \| Q)
$</p>

<p>$
\ge D_{\mathrm{KL}}(P(Z) \| Q(Z)) \quad (\because\text{Chain rule of KL divergence})
$</p>

<p>$
= D_{\mathrm{KL}}(P(P_A) \| P(Q_A)) \quad (\because (4))
$</p>

<p>$
\Rightarrow D_{\mathrm{KL}}(P \| Q) \ge D_{\mathrm{KL}}(P(P_A) \| P(Q_A))
$</p>

<p>$
\ge \frac{1}{2 \ln 2} |P(P_A) - P(Q_A)|_1^2 \quad (\because\text{Special case of pinsker’s inequality})
$</p>

<p>$
= \frac{1}{2 \ln 2} |P - Q|_1^2 \quad (\because (4))
$</p>

<p>$
\Rightarrow \sqrt{\tfrac{1}{2} D_{\mathrm{KL}}(P \| Q)} \ge \sqrt{\tfrac{1}{4 \ln 2}} \, |P - Q|_1 \quad\cdots\quad (5)
$</p>

<p>$
\text{Let } A^\ast \in \Sigma \quad\text{s.t.}\quad \sup_{A^\ast} |P(A^\ast) - Q(A^\ast)| = |P(A^\ast) - Q(A^\ast)| \quad (\because\text{Hahn decomposition theory})
$</p>

<p>$
\text{then let } p := P(A^\ast), \; q := Q(A^\ast)
$</p>

<p>$
|P - Q|_1 = |P(A^\ast) - Q(A^\ast)| + |P((A^{\ast})^c) - Q((A^{\ast})^c)|
$</p>

<p>$
= |P(A^\ast) - Q(A^\ast) - (P((A^{\ast})^c) + Q((A^{\ast})^c))|
$</p>

<p>$
= |p - q - (1-p) + (1-q)|
$</p>

<p>$
= 2(p - q)
$</p>

<p>$
= 2 \delta(P,Q) \quad\cdots\quad (6)
$</p>

<p>$
\therefore \sqrt{\tfrac{1}{2} D_{\mathrm{KL}}(P \| Q)} \;\;\ge\;\; \sqrt{\tfrac{1}{4 \ln 2}} \, |P - Q|_1 \quad (\because(5))
$</p>

<p>$
= \sqrt{\tfrac{1}{4 \ln 2}} \cdot 2 \delta(P,Q) \;\ge\; \delta(P,Q) \quad (\because(6))
$</p>

<p>$
\Rightarrow \sqrt{\tfrac{1}{2} D_{\mathrm{KL}}(P \| Q)} \ge \delta(P,Q)
$</p>

<p>$\blacksquare$</p>]]></content><author><name>Jaehyun Jeong</name><email>jj.for.jaehyun@gmail.com</email></author><category term="ProbabilityTheory" /><category term="RL" /><summary type="html"><![CDATA[Th) Pinsker’s Inequality $\forall$ ( P, Q ): probability distributions on measurable space $( U, \Sigma )$, $\delta(P, Q) \leq \sqrt{\frac{1}{2} D_{\text{KL}}(P \| Q)}$ $\delta(P, Q)$ : Total variation $D_{\text{KL}}(P \| Q)$ : KL divergence Proof) I only prove for discrete case. A special case of Pinsker’s Inequality first be proved for whole proof. Special case of Pinsker’s Inequality \( P = \begin{cases} 1 &amp; \text{w.p. } p \\ 0 &amp; \text{w.p. } 1-p \end{cases} \) \( Q = \begin{cases} 1 &amp; \text{w.p. } q \\ 0 &amp; \text{w.p. } 1-q \end{cases} \) s.t. $ p \ge q $ $ |P-Q|_1 = |p-q| + |(1-p) - (1-q)| = 2|p-q| = 2(p-q) \quad (\because p \geq q)$ $f(p,q) = p \log \frac{p}{q} + (1-p)\log \frac{1-p}{1-q} - \frac{1}{2 \ln 2}(2(p-q))^2$ and $\frac{\partial f}{\partial q} = \frac{\partial}{\partial q}\left(p\log p - p\log q\right) + \frac{\partial}{\partial q}\left((1-p)(\log(1-p) - \log(1-q))\right) - \frac{\partial}{\partial q}\frac{1}{2\ln 2}(2(p-q))^2$ $= -\frac{p}{q \ln 2} + \frac{1-p}{(1-q)\ln 2} - \frac{1}{2\ln 2}\cdot 2(2(p-q))(-2)$ $= \frac{1}{\ln 2}\left(-\frac{p}{q} + \frac{1-p}{1-q}\right) + \frac{4}{\ln 2}(p-q)$ $= \frac{1}{\ln 2}\left(-\frac{p}{q} + \frac{1-p}{1-q} + 4(p-q)\right)$ $= -\frac{p-q}{\ln 2}\left(\frac{1}{q(1-q)} - 4\right) \le 0 \quad (\because p \ge q \land \frac{1}{q(1-q)} \ge 4)$ and $q = p \implies f(p,q)=0$ $\therefore f(p,q)\ge 0 \quad (p \ge q)$ which means that $f(p,q) = D_{\mathrm{KL}}(P \| Q) - \tfrac{1}{2 \ln 2} |P - Q|_1^2 \ge 0$ $\therefore D_{\mathrm{KL}}(P \| Q) \ge \tfrac{1}{2 \ln 2} |P - Q|_1^2 \quad \cdots \quad (1)$ Let $p(x) := P_P(x)$ $q(x) := P_Q(x)$ $ A := \{x \mid p(x) \geq q(x) \} $ then define random variable \( Z(x) := \begin{cases} 1 &amp; (x \in A) \\ 0 &amp; (x \notin A) \end{cases} \) then, below holds Th) chain rule of KL divergence $D_{\text{KL}}(P \| Q) = D_{\text{KL}}(P(Z) \| Q(Z)) + D_{\text{KL}}(P \| Q | Z)$ Proof) $D_{\text{KL}}(P(Z) \| Q(Z))$ $= P(A) \log \frac{P(A)}{Q(A)} + P(A^c) \log \frac{P(A^c)}{Q(A^c)}$ $= \sum_{x \in A} p(x) \log \frac{P(A)}{Q(A)} + \sum_{x \notin A} p(x) \log \frac{P(A^c)}{Q(A^c)} \quad\cdots\quad \text{(2)}$ and $D_{\text{KL}}(P \| Q \mid Z)$ \( = \mathbb{E}_{Z \sim P(Z)}\!\left[ D_{\mathrm{KL}}\!\left( P(P \mid Z=z)\,\|\,P(Q \mid Z=z) \right) \right] \quad (\text{KL divergence between two conditional probability distributions}) \) $= P(A) D_{\text{KL}}(P(P \mid Z=1) \,\|\, P(Q \mid Z=1)) + P(A^c) D_{\text{KL}}(P(P \mid Z=0) \| P(Q \mid Z=0))$ $= P(A) \sum_{x \in A} p(x \mid Z=1) \log \frac{p(x \mid Z=1)}{q(x \mid Z=1)} + P(A^c) \sum_{x \notin A} p(x \mid Z=0) \log \frac{p(x \mid Z=0)}{q(x \mid Z=0)}$ $ = \sum_{x\in A} p(x)\,\log\frac{p(x)}{q(x)}\cdot\frac{Q(A)}{P(A)}+\sum_{x\notin A} p(x)\,\log\frac{p(x)}{q(x)}\cdot\frac{Q(A^c)}{P(A^c)} \quad\cdots\quad \text{(3)}$ $ \left( \because p(x \mid Z=1) = \frac{p(x)}{P(A)} \text{, } q(x \mid Z=1) = \frac{q(x)}{Q(A)} \text{, } p(x \mid Z=0) = \frac{p(x)}{P(A^{c})} \text{, } q(x \mid Z=0) = \frac{q(x)}{Q(A^{c})}\right) $ Combine (2), (3), then $D_{\text{KL}}(P(Z) \| Q(Z)) + D_{\text{KL}}(P \| Q \mid Z)$ $= \sum_{x \in A} p(x) \log \frac{P(A)}{Q(A)} + \sum_{x \notin A} p(x) \log \frac{P(A^c)}{Q(A^c)} + \sum_{x \in A} p(x) \log \frac{p(x)}{q(x)} \cdot \frac{Q(A)}{P(A)} + \sum_{x \notin A} p(x) \log \frac{p(x)}{q(x)} \cdot \frac{Q(A^c)}{P(A^c)}$ $= \sum_{x \in A} p(x) \left( \log \frac{p(x)}{q(x)} \cdot \frac{Q(A)}{P(A)} + \log \frac{P(A)}{Q(A)} \right) + \sum_{x \notin A} p(x) \left( \log \frac{p(x)}{q(x)} \cdot \frac{Q(A^c)}{P(A^c)} + \log \frac{P(A^c)}{Q(A^c)} \right)$ $= \sum_{x \in U} p(x) \log \frac{p(x)}{q(x)}$ $= D_{\text{KL}}(P \| Q)$ $\blacksquare$ Let \( P_A := \begin{cases} 1 &amp; \text{w.p. } \sum_{x \in A} p(x) \\ 0 &amp; \text{w.p. } \sum_{x \notin A} p(x) \end{cases} \) \( Q_A := \begin{cases} 1 &amp; \text{w.p. } \sum_{x \in A} q(x) \\ 0 &amp; \text{w.p. } \sum_{x \notin A} q(x) \end{cases} \) Then, $ |P - Q|_1 $ $ = \sum_x |p(x) - q(x)| $ $ = \sum_{x \in A} (p(x) - q(x)) + \sum_{x \notin A} (q(x) - p(x)) \quad (\because p(x) \geq q(x) \, \forall x \in A) $ $ = \left| \sum_{x \in A} p(x) - \sum_{x \in A} q(x) \right| + \left| \sum_{x \notin A} q(x) - \sum_{x \notin A} p(x) \right| $ $ = |P(P_A = 1) - P(Q_A = 1)| + |P(P_A = 0) - P(Q_A = 0)| $ $ = \sum_{x \in \{0,1\}} |P(P_A = x) - P(Q_A = x)| $ $ = |P(P_A) - P(Q_A)|_1 \quad\cdots\quad (4) $ Therefore, below holds $ D_{\mathrm{KL}}(P \| Q) $ $ \ge D_{\mathrm{KL}}(P(Z) \| Q(Z)) \quad (\because\text{Chain rule of KL divergence}) $ $ = D_{\mathrm{KL}}(P(P_A) \| P(Q_A)) \quad (\because (4)) $ $ \Rightarrow D_{\mathrm{KL}}(P \| Q) \ge D_{\mathrm{KL}}(P(P_A) \| P(Q_A)) $ $ \ge \frac{1}{2 \ln 2} |P(P_A) - P(Q_A)|_1^2 \quad (\because\text{Special case of pinsker’s inequality}) $ $ = \frac{1}{2 \ln 2} |P - Q|_1^2 \quad (\because (4)) $ $ \Rightarrow \sqrt{\tfrac{1}{2} D_{\mathrm{KL}}(P \| Q)} \ge \sqrt{\tfrac{1}{4 \ln 2}} \, |P - Q|_1 \quad\cdots\quad (5) $ $ \text{Let } A^\ast \in \Sigma \quad\text{s.t.}\quad \sup_{A^\ast} |P(A^\ast) - Q(A^\ast)| = |P(A^\ast) - Q(A^\ast)| \quad (\because\text{Hahn decomposition theory}) $ $ \text{then let } p := P(A^\ast), \; q := Q(A^\ast) $ $ |P - Q|_1 = |P(A^\ast) - Q(A^\ast)| + |P((A^{\ast})^c) - Q((A^{\ast})^c)| $ $ = |P(A^\ast) - Q(A^\ast) - (P((A^{\ast})^c) + Q((A^{\ast})^c))| $ $ = |p - q - (1-p) + (1-q)| $ $ = 2(p - q) $ $ = 2 \delta(P,Q) \quad\cdots\quad (6) $ $ \therefore \sqrt{\tfrac{1}{2} D_{\mathrm{KL}}(P \| Q)} \;\;\ge\;\; \sqrt{\tfrac{1}{4 \ln 2}} \, |P - Q|_1 \quad (\because(5)) $ $ = \sqrt{\tfrac{1}{4 \ln 2}} \cdot 2 \delta(P,Q) \;\ge\; \delta(P,Q) \quad (\because(6)) $ $ \Rightarrow \sqrt{\tfrac{1}{2} D_{\mathrm{KL}}(P \| Q)} \ge \delta(P,Q) $ $\blacksquare$]]></summary></entry></feed>