For options markets in general, comparing ATM volatility with the historical realized volatility is one of the most popular and extensively used methods of relative value trading, especially in the shorter‐dated options space. The realized volatility, with appropriate time scaling, can be directly compared to the ATM implied volatility, irrespective of the level in the underlying or change thereof, for analyzing relative richness and cheapness. This follows directly from the concept of continuous delta‐hedging

However, for someone interested in the relative richness of options away from the ATM strike, things are difficult, to say the least. While from the quoted options prices it is still possible to extract implied volatilities at a given strike (by inverting a pricing formula, for example Black‐Scholes), coming up with a comparable realized volatility number is not straight‐forward. Computing standard deviation from historical returns (or changes) provide only one estimate, irrespective of the strike of the option. And as we mentioned earlier, that estimate is most applicable for ATM strike

One rigorous method of computing the applicable volatility for OTM options from historical price information is to use the concept of break‐even volatility. Break‐even volatility can be construed as the value of the volatility parameter in a pricing formula where if we buy an OTM option at that value and delta‐hedge, net expected PnL should be zero. This expectation can be computed using historical price information over appropriate time period (i.e. as a simple average of different historical delta‐hedging back‐tests, or some similar methodology). But as it is obvious, this requires considerable amount of computation effort, for running enough number of back‐tests to be able to compute the expected value

One approximate way‐out is to exploit the historical distribution information and use that to directly price a given option at any strike. Then the value of volatility parameter can be backed out by inverting the pricing engine. This can be compared, then, with the option price implied value at that particular strike and one can make an estimate of relative richness from these numbers. Following is an approach to develop a method for this process

For Swaption market, the market standard is to use normal basis points volatility. Hence we start with historical change in the underlying swap rates over a chosen interval (daily or weekly), scaled up appropriately depending on the option expiry. From this, a simple Gaussian Kernel Density can be extracted. This is basically for smoothing the realized histograms of changes. The distribution is then given by

where K is the standard normal density function. However, before we use this distribution we impose two conditions

The first ensures that it is centered around the current ATM forward level (F). That is, the expected value of underlying is always equal to the current ATM forward, irrespective of historical levels. The second condition ensures that the currently priced‐in uncertainty is maintained. Strictly speaking, this is not a required condition. However, since we are primarily interested in developing a methodology for skew comparison, this condition becomes useful. Also, it can be construed that the level of uncertainties currently priced‐in contains important forward‐looking information, and dropping this may result in significant under/over‐pricing of OTM options depending on implied variance and variance of the recent realized changes in the underlying. We enforce this condition by first extracting the complete implied

distribution from options prices and then setting the variances of this implied and the historical distribution equal. The other method is to force the implied volatility at the ATMF strike equal. The former is preferred as it is model independent

Note that to further improve the method one can add another condition that requires the entropy distance (Kullback‐Leibler distance) between the historical distribution and the implied distribution to be minimized. This corresponds to the requirement that the difference in total information (or reduction in uncertainty) between the two distribution is not very significant. We skip this stage specifically to maintain the frequent bi‐modal distributions observed in swap rates in recent times and avoid parameterization. In essence, this allows us to extract information from the shape of the historical distribution, while maintaining the overall central tendency and uncertainty (as measured by variance). This is very useful for implementing distributional arbitrage.

Finally, the historical distribution, thus obtained, can be used to price an option from the first principle

Where, C is the price of a call, N is an appropriate numeraire (DV01 of the underlying swap), E denotes the expectation operator, F is the underlying rate, and K is the strike

This method is useful for looking at the relative richness of collars and strangles (w.r.t. straddle) for shorter maturities (including midcurve options) and for implementing distributional arbitrage (shape of the distribution). However, the underlying assumption here, which is similar to the original realized vs. implied volatility method, is that historical returns have useful information about future outcomes

where K is the standard normal density function. However, before we use this distribution we impose two conditions

The first ensures that it is centered around the current ATM forward level (F). That is, the expected value of underlying is always equal to the current ATM forward, irrespective of historical levels. The second condition ensures that the currently priced‐in uncertainty is maintained. Strictly speaking, this is not a required condition. However, since we are primarily interested in developing a methodology for skew comparison, this condition becomes useful. Also, it can be construed that the level of uncertainties currently priced‐in contains important forward‐looking information, and dropping this may result in significant under/over‐pricing of OTM options depending on implied variance and variance of the recent realized changes in the underlying. We enforce this condition by first extracting the complete implied

distribution from options prices and then setting the variances of this implied and the historical distribution equal. The other method is to force the implied volatility at the ATMF strike equal. The former is preferred as it is model independent

Note that to further improve the method one can add another condition that requires the entropy distance (Kullback‐Leibler distance) between the historical distribution and the implied distribution to be minimized. This corresponds to the requirement that the difference in total information (or reduction in uncertainty) between the two distribution is not very significant. We skip this stage specifically to maintain the frequent bi‐modal distributions observed in swap rates in recent times and avoid parameterization. In essence, this allows us to extract information from the shape of the historical distribution, while maintaining the overall central tendency and uncertainty (as measured by variance). This is very useful for implementing distributional arbitrage.

Finally, the historical distribution, thus obtained, can be used to price an option from the first principle

Where, C is the price of a call, N is an appropriate numeraire (DV01 of the underlying swap), E denotes the expectation operator, F is the underlying rate, and K is the strike

This method is useful for looking at the relative richness of collars and strangles (w.r.t. straddle) for shorter maturities (including midcurve options) and for implementing distributional arbitrage (shape of the distribution). However, the underlying assumption here, which is similar to the original realized vs. implied volatility method, is that historical returns have useful information about future outcomes