Production Function Estimation and Extension¶

The estimation of firms cost functions plays an important role in any empirical study of industry competition. Production Function estimation allows businessess to quantify the relationship between economic output and inputs like labor and capital. Use of production functions also allows for predictions of productivity, costs and the profit rate (markup) for a product or company.

Cobb-Douglas Production Function¶

The most common production function framework is the Cobb-Douglas, that takes the following functional form: $$ Y = V^{\theta_v} K^{\theta_k} E$$

Where $Y$ is total output, $V$ is a production input that is adjustable in the short-term, and $K$ is a production input facing short term adjustment restrictions (such as capital for example). In this context, I implement the method from De Loecker, J., Eeckhout, J., & Unger, G. (2020). The rise of market power and the macroeconomic implications, The Quarterly Journal of Economics, 135(2), 561–644.

To implement it, I rely on a panel dataset of about +3,000 publicly traded data company financial annual data from Compustat to estimate the parameters of interest $\theta_v$ and $\theta_k$ for different industries $j$, if $z_{it}^{(j)} \equiv log\left(Z_{it}^{(j)} \right) $ of any variable $X_{it}^{(j)}$ for company $i$, year $t$, and industry $j$. Then, the production function can be written as

$$ y_{it}^{(j)} = \theta^{(j)}_v v_{it}^{(j)} + \theta^{(j)}_k k_{it}^{(j)} + \epsilon_{it}^{(j)} $$

In the data, I use $y_{it}^{(j)} = log(sales\_D), \ v_{it}^{(j)} = log(cogs\_D), \ k_{it}^{(j)} = log(ppet\_D)$ as measures of production (Sales), a short-term adjustable inpu t (Cogs) and long-term adjustable input (Property, Plant, and Equipment), deflated using the US GDP deflator (hence the D stub) to estimate the cost of good sold and capital (ppe&t) elasticities of output $( \theta^{(j)}_v, \theta^{(j)}_k)$ for each year and industry in the sample.

To this end, I will estimate the annual elasticities sequentially between 1969 and 2021.

Extension¶

The framework above is flexible and allows for the inclusion of additional short-term and long-term adjustable inputs. In the Python implementation, I also consider $$ y_{it}^{(j)} = \theta^{(j)}_v v_{it}^{(j)} + \theta^{(j)}_x x_{it}^{(j)} + \theta^{(j)}_k k_{it}^{(j)} + \epsilon_{it}^{(j)} $$

where $x_{it}^{(j)} = log(xsga_D)$ is another short-term adjustable input corresponding to Selling, General and Administrative (SG&A) costs, and $\theta^{(j)}_x$ is its elasticity of output.

Method Implementation in Python¶

0. Preparation¶

Download the codes from the De Loecker et al (2020) and follow their instructions to download the data. Clean the data using their codes, then save a copy of the "data_main_upd_trim_1.dta" and "theta_W_s_window.dta" files. Given these copies, I start implementing the estimation method in Python

In [1]:
#Housekeeping
import sys 
import os
clear = lambda: os.system('cls')
clear()

os.chdir('[[ADD PATH]]/PF_estimation')

#Import functions stored in a folder (custom) 
import sys 
#sys.path.append("directoryname") 
import time
import pandas as pd
import numpy as np
from numpy.random import seed
from numpy.random import randn 
from optimparallel import minimize_parallel
from scipy.optimize import minimize
from scipy.optimize import fsolve
from scipy import stats
import seaborn as sns
import statsmodels.api as smf
import statsmodels.formula.api as smfform
from statsmodels.nonparametric.kernel_regression import KernelReg
import numpy as np
import time
import pyreadstat 
from matplotlib import pyplot

#Turn off pandas warnings
pd.options.mode.chained_assignment = None  # default='warn'

Now, lets import the datasets and do some initial data cleaning to extend the panel to include 2017, 2018, 2019, 2020, and 2021:

In [2]:
#Import de loecker elasticities
Delocker_elast = pyreadstat.read_dta('./deloecker/theta_W_s_window.dta')[0]

#Extend panel to Obtain the updated series
extendyr   = [2017,2018,2019,2020,2021]
industries = Delocker_elast.ind2d.unique()

 #add gaps to the new extended period
for ind in industries:
    samplepanel = pd.DataFrame({'year':extendyr , 'ind2d':ind , 'theta_WI1_ct':np.nan,  'theta_WI2_ct':np.nan,
                                'theta_WI2_xt':np.nan,  'theta_WI1_kt':np.nan,  'theta_WI2_kt':np.nan})
    Delocker_elast = pd.concat([Delocker_elast,samplepanel] )

Delocker_elast= Delocker_elast.sort_values(['ind2d', 'year'], ascending=[True, True])

Delocker_elast['theta_WI1_ct_update'] =  np.nan
Delocker_elast['theta_WI2_ct_update'] =  np.nan
#Keep only the variables of interest
Delocker_elast  = Delocker_elast[['ind2d','year','theta_WI1_ct','theta_WI1_ct_update','theta_WI2_ct','theta_WI2_ct_update']]

#import Compustat
Data = pyreadstat.read_dta('./deloecker/data_main_upd_trim_1.dta')[0]

#We will need: sales_d, cogs_d, capital_D and identifiers
#Data.columns.values.tolist() shows all names 
names              = ['sale_D','cogs_D','capital_D','xsga_D']
Data               = Data[['gvkey','year','ind2d','sale_D','cogs_D','capital_D','xsga_D']]
Productivities     = pd.DataFrame(columns=['gvkey','year','ind2d','omega'])
Productivities_ext = pd.DataFrame(columns=['gvkey','year','ind2d','omega_ext'])
#compute logs and Lag the variables 
for varname in names:
    Data['l'+varname]       = np.log(Data[varname])
    Data['lag'+'l'+varname] = Data.groupby('gvkey')['l'+varname].shift()
C:\ProgramData\anaconda3\Lib\site-packages\pandas\core\arraylike.py:399: RuntimeWarning: divide by zero encountered in log
  result = getattr(ufunc, method)(*inputs, **kwargs)
C:\ProgramData\anaconda3\Lib\site-packages\pandas\core\arraylike.py:399: RuntimeWarning: divide by zero encountered in log
  result = getattr(ufunc, method)(*inputs, **kwargs)

1. Step 1: Kernel Regression¶

Run the following Kernel regression (using the observations of year $t$, industry $j$ only): \begin{align*} % y_{it}^{(j)} = \alpha^{(j)} + \theta_v^{(j)} v_{it}^{(j)} + \theta_k^{(j)} k_{it}^{(j)} + \epsilon_{it}^{(j)} y_{it}^{(j)} = \phi\left(v_{it}^{(j)}, k_{it}^{(j)}\right) + \epsilon_{it}^{(j)} \end{align*} and compute the predicted values $\widehat{y}_{it}^{(j)} \equiv \widehat{\phi}_{it}^{(j)}$, using these same estimates, compute $\widehat{\phi}_{it-1}^{(j)}$. The main reason to use a non-parametric regression is that the sample may be small for some industries, which prompts De Loecker et al (2020) to run a non-parametric regression. However, I suppose OLS should not be too different as long as there is a linear relationship between the factors (features) and the outcome variable (sales).

2. Step 2: Make a Guess¶

Make a guess for $\Theta^{(j)}_n = \left[\alpha_{n}^{(j)}, \ \theta_{vn}^{(j)}, \ \theta_{kn}^{(j)}\right]$, where $n$ is the iteration and $\alpha_{n}^{(j)}$ is the guessed total factor productivity). I use the estimated parameters from an OLS regression for my iniial guess $\Theta^{(j)}_0$. Given $ \Theta^{(j)}_n $, compute productivities \begin{align*} \omega_{it}^{(j)} = & \ \widehat{\phi}_{it}^{(j)} - \Theta^{(j)}_n\ \begin{bmatrix} v_{it}^{(j)}\\ k_{it}^{(j)}\end{bmatrix} \end{align*}

3. Step 3: Make an assumption for the Total Factor Productivity and estimate the relationship¶

Assuming $ \omega_{it}^{(j)}$ follows an AR(1) process (can be changed), estimate \begin{align*} \omega_{it}^{(j)} = & \ \rho \omega_{it-1}^{(j)} + u_{it}^{(j)} \end{align*} by OLS and compute $\widehat{u}_{it}^{(j)}$, which is why two years are needed. Also, annual data rarely displays more than two years of persistence so a single lag usually captures the relationship accurately.

Step 4: Calculate the Moment Condition¶

Finally, compute the moment condition \begin{align*} E\left\{ \begin{bmatrix} u_{it}^{(j)} 1_{it}^{(j)}\\ u_{it}^{(j)} v_{it-1}^{(j)}\\ u_{it}^{(j)}k_{it}^{(j)} \end{bmatrix}\right\} = & \ 0_{3\times1} \end{align*} which is compacted using the GMM objective function, as described in Hansen, L. P. (1982), Large sample properties of generalized method of moments estimators. Econometrica: Journal of the econometric society, (pp. 1029–1054): \begin{align*} G(\Theta) =& \ \left(Z^\prime U\right)^\prime\left(Z^\prime U\right), \end{align*} where $Z = \left[1 \quad v_{it-1}^{(j)} \quad k_{it}^{(j)}\right]_{n_t^{(j)}\times 3}$ is the stacked vector of the firm variables, and $U = \left[u_{it}^{(j)}\right]_{n_t^{(j)} \times 1}$ and $n_t^{(j)}$ is the number of firms at year $t$ and industry $j$.

Extension¶

If considering the extension with overheard costs (SG&A), then $Z = \left[1 \quad v_{it-1}^{(j)} \quad x_{it-1}^{(j)} \quad k_{it}^{(j)}\right]_{n_t^{(j)}\times 3}$ and $U$ comes from the extended kernel reggression and AR(1) estimations resulting from including this regressor into the estimation.

Step 5: Define the Convergence Criteria and Iterate¶

If the function has not been minimized up to a certain tolerance, or f the euclidean norm of the moment condition is not met, or the parameter differences across iterations are larger than some tolerance, ie $||\Theta^{(j)}_{n}-\Theta^{(j)}_{n-1} ||<\eta,$ $\eta $ close to $0$, update the guess of step 2 as $\Theta^{(j)}_{n+1}=\Theta^{(j)}_{n}$ and repeat until convergence.

This procedure can then be implemented for all industries in the sample.

Step 3 requires knowledge of $\Theta^{(j)}$ at the previous period. Since I don't know the initial value, I assume them to be the same for the first iteration, and then use the estimated value of theta from the previous step to carry forward the estimation. In practice, this will produce a poor estimate of $\Theta^{(j)}$ for the first periods, but it will improve over time. Also, I start my estimations at 1969.

Implementation in Python¶

Before implementing the iterative process across all industries, it is worth defining the objective function for both the first iteration and for subsequent iterations. In Python, this can be achieved as

In [3]:
def GMM_Delocker_first(theta, Estimset, X, Xlag,extend=False, optim=True):
    #The equations are (Svar and Pbar are double differences): 
    #omegait = phiit - Theta_n [cogsit kit]
    #uit = omegait - rho* omegait-1
    #The moment condition are  mean(uit cogsit)=0 and mean(uit kit)=0 
    #Parameters:
    #theta: PF elasticities
    #Estimset: estimation set (after removing NA)
    #X: Design matrix
    #Xlag: lag of X without gaps
    #extend: Default False and determines whether to include xsga as a fixed factor or not
    #optim: True by default, states whether to return only the objective function (for optimization) or all the results.
    if extend:
          Z        = np.array(Estimset[['laglcogs_D','lcapital_D','lxsga_D']])
    else:    
          Z        = np.array(Estimset[['laglcogs_D','lcapital_D']])
          
    Z        = smf.add_constant(Z)
    omega    = np.array(Estimset['phi']) - np.matmul(X,theta)   
    lagomega = np.array(Estimset['lagphi']) -  np.matmul(Xlag,theta) 
    Xom      = smf.add_constant(lagomega)
    U        = smf.OLS(omega,Xom).fit().resid
    rho      = smf.OLS(omega,Xom).fit().params 
    #objfun =  [np.mean(U*Estimset['laglcogs_D']),np.mean(U*Estimset['lcapital_D'])]
    Mom      = np.matmul(np.transpose(Z),U)
    objfun   =  np.matmul(np.transpose(Mom),Mom)
    #Save results
    if optim == False:
          Results    = {'Z': Z, 'omega':omega   , 'objfun':objfun , 'rho': rho}
          return Results
    else:
      return objfun

def GMM_Delocker(theta, thetalag, Estimset, X, Xlag,extend=False ,optim=True):
    #The equations are (Svar and Pbar are double differences): 
    #omegait = phiit - Theta_n [cogsit kit]
    #uit = omegait - rho* omegait-1
    #The moment condition are  mean(uit cogsit)=0 and mean(uit kit)=0 
    #Parameters:
    #theta: PF elasticities
    #Estimset: estimation set (after removing NA)
    #X: Design matrix
    #Xlag: lag of X without gaps
    #extend: Default False and determines whether to include xsga as a fixed factor or not
    #optim: True by default, states whether to return only the objective function (for optimization) or all the results.
    if extend:
          Z        = np.array(Estimset[['laglcogs_D','lcapital_D','lxsga_D']])
    else:    
          Z        = np.array(Estimset[['laglcogs_D','lcapital_D']]) 
    
    Z        = smf.add_constant(Z)
    omega    = np.array(Estimset['phi']) - np.matmul(X,theta)   
    lagomega = np.array(Estimset['lagphi']) -  np.matmul(Xlag,thetalag) 
    Xom      = smf.add_constant(lagomega)
    U        = smf.OLS(omega,Xom).fit().resid
    rho      = smf.OLS(omega,Xom).fit().params 
    #objfun =  [np.mean(U*Estimset['laglcogs_D']),np.mean(U*Estimset['lcapital_D'])]
    Mom      = np.matmul(np.transpose(Z),U)
    objfun   =  np.matmul(np.transpose(Mom),Mom)
    #Save results
    if optim == False:
        Results    = {'Z': Z, 'omega':omega   , 'objfun':objfun , 'rho': rho}
        return Results
    else:
     return objfun    

Now, lets use this iterative process for both all industries and years

In [4]:
all_years = range(1969,2022)

for ind in industries:

 print("Industry:",ind) 
 Industrydat_all = Data.loc[Data['ind2d'] == ind ]
 Industrydat_all = Industrydat_all.loc[(Industrydat_all['year']>=np.min(all_years) )& (Industrydat_all['year']<=np.max(all_years) )]
 #Construct industrydat with only cogs and capital
 Industrydat = Industrydat_all[['gvkey',  'year',  'ind2d',  'lsale_D', 'laglsale_D',  'lcogs_D', 'laglcogs_D', 'lcapital_D', 'laglcapital_D' ]]
 #Construct industrydat with, cogs, xsga and capital
 Industrydat_ext = Industrydat_all[['gvkey',  'year',  'ind2d',  'lsale_D', 'laglsale_D',  'lcogs_D', 'laglcogs_D', 'lcapital_D', 'laglcapital_D'
                                    , 'lxsga_D', 'laglxsga_D' ]]
 #Also drop inf and - inf  and na
 Industrydat     = Industrydat.replace([np.inf, -np.inf], np.nan).dropna(axis=0)
 Industrydat_ext = Industrydat_ext.replace([np.inf, -np.inf], np.nan).dropna(axis=0)
 for year in all_years:
     #Now, select the two consecutive years and subset the years of estimation (will start estimating at 1970) 
     Estimset     = Industrydat[Industrydat['year']==year ] 
     Estimset_ext = Industrydat_ext[Industrydat_ext['year']==year ] 
     
     print("            year:",year,  "Obs:",len(Estimset))    
   
     #Determine theta0 
     Initial_all =  Delocker_elast[Delocker_elast['ind2d'] == ind ] 
     Initial_all = Initial_all[(Initial_all['year']>=((year-10))) & (Initial_all['year']<year)][['theta_WI1_ct','theta_WI2_ct']]
     Initial     = np.mean(Initial_all['theta_WI1_ct'])
     Initial_ext = np.mean(Initial_all['theta_WI2_ct'])
     
      #Step 1: Run ols regression for the year in question
      #For standard model
     X         = np.array(Estimset[['lcogs_D','lcapital_D']])
     Xlag      = np.array(Estimset[['laglcogs_D','laglcapital_D']])
     Y         = np.array(Estimset['lsale_D'])
     Ylag      = np.array(Estimset['laglsale_D'])
     Xc        = smf.add_constant(X)
     Xlagc     = smf.add_constant(Xlag)
     #For Extended Model
     X_ext     = np.array(Estimset_ext[['lcogs_D','lcapital_D','lxsga_D']])
     Xlag_ext  = np.array(Estimset_ext[['laglcogs_D','laglcapital_D','laglxsga_D']])
     Y_ext     = np.array(Estimset_ext['lsale_D'])
     Ylag_ext  = np.array(Estimset_ext['laglsale_D'])
     Xc_ext    = smf.add_constant(X_ext)
     Xlagc_ext = smf.add_constant(Xlag_ext)
     
     OLS           = smf.OLS(Y,Xc).fit()
     OLS_ext       = smf.OLS(Y_ext,Xc_ext).fit()
     Model2        = KernelReg(endog=Y, exog=X,var_type='cc' )
     Model2lag     = KernelReg(endog=Ylag, exog=Xlag,var_type='cc' )
     Model2_ext    = KernelReg(endog=Y_ext, exog=X_ext,var_type='ccc' )
     Model2lag_ext = KernelReg(endog=Ylag_ext, exog=Xlag_ext,var_type='ccc' )
     
     
     thetaols      = OLS.params 
     thetaols_ext  = OLS_ext.params   
     Estimset['phi']         =  Model2.fit(X)[0] 
     Estimset['lagphi']      =  Model2.fit(Xlag)[0] 
     skipxsga = False
     if (ind==49) & ( year == 2021):
      skipxsga = True
      
     if(skipxsga == False):
      Estimset_ext['phi']     =  Model2_ext.fit(X_ext)[0] 
      Estimset_ext['lagphi']  =  Model2_ext.fit(Xlag_ext)[0]  
     
     #Use the average of the ten years before the sample period to compute the  (if ols gives negative numbers)
     if (thetaols[1]<0):
        print("Negative OLS elasticity estimate, using 10 year average instead")
        theta0 = np.array([Initial,Initial,Initial])  
     else:
        theta0 = np.array(thetaols)    
     if (thetaols_ext[1]<0):
        print("Negative OLS elasticity estimate, using 10 year average instead")
        theta0_ext = np.array([Initial_ext,Initial_ext,Initial_ext,Initial_ext])  
     else:
        theta0_ext = np.array(thetaols_ext)       
       
       #Step 2: Run ols GMM for the year in question
  
     if (year == 1969):
         #First do the baseline Cobb-Douglas
         o2 = minimize(fun=GMM_Delocker_first, x0=theta0,args=(Estimset,Xc,Xlagc)   ,tol=1e-10)
         theta = o2.x
         #Retrieve Results
         Res = GMM_Delocker_first(theta, Estimset,Xc,Xlagc,optim=False)
         thetalag = theta 
         #Now do the Xsga Extended Cobb-Douglas
         extend   = True
         if(skipxsga == False):
          o2_ext = minimize(fun=GMM_Delocker_first, x0=theta0_ext,args=(Estimset_ext,Xc_ext,Xlagc_ext,extend ),tol=1e-10)
          theta_ext = o2_ext.x
          #Retrieve Results
          Res_ext = GMM_Delocker_first(theta_ext, Estimset_ext,Xc_ext,Xlagc_ext,optim=False,extend=extend)
          thetalag_ext = theta_ext 
         
     else:
         #First do the baseline Cobb-Douglas
         o2 = minimize(fun=GMM_Delocker, x0=theta0,args=(thetalag, Estimset,Xc,Xlagc)   ,tol=1e-10)
         theta = o2.x
         #Retrieve Results
         Res = GMM_Delocker(theta, thetalag, Estimset,Xc,Xlagc,optim=False)   
         thetalag = theta   
         #Now do the Xsga Extended Cobb-Douglas
         extend   = True
         if(skipxsga == False):
          o2_ext = minimize(fun=GMM_Delocker, x0=theta0_ext,args=(thetalag_ext, Estimset_ext,Xc_ext,Xlagc_ext,extend) ,tol=1e-10)
          theta_ext = o2_ext.x
          #Retrieve Results
          Res_ext      = GMM_Delocker(theta_ext, thetalag_ext, Estimset_ext,Xc_ext,Xlagc_ext,extend=extend,optim=False)   
          thetalag_ext = theta_ext   
        
    #Now save and attach to the main dataset
     position = (Delocker_elast['ind2d'] == ind) & (Delocker_elast['year'] ==year)
     Delocker_elast.loc[position,"theta_WI1_ct_update"] = theta[1]
     if(skipxsga == False):
      Delocker_elast.loc[position,"theta_WI2_ct_update"] = theta_ext[1]
      Estimset_ext['omega_ext'] = Res_ext['omega']
      Productivities_ext = pd.concat([Productivities_ext, Estimset_ext[['gvkey','year','ind2d','omega_ext']]], sort=False)
     else:
      Delocker_elast.loc[position,"theta_WI2_ct_update"] = np.nan
    
     Estimset['omega'] = Res['omega'] 
     Productivities     = pd.concat([Productivities, Estimset[['gvkey','year','ind2d','omega']]], sort=False)   qa3
Industry: 11
            year: 1969 Obs: 8
            [...]
            year: 2021 Obs: 12
Industry: 21
            year: 1969 Obs: 102
            [...]
            year: 2021 Obs: 354
Industry: 22
            year: 1969 Obs: 224
            [...]
            year: 2021 Obs: 233
Industry: 23
            year: 1969 Obs: 34
            [...]
            year: 2021 Obs: 70
Industry: 31
            year: 1969 Obs: 165
            [...]
            year: 2021 Obs: 148
Industry: 32
            year: 1969 Obs: 296
           [...]
            year: 2021 Obs: 758
Industry: 33
            year: 1969 Obs: 581
            [...]
            year: 2021 Obs: 958
Industry: 42
            year: 1969 Obs: 91
           [...]
            year: 2021 Obs: 128
Industry: 44
            year: 1969 Obs: 79
           [...]
            year: 2021 Obs: 56
Industry: 45
            year: 1969 Obs: 41
            [...]
            year: 2021 Obs: 77
Industry: 48
            year: 1969 Obs: 68
            [...]
            year: 2021 Obs: 184
Industry: 49
            year: 1969 Obs: 6
           [...]
            year: 2021 Obs: 3
Industry: 51
            year: 1969 Obs: 90
            [...]
            year: 2021 Obs: 606
Industry: 52
            year: 1969 Obs: 48
            [...]
            year: 2021 Obs: 303
Industry: 53
            year: 1969 Obs: 30
            [...]
            year: 2021 Obs: 126
Industry: 54
            year: 1969 Obs: 34
            [...]
            year: 2021 Obs: 169
Industry: 56
            year: 1969 Obs: 26
            [...]
            year: 2021 Obs: 91
Industry: 61
            year: 1969 Obs: 3
            [...]
            year: 2021 Obs: 44
Industry: 62
            year: 1969 Obs: 10
            [...]
            year: 2021 Obs: 87
Industry: 71
            year: 1969 Obs: 9
            [...]
            year: 2021 Obs: 41
Industry: 72
            year: 1969 Obs: 34
            [...]
            year: 2021 Obs: 83
Industry: 81
            year: 1969 Obs: 11
            [...]
            year: 2021 Obs: 16
Industry: 99
            year: 1969 Obs: 18
            [...]
            year: 2021 Obs: 11

Analysis of Estimated Elasticites¶

After performing the estimation procedure for all industries and years, save the output. Lets analyze the results. First, lets calculate the industry specific profit rates $(\pi_{t}^{(j)} = SALES_{t}^{(j)} + COGS_{t}^{(j)}+ XSG\&A_{t}^{(j)} )$, average industries between 1970 and 2021, and add a variable containing each of the NAICS industry names:

In [5]:
# Analysis of Estimated Elasticites

profit_rate = (
    Data 
    .groupby(["ind2d","year"])[["sale_D", "cogs_D","xsga_D"]]
    .sum()
    
    )

profit_rate['profit_rate'] = (profit_rate['sale_D'] -profit_rate['cogs_D'] - profit_rate['xsga_D'] 
                              )/ profit_rate['sale_D']


Delocker_elast_plots =  pd.merge(Delocker_elast,profit_rate, 
                                 how='left', 
                                 on=['year','ind2d']) 

Delocker_elast_plots = Delocker_elast_plots.loc[
    (Delocker_elast_plots["year"] >= 1970) & (Delocker_elast_plots["year"] <= 2021)].copy()
 
#add industry names
naics2_map = {
    11: "Agriculture",
    21: "Mining & Oil",
    22: "Utilities",
    23: "Construction",
    31: "Mfg: Food/Textile",
    32: "Mfg: Chemicals/Plastics",
    33: "Mfg: Metals/Machinery",
    42: "Wholesale Trade",
    44: "Retail: Stores",
    45: "Retail: Nonstore",
    48: "Transport",
    49: "Warehousing",
    51: "Information",
    52: "Finance",
    53: "Real Estate",
    54: "Prof. Services",
    55: "Management",
    56: "Admin & Support",
    61: "Education",
    62: "Healthcare",
    71: "Arts & Entertainment",
    72: "Accommodation & Food",
    81: "Other Services",
    92: "Public Admin",
    99: "Other"
}
 
Delocker_elast_plots["industry_name"] = Delocker_elast_plots["ind2d"].map(naics2_map)
     
avg = (
    Delocker_elast_plots
     .groupby("industry_name")[["theta_WI1_ct_update", "theta_WI2_ct_update"]]
    .mean()
    )

Compute scatterplot of average elastiities¶

In the below figure, I plot the average estimated COGS production elasticities under both the two input model (COGS, Capital) in the x axis, and the three input model (COGS, XSGA, Capital) on the y axis. If both models provided the same results, the elasticities would fall close to the 45 degree line, which I also include in the plot. As shown below, ths plot shows that XSGA is a confounding factor that biases the COGS elasticity upwards, as all the average values lie to the right of the 45 degree line.

Also, this plot shows that, regardless of the model selected, for an average increase in the use of materials that result in an increase of 1% in the Cost of Good Sold (i.e the company allocated 1% more budget to COGS) results in an average increase between 0.55% to about 0.82% in sales (under the extended model with SG&A)

In [6]:
pyplot.figure(figsize=(6,6))
pyplot.scatter(avg['theta_WI1_ct_update'], avg['theta_WI2_ct_update'])
pyplot.xlabel("Elasticity of COGS (θ_WI1)")
pyplot.ylabel("Elasticity of COGS in extended model with SG&A (θ_WI2)")
pyplot.title("Average Production Elasticities")
# get axis limits (cover both x and y so they’re equal)
lims = [
    min(avg['theta_WI1_ct_update'].min(), avg['theta_WI2_ct_update'].min()),
    max(avg['theta_WI1_ct_update'].max(), avg['theta_WI2_ct_update'].max())
]

# draw 45-degree line
pyplot.plot(lims, lims, 'k--', alpha=0.8)

# set equal limits for x and y
pyplot.xlim(lims)
pyplot.ylim(lims)
pyplot.gca().set_aspect('equal', adjustable='box')

pyplot.show()
print(avg)
No description has been provided for this image
                         theta_WI1_ct_update  theta_WI2_ct_update
industry_name                                                    
Accommodation & Food                0.841001             0.730175
Admin & Support                     0.926873             0.701194
Agriculture                         0.856625             0.827836
Arts & Entertainment                0.983819             0.715711
Construction                        0.907885             0.767549
Education                           0.818034             0.619348
Finance                             0.857796             0.713387
Healthcare                          0.906066             0.755744
Information                         0.779578             0.541651
Mfg: Chemicals/Plastics             0.808359             0.747540
Mfg: Food/Textile                   0.924312             0.789714
Mfg: Metals/Machinery               0.852678             0.707716
Mining & Oil                        0.731751             0.635227
Other                               0.855267             0.748480
Other Services                      0.838727             0.734249
Prof. Services                      0.785833             0.708301
Real Estate                         0.832144             0.635094
Retail: Nonstore                    0.899675             0.732769
Retail: Stores                      0.813386             0.716557
Transport                           0.835681             0.733017
Utilities                           0.773288             0.696491
Warehousing                         0.875534             0.766604
Wholesale Trade                     0.888247             0.805094

Heatmap by Industry¶

 

Heatmaps are data representations in the form of a map or diagram in which data values are represented as colors. In this context, a heatmap plot will show the change on each individual elasticity for each industry across years between industry. I plot two heatmaps for the two input and the three input model. These heatmaps shows these takeaways:

  1. SG&A is a coonfounding factor creating upward bias on the COGS elasticity: The heatmaps also show (and confirms) the pattern shown in the scatterplot above. When including overhead costs (SG&A) the COGS individual productivity is lower.
  2. Some industries rely more on COGS than others: Information seems to be the industry with the lowest COGS intensity, while Wholesale trade seems to be the most COGS intensive indsutry, as shown by the heatmap of the extended model. This makes sense, as firms in the information sector rely on the constributions of its engineers and computer centers, whereas wholesalers focus on selling and delivering goods to final consumers.
  3. Industry reliance on COGS is relatively stable: The heatmaps do not show muchwithin industry variation across time, which suggests that industries tend to maintain similar reliance on COGS over time. There are some exceptions, like Information, Admin & Support, and Real Estate, which all have experienced a fall in the reliance on COGS, particularly since the early 2000s.
In [7]:
#Heatmap by industry

pivot_elast_1 = (
    Delocker_elast_plots
     .pivot(index="year", columns="industry_name", values="theta_WI1_ct_update")
)


pivot_elast_2 = (
    Delocker_elast_plots
     .pivot(index="year", columns="industry_name", values="theta_WI2_ct_update")
)

#Make Yeat integer
pivot_elast_1.index = pivot_elast_1.index.astype(int)
pivot_elast_2.index = pivot_elast_2.index.astype(int)

norm = pyplot.Normalize(vmin=0, vmax=1)

fig, axes = pyplot.subplots(1, 2, figsize=(20, 8), sharey=True)

sns.heatmap(
    pivot_elast_1,
    cmap="YlGnBu",
    norm=norm,
    ax=axes[0],
    cbar_kws={"label": "Elasticity"}
)
axes[0].set_title("Elasticity of COGS (θ_WI1)", fontsize=14)
axes[0].set_xlabel("Industry")
axes[0].set_ylabel("Year")
axes[0].set_xticklabels(pivot_elast_1.columns, rotation=45, ha="right")  # ensure integer labels

sns.heatmap(
    pivot_elast_2,
    cmap="YlGnBu",
    norm=norm,
    ax=axes[1],
    cbar_kws={"label": "Elasticity"}
)
axes[1].set_title("Elasticity of COGS in extended model with SG&A (θ_WI2)", fontsize=14)
axes[1].set_xlabel("Industry")
axes[1].set_ylabel("")  # avoid repeating "Year"
axes[1].set_xticklabels(pivot_elast_2.columns, rotation=45, ha="right")

pyplot.tight_layout()
pyplot.show()
 
No description has been provided for this image

Relationship Between Profit Rates and Elasticities¶

In this final step, I study the relationship between profit rates and the model implied elasticities (I use the three input model for this excercise), to determine whether industries with higher reliance on COGS are more profitable or not. I will plot a scatterplot between the elasiticies $\theta_{t,extended \ model}^{(j)}$ (x axis) and profit rates $\pi_t^{(j)}$ on the y axis. Then, I formally test this relationshiup by running the following panel regression $$\pi_t^{(j)} = \alpha_j + \theta_{t,extended \ model}^{(j)} \beta + \epsilon_{t}^{(j)} $$

So $ \alpha_j$ is an industry fixed effect and $\beta$ measures the following relationship:

  1. $\beta<0$ implies that industries that are more COGS intensive are less profitable.
  2. $\beta=0$ implies no relationship between industries COGS intensivness are and profitability.
  3. $\beta>0$ implies that industries that are more COGS intensive are more profitable.

The Scatterplot does not show a clear pattern. However, it shows that there may be different relationships between the pcogs production elasticity and profit rates. This is also confirmed in the main regression, which shows that on average, a 1% increase in the production elasticity results in a 0.003 increase in profit rates, and it is also not statistically different than zero at the 5% level. However, when I run a version of the panel regression above where I allow a single slope coefficient for each industry, a more stricking pattern emerges: the response is heterogeneous across industries. For example, if the cogs production elasticity for Retail stores increases by 1% is associated with an 0.18% increase in profit rates. Conversely, a 1% increase in the cogs production elasticity on the Food & Textile industry is translated on a 0.46% decrease in profit rates.

These results show that, within an industry, it may be more efficient for a company to invest in developing more (less) efficient use of the inputs going into the Cost of Goods Sold. For exanmple, the positive COGS and profit rate association could be linked to a better use of the goods sold and waste minimization. On the other hand, an increase in the cogs efficiency on the Food & Textile may suggest that companies may focus more in optimizing other inputs, such as capital (Food & Textile has high capital intensity requirements)

In [8]:
pyplot.figure(figsize=(8, 6))
sns.scatterplot(
    data=Delocker_elast_plots,
    x="theta_WI2_ct_update",
    y="profit_rate",
    hue="industry_name",
    legend="full"  # show all industries
)
pyplot.title("Profitability vs. COGS Elasticity in Extended SG&A Model")
pyplot.xlabel("COGS Production Elasticity in Extended  SG&A model (θ₂)")
pyplot.ylabel("Profit Rate")
pyplot.xlim(left=0, right=2)
pyplot.legend(bbox_to_anchor=(1.05, 1), loc="upper left")
pyplot.show()

#Save Regression Data and demean to remove fixed effects
reg_data = Delocker_elast_plots[["profit_rate", "theta_WI1_ct_update", "theta_WI2_ct_update", "industry_name"]].dropna().copy()
reg_data["profit_rate"] = reg_data["profit_rate"] * 100
reg_data["theta_WI1_ct_update"] = reg_data["theta_WI1_ct_update"] * 100
reg_data["theta_WI2_ct_update"] = reg_data["theta_WI2_ct_update"] * 100

reg_data_demean = reg_data.copy()
reg_data_demean["profit_rate_demeaned"] = reg_data_demean.groupby("industry_name")["profit_rate"].transform(lambda x: x - x.mean())
reg_data_demean["theta_WI2_demeaned"] = reg_data_demean.groupby("industry_name")["theta_WI2_ct_update"].transform(lambda x: x - x.mean())

reg = smfform.ols("profit_rate_demeaned ~ theta_WI2_demeaned", data=reg_data_demean).fit(cov_type="cluster",
                                                                                 cov_kwds={"groups": reg_data_demean["industry_name"]})
 
full_reg = smfform.ols("profit_rate_demeaned ~ theta_WI2_demeaned:C(industry_name)", data=reg_data_demean).fit(cov_type="cluster",
                                                                                 cov_kwds={"groups": reg_data_demean["industry_name"]})
print(reg.summary())
print(full_reg.summary())
 
No description has been provided for this image
                             OLS Regression Results                             
================================================================================
Dep. Variable:     profit_rate_demeaned   R-squared:                       0.000
Model:                              OLS   Adj. R-squared:                 -0.001
Method:                   Least Squares   F-statistic:                  0.001380
Date:                  Mon, 22 Sep 2025   Prob (F-statistic):              0.971
Time:                          19:48:09   Log-Likelihood:                -3220.9
No. Observations:                  1195   AIC:                             6446.
Df Residuals:                      1193   BIC:                             6456.
Df Model:                             1                                         
Covariance Type:                cluster                                         
======================================================================================
                         coef    std err          z      P>|z|      [0.025      0.975]
--------------------------------------------------------------------------------------
Intercept           4.094e-16   3.28e-16      1.250      0.211   -2.33e-16    1.05e-15
theta_WI2_demeaned     0.0003      0.008      0.037      0.970      -0.016       0.017
==============================================================================
Omnibus:                      123.063   Durbin-Watson:                   0.511
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              816.253
Skew:                          -0.180   Prob(JB):                    5.66e-178
Kurtosis:                       7.033   Cond. No.                         14.7
==============================================================================

Notes:
[1] Standard Errors are robust to cluster correlation (cluster)
                             OLS Regression Results                             
================================================================================
Dep. Variable:     profit_rate_demeaned   R-squared:                       0.081
Model:                              OLS   Adj. R-squared:                  0.063
Method:                   Least Squares   F-statistic:                 4.135e+30
Date:                  Mon, 22 Sep 2025   Prob (F-statistic):               0.00
Time:                          19:48:09   Log-Likelihood:                -3170.4
No. Observations:                  1195   AIC:                             6389.
Df Residuals:                      1171   BIC:                             6511.
Df Model:                            23                                         
Covariance Type:                cluster                                         
================================================================================================================================
                                                                   coef    std err          z      P>|z|      [0.025      0.975]
--------------------------------------------------------------------------------------------------------------------------------
Intercept                                                     4.094e-16   6.35e-16      0.645      0.519   -8.34e-16    1.65e-15
theta_WI2_demeaned:C(industry_name)[Accommodation & Food]       -0.1858    3.8e-16  -4.89e+14      0.000      -0.186      -0.186
theta_WI2_demeaned:C(industry_name)[Admin & Support]            -0.1726   6.83e-17  -2.53e+15      0.000      -0.173      -0.173
theta_WI2_demeaned:C(industry_name)[Agriculture]                 0.0734   2.33e-16   3.16e+14      0.000       0.073       0.073
theta_WI2_demeaned:C(industry_name)[Arts & Entertainment]        0.0069   5.26e-17    1.3e+14      0.000       0.007       0.007
theta_WI2_demeaned:C(industry_name)[Construction]               -0.0634   1.37e-17  -4.64e+15      0.000      -0.063      -0.063
theta_WI2_demeaned:C(industry_name)[Education]                   0.0610   2.65e-16    2.3e+14      0.000       0.061       0.061
theta_WI2_demeaned:C(industry_name)[Finance]                    -0.2734   4.79e-17   -5.7e+15      0.000      -0.273      -0.273
theta_WI2_demeaned:C(industry_name)[Healthcare]                  0.2637   1.33e-16   1.98e+15      0.000       0.264       0.264
theta_WI2_demeaned:C(industry_name)[Information]                 0.0014   1.63e-16    8.5e+12      0.000       0.001       0.001
theta_WI2_demeaned:C(industry_name)[Mfg: Chemicals/Plastics]    -0.1312   1.21e-16  -1.08e+15      0.000      -0.131      -0.131
theta_WI2_demeaned:C(industry_name)[Mfg: Food/Textile]          -0.4016   2.18e-16  -1.84e+15      0.000      -0.402      -0.402
theta_WI2_demeaned:C(industry_name)[Mfg: Metals/Machinery]       0.1744    7.3e-16   2.39e+14      0.000       0.174       0.174
theta_WI2_demeaned:C(industry_name)[Mining & Oil]                0.4498   1.37e-16   3.29e+15      0.000       0.450       0.450
theta_WI2_demeaned:C(industry_name)[Other]                       0.0064    1.7e-17   3.79e+14      0.000       0.006       0.006
theta_WI2_demeaned:C(industry_name)[Other Services]             -0.0043   2.95e-17  -1.44e+14      0.000      -0.004      -0.004
theta_WI2_demeaned:C(industry_name)[Prof. Services]             -0.0315   5.32e-17  -5.92e+14      0.000      -0.031      -0.031
theta_WI2_demeaned:C(industry_name)[Real Estate]                -0.2031    1.7e-16  -1.19e+15      0.000      -0.203      -0.203
theta_WI2_demeaned:C(industry_name)[Retail: Nonstore]           -0.0322   1.31e-17  -2.46e+15      0.000      -0.032      -0.032
theta_WI2_demeaned:C(industry_name)[Retail: Stores]              0.1287   5.38e-17   2.39e+15      0.000       0.129       0.129
theta_WI2_demeaned:C(industry_name)[Transport]                  -0.0735   7.82e-17   -9.4e+14      0.000      -0.074      -0.074
theta_WI2_demeaned:C(industry_name)[Utilities]                   0.0230   4.78e-17   4.81e+14      0.000       0.023       0.023
theta_WI2_demeaned:C(industry_name)[Warehousing]                 0.0033   6.63e-17   5.02e+13      0.000       0.003       0.003
theta_WI2_demeaned:C(industry_name)[Wholesale Trade]            -0.0429   1.85e-16  -2.32e+14      0.000      -0.043      -0.043
==============================================================================
Omnibus:                      160.527   Durbin-Watson:                   0.599
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             1460.818
Skew:                          -0.264   Prob(JB):                         0.00
Kurtosis:                       8.391   Cond. No.                         24.6
==============================================================================

Notes:
[1] Standard Errors are robust to cluster correlation (cluster)

Final Steps¶

Finally, I standardize log-productivity estimates $ \widehat{\omega}_{it}^{(j)}$, by their firm average, so all values are interpreted as-percentage deviations from the firm's steady state \begin{align*} \tilde{\omega}_{it}^{(j)} = & \ \widehat{\omega}_{it}^{(j)} - \widehat{\overline{\omega}}_{i}^{(j)} \end{align*} where $\widehat{\overline{\omega}}_{i}^{(j)} = \frac{1}{T}\sum_{t=1}^T \widehat{\omega}_{it}^{(j)} $ and $T$ is the number of observed periods per firm $i$. And save the output.

In [14]:
Data = pd.merge(Data,Productivities, how='left', on=['gvkey','year','ind2d']) 
Data = pd.merge(Data,Productivities_ext, how='left', on=['gvkey','year','ind2d']) 
 
#Normalize firm productivity to the historical average  
Data['omega_bar']= Data['omega'].groupby(Data['gvkey']).transform('mean')
Data['omega_tilde']= Data['omega'] -Data['omega_bar'] 
Data['omega_bar_ext']= Data['omega_ext'].groupby(Data['gvkey']).transform('mean')
Data['omega_tilde_ext']= Data['omega_ext'] -Data['omega_bar_ext'] 
#Finally save the output into .dta files
pyreadstat.write_dta(Delocker_elast,dst_path ='./output/theta_W_s_window_update.dta')
pyreadstat.write_dta(df=Data,dst_path ='./output/data_main_upd_trim_1_prod.dta')