DharaNivesh: Data Science for Financial Decisions: How to Generate Continuous Futures Contracts from NSE Data with nselib

In today's world, stock trading has become a common side hustle, and the credit goes to the exponential pace of digitalization, which has made it incredibly easy to step into the fascinating realm of stock trading. As of March 2024, there are close to 15.1 crore Demat Accounts, which is roughly ten percent of the total population. This percentage nearly doubles if we exclude those under 18 and over 60 years of age.

Nearly 80 percent of global trading volume can be attributed to the volumes in Futures and Options (F&O) from NSE & BSE, a trend that is not entirely mysterious. With the advent of Technology Revolution, primarily fueled by data as the new oil, has introduced High-Frequency Trading (HFT) algorithms, machine learning, and artificial intelligence, ushering in an era of robotic traders into the market.

Challenges in Data Acquisition

While we now understand the use of data for informed decision-making, obtaining the right set of data in the right format is not always straightforward. The integration of advanced technologies and the sheer volume of available data can pose challenges for even the most experienced traders and analysts.

Continuing with our discussion on the integration of advanced technologies in stock trading, today we will delve into how to get price volume data for NSE listed stocks using Python. We'll explore the available libraries, their documentation, and, most importantly, how to prepare this data for model building.

To begin with, several libraries can help us fetch and manipulate stock data in Python. Among the most popular are yfinance, nselib, nsetools etc. These libraries provide comprehensive documentation and are user-friendly, making them ideal for both beginners and seasoned traders.

Link to Library Documentation

· nselib Documentation

· yfinance Documentation

· nsetools Documentation

Installing Required Library

We'll start by installing and importing these libraries. The nselib library, for example, can be installed using pip.

pip install nselib

Importing Libraries and Fetching Data

Once installed, you can easily download the price and volume data from any NSE stock. Here’s a simple example to get started with SBI data.

from # Importing necessary libraries
nselib import capital_market
from nselib import derivatives
import pandas as pd
  
fut_data = derivatives.future_price_volume_data(symbol='SBIN', instrument='FUTSTK', from_date='01-01-2015', to_date='31-05-2024')

Example Output: DataFrame Structure

The output will provide you with a DataFrame containing columns like ‘Open’, ‘High’, ‘Low’, ‘Close’, ‘AveragePrice’ etc.

Embracing Challenge: Creating Continuous Contracts

Continuing from our previous discussion, so far, so good. But let's suppose I want to build a model using derivative future data. The problem with this is that in our DataFrame, we will have at least three contracts running on any given day. For instance, if you filter the date to 01-05-2024, you will see three rows corresponding to the May, June, and July contracts. This is a common problem encountered by any analyst when attempting to build a model using futures data.

This issue is also discussed in the book *Technical Analysis of the Financial Markets* by John J. Murphy, who explains how analysts create continuation charts for futures. The technique most commonly employed is to link a number of contracts together to provide continuity. When one contract expires, another one is used. However, this is easier said than done, especially when dealing with a large dataset spanning, say, the last ten years. Manually selecting contracts becomes very difficult, especially when shifting from the current contract to the next.

For example, in May, the future contract for May expired on the 30th. Therefore, until the 30th of May, the contract to be used is the May expiring contract, but the next trading day will not have May contracts. Thus, the 31st of May has to be tagged to the June contract. Essentially, there are two key tasks that need to be accomplished:

Creating a continuous price data series using the most recent contract.
Removing all other data that pertains to far-month contracts.

Approach to Creating Continuous Contracts

To create a continuous contract DataFrame using the approach described, follow these steps. We'll extract the necessary information from the `TIMESTAMP` and `EXPIRY_DT` columns, perform the required calculations, and filter the data to obtain the continuous futures contracts.

Step-by-Step Approach

Extracting Months and Years: We extract the month and year from the `TIMESTAMP` and `EXPIRY_DT` columns and create new columns `TIMESTAMP_Year`, `TIMESTAMP_Month`, `EXPIRY_Year`, and `EXPIRY_Month’.

# Extracting Months & Year from TIMESTAMP column
fut_data['TIMESTAMP'] = pd.to_datetime(fut_data['TIMESTAMP'], format='%d-%b-%Y')
fut_data['TIMESTAMP_Year'] = fut_data['TIMESTAMP']
fut_data['TIMESTAMP_Month'] = fut_data['TIMESTAMP']
fut_data['TIMESTAMP'] = fut_data['TIMESTAMP'].dt.date
fut_data['TIMESTAMP_Month'] = fut_data['TIMESTAMP_Month'].dt.month
fut_data['TIMESTAMP_Year'] = fut_data['TIMESTAMP_Year'].dt.year

# Extracting Months & Year from EXPIRY_DT column
fut_data['EXPIRY_DT'] = pd.to_datetime(fut_data['EXPIRY_DT'], format='%d-%b-%Y')
fut_data['EXPIRY_Year'] = fut_data['EXPIRY_DT']
fut_data['EXPIRY_Month'] = fut_data['EXPIRY_DT']
fut_data['EXPIRY_DT'] = fut_data['EXPIRY_DT'].dt.date
fut_data['EXPIRY_Month'] = fut_data['EXPIRY_Month'].dt.month
fut_data['EXPIRY_Year'] = fut_data['EXPIRY_Year'].dt.year

Sorting Data: The DataFrame is sorted by `TIMESTAMP` to ensure chronological order.

# Sorting dates on the basis of TIMESTAMP column
fut_data = fut_data.sort_values('TIMESTAMP').reset_index(drop=True)

Creating Cumulative Month Columns: We create `TIMESTAMP_Month_Cum` and `EXPIRY_Month_Cum` columns by converting the year and month into a cumulative month count (years multiplied by 12 plus months).

# Creating an index column of TIMESTAMP Year and Expiry Month i.e. if data starts from 2025 then 2015 will be 0, 2016 will be 1, 2017 will be 2 and so on
fut_data['TIMESTAMP_Year_Index'] = fut_data['TIMESTAMP_Year'] - 2015
fut_data['EXPIRY_Year_Index'] = fut_data['EXPIRY_Year'] - 2015
  
# Creating cumulative columns
fut_data['TIMESTAMP_Month_Cum'] = ((12 * fut_data['TIMESTAMP_Year_Index']) + fut_data['TIMESTAMP_Month'])
fut_data['EXPIRY_Month_Cum'] = ((12 * fut_data['EXPIRY_Year_Index']) + fut_data['EXPIRY_Month'])

Calculating Month Difference: The `Month_diff` column is calculated as the difference between ` EXPIRY_Month_Cum` and ` TIMESTAMP_Month_Cum. Generating Month_diff_list: Month_diff_list` is created by grouping the DataFrame by `TIMESTAMP` and transforming the `Month_diff` values into lists for each date.

# Create an empty list to store the result
column3 = []
fut_data['Month_diff'] = abs(fut_data['EXPIRY_Month_Cum'] - fut_data['TIMESTAMP_Month_Cum'])

# Iterate over each unique group in 'Column1' and compile lists of 'Column2' values
unique_groups = fut_data['TIMESTAMP'].unique()
grouped_data = {group: fut_data[fut_data['TIMESTAMP'] == group]['Month_diff'].tolist() for group in unique_groups}

# Populate 'Column3' by mapping the grouped data to each row
for index, row in fut_data.iterrows():
    column3.append(grouped_data[row['TIMESTAMP']])

# Assign the list to 'Column3' in the DataFrame
fut_data['Month_diff_list'] = column3

Creating Near_Month Column: The `Near_Month` column is calculated by subtracting the minimum value of `Month_diff_list` from each value in `Month_diff.

fut_data['Near_Month'] = ""

# Here row 30 is "Near_Month", row 29 is 'Month_diff_list' and row 28 is 'Month_diff'
  
for i in range(len(fut_data['TIMESTAMP'])):
    fut_data.iat[i,30] = min(fut_data.iat[i,29]) - fut_data.iat[i,28]

Filtering for Continuous Contracts: The DataFrame is filtered to keep only the rows where `Near_Month` is equal to 0, providing the continuous futures contracts.

fut_data = fut_data.loc[fut_data['Near_Month'] == 0]

This approach ensures that the futures data is continuous and suitable for building models, addressing the common challenge of dealing with multiple active contracts for different expiration dates.

Visualization of SBIN Continuous Chart

Below is the visualization of SBIN's continuous futures contract chart from 2015 to 2024:

Modeling Predictive Analytics

By employing the logic and techniques discussed, we have successfully created a continuous futures contract DataFrame, which forms a robust foundation for building predictive models.

In the next phase of this project, I will be using machine learning algorithms such as Random Forest and Artificial Neural Networks (ANN) to predict the next day's price. These models can help in formulating strategies like "buy today, sell tomorrow" or "sell today, buy tomorrow," potentially offering significant trading advantages.

Stay tuned for detailed insights into the modeling process, including data preprocessing steps, feature engineering, model training, evaluation, and the implementation of these trading strategies. We will also explore the performance metrics of these models and their real-world applicability.

Thank you for following along. Make sure to subscribe and keep following the blog for more in-depth analysis and updates on the latest in stock trading and machine learning. Your feedback and questions are always welcome, as they help in refining and enriching the content. Together, let's dive deeper into the fascinating world of stock trading with data science.

DharaNivesh: Data Science for Financial Decisions

Sunday, June 23, 2024

How to Generate Continuous Futures Contracts from NSE Data with nselib

No comments:

Post a Comment

Understanding Put-Call Parity: The Most Ignored Yet Powerful Concept in Option Pricing

Followers

Report Abuse