DharaNivesh: Data Science for Financial Decisions

Sunday, August 4, 2024

Predicting Stock Prices: The Surprising Accuracy and Hidden Power of Linear Regression

Introduction to Linear Regression

Linear regression is one of the most fundamental and widely used statistical techniques in data analysis and machine learning. At its core, linear regression aims to model the relationship between a dependent variable and one or more independent variables by fitting a linear equation to observed data.

Mathematically, a simple linear regression model can be represented as:

y = β₀ + β₁x + ε

Where:

·       y is the dependent variable.
·       x is the independent variable.
·       β₀ is the y-intercept.
·       β₁ is the slope of the line.
·       ε is the error term.

In a multiple linear regression scenario, the equation expands to:

y = β₀ + β₁x₁ + β₂x₂ + .......... + β_nx_n + ε

Where x₁, x₂, ..., x_n are multiple independent variables.

The goal of linear regression is to determine the values of β₀ and β₁ (or β₁, β₂, ..., β_n in the case of multiple regression) that minimize the sum of squared errors between the predicted values and the actual values.

Applications of Linear Regression

Linear regression is a versatile tool used in various fields to predict outcomes and analyze trends. Some of the key areas where linear regression is applied include:

1. Economics: Forecasting economic indicators such as GDP, unemployment rates, and inflation.

2. Finance: Modeling relationships between financial metrics, such as risk and return.

3. Healthcare: Predicting patient outcomes based on medical histories and other factors.

4. Marketing: Estimating the impact of advertising spend on sales.

5. Real Estate: Valuing properties based on features like location, size, and age.

6. Environmental Science: Assessing the impact of environmental variables on climate change.

Linear Regression in Stock Market Analysis

In the realm of stock market analysis, linear regression is a powerful tool for predicting stock prices and understanding market trends. Analysts use historical price data and various financial indicators to build regression models that can forecast future stock prices.

How Linear Regression is Used in the Stock Market

1. Trend Analysis: By examining the relationship between time and stock prices, analysts can identify long-term trends and potential turning points.

2. Price Prediction: Using historical data, analysts can predict future stock prices by modeling the relationship between a stock's past performance and various market factors.

3. Risk Management: Linear regression helps in assessing the volatility of stock returns, aiding in the development of risk management strategies.

4. Portfolio Optimization: By analyzing the relationships between different stocks, investors can optimize their portfolios for better returns.

Example: Using Python to Predict Stock Prices with Linear Regression

Let's dive into a practical example where we pull data from the National Stock Exchange (NSE) of India and use linear regression to predict stock prices.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
import yfinance as yf

# Fetching the stock data
ticker = 'RELIANCE.NS'
data = yf.download(ticker, start='2020-01-01', end='2023-01-01')

# Preparing the data
data['Date'] = data.index
data['Date'] = pd.to_datetime(data['Date'])
data['Date_ordinal'] = data['Date'].apply(lambda date: date.toordinal())

# Splitting the data into training and testing sets
X = data[['Date_ordinal']]
y = data['Close']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=False)

# Creating and training the linear regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Making predictions
y_pred = model.predict(X_test)

# Visualizing the results
plt.figure(figsize=(14, 7))
plt.plot(data['Date'], data['Close'], label='Actual Prices')
plt.plot(data.loc[X_test.index]['Date'], y_pred, label='Predicted Prices')
plt.xlabel('Date')
plt.ylabel('Stock Price')
plt.title('Stock Price Prediction using Linear Regression')
plt.legend()
plt.show()

Conclusion

Linear regression is an invaluable tool for predicting stock prices and analyzing market trends. It provides a straightforward yet powerful approach to understanding the relationships between various market factors and stock performance. While time series analysis is a widely used method in predicting stock prices, we cannot overlook the importance of other machine learning models, which offer diverse perspectives and can enhance predictive accuracy.

Moreover, linear regression can be applied to momentum indicators such as moving averages and relative strength indices (RSI) to forecast stock prices further. These indicators help identify the strength and direction of market trends, providing additional insights into future price movements.

In the ever-evolving landscape of stock market analysis, machine learning has brought new momentum and perspective, enabling analysts to make more informed decisions and optimize their investment strategies.

Thank you for taking the time to read this post. If you enjoyed the content and found it useful, please share it with others and follow my blog for more insightful articles on data analysis and machine learning. Happy investing🤑💲🤑!

Sunday, July 7, 2024

Amazing Insights: How Open Interest Can Wow Your Stock Market Moves

With the growing number of participants in the stock market, not just in India and the US but globally, there is an urgent need for investors and traders to make informed decisions promptly. This is especially important for retail investors, whose investment decisions are often influenced by unreliable tips based on loose sentiments. Therefore, it is crucial to validate these tips using fundamental and technical analysis.

Today, we will discuss using open interest to understand price trends, a topic many traders and investors find challenging. We will explore what open interest is, how it can be used in daily trading, and provide real-time examples to illustrate its application.

Firstly, open interest can be utilized by both equity and futures traders to predict underlying price trends. Although it primarily applies to futures markets, its insights are valuable to all traders. Open interest provides a third dimension to forecasts, complementing price and volume analysis.

What is Open Interest?
Open interest is the total number of outstanding or unliquidated contracts at the end of the day. Each contract involves both a buyer and a seller, who together create the contract. Open interest represents these outstanding contracts held by market participants. An increase or decrease in open interest indicates a corresponding increase or decrease in the number of contracts, reflecting the market's activity.

How change in Open Interest happens?
Every time a trade happens in the market, the open interest is affected in one of three ways: it increases, decreases, or remains unchanged. These scenarios can be summarized in the table below:

Buyer	Seller	Change in Open Interest
Enters new long	Enters new short	Increases
Enters new long	Exits old long	No change
Exits old short	Enters new short	No change
Exits old short	Exits old short	Decreases

Increase in Open Interest: Both the buyer and seller are initiating new contracts, leading to an increase in open interest.

No Change in Open Interest: One party (either the buyer or seller) is entering a new contract while the other is exiting an old position, resulting in no net change in the number of contracts.

Decrease in Open Interest: Both the buyer and seller are exiting their old positions, causing open interest to decrease.

To summarize:
- If both the buyer and seller initiate a new contract, open interest will increase.
- If both the buyer and seller liquidate an old position, open interest will decline.
- If one party initiates a new position while the other liquidates an old position, open interest will remain unchanged.

How to read or analyze change in Open Interest?
Although open interest can be used in isolation with price, analysts usually incorporate volume with open interest to analyze the market. Below is a table that outlines different scenarios and the resulting market moves:

Price	Volume	Open Interest	Market
Increasing	Increasing	Increasing	Continues increasing
Increasing	Declining	Declining	Reversal i.e. market to decline
Declining	Increasing	Increasing	Continues declining
Declining	Declining	Declining	Reversal i.e. market to rise

If both open interest and volume increase, it indicates the continuation of the current trend. If both open interest and volume decline, it indicates a reversal of the current trend.

Below is the Nifty 50 daily close price vs Open Interest chart. Nifty 50 is plotted through a line chart of the right axis and Open Interest as a bar chart on the left axis.

In the below chart we can see that the price of Nifty increased from 22700 to 23400 between 31^st May and 3^rd June but Open Interest bar shows light green color i.e. short covering and not long buildup. This indicates the second scenario in the above table. The market declined on the very next day to 22000 with fresh shorts being building up. However, the markets started recovering and we can see short covering on the next day. On 6^th there is strong long buildup i.e. first scenario from the above table. As we can see from the chart that market continues to trend in the upward direction after that.

Courtesy: https://web.sensibull.com/open-interest/fut-oi-vs-time?tradingsymbol=NIFTY

Below is the chart of Bitcoin/TetherUS, we can see that on May 5^th (marked using arrow) the price declined with increasing volume and increasing Open Interest i.e. third scenario from the above table indicating continuation of declining trend and that is what happened in subsequent trading session.

Similarly, we can also see that price on May 10 and with increasing volume and open interest leading a slight uptrend in the underlying.

Courtesy: https://web.sensibull.com/open-interest/fut-oi-vs-time?tradingsymbol=NIFTY

So, to conclude this is how we can use Open Interest with volume and price to predict market trend. Analysts complement open interest with some momentum indicators like RSI & Stochastics to get an understand of oversold or overbought position of the underlying but for now I will reserve this topic for a later discussion.

Sunday, June 23, 2024

How to Generate Continuous Futures Contracts from NSE Data with nselib

In today's world, stock trading has become a common side hustle, and the credit goes to the exponential pace of digitalization, which has made it incredibly easy to step into the fascinating realm of stock trading. As of March 2024, there are close to 15.1 crore Demat Accounts, which is roughly ten percent of the total population. This percentage nearly doubles if we exclude those under 18 and over 60 years of age.

Nearly 80 percent of global trading volume can be attributed to the volumes in Futures and Options (F&O) from NSE & BSE, a trend that is not entirely mysterious. With the advent of Technology Revolution, primarily fueled by data as the new oil, has introduced High-Frequency Trading (HFT) algorithms, machine learning, and artificial intelligence, ushering in an era of robotic traders into the market.

Challenges in Data Acquisition

While we now understand the use of data for informed decision-making, obtaining the right set of data in the right format is not always straightforward. The integration of advanced technologies and the sheer volume of available data can pose challenges for even the most experienced traders and analysts.

Continuing with our discussion on the integration of advanced technologies in stock trading, today we will delve into how to get price volume data for NSE listed stocks using Python. We'll explore the available libraries, their documentation, and, most importantly, how to prepare this data for model building.

To begin with, several libraries can help us fetch and manipulate stock data in Python. Among the most popular are yfinance, nselib, nsetools etc. These libraries provide comprehensive documentation and are user-friendly, making them ideal for both beginners and seasoned traders.

Link to Library Documentation

· nselib Documentation

· yfinance Documentation

· nsetools Documentation

Installing Required Library

We'll start by installing and importing these libraries. The nselib library, for example, can be installed using pip.

pip install nselib

Importing Libraries and Fetching Data

Once installed, you can easily download the price and volume data from any NSE stock. Here’s a simple example to get started with SBI data.

from # Importing necessary libraries
nselib import capital_market
from nselib import derivatives
import pandas as pd
  
fut_data = derivatives.future_price_volume_data(symbol='SBIN', instrument='FUTSTK', from_date='01-01-2015', to_date='31-05-2024')

Example Output: DataFrame Structure

The output will provide you with a DataFrame containing columns like ‘Open’, ‘High’, ‘Low’, ‘Close’, ‘AveragePrice’ etc.

Embracing Challenge: Creating Continuous Contracts

Continuing from our previous discussion, so far, so good. But let's suppose I want to build a model using derivative future data. The problem with this is that in our DataFrame, we will have at least three contracts running on any given day. For instance, if you filter the date to 01-05-2024, you will see three rows corresponding to the May, June, and July contracts. This is a common problem encountered by any analyst when attempting to build a model using futures data.

This issue is also discussed in the book *Technical Analysis of the Financial Markets* by John J. Murphy, who explains how analysts create continuation charts for futures. The technique most commonly employed is to link a number of contracts together to provide continuity. When one contract expires, another one is used. However, this is easier said than done, especially when dealing with a large dataset spanning, say, the last ten years. Manually selecting contracts becomes very difficult, especially when shifting from the current contract to the next.

For example, in May, the future contract for May expired on the 30th. Therefore, until the 30th of May, the contract to be used is the May expiring contract, but the next trading day will not have May contracts. Thus, the 31st of May has to be tagged to the June contract. Essentially, there are two key tasks that need to be accomplished:

Creating a continuous price data series using the most recent contract.
Removing all other data that pertains to far-month contracts.

Approach to Creating Continuous Contracts

To create a continuous contract DataFrame using the approach described, follow these steps. We'll extract the necessary information from the `TIMESTAMP` and `EXPIRY_DT` columns, perform the required calculations, and filter the data to obtain the continuous futures contracts.

Step-by-Step Approach

Extracting Months and Years: We extract the month and year from the `TIMESTAMP` and `EXPIRY_DT` columns and create new columns `TIMESTAMP_Year`, `TIMESTAMP_Month`, `EXPIRY_Year`, and `EXPIRY_Month’.

# Extracting Months & Year from TIMESTAMP column
fut_data['TIMESTAMP'] = pd.to_datetime(fut_data['TIMESTAMP'], format='%d-%b-%Y')
fut_data['TIMESTAMP_Year'] = fut_data['TIMESTAMP']
fut_data['TIMESTAMP_Month'] = fut_data['TIMESTAMP']
fut_data['TIMESTAMP'] = fut_data['TIMESTAMP'].dt.date
fut_data['TIMESTAMP_Month'] = fut_data['TIMESTAMP_Month'].dt.month
fut_data['TIMESTAMP_Year'] = fut_data['TIMESTAMP_Year'].dt.year

# Extracting Months & Year from EXPIRY_DT column
fut_data['EXPIRY_DT'] = pd.to_datetime(fut_data['EXPIRY_DT'], format='%d-%b-%Y')
fut_data['EXPIRY_Year'] = fut_data['EXPIRY_DT']
fut_data['EXPIRY_Month'] = fut_data['EXPIRY_DT']
fut_data['EXPIRY_DT'] = fut_data['EXPIRY_DT'].dt.date
fut_data['EXPIRY_Month'] = fut_data['EXPIRY_Month'].dt.month
fut_data['EXPIRY_Year'] = fut_data['EXPIRY_Year'].dt.year

Sorting Data: The DataFrame is sorted by `TIMESTAMP` to ensure chronological order.

# Sorting dates on the basis of TIMESTAMP column
fut_data = fut_data.sort_values('TIMESTAMP').reset_index(drop=True)

Creating Cumulative Month Columns: We create `TIMESTAMP_Month_Cum` and `EXPIRY_Month_Cum` columns by converting the year and month into a cumulative month count (years multiplied by 12 plus months).

# Creating an index column of TIMESTAMP Year and Expiry Month i.e. if data starts from 2025 then 2015 will be 0, 2016 will be 1, 2017 will be 2 and so on
fut_data['TIMESTAMP_Year_Index'] = fut_data['TIMESTAMP_Year'] - 2015
fut_data['EXPIRY_Year_Index'] = fut_data['EXPIRY_Year'] - 2015
  
# Creating cumulative columns
fut_data['TIMESTAMP_Month_Cum'] = ((12 * fut_data['TIMESTAMP_Year_Index']) + fut_data['TIMESTAMP_Month'])
fut_data['EXPIRY_Month_Cum'] = ((12 * fut_data['EXPIRY_Year_Index']) + fut_data['EXPIRY_Month'])

Calculating Month Difference: The `Month_diff` column is calculated as the difference between ` EXPIRY_Month_Cum` and ` TIMESTAMP_Month_Cum. Generating Month_diff_list: Month_diff_list` is created by grouping the DataFrame by `TIMESTAMP` and transforming the `Month_diff` values into lists for each date.

# Create an empty list to store the result
column3 = []
fut_data['Month_diff'] = abs(fut_data['EXPIRY_Month_Cum'] - fut_data['TIMESTAMP_Month_Cum'])

# Iterate over each unique group in 'Column1' and compile lists of 'Column2' values
unique_groups = fut_data['TIMESTAMP'].unique()
grouped_data = {group: fut_data[fut_data['TIMESTAMP'] == group]['Month_diff'].tolist() for group in unique_groups}

# Populate 'Column3' by mapping the grouped data to each row
for index, row in fut_data.iterrows():
    column3.append(grouped_data[row['TIMESTAMP']])

# Assign the list to 'Column3' in the DataFrame
fut_data['Month_diff_list'] = column3

Creating Near_Month Column: The `Near_Month` column is calculated by subtracting the minimum value of `Month_diff_list` from each value in `Month_diff.

fut_data['Near_Month'] = ""

# Here row 30 is "Near_Month", row 29 is 'Month_diff_list' and row 28 is 'Month_diff'
  
for i in range(len(fut_data['TIMESTAMP'])):
    fut_data.iat[i,30] = min(fut_data.iat[i,29]) - fut_data.iat[i,28]

Filtering for Continuous Contracts: The DataFrame is filtered to keep only the rows where `Near_Month` is equal to 0, providing the continuous futures contracts.

fut_data = fut_data.loc[fut_data['Near_Month'] == 0]

This approach ensures that the futures data is continuous and suitable for building models, addressing the common challenge of dealing with multiple active contracts for different expiration dates.

Visualization of SBIN Continuous Chart

Below is the visualization of SBIN's continuous futures contract chart from 2015 to 2024:

Modeling Predictive Analytics

By employing the logic and techniques discussed, we have successfully created a continuous futures contract DataFrame, which forms a robust foundation for building predictive models.

In the next phase of this project, I will be using machine learning algorithms such as Random Forest and Artificial Neural Networks (ANN) to predict the next day's price. These models can help in formulating strategies like "buy today, sell tomorrow" or "sell today, buy tomorrow," potentially offering significant trading advantages.

Stay tuned for detailed insights into the modeling process, including data preprocessing steps, feature engineering, model training, evaluation, and the implementation of these trading strategies. We will also explore the performance metrics of these models and their real-world applicability.

Thank you for following along. Make sure to subscribe and keep following the blog for more in-depth analysis and updates on the latest in stock trading and machine learning. Your feedback and questions are always welcome, as they help in refining and enriching the content. Together, let's dive deeper into the fascinating world of stock trading with data science.

Saturday, June 8, 2024

Unlocking the Power of Data Analytics: Lessons from Exit Polls and Beyond

Data, as defined by the dictionary, refers to facts or statistics collected for reference or analysis. However, the crucial question is how accurate or misleading data can be. The recent 2024 General Election exit polls serve as a prime example. Despite 80% of exit polls predicting certain outcomes, the actual results were starkly different, causing significant stock market volatility. This discrepancy led opposition parties to call for an investigation of polling companies for possible manipulations and sparked widespread criticism of the pollsters. Rather than analyze the poll results—ample articles already do so—this article aims to explore how data can be misleading and how AI can enhance data accuracy.

The Importance of Sample Size and Bias in Data Collection

Presumably, the key issues in exit polls are the sample size and potential bias within the sample. For instance, a leading Indian newspaper reported that a prominent exit poll predicting 350-380 seats had a sample size of only 450,000 voters. With approximately 968 million eligible voters and a 65.8% turnout (about 637 million voters), a sample size of 450,000 is woefully inadequate. This small sample size can lead to significant inaccuracies in predictions.

Moreover, ensuring an unbiased sample that is evenly distributed across the country is challenging. This explains why exit polls were accurate in states like MP, Gujarat, and Delhi but failed in UP and West Bengal. Increasing the sample size could help, but it brings its own challenges, such as higher costs and logistical difficulties.

To provide a more accurate prediction, pollsters need to ensure that their sample represents the diverse population of voters. This involves not just increasing the sample size but also ensuring that it includes voters from different regions, socioeconomic backgrounds, ages, and other demographic factors. This level of detail is difficult to achieve but necessary for accurate data collection.

The Historical Perspective: Lessons from Abraham Wald

This situation reminds me of a historical case detailed in Syed Mathew's book "Black Box Thinking." Abraham Wald, a Hungarian mathematician who moved to America before or during World War II, worked on the Applied Mathematics Panel. The panel was a group of brilliant mathematicians working on behalf of military. They used to analyze whole range of issues from effective pattern of torpedo launching to aerodynamic efficiency of missiles.

Wald was asked by military to help them in a crucial issue. The wartime leaders realized that they need to reinforce the planes with armor to protect them from gunfire. But the problem was that they cannot armor the entire surface area because that would make the plane heavy to fly. The air force had already accessed the data and to them the pattern was very clear. Most of the planes were riddled with gunfire all over the wings and fuselage. The military initially proposed adding armor to the areas with the most bullet holes—wings and fuselage and keeping the cockpit and tail unguarded. However, Wald disagreed, noting that the military has only considered data from returning planes. According to Wald, the planes that were hit in the cockpit or tail often did not return. Wald's insight, that planes surviving had avoided these critical areas, led to reinforcing the cockpit and tail, profoundly impacting the war effort.

This story highlights the importance of considering all data, including unseen elements, to draw meaningful conclusions in aviation, business, politics, and beyond.Wald's analysis exemplifies the importance of not just relying on visible data but also considering the missing data. This approach can significantly alter the conclusions drawn from data analysis, leading to more effective and accurate solutions.

The Role of Artificial Intelligence in Modern Data Analytics

Artificial Intelligence (AI) powered analytics can address data bias and hidden information issues. A more accurate analysis during the 2024 elections came from a US-based research firm that used AI to gather sentiments from social media interactions. AI can assess what potential voters read, write, and how they respond online, reducing the risk of false positives common in traditional exit polls.

AI-driven data collection can cover a larger and more diverse audience than traditional methods, thereby minimizing bias. By analyzing social media interactions, AI captures a wider voter base, representing various societal segments and reducing bias. For example, AI can analyze data from millions of social media posts, providing insights into voter sentiment that are more representative of the entire population.

Furthermore, AI can continuously learn and improve its analysis methods. As more data is collected, AI algorithms can adjust and become more accurate over time. This ability to learn and adapt makes AI a powerful tool in data analytics, capable of providing increasingly accurate predictions and insights.

Applications Beyond Politics: AI in Business and Other Sectors

The 2024 general election exit polls results emphasize AI's untapped potential in revolutionizing analytics. Beyond politics, AI can transform data analytics across industries, ensuring more accurate, comprehensive, and unbiased insights. In business, for example, AI can be used to analyze consumer behavior, predict market trends, and optimize supply chains.

In healthcare, AI can analyze vast amounts of medical data to identify patterns and predict outcomes, leading to better patient care and treatment plans. In finance, AI can detect fraudulent activities by analyzing transaction patterns and identifying anomalies.

Moreover, AI's ability to process and analyze large datasets quickly and accurately can drive innovation in research and development. For instance, in the pharmaceutical industry, AI can analyze research data to identify potential new drugs and predict their effectiveness.

Conclusion: The Future of Data Analytics with AI

The integration of AI into data analytics can significantly improve the accuracy and reliability of data interpretations. By leveraging AI, we can mitigate the inherent biases and limitations of traditional data collection methods, leading to more informed decision-making. As we continue to harness AI's capabilities, we unlock new possibilities for innovation and progress in various fields, making data not just a collection of numbers, but a powerful tool for insight and change.

AI's ability to analyze large and diverse datasets quickly and accurately offers immense potential for improving decision-making across sectors. From politics and business to healthcare and finance, AI can transform how we collect, analyze, and use data. By addressing the challenges of sample size and bias, AI ensures that data analytics provides a more accurate and comprehensive picture, leading to better outcomes and innovations.

In conclusion, as we embrace the power of AI in data analytics, we must also ensure that its use is ethical and transparent. By doing so, we can fully realize AI's potential to revolutionize data analytics, making it a cornerstone of informed decision-making and progress in the modern world.

---

This article explores the transformative potential of AI in data analytics, particularly in reducing biases and increasing accuracy, using recent election exit polls as a case study.

DharaNivesh: Data Science for Financial Decisions

Sunday, August 4, 2024

Predicting Stock Prices: The Surprising Accuracy and Hidden Power of Linear Regression

Sunday, July 7, 2024

Amazing Insights: How Open Interest Can Wow Your Stock Market Moves

Sunday, June 23, 2024

How to Generate Continuous Futures Contracts from NSE Data with nselib

Saturday, June 8, 2024

Unlocking the Power of Data Analytics: Lessons from Exit Polls and Beyond

Predicting Stock Prices: The Surprising Accuracy and Hidden Power of Linear Regression

Followers

Report Abuse