Skip to content Skip to sidebar Skip to footer

0 items - $0.00 0

Data Science

How to Set the Number of Trees in Random Forest

Data ScienceMay 19, 202513Views 0Likes 0Comments

Scientific publication T. M. Lange, M. Gültas, A. O. Schmitt & F. Heinrich (2025). optRF: Optimising random forest stability by determining the optimal number of trees. BMC bioinformatics, 26(1), 95. Follow this LINK to the original publication. Random Forest — A Powerful Tool for Anyone Working With Data What is Random Forest? Have you ever wished you…

Parquet File Format – Everything You Need to Know!

Data ScienceMay 14, 20259Views 0Likes 0Comments

With the amount of Data growing exponentially in the last few years, one of the biggest challenges has become finding the most optimal way to store various data flavors. Unlike in the (not so far) past, when relational databases were considered the only way to go, organizations now want to perform analysis over raw data…

Time Series Forecasting Made Simple (Part 2): Customizing Baseline Models

Data ScienceMay 9, 202512Views 0Likes 0Comments

Thank you for the kind response to Part 1, it’s been encouraging to see so many readers interested in time series forecasting. In Part 1 of this series, we broke down time series data into trend, seasonality, and noise, discussed when to use additive versus multiplicative models, and built a Seasonal Naive baseline forecast using…

From a Point to L∞

Data ScienceMay 4, 202525Views 0Likes 0Comments

Why you should read this As someone who did a Bachelors in Mathematics I was first introduced to L¹ and L² as a measure of Distance… now it seems to be a measure of error — where have we gone wrong? But jokes aside, there seems to be this misconception that L₁ and L₂ serve the same function — and…

Building a Scalable and Accurate Audio Interview Transcription Pipeline with Google Gemini

Data ScienceApril 29, 202514Views 0Likes 0Comments

This article is co-authored by Ugo Pradère and David Haüet How hard can it be to transcribe an interview? You feed the audio to an AI model, wait a few minutes, and boom: perfect transcript, right? Well… not quite. When it comes to accurately transcribe long audio interviews, even more when the spoken language is…

Government Funding Graph RAG

Data ScienceApril 24, 202516Views 0Likes 0Comments

In this article, I present my latest open-source project — Government Funding Graph. The inspiration for this project came from a desire to make better tooling for grant writing, namely to suggest research topics, funding bodies, research institutions, and researchers. I have made Innovate UK grant applications in the past, so I have had an interest in…

Load-Testing LLMs Using LLMPerf

Data ScienceApril 19, 202519Views 0Likes 0Comments

Deploying your Large Language Model (LLM) is not necessarily the final step in productionizing your Generative AI application. An often forgotten, yet crucial part of the MLOPs lifecycle is properly load testing your LLM and ensuring it is ready to withstand your expected production traffic. Load testing at a high level is the practice of…

Layers of the AI Stack, Explained Simply

Data ScienceApril 14, 202519Views 0Likes 0Comments

This is the first in a multi-part series on creating web applications with Generative Ai integration. Table of Contents Introduction The Virtues of the Application Layer Thick Wrappers The Return of Clippy Getting Stuff Done While You Sleep Introduction The AI space is a vast and complicated landscape. Matt Turck famously does his Machine Learning, AI,…

Mining Rules from Data

Data ScienceApril 9, 202517Views 0Likes 0Comments

Working with products, we might face a need to introduce some “rules”. Let me explain what I mean by “rules” in practical examples: Imagine that we’re seeing a massive wave of fraud in our product, and we want to restrict onboarding for a particular segment of customers to lower this risk. For example, we found…

Are We Watching More Ads Than Content? Analyzing YouTube Sponsor Data

Data ScienceApril 4, 202519Views 0Likes 0Comments

I’m definitely not the only person who feels that YouTube sponsor segments have become longer and more frequent recently. Sometimes, I watch videos that seem to be trying to sell me something every couple of seconds. On one hand, it’s great that both small and medium-sized YouTubers are able to make a living from their…