Skip to main content

Posts

Showing posts from August, 2024

Top data science tool

 

Federated learning ML

 

Federated learning

 If your data are used to train a Machine Learning model, chances are that a Data Scientist, a Data Engineer, or an ML engineer is going to stumble upon it! I know, for example, that Walmart categorically refuses to use AWS as Amazon is their direct competitor and doesn't want to risk having their data fall into the wrong hands. One solution to that data privacy problem could be to encrypt the data, but for typical encryption, that would mean that the whole training data needs to be encrypted. But if the Data Engineers own the key that encrypts the data, what would stop them from decrypting it when they receive new customer data? One way to go about it is to use Full Homomorphic Encryption (FHE). FHE means that when you encrypt data, it preserves addition and multiplication operations. For example, if E is the encryption function, we have: E(a + b) = E(a) + E(b) and E(a x b) = E(a) x E(b) In practice, it means that FHE preserves any polynomial transformation of the data. If a compu...

Top ML for prediction

 

Python

 

Data scientist

 -Hugging face -tracking app like wandb -Annotation tool like vgg anno And Roboflow Advanced model -Generative model -Diffusion model  -Diffusion model for art-stable diffusion.

Yolo

 

You look only once v1

 YOLO really changed the game when it comes to object detection! Back in the day, we had to use the sliding window method at different scales and apply an image classifier. The method was imprecise and very slow, and it was not really possible to run it in real-time on videos, for example. YOLO (You Only Look Once) changed everything by predicting everything at once, making it possible to run it in real-time while being small enough to fit in a mobile application. The first YOLO model came out in 2015, and YOLO v9 came out last week. There have been many improvements over the years, but there are common features that persist. The idea is to segment the image into a grid and predict the existence of a bounding box for each of the classes we are considering. When it comes to labeling the data, a grid cell is labeled to contain an object only if the center of the box is in it. If the grid cell contains a center, the "objectness" is labeled 1 and 0 otherwise. The model will try t...

Python Road map

 

Road to learn ML

 

Power of embedding

 

Ai Architecture

 

AdamW optimizer

 Nowadays, most LLMs get trained with the AdamW optimizer as opposed to the Adam optimizer. Why?  There used to be a time when Adam was the king among optimizers, and it didn't make much sense to spend too much time trying to find a better one. This has changed recently, and AdamW has become the default optimizer for the LLM practitioners.  It all depends on how we apply the regularization terms to the weight parameters. For the typical gradient descent algorithm, if we want to apply an L2​ regularization term, we modify the loss function such that:  regularized loss = loss + L2 term Then, we compute the gradient of that new loss to update the model parameters. The goal of the regularization term is to ensure that the weights don't grow too large and it acts as a weight decay mechanism when we update the weights. In Adam, when we apply the L2 regularization, we regularize the loss function as well but it gets used differently. The loss function is used to compute the...

Data analysis texhniques

 #⃣ 10 Powerful Ways to Analyze Data ✅ In the data-driven world we live in, understanding how to effectively analyze data is key to unlocking valuable insights. Here are 10 methods to consider: 1⃣ Drill Up and Drill Down: Navigate between the big picture and finer details to uncover root causes or broader trends.     2⃣ Slicing and Dicing: Break down large datasets into manageable segments for more precise analysis.     3⃣ Segmentation: Divide customers into groups based on shared characteristics for targeted marketing and better customer understanding.     4⃣ Data Visualization: Utilize charts and maps to visually represent data, making trends and outliers more apparent.     5⃣ Driver-Based Relationships: Identify how changes in one area can influence others, revealing cause-and-effect relationships.     6⃣ Benchmarking: Compare your data against internal or external benchmarks to assess performance.     7⃣ Seasonality: A...