Telegram Group & Telegram Channel
This media is not supported in your browser
VIEW IN TELEGRAM
How do transformers work? Learn it by hand 👇

𝗪𝗮𝗹𝗸𝘁𝗵𝗿𝗼𝘂𝗴𝗵

[1] Given
↳ Input features from the previous block (5 positions)

[2] Attention
↳ Feed all 5 features to a query-key attention module (QK) to obtain an attention weight matrix (A). I will skip the details of this module. In a follow-up post I will unpack this module.

[3] Attention Weighting
↳ Multiply the input features with the attention weight matrix to obtain attention weighted features (Z). Note that there are still 5 positions.
↳ The effect is to combine features across positions (horizontally), in this case, X1 := X1 + X2, X2 := X2 + X3....etc.

[4] FFN: First Layer
↳ Feed all 5 attention weighted features into the first layer.
↳ Multiply these features with the weights and biases.
↳ The effect is to combine features across feature dimensions (vertically).
↳ The dimensionality of each feature is increased from 3 to 4.
↳ Note that each position is processed by the same weight matrix. This is what the term "position-wise" is referring to.
↳ Note that the FFN is essentially a multi layer perceptron.

[5] ReLU
↳ Negative values are set to zeros by ReLU.

[6] FFN: Second Layer
↳ Feed all 5 features (d=3) into the second layer.
↳ The dimensionality of each feature is decreased from 4 back to 3.
↳ The output is fed to the next block to repeat this process.
↳ Note that the next block would have a completely separate set of parameters.

#ai #tranformers #genai #learning

💯 BEST DATA SCIENCE CHANNELS ON TELEGRAM 🌟
Please open Telegram to view this post
VIEW IN TELEGRAM



tg-me.com/DataScienceM/1637
Create:
Last Update:

How do transformers work? Learn it by hand 👇

𝗪𝗮𝗹𝗸𝘁𝗵𝗿𝗼𝘂𝗴𝗵

[1] Given
↳ Input features from the previous block (5 positions)

[2] Attention
↳ Feed all 5 features to a query-key attention module (QK) to obtain an attention weight matrix (A). I will skip the details of this module. In a follow-up post I will unpack this module.

[3] Attention Weighting
↳ Multiply the input features with the attention weight matrix to obtain attention weighted features (Z). Note that there are still 5 positions.
↳ The effect is to combine features across positions (horizontally), in this case, X1 := X1 + X2, X2 := X2 + X3....etc.

[4] FFN: First Layer
↳ Feed all 5 attention weighted features into the first layer.
↳ Multiply these features with the weights and biases.
↳ The effect is to combine features across feature dimensions (vertically).
↳ The dimensionality of each feature is increased from 3 to 4.
↳ Note that each position is processed by the same weight matrix. This is what the term "position-wise" is referring to.
↳ Note that the FFN is essentially a multi layer perceptron.

[5] ReLU
↳ Negative values are set to zeros by ReLU.

[6] FFN: Second Layer
↳ Feed all 5 features (d=3) into the second layer.
↳ The dimensionality of each feature is decreased from 4 back to 3.
↳ The output is fed to the next block to repeat this process.
↳ Note that the next block would have a completely separate set of parameters.

#ai #tranformers #genai #learning

💯 BEST DATA SCIENCE CHANNELS ON TELEGRAM 🌟

BY Data Science Machine Learning Data Analysis Books


Share with your friend now:
tg-me.com/DataScienceM/1637

View MORE
Open in Telegram


Data Science Machine Learning Data Analysis Books Telegram | DID YOU KNOW?

Date: |

For some time, Mr. Durov and a few dozen staffers had no fixed headquarters, but rather traveled the world, setting up shop in one city after another, he told the Journal in 2016. The company now has its operational base in Dubai, though it says it doesn’t keep servers there.Mr. Durov maintains a yearslong friendship from his VK days with actor and tech investor Jared Leto, with whom he shares an ascetic lifestyle that eschews meat and alcohol.

The messaging service and social-media platform owes creditors roughly $700 million by the end of April, according to people briefed on the company’s plans and loan documents viewed by The Wall Street Journal. At the same time, Telegram Group Inc. must cover rising equipment and bandwidth expenses because of its rapid growth, despite going years without attempting to generate revenue.

Data Science Machine Learning Data Analysis Books from us


Telegram Data Science Machine Learning Data Analysis Books
FROM USA