122k views
4 votes
Consider the product ∣wσ ′

(wa+b)∣. Suppose ∣wσ ′
(wa+b)∣≥1 (1) Argue that this can only ever occur if ∣w∣≥4 (2) Supposing that ∣w∣≥4 consider the set of input activations a for which ∣wσ ′
(wa+b)∣≥1. Show that the set of a satisfying that constraint can range over an interval no greater in width than w
2

ln( 2
∣w∣(1+ 1−4/∣w∣

)

−1) (3) Show numerically that the above expression bounding the width of the range is greatest at ∣w∣≈6.9, where it takes a value ≈0.45. And so even given that everything lines up just perfectly, we still have a fairly narrow range of input activations which can avoid the vanishing gradient problem.

User Jeff Moden
by
8.2k points

1 Answer

3 votes

Final answer:

In order to satisfy the condition ∣wσ ′ (wa+b)∣≥1, we must have ∣w∣≥4. The set of input activations that satisfy this condition can range over an interval no greater in width than w^2 ln(2/∣w∣(1+ 1-4/∣w∣)−1). Numerically, this expression is greatest at ∣w∣≈6.9, where it takes a value ≈0.45.

Step-by-step explanation:

To start, we have ∣wσ ′ (wa+b)∣≥1. We can rewrite this as ∣(wσ ′) (wa+b)∣≥1. Using the triangle inequality, we have ∣wσ ′∣ ∣(wa+b)∣≥1. Given that ∣wσ ′∣ and ∣(wa+b)∣ are both non-negative, we can conclude that ∣wσ ′∣≥1 and ∣(wa+b)∣≥1.

From the condition ∣wσ ′∣≥1, we can say that ∣w∣ = ∣wσ ′∣≥1. Now, let's consider the condition ∣w∣≥4. For this condition to be true, ∣w∣ must be greater than or equal to 4.

Now, let's move on to the second part of the question. If ∣w∣≥4, we want to find the set of input activations a for which ∣wσ ′ (wa+b)∣≥1. Rearranging the inequality, we have ∣(wa+b)∣≥1/∣wσ ′∣. Since ∣w∣≥4, we can substitute this into the inequality as ∣(wa+b)∣≥1/∣wσ ′∣≥1/∣(wa+b)∣. Solving for a, we have ∣a+b∣≥1/∣wσ ′∣∣w∣. Simplifying further, ∣a+b∣≥1/∣wσ ′∣∣w∣ = 1/∣w∣. This is the maximum width of the set of a.

To numerically check the expression for the maximum width, we can substitute ∣w∣≈6.9 into the expression. We get the maximum width as w^2 ln(2/∣w∣(1+ 1-4/∣w∣)−1) ≈ 0.45. This means that even with perfect alignment, we have a fairly narrow range of input activations which can avoid the vanishing gradient problem.

Learn more about mathematical argument

User ZecKa
by
7.6k points
Welcome to QAmmunity.org, where you can ask questions and receive answers from other members of our community.

9.4m questions

12.2m answers

Categories