fikira

Statistics • Chi Square • P-value • Significance

The Statistics • Chi Square • P-value • Significance publication aims to provide a tool for combining different conditions and checking whether the outcome is significant using the Chi-Square Test and P-value.


🔶 USAGE

The basic principle is to compare two or more groups and check the results of a query test, such as asking men and women whether they want to see a romantic or non-romantic movie.

–––––––––––––––––––––––––––––––––––––––––––––
|       | ROMANTIC | NON-ROMANTIC | ⬅︎ MOVIE |
–––––––––––––––––––––––––––––––––––––––––––––
|  MEN  |     2    |       8      |    10    |
–––––––––––––––––––––––––––––––––––––––––––––
| WOMEN |     7    |       3      |    10    |
–––––––––––––––––––––––––––––––––––––––––––––
|⬆︎ SEX |    10    |      10      |    20    |
–––––––––––––––––––––––––––––––––––––––––––––

We calculate the Chi-Square Formula, which is:

Χ² = Σ ( (Observed Value − Expected Value)² / Expected Value )


In this publication, this is:

    chiSquare = 0.
    for i = 0 to rows -1
        for j = 0 to colums -1

            observedValue = aBin.get(i).aFloat.get(j)
            expectedValue = math.max(1e-12, aBin.get(i).aFloat.get(colums) * aBin.get(rows).aFloat.get(j) / sumT) //Division by 0 protection

            chiSquare += math.pow(observedValue - expectedValue, 2) / expectedValue

Together with the 'Degree of Freedom', which is (rows − 1) × (columns − 1), the P-value can be calculated.

In this case it is P-value: 0.02462

A P-value lower than 0.05 is considered to be significant. Statistically, women tend to choose a romantic movie more, while men prefer a non-romantic one.

Users have the option to choose a P-value, calculated from a standard table or through a math.ucla.edu - Javascript-based function (see references below).

Note that the population (10 men + 10 women = 20) is small, something to consider.

Either way, this principle is applied in the script, where conditions can be chosen like rsi, close, high, ...

🔹 CONDITION

Conditions are added to the left column ('CONDITION')

For example, previous rsi values (rsi[1]) between 0-100, divided in separate groups


🔹 CLOSE

Then, the movement of the last close is evaluated

  • UP when close is higher then previous close (close[1])
  • DOWN when close is lower then previous close
  • EQUAL when close is equal then previous close

It is also possible to use only 2 columns by adding EQUAL to UP or DOWN

  • UP
  • DOWN/EQUAL

or

  • UP/EQUAL
  • DOWN

In other words, when previous rsi value was between 80 and 90, this resulted in:

  • 19 times a current close higher than previous close
  • 14 times a current close lower than previous close
  • 0 times a current close equal than previous close

However, the P-value tells us it is not statistical significant.

NOTE: Always keep in mind that past behaviour gives no certainty about future behaviour.

A vertical line is drawn at the beginning of the chosen population (max 4990)


Here, the results seem significant.

🔹 GROUPS

It is important to ensure that the groups are formed correctly. All possibilities should be present, and conditions should only be part of 1 group.


In the example above, the two top situations are acceptable; close[1] against close[2] can only be higher, lower or equal.

The two examples at the bottom, however, are very poorly constructed.

Several conditions can be placed in more than 1 group, and some conditions are not integrated into a group. Even if the results are significant, they are useless because of the group formation.

A population count is added as an aid to spot errors in group formation.


In this example, there is a discrepancy between the population and total count due to the absence of a condition.


The results when rsi was between 5-25 are not included, resulting in unreliable results.

🔹 PRACTICAL EXAMPLES

In this example, we have specific groups where the condition only applies to that group.
For example, the condition rsi > 55 and rsi <= 65 isn't true in another group.
Also, every possible rsi value (0 - 100) is present in 1 of the groups.

rsi > 15 and rsi <= 25 28 times UP, 19 times DOWN and 2 times EQUAL. P-value: 0.01171

When looking in detail and examining the area 15-25 RSI, we see this:


The population is now not representative (only checking for RSI between 15-25; all other RSI values are not included), so we can ignore the P-value in this case. It is merely to check in detail. In this case, the RSI values 23 and 24 seem promising.

NOTE: We should check what the close price did without any condition.
If, for example, the close price had risen 100 times out of 100, this would make things very relative.

In this case (at least two conditions need to be present), we set 1 condition at 'always true' and another at 'always false' so we'll get only the close values without any condition:


Changing the population or the conditions will change the P-value.





In the following example, the outcome is evaluated when:

  • close value from 1 bar back is higher than the close value from 2 bars back
  • close value from 1 bar back is lower/equal than the close value from 2 bars back


Or:
  • close value from 1 bar back is higher than the close value from 2 bars back
  • close value from 1 bar back is equal than the close value from 2 bars back
  • close value from 1 bar back is lower than the close value from 2 bars back


In both examples, all possibilities of close[1] against close[2] are included in the calculations. close[1] can only by higher, equal or lower than close[2]

Both examples have the results without a condition included (5 = 5 and 5 < 5) so one can compare the direction of current close.


🔶 NOTES

• Always keep in mind that:
  • Past behaviour gives no certainty about future behaviour.
  • Everything depends on time, cycles, events, fundamentals, technicals, ...

• This test only works for categorical data (data in categories), such as Gender {Men, Women} or color {Red, Yellow, Green, Blue} etc., but not numerical data such as height or weight. One might argue that such tests shouldn't use rsi, close, ... values.

• Consider what you're measuring

For example rsi of the current bar will always lead to a close higher than the previous close, since this is inherent to the rsi calculations.


• Be careful; often, there are na-values at the beginning of the series, which are not included in the calculations!


• Always keep in mind considering what the close price did without any condition

• The numbers must be large enough. Each entry must be five or more. In other words, it is vital to make the 'population' large enough.

• The code can be developed further, for example, by splitting UP, DOWN in close UP 1-2%, close UP 2-3%, close UP 3-4%, ...

• rsi can be supplemented with stochRSI, MFI, sma, ema, ...


🔶 SETTINGS

🔹 Population

• Choose the population size; in other words, how many bars you want to go back to. If fewer bars are available than set, this will be automatically adjusted.

🔹 Inputs

At least two conditions need to be chosen.


• Users can add up to 11 conditions, where each condition can contain two different conditions.

🔹 RSI

• Length

🔹 Levels

• Set the used levels as desired.

🔹 Levels

• P-value: P-value retrieved using a standard table method or a function.

• Used function, derived from Chi-Square Distribution Function; JavaScript

LogGamma(Z) =>
	S = 1 
      + 76.18009173   / Z 
      - 86.50532033   / (Z+1)
      + 24.01409822   / (Z+2)
      - 1.231739516   / (Z+3)
      + 0.00120858003 / (Z+4)
      - 0.00000536382 / (Z+5)

	(Z-.5) * math.log(Z+4.5) - (Z+4.5) + math.log(S * 2.50662827465)

Gcf(float X, A) =>        // Good for X > A +1
	A0=0., B0=1., A1=1., B1=X, AOLD=0., N=0
	while (math.abs((A1-AOLD)/A1) > .00001) 
		AOLD := A1
		N    += 1
		A0   := A1+(N-A)*A0
		B0   := B1+(N-A)*B0
		A1   := X*A0+N*A1
		B1   := X*B0+N*B1
		A0   := A0/B1
		B0   := B0/B1
		A1   := A1/B1
		B1   := 1
	Prob      = math.exp(A * math.log(X) - X - LogGamma(A)) * A1
	1 - Prob

Gser(X, A) =>        // Good for X < A +1
	T9 = 1. / A
	G  = T9
	I  = 1
	while (T9 > G* 0.00001) 
		T9 := T9 * X / (A + I)
		G  := G + T9
		I  += 1
	
	G *= math.exp(A * math.log(X) - X - LogGamma(A))

Gammacdf(x, a) =>
	GI = 0.
	if (x<=0) 
		GI := 0
	else if (x<a+1) 
		GI := Gser(x, a)
	else 
		GI := Gcf(x, a)
	GI

compute(Z, DF) =>
    Chisqcdf  = Gammacdf(Z/2, DF/2)
	Chisqcdf := math.round(Chisqcdf * 100000) / 100000
    pValue    = 1 - Chisqcdf


🔶 REFERENCES


LuxAlgo Dev: www.luxalgo.com
PineCoder: www.pinecoders.com

- We cannot control our emotions,
but we can control our keyboard -
Open-source script

In true TradingView spirit, the author of this script has published it open-source, so traders can understand and verify it. Cheers to the author! You may use it for free, but reuse of this code in a publication is governed by House Rules. You can favorite it to use it on a chart.

Disclaimer

The information and publications are not meant to be, and do not constitute, financial, investment, trading, or other types of advice or recommendations supplied or endorsed by TradingView. Read more in the Terms of Use.

Want to use this script on a chart?