Download presentation
Presentation is loading. Please wait.
Published byVernon Mitchell Modified over 9 years ago
1
Repository Method to suit different investment strategies Alma Lilia Garcia & Edward Tsang
2
Layout Motivation Repository Method procedure Receiver Operating Characteristic (ROC) Experimental design Experimental results Conclusions Alma Lilia Garcia & Edward Tsang
3
Motivation Many machine learning techniques has been applied to financial problems. Genetic Programming (GP) has been used to predict financial opportunities. However, when the number of profitable opportunities is extremely small it is very difficult to detect those cases. Alma Lilia Garcia & Edward Tsang Motivation Repository Method Receiver Operating Characteristic (ROC) Experimental design Experimental results Conclusions
4
Confusion Matrix RealityPrediction ++ +– –– –+ ++ –– –+ –– ++ –– –+ –426 +134 5510 Prediction Reality
5
The problem with few opportunities + 9,8019999% + 9911% 99%1% + 9,900099% + 01001% 99%1% Predictions Reality Ideal prediction Accuracy = Precision = Recall = 100% + 9,900099% + 10001% 100%0% Easy score on accuracy Accuracy = 99%, Precision = ? Recall = 0% Predictions Motivation Repository Method Receiver Operating Characteristic (ROC) Experimental design Experimental results Conclusions Motivation Alma Lilia Garcia & Edward Tsang Random move from to + Accuracy = 98.02% Precision = Recall = 1% + 9,8109099% + 90101% 99%1% Moves from to + Accuracy = 98.2% Precision = Recall = 10% (Accuracy dropped from 99%)
6
Generation... 1 2 100 R 1 = Var 1 >0.6 and Var 2 >Var 3 R 2 = Var 2 > 0.6 R n = …... GP systems spend a lot of computational resources evolving complete populations for several generations. However, the standard procedure is to choose just the best individual of the evolution as the optimal solution of the problem. The objective of repository method is to mine the knowledge acquired by the evolutionary process in order to compile several rules that detect the rare cases in diverse ways. Since the number of positive examples is very small, it is important to gather all available information about them. Repository Method Alma Lilia Garcia & Edward Tsang Motivation Repository Method Receiver Operating Characteristic (ROC) Experimental design Experimental results Conclusions
7
Repository Method In order to mine the knowledge acquired by the evolutionary process Repository Method performs the following steps: Alma Lilia Garcia & Edward Tsang Motivation Repository Method Receiver Operating Characteristic (ROC) Experimental design Experimental results Conclusions Evolve a GP to create a population of decision trees R1R2…RnR1R2…Rn The rule R k is selected by precision; R k is simplified to R’ k 1- Rule extraction 2- Rule simplification R’ k is compared to the rules in the repository by similarity (genotype) R α … Rµ 3- New rule detection 4- Add rule to the repository If R’ k is a novel rule R’ k is added to the rule repository R’ k
8
Repository Method Alma Lilia Garcia & Edward Tsang Motivation Repository Method Receiver Operating Characteristic (ROC) Experimental design Experimental results Conclusions R 1 R 2 … Rn An interesting question arises: Which is the best Precision Threshold to select rules? We propose to try with different precision thresholds in order to generate different classifications. PT=1 R 1 R 2 … Rs PT=.90 … R 1 R 2 … Rt PT=.05 … Every Repository produces different classification
9
ROC space Alma Lilia Garcia & Edward Tsang Motivation Repository Method Receiver Operating Characteristic (ROC) Experimental design Experimental results Conclusions The Receiver Operating Characteristics (ROC) has been used extensively in Machine Learning to measure the performance of classifiers. A single classification produces a point in the ROC space. However, some classifiers are able to produce a range of classifications, in that cases a curve is produced, this moves from the liberal to the conservative area.
10
ROC Curve Alma Lilia Garcia & Edward Tsang Motivation Repository Method Receiver Operating Characteristic (ROC) Experimental design Experimental results Conclusions The main advantages of using ROC are: 1)It is able to deal with imbalance classifications 2)It is able to deal with classifiers that produce a range of classification 3)Lets the user to calculate the best trade-off between misclassifications and false alarms The Area Under the ROC curve (AUC) has been used widely to measure compare the performance of different classifiers. Slope = μ (1 ρ)/(β ρ) where ρ = the % of + cases
11
Experimental design Alma Lilia Garcia & Edward Tsang The aims of this work are: 1) to show that RM is able to produce a range of solutions capable to suit the investor requirements 2) to analyze the influence of the evolutionary process in the RM performance. For that purpose RM was tested with the following experiments: Experiment 1: RM on random trees RM collects rules from P 0, a random population of decision trees. It is expected that the performance of RM will be low, because the decision trees were random. Experiment 2: RM on partially evolved trees RM gathers rules from P 10, a population that has been evolved after 10 generations. Experiment 3: RM on trees from different generations RM collects and accumulated rules from P 10, P 20,... P 100, which means that after every ten generations, RM collected and accumulated rules generated so far. Motivation Repository Method Experimental design Receiver Operating Characteristic (ROC) Experimental results Conclusions
12
Experimental results The ROC curve using plotted by experiment 1,2 and 3 Alma Lilia Garcia & Edward Tsang Motivation Repository Method Receiver Operating Characteristic (ROC) Experimental design Experimental results Conclusions A standard GP result Recall= 14%, Precision=5%and Accuracy= 89%. This result is plotted in (0.09, 0.14)
13
It has been shown that RM offers a range of solutions to suit the risk guidelines of the investors. Thus the user can choose the best balance between miss-classification and false alarms according to his/her requirements. This makes RM a valuable tool for investors in balancing between not making mistakes and not missing opportunities. RM is able to extract predictive rules even from earliest stages of the evolutionary process is two folds: (a) RM an potentially shorten the time in evolutionary computation; and (b) effort in early part of the search are not wasted. However to create a wider range of solutions, it is advisable to evolve the solutions at least past the exploration phase, especially when the solution of the problem is complex. Alma Lilia Garcia & Edward Tsang Conclusions Motivation Repository Method Receiver Operating Characteristic (ROC) Experimental design Experimental results Conclusions
14
Questions? Alma Lilia Garcia & Edward Tsang
15
Data set description The data set of Barclays stock is composed by the prices from March, 1998 to January, 2005. The attributes of each record are composed by indicators derived from financial technical analysis. Technical analysis has been used in financial markets to analyze the stock prices behaviour, this is mainly based on historical prices and volume trends. The indicators were calculated on the basis of the daily closing price. Alma Lilia Garcia & Edward Tsang Motivation Repository Method Factors that work in favor of RM Experimental design Experimental results Conclusions
16
The result of the standard GP is: recall =14%, precision=5%and accuracy= 89%. This result is plotted in (0.09, 0.14) in the ROC graph, which describes a conservative prediction. Figure 4 displays the ROC curves plotted by RM in the following experiments: ² Experiment 1 Using P 0 the AUC =.69, as can be observed from figure 4 the majority of the points are clustered in the conservative part of the ROC curve because these did not classify any positive case. However, RM was able to generate an interesting choice for the investor, when PT=20%, recall =38%, precision=9% and accuracy= 87 (see table V) ² Experiment 2 Using P 10 the performance of RM increased considerably, the AUC increased from 0.69 to 0.74. In this experiment RM offers two valuable choices when PT=30% and PT=20%. The latest option provides a recall = 63% and accuracy = 81%. However one of the choices is in the conservative side and the other in the liberal side of the ROC curve as table V shows.. Alma Lilia Garcia & Edward Tsang Experimental results Motivation Repository Method Factors that work in favor of RM Experimental design Experimental results Conclusions
17
True Positive Rate (recall) = TP/(TP+FN) = 350/(350+200) = 63.6% False Positive Rate = FP/(FP+TN) = 50/(50+400) = 11.1% Precision = TP/(TP+FP) = 350/(350+50) = 87.5% + 40050450 + 200350550 6004001,000 Predictions Reality TN FN FP TP Confusion Matrix
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.