Download presentation
Presentation is loading. Please wait.
1
Position pTreeSet, e.g., Tweet
For a Tweetbase, construct Docment pTrees (indexed by Term and Position)? Doc Tweet3 Tweet2 Tweet1 1 Term pTrees index by Doc, Pos. Pos … What about phrases? For 2 word phrases, use 4D cube. Sentiment analysis (by doc) : PSB: Positive Sentiment BitMap, 1 iff doc has positive AAPL sentiment. PSV: Positive Sentiment ValueArray, measures positive AAPL sentiment level? PSV for each term? Might term context change the sentiment? With term position information we should be able to evaluate PSV in context! PSB, PSV could be derived by hand (humans read tweets and assign a PSB or PSV). Do we need to use PS minus NS measures? (NS=Negative Sentiment) Research literature on Sentiment Analysis (word/doc sentiment assessment software ? Strategy: Each day buy the stock with the greatest Positive Sentiment Tweet Bloom? Twt3 Tweet2 1 2-word phrase start position 2 3 4 5 6 April apple and an always. all again buy Tweet1 Docs 1st wd AAPL 2nd wd 1 Why might the positions of words be important? e.g., “buy” and “AAPL” occur close in tweet position, - a stronger positive sentiment. If multilevel Pos pTrees, a positive sentiment bloom: buy, tweet1 AAPL, tweet1 stronger PS than buy, tweet1 AAPL, tweet1 determine by level1 & 1 Doc pTrees (pTrees are named or index by Term and Position) Position Term AAPL all an and April are always apple 2wdPhraseStartPos pTree index (buy,AAPL,Tweet1) Pre-compute, save and catalog? or compute as needed by shifting AAPL TP pTree right 1-bit, then AND with the buy TP pTree. Multilevel 2wdPhraseStartPos pTrees strides= D,W,W,P-1, D=#docs, W=#wds, P=#positions buy OR for Existential AAPL pTrees. Sum for AAPL tf are April apple and an always. all AAPL buy ... Terms (Vocab) Tweet1 buy 0100… AAPL… 1 AAPL 0000… Tweet3 Tweet2 Tweet1 .Docs 1 Term a Etc. Pos pTrees Doc Tweet3 Tweet2 Tweet1 1 1 1 1 1 1 OR row gives Term=a ExistentialTerm pTree. Sum gives Term=a DocFreq (df) array Position pTreeSet, e.g., Tweet Sentiment Analysis (1 iff Term in Pos in Doc) Position… 1 2 3 4 5 6 7
2
This movie doesn't care about cleverness, wit or any other kind of intelligent humor.
Those who find ugly meanings in beautiful things are corrupt without being charming. There are slow and repetitive parts, but it has just enough spice to keep it interesting.
4
My SA observations at this early stage:
Sentiment Analysis is done by NLP typically these days and it can certainly miss a signal of a stock rise! (i.e. the first one on this page). SA is typically treated as [sophisticated] querying, not ad hoc data mining. SA, as data mining, is very similar to anomaly detection, ala The Client. SA, as a stock market recommender, must be very advanced. The stock market seems to be driven by at least 2 35,000ft issues: Supply and Demand Long term company health Since SA is a major driver, contrarianism is a good strategy (if everyone gets an SA signal to buy $AAPL then price zooms (immediately, due to microsecond trading engines) and price soars (too quickly for small guys to benefit, except, possibly, a longer term slow benefit from long term $AAPL health). Strategy: Attack BigData (too big for horizontalists) and take a contrarian view Strategy: look for stocks with low SA that classify as high potential based on a TooBigforVerticalProcessingofHorizontal Data (TBVPHD). We compose the TBVPHD TrainingSet over time based on a long history of 1-class classification of these “DiamondInTheRough (DITR)” stocks that rose contrary to Negative Sentiment. How will we put together the TBVPHD TrainSet? How will we 1-class (the DITR stocks)? Contrarian strategies shouldn’t be based on queries but on data mining of TooBigforVerticalProcessingofHorizontalData (TBVPHD) technologies! For example, doing SA to buy stocks with high SA is a query based strategy. Even finding other stocks that are related and my therefore also rise (e.g. component supplier companies) is pretty much query based (you know what you are looking for and write a query (albeit possibly a very sophisticated query) to find it.). Whereas, effective contrarianism (in my very uneducated opinion!) should be true data mining (clustering, classification and ARM).
5
Oblique FAUST PR = P(X dot d)<a (more answers for Arjun’s question on FAUST Hull Classifers)
D≡ mRmV = oblique vector. d=D/|D| From 2013_10_12 Midpoint FAUST Classification Separate classR, classV using midpoints of means (mom) method: calc a View mR, mV as vectors (mR≡vector from origin to pt_mR), a = (mR+(mV-mR)/2)od = (mR+mV)/2 o d (Very same formula works when D=mVmR, i.e., points to left) Training ≡ choosing "cut-hyper-plane" (CHP), which is always an (n-1)-dimensionl hyperplane (which cuts space in two). Classifying is one horizontal program (AND/OR) across pTrees to get a mask pTree for each entire class (bulk classification) Improve accuracy? e.g., by considering the dispersion within classes when placing the CHP. Use 1. the vector_of_median, vom, to represent each class, rather than mV, vomV ≡ ( median{v1|vV}, 2. project each class onto the d-line (e.g., the R-class below); then calculate the std (one horizontal formula per class; using Md's method); then use the std ratio to place CHP (No longer at the midpoint between mr [vomr] and mv [vomv] ) median{v2|vV}, ... ) dim 2 vomR vomV r r vv r mR r v v v v r r v mV v r v v r v v2 v1 d-line dim 1 d a std of these distances from origin along the d-line
6
Convex hull circumsciber, CHX. c c c c c c c c cc c c c c cc c
FAUST Class Hulls: Use the inside of a Circumscribed Training Class Hull to approximate each class, i.e., use a series of (d,a) pairs, each defining a half-space {z | doz(<|>)a}. Then the C-hull is XC&{(d,a)}Pxod(<|>)a&PC. The question remaining: How to determine the series of (d.a) pairs? 1. Choose the next d to be perpendicular to all previous (The simplest way is to use as the d series; e1, e2, ...en) 2. User the diagonals, e's, mean-to-median, mean-to-furthest, ... 3. Start with {ei}. Add a finer and finer grid of unit vectors until diameter of CIRCX is close to the diameter of C Model-based 1-class classifier for HiValue, Durable TrainingSets (e.g., C=10 yrs normal activity. Looking for anomalous activity). It may be worth the additional training time to continue to better the model in II by trimming the circumscriptor corners further. Let CIRC1 be 1st Circumscriptor: k, define Lk={x | xk=minXk}, Hk={x | xk=maxXk}. Classify x in C iff x is in CIRC1 (minXk xk maxXk). (Eliminate outliers 1st? Replace minXk by lowest count change and maxXk by highest or?) Does C fill CIRC1 corners? In high dimensions, corners can be huge. a. Cap each corner with a fitted round cap (r2, r4,...)? Barrel cap??? b. diag cap to it: e.g., D12=e1+e2 YoD12=Y1+Y2), YoD123=Y1+Y2+Y3) etc. Enclosing classes with linear bddries to sums of dim unit vectors and negatives, good for multi-class too c. Use a C-circumscribing barrel wrt each (d,a) (limits the radial reach each time, with a round cap on corners.) d. Use a C-circumscribing sphere. The ultimate Circumscriber is the convex hull, but algs for computing it are complex, even with VPHD tools created over 500 years. Can we do it with our HPVD tools (created over 10 yrs?) Circumscribe linear pieces using diagonals, {z| dozk>minz1+mnz2&dozk<mxz1+mxz2} and {z| dozk>minz1+mxz2&dozk<mxz1+mnz2}. D=e1+e2+e3 3D example e1 e2 e3 Convex hull circumsciber, CHX. c c c c c c c c cc c c c c cc c D=e1+e3 D=e1-e3 Circumscribe linear pieces {z| dozk>min(zk)& dozk<max(zk)}, k=1,2 (i.e., d=e1,e2) pTree Pillar k-means clustering k ireveals itself.) Choose m1 as a pt that maximizes Distance(X, avgX) m1 Choose m2 to maximize Dis(X, m1) m3 to maximize h=1..2Dis(X, mh) m4 to maximize h=1..3Distance(X,mh) m4 Do until minimumh=1..kDistance(X,mh) < Threshold (or Do until mk < Threshold) This gives k. Apply pk-means. (Note we already have all Dis(X,mh)s for the first round. D=m1m2 line. Treat PCCs like parentheses - ( corresp to a PCI and ) corresponds to a PC. Each matched pair should indicate cluster somewhere in slice. Where? One could take the VoM as the best-guess centroid? Then proceed by restricting to that slice. Or 1st apply R and do PCC parenthesizing on R values to identify radial slice where cluster occurs. VoM of combo slice (linear/radial) as the centroid. Apply S to confirm. Note: A possible clustering method for identifying density clusters (as opposed to round or convex clusters) PCI PCD PCI PCD d-line m3 m2
7
Classes numbered as they are revealed.
Why I like the FAUST Linear-Spherical-Radial Serial-Parallel Classifier very much (FAUST LSR SP) The parallel part lets us build a pair of Linear, Spherical and Radial hull segments for every pTree computation (the more the merrier) The serial part allows us the possibility of building a hull better than the convex hull! E.g., in a linear step, if we not only use min and max but also PCIs and PCDs, potentially we could do the following on On each PCC interval (not yet well defined n general, but in this example they are [pci1L,pcd1L] (pcd1L,pci2L) [pci2L,pcd2L] We build hull segments on each interval and OR them. In general, if non-outlier pillars m1..mi-1 have been chosen, choose mi from nop(X,{m1,...,mi-1}) (i.e., mi maximizes k=1..i-1dis2(X,mk) and is a non-outlier). (Instead of using Smi or D2NN to eliminate outliers each round, one might get better pillars by constructing Lmi-1mi:XR, eliminating outliers that show up on L, then picking the pillar to be the mean (or vector of medians) of the slice L-1[(3PCC1+PCC2)/4 , PCC2) ? ) A PCC Pillar pkmeans clusterer: Assign each (object, class) a ClassWeightReals (all CW init at 0) Classes numbered as they are revealed. As we are identifying pillar mj's, compute Lmj = Xo(mj-mj-1) and 1. For the next larger PCI in Ld(C), left-to-right. 1.1a If followed by PCD, CkAvg(Ld-1[PCI,PCD]) (or VoM). If Ck is center of a sphere-gap (or barrel gap), declare Classk and mask off. 1.1b If followed by another PCI, declare next Classk=the sphere-gapped set around Ck=Avg( Ld-1[ (3PCI1+PCI2)/4,PCI2) ). Mask it off. 2. For the next smaller PCD in Ld from the left side. 2.1a If preceded by a PCI, declare next Classk= subset of Ld-1[PCI, PCD] sphere-gapped around Ck=Avg. Mask off. 2.1b If preceded by another PCD declare next Classk=subset of same, sphere-gapped around Ck=Avg(Ld-1( [PCD2,(PCD1+PCD2)/4] ). Mask off d minL = pci1L A potential advantage of the classifier: FAUST Linear-Spherical-Radial (LSR) pcd1L The parallel part lets us build a pair of L,S,R hull segments for every pTree computation (the more the merrier) Serial part allows possibility of better hull than ConvexHull E.g., in a linear step, if we not only use min and max but also PCIs and PCDs, potentially we could do the following on On each PCC interval (ill-defined but here [pci1L,pcd1L] (pcd1L,pci2L) [pci2L,pcd2L] Build hull segments on each interval and OR them? pci2L Whereas the convex hull in orange (lots of false positives) maxL = pcd2L
8
Clustering with Oblique FAUST using cylindrical gaps (Building cylindrical gaps around round clusters) From 2103_10_12 What if clusters cannot be isolated by oblique hyperplanes? We make gaps local by adding to the d-line planar gaps, cylindrical gaps around the d-line. y ypd gap > GWT ypdr gap >GWT2 d ypdr ypd p On a dataset,Y, we use 2 real valued functionals (or SPTSs) to define cylindrical gaps, ypd=(y-p)od (for planar gaps) and ypdr=(y-p)o(y-p)-((y-p)od)2 (for cylinder gaps) d=unit_vector Question? Are gaps in these SPTSs independent of p? I.e., can we simplify and always take p=origin? (y-p)o(y-p)=yoy-pop is a shift of yoy by constant, pop, and (y-p)od=yod-pod is a shift of yp≡yod by constant pod. But, (y-p)o(y-p) - ((y-p)od)2 =yoy-pop - (yod-pod)2 = yoy-pop - yod2 + (2pod)yod - pod2 is not a constant shift of yoy - yod2 So the answer seems to be. NO? We find yp,d gaps using yod but need p (and d) when finding ypdr gaps? We search (p,d,r) for large gaps in the ypdr and ypd SPTSs. We pick a gap width threshold (GWT) and search for (p,d,r) for which Gap(ypdr)>GWT2 and Gap(ypd)>GWT. So we need a pTree-based [log(n) time] Gap Finder for these cylindrical gaps Note that the ypd gap situation changes when you change r and the ypdr gap situation changes when you change d or p, so we can't just search for ypd gaps and then search for ypdr gaps or vice versa either. ypd,r gap ypd gap1,gap2 mask A pTree-based Cylindrical Gap Finder (CGF): 1. Choose a small initial radius, r0, (a*global_density/(n-1) for some a?) 2. Create an r0 cylinder mask about pd-line (Round cluster thru which pd-line runs should reveal ypd gaps even if it doesn't enclose the cluster). ypd gap1 d r0 cylinder p gap2 3. Identify gaps in the ypdr SPTS after it is masked to the space between ypd gap1 and gap2 Are there problems here? Yes, what if the p,d-line does not pierce our cluster near its center? Next slide.
9
Clustering with Oblique FAUST using cylindrical gaps 2 (from 2013_10_12)
What if the pd-line doesn't pierce the cluster at its widest? yp,d gaps are still revealed but yp,d,r gaps may not be! ypd gap1,gap2 mask Solution ideas? 1. Before cylinder masking (step 1) move p using gradient descent of xp,d gap width? 2. Before cylinder masking (step 1) move (p,d) using gradient descent and line search to minimize the Variance of ypd ypd,,r gap 3. Identify dense cylinders that get pieced together later? d r0 cylinder p yp.d gap1 yp,d gap2 4. Maximize each dense cylinder before finding the next? 5. We know we are in a cluster (by virtue of the fact that there are yp,d substantial gap1 and gap2) so we then move to neighboring (touching) cylinders with similar density (since they are touching there is no gap and we are confident that we are in the same cluster). 6. If we are clustering to identify outliers (anomaly detection), then the clusters we want to identify are singleton [or doubleton] sets. We can simply test each "cluster" between two gaps (and especially the end ones) for outliership (Note that we will always pierce outlier cluster at their widest). 7. For (p,d,r), r very small, do a 2Dplanar search on t=(tp,td) to maximize the variance of ypd inside the r-cylinder (this is not a gradient search but a heuristic search. The variance may not be continuously differentiable in p and d since the set of y changes as you change p or d (keeping r fixed). Also, the SPTS, ypd must be recalculated every time you change p or d). 8. For (p,d), do a 2Dplanar search on t=(tp,td) to maximize the variance of ypd over the entire space, Y. Then gradient descend to maximize the variance, then 2Dplanar search, ... CFG: Cylindrical Gap Finder: 1. Create a small radius (r0 = a*global_density/(n-1) for some a?) cylinder about the pd-line 2. Identify gaps in the yp,d,r SPTS after it's masked to the space between yp,d gap1 and gap2
10
AAPL20 Document pTrees (1 level) (indexed by Term and Position)
Term T# Doc 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 This is the Doc pTree for Term#1 =2015, Pos=1 Apple call day AAPL area chart ahead big check happy alert bond cours helpi all break custo her
11
AAPL20 Document pTrees (1 level) (indexed by Term and Position)
Term T# Doc 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 https https iwatc https https join https https hype keep https https info lesso https inves list
12
AAPL20 Document pTrees (1 level) (indexed by Term and Position)
Term T# Doc 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 looki momen nice peeps losin money onlin plans marke more out play mento multi pass poten might new pay progr
13
AAPL20 Document pTrees (1 level) (indexed by Term and Position)
Term T# prosp sale Starb take reaso send stock three run sensi stree today runne share subs trade safe sign swing up
14
AAPL20 Document pTrees (1 level) (indexed by Term and Position)
Term T# value year $GMCR $TWTR video $AAPL $GOOG visit $BAC $IDRA week $DNKN $JAKK winne $FB $SBUX
15
AAPL20 Document 2lev pTrees (indexed by Term, Position; Predicate=npz; strides=5)
Term T# Pos stride : 01000 00001 00100 check 13 1: 01000 00100 24 10: 00100 01000 00001 : 01000 00100 Check 13 13: 01000 00001 https 24 : 01000 00001 Check 13 16: 01000 00100 https 24 16: 01000 00100 AAPL : 01000 00100 00001 AAPL : 01000 00001 course14 3: 01000 00001 AAPL : 01000 10000 25 7: 10000 00100 01000 00001 course14 6: 01000 00100 00001 AAPL : 01000 00100 course14 7: 01000 00100 00001 https 25 : 01000 00001 course14 10: 01000 00100 https 25 16: 01000 00100 ahead 3 6: 01000 00100 00001 course14 13: 01000 00001 ahead 3 16: 01000 00100 00001 course14 16: 01000 00100 https 26 : 01000 00001 https 26 16: 01000 00100 alert 4 3: 01000 00100 00001 customer15 1: 01000 00100 26 17: 10000 00100 01000 00001 alert 4 6: 00100 00011 00001 customer15 10: 01000 00010 00100 alert 4 7: 01000 00100 customer15 13: 01000 00001 00100 alert 4 12: 00010 00100 01000 https 27 : 01000 00001 Alert 4 13: 01000 00001 00100 Day : 01000 00011 https 27 16: 01000 00100 Day : 01000 00001 27 20: 10000 00100 01000 Alert 4 16: 01000 00100 Day : 01000 00100 10000 Day : 01000 00100 All : 01000 00100 10000 00001 28 6: 00100 00001 01000 All : 01000 00001 00100 https 28 : 01000 00001 All : 01000 00100 17 10: 01000 00100 https 28 16: 01000 00100 17 13: 01000 00001 Enail 17 16: 01000 00100 Apple 6 13: 01000 00001 00100 29 7: 00100 Apple 6 16: 01000 00100 https 12: 00100 00011 00001 Happy 18 1: 01000 00001 https 29 13: 01000 00001 Happy 18 13: 01000 00001 Area : 01000 00100 00010 https 29 14: 00010 00100 01000 Happy 18 16: 01000 00100 Area : 01000 00001 00100 https 29 16: 01000 00100 Area : 01000 00100 helpin19 11: 01000 10000 30 11: 01000 00100 00001 helpin19 13: 01000 00001 Big : 00100 00011 00001 helpin19 16: 01000 00100 https 30 13: 01000 00001 Big : 01000 00001 00100 https 30 16: 01000 00100 Big : 01000 00100 Her : 01000 10000 Her : 01000 00001 32 13: 01000 00001 Bond : 01000 00001 00100 Her : 01000 00100 https 32 16: 01000 00100 Bond : 01000 00100 21 8: 00100 00011 00001 Break 10 10: 01000 00100 00010 https 21 13: 01000 00001 32 13: 01000 00001 Break 10 13: 01000 00001 00100 https 21 16: 01000 00100 https 32 16: 01000 00100 Break 10 16: 01000 00100 https 32 18: 00001 00100 00011 22 7: 00100 01000 00001 Call : 01000 00100 00001 Call : 01000 00100 https 22 13: 01000 00001 Hype : 01000 00001 https 22 16: 01000 00100 Hype : 01000 00100 Chart 12 4: 00001 00100 00011 Chart 12 13: 01000 00001 00100 23 9: 00100 01000 00001 Chart 12 16: 01000 00100 Info : 01000 00100 https 23 13: 01000 00001 Info : 01000 00001 https 23 16: 01000 00100 Info : 01000 00100
16
AAPL20 Doc 2lev pTrees (indexed by Term, Position;
Term T# Pos AAPL20 Doc 2lev pTrees (indexed by Term, Position; Predicate=npz; strides=5) stride New : 01000 00100 10000 Investor35 13: 01000 00001 00100 New : 01000 00100 10000 Investor35 16: 01000 00100 New : 01000 00001 00100 New : 01000 00100 Iwatch : 01000 10000 00100 Runners 64 6: 01000 00100 Runners 6413: 01000 00001 00100 Nice : 00001 00100 10000 01000 Join : 01000 00001 00100 Runners 6416: 01000 00100 Nice : 01000 00100 10000 Join : 01000 00100 Nice : 01000 00100 10000 00010 Safe : 01000 10000 Nice : 01000 00100 10000 Safe : 01000 00001 00100 keep : 01000 00010 00100 Nice : 01000 00001 00100 Safe : 01000 00100 keep : 01000 00001 00100 Nice : 01000 00100 keep : 01000 00100 online : 10000 00100 01000 Sale : 01000 00100 00001 lessons 39 11: 01000 00010 00100 online : 01000 00001 00100 Sale : 01000 00001 00100 lessons 39 13: 01000 00001 00100 online : 01000 00100 Sale : 00100 10000 01000 lessons 39 16: 01000 00100 Send : 01000 00100 out : 01000 10000 Send : 01000 00001 00100 List : 00010 00100 01000 out : 00010 00100 10000 01000 Send : 00100 10000 01000 List : 01000 00001 00100 out : 01000 00001 00100 List : 01000 00100 out : 01000 00100 sensitive68 5: 01000 10000 00100 sensitive6813: 01000 00001 00100 Looking 41 2: 00001 00010 00100 01000 pass : 01000 00011 Sensitive6816: 00100 10000 01000 Looking 41 13: 01000 00001 00100 pass : 01000 00001 00100 Looking 41 16: 01000 00100 pass : 01000 00100 Shares : 01000 10000 00100 Shares : 01000 00001 00100 Losing : 00001 00010 00100 01000 Shares : 00100 10000 01000 Losing : 01000 00001 00100 Pay : 01000 00001 00100 Pay : 01000 00100 10000 Sign : 00100 10000 01000 Losing : 01000 00100 Pay : 01000 00100 Sign : 00010 00100 01000 Market : 00001 00010 00100 01000 Sign : 01000 00001 00100 Sign : 00100 10000 01000 Market : 01000 00001 00100 Plans : 01000 00100 00001 Market : 01000 00100 Plans : 01000 00001 00100 Starbucks71 4: 01000 10000 00100 Mentor : 10000 00010 00100 01000 Plans : 00100 10000 01000 starbucks7113: 01000 00001 00100 Mentor : 01000 00001 00100 starbucks7116: 00100 10000 01000 Mentor : 01000 00100 Play : 01000 00001 00100 Play : 00100 10000 01000 Stock : 01000 10000 00100 Might : 01000 00010 00100 Stock : 00010 10000 00100 01000 Might : 01000 00001 00100 Stock : 01000 00001 00100 Might : 01000 00100 potential5913: 01000 00001 00100 Stock : 00100 10000 01000 Potential5916: 00100 10000 01000 Moments 46 6: 01000 00010 00100 Moments 46 13: 01000 00001 00100 Program : 01000 00011 00100 Street : 10000 00100 01000 Moments 46 16: 01000 00100 Program : 00100 10000 01000 Street : 01000 00001 00100 Street : 00100 10000 01000 Money : 01000 00010 00100 Prosperou61 10: 01000 00100 10000 Money : 01000 00001 00100 Subs : 01000 00100 10000 prosperou61 13: 01000 00011 00100 Money : 01000 00100 subs : 01000 00001 00100 prosperou61 16: 00100 10000 01000 Subs : 00100 10000 01000 More : 01000 00100 Reasons : 01000 00100 More : 01000 00001 00100 Swing : 00001 10000 00100 01000 reasons : 01000 00011 00100 More : 01000 00100 Swing : 01000 00001 00100 reasons : 00100 10000 01000 Swing : 00100 10000 01000 Multi-day49 5: 01000 00100 Multi-day4913: 01000 00001 00100 Run : 01000 00100 00010 Multi-day4916: 01000 00100 Run : 01000 00100 10000 Take : 01000 00100 Run : 01000 00011 00100 Take : 01000 00001 00100 Run : 00100 10000 01000 Take : 00100 10000 01000
17
AAPL20 Doc 2lev pTrees (indexed by Term, Position;
Term T# Pos AAPL20 Doc 2lev pTrees (indexed by Term, Position; Predicate=npz; strides=5) stride $AAPL : 00010 00100 00001 Today : 00010 10000 00100 01000 $AAPL : 01000 00001 00100 Today : 00010 00100 $AAPL : 10000 01000 01100 Today : 01000 00001 00100 $AAPL : 00010 00001 00011 Today : 00100 10000 01000 $AAPL : 00100 10000 01000 $AAPL : 01000 00001 00100 00110 Trade : 00010 00001 00100 $SBUX : 01000 00010 00100 10000 $AAPL : 00101 10000 00100 01000 Trade : 00010 00001 01000 $SBUX : 01000 00001 00100 TRADE : 00100 01000 00011 $AAPL : 00010 00100 00001 $SBUX : 01000 00010 00100 10000 $AAPL : 01000 00100 10000 TRADE : 00100 01000 00011 $SBUX : 00100 10000 01000 $AAPL : 01000 00100 10000 TRADE : 00100 01000 00011 10100 $AAPL : 01000 00010 00100 Trade : 00001 10000 00100 01000 $TWTR : 01000 00001 00100 Trade : 00010 00100 $TWTR : 10000 00100 01100 $BAC : 01000 00001 00100 Trade : 01000 00001 00100 $TWTR : 01000 00100 00011 10000 $BAC : 00100 10000 01000 Trade : 00100 10000 01000 $TWTR : 00001 01000 10000 $BAC : 00010 00100 10000 $TWTR : 01000 00101 00100 $TWTR : 00010 00100 00110 Term : 00100 01000 00011 $DNKN : 01000 00001 00100 $TWTR : 00100 00010 10000 Term : 00010 00011 01000 $DNKN : 00100 10000 01000 $TWTR : 00100 10000 11000 Term : 01000 00001 00100 $DNKN : 01000 00010 00100 10000 $TWTR : 00010 00100 10000 Term : 00100 10000 01000 $TWTR : 01000 00100 10000 $FB : 01000 00001 00100 value : 01000 00001 00100 $FB : 10000 00001 01100 00010 : 01000 00001 00100 Value : 00100 10000 01000 $FB : 00010 00001 00011 : 00100 10000 01000 $FB : 00100 01000 10000 Video : 00100 01000 00011 $FB : 00100 10000 01000 : 00100 00010 10000 video : 01000 00001 00100 $FB : 00100 10000 00110 : 01000 00001 00100 video : 00100 10000 01000 $FB : 00101 00001 10000 01000 : 00100 10000 01000 $FB : 00100 10000 01000 $FB : 00010 00100 10000 visit : 00100 10000 01000 $FB : 01000 00100 10000 : 00010 00100 visit : 01000 00001 00100 $FB : 00100 00010 10000 01000 : 01000 00001 00100 visit : 00100 10000 01000 $FB : 00100 10000 : 00100 10000 01000 Week : 00100 01000 00011 Week : 00100 01000 00011 $GMCR : 01000 00001 00100 Week : 01000 00001 00100 $GMCR : 00100 10000 01000 : 00010 00100 10000 : 01000 00001 00100 Week : 00100 10000 01000 : 00100 10000 01000 $GOOG : 01000 00001 00100 Winners 85 6: 00100 01000 00011 $GOOG : 00100 10000 01000 winners 8513: 01000 00001 00100 $GOOG : 01000 00010 00100 10000 Winners 8516: 00100 10000 01000 Pos Term T# Doc 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 $IDRA : 00100 10000 00010 Year : 00010 00001 10000 $IDRA : 01000 00001 00100 Year : 01000 00001 00100 $IDRA : 00100 10000 01000 Year : 00100 10000 01000 $JAKK : 00001 00010 00100 10000 $JAKK : 01000 00001 00100 $JAKK : 00100 10000 01000
18
AAPL20 Position 2-Level pTrees Index by (Term,Document) Predicate= npz; Stride=5
check course custom out peeps trade $AAPL $FB $TWTR DOC 13 1 14 15 23 56 79 87 90 96 DOC1 AAPL helping her hype iwatch moments sensitive T# Lev1 Lev0 2 1 19 20 33 36 46 68 course info more visit $AAPL $FB $TWTR DOC 14 1 34 48 87 90 96 83 24 custom keep losing market money program sign today trade up $AAPL$FB $TWTR DOC4 15 1 38 42 47 60 70 43 26 78 80 87 90 79 96 course custom peeps trade $AAPL $FB $TWTR DOC 14 1 28 79 90 96 87 15 56 all happy new peeps prosperous safe year $AAPL $FB $TWTR DOC6 5 1 50 61 86 87 65 18 56 90 96 Course peeps take trade $AAPL $FB $TWTR DOC7 14 1 56 79 90 96 87 22 76 alerts multi-day peeps runners stock $AAPL $FB $TWTR DOC8 4 1 29 64 87 90 72 49 56 96 alerts day pass trade $AAPL $FB $TWTR DOC 4 1 16 21 54 79 87 90 96 alerts day pass trade $AAPL $FB $TWTR DOC 4 1 16 21 54 79 87 90 96 course mentor online street trade $AAPL $FB $TWTR DOC11 14 1 25 44 52 73 79 87 90 96 might reasons shares Starbucks three $AAPL $DNKN $GMCR $GOOG DOC12 1 30 45 62 69 71 77 87 89 91 92 alerts big peeps sign trade up week winners $AAPL $BAC $FB $TWTR DOC 4 1 8 29 56 70 79 80 84 85 87 88 90 96 Alerts join list out sign stock today up $AAPL $FB $TWTR DOC14 4 1 29 37 40 53 70 72 78 80 87 90 96 chart looking nice peeps play swing trade value $AAPL $FB $JAKK $TWTR DOC15 12 1 41 32 51 56 58 75 79 81 87 90 94 96 day pay run subs trade $AAPL $BAC $FB $TWTR DOC16 16 1 27 55 96 98 100 90 88 87 79 74 63 Alerts lessons peeps sign trade up video week $AAPL $FB $TWTR DOC17 4 1 29 39 84 87 90 82 80 79 70 56 96 course send trade $AAPL $FB $TWTR 0.75 DOC18 14 1 17 67 97 96 90 87 79 area breakout nice run today $AAPL $FB $IDRA $TWTR 3.5 DOC19 7 1 10 51 96 99 93 90 87 78 63 Kphrase (in sequence) and Kset pTrees: Ksets are created by OR-ing the kTerm pTrees. Note 2PhraseStartPos pTrees would involve #docs*#Words*#Words*#Pos cells so it could be too massive even for pTree technology! Terms=1Terms. Use downward closure (of ARM) to facilitate fast pTree creation for kPhrases from . We’ll select “content” kPhrases and create Position pTrees for them (with start positions). Since tweets are short and the vocab is long, it might be best to do this by just scanning the tweets by hand? I selected the following content 2phrases from AAPL20: custom trade; losing money; stock alert; multi-day runners; trade alerts; online trade; street mentor; daily chart; nice run; investor call; big winners; trade courses; nice breakout; customer trade; bond sale; trade mentorship; new year; daily alerts; value play. Each word in a content kphrase must be a content Term and be selected as a content kphrase (ALA, ARM downward closure). At the kth step we construct candidate content kphrases from pairs of content (k-1) pairs that coincide on (k-2) terms. Then delete all such kphrases that are non-content and/or have zero count. AAPL ahead bond call investor plans potential sale $AAPL DOC 2 1 3 9 59 66 57 35 31 11 87 Wds 87,88 #BAC) in doc16, pos=17,18
19
AAPL20 Kphrases (in sequence): KSets (OR ofTerm pTrees) used like keyword lists (create on the fly?). For kphrases, use downward closure to facilitate fast pTree creation for kPhrases Select “content” kPhrases (StartPosition pTree = 1st Term SP) I selected these from AAPL20 by hand: sensitive moments, iwatch helping, helping hype, hype AAPL, custom trade; losing money; stock alert; multi-day runners; trade alerts; online trade; street mentor; daily chart; nice run; investor call; big winners; trade courses; nice breakout; customer trade; bond sale; trade mentorship; new year; daily alerts; value play. DOC1 Sensitive moment her iwatch helping hype AAPL sensitive-moments iwatch-helping hype-AAPL T# Lev1 Lev0 2 1 19 20 33 36 46 68 68 &46= So we might just indicate in the header if a Term launches a Content2Phrase, Content3Phrase, … as follows: Phrases: T# DOC13 sign up trade alerts course big winners week $FB $AAPL $TWTR peeps Lev1 Lev0 70 1 80 79 4 14 8 85 84 29 90 87 96 56 Phrases: T# DOC1 Sensitive moment her iwatch helping hype AAPL Lev1 Lev0 2 1 19 20 33 36 46 68 Observations: On the previous page I sorted using a software that put the terms in a strange order (which is not a problem since we fully shuffle pTree placements anyway for our pTreeSet Security Mechanism anyway). However on this page I put them in position order so we could indicate phrases easily. If we shuffle, the phrase array can be as shown here and matched up to the Term list after retrieval. Also note that I discovered many errors on the previous page (sorry – too lazy to go through and correct them all). 13 1 53 Phrases: T# DOC2 Check out custom trade course $AAPL $TWTR peeps Lev1 Lev0 15 79 14 87 96 56 Phrases: T# DOC12 $SBUX three reasons starbucks shares might $SBUX $AAPL $GOOG $DKDN $$GMCR Lev1 Lev0 1 77 62 71 69 45 30 95 87 92 89 91 1. At one of those sensitive moments 2. Check out my custom trading mentorship 3. For more info on our courses 4. Keep losing money in the market? 5. Custom trading classes with us 6. Happy new year to all! Have 7. Take a trading course with us! 8. Get stock alerts here multi-day runners! 9. Day pass for our trading alerts 10. Day pass for our trading alerts 11. Online Trading Course's The Street Mentor 12. $SBUX Three Reasons Starbucks Shares Might 13. Sign up for our trading alerts 14. Hot list of stocks out today 15. Nice looking daily chart on $JAKK 16. Been a nice run for sub's 17. Sign up for our daily trading 18. Send me an for the 19. $IDRA could run to the $3.50 20. $AAPL Apple plans investor call ahead give her a iwatch helping the courses here: $AAPL $TWTR visit us here: https $FB $AAPL Sign up for a custom trading $FB $AAPL $TWTR Peeps #investing #trading a safe and prosperous new year! $FB $AAPL $TWTR Peeps #stocks $FB $AAPL $TWTR peeps here: $AAPL $TWTR #stocks $FB $AAPL $TWTR Traders. Stay Hot In http: #IBDNews here Big winners this week https: Sign up for our daily alerts for a swing trade here $TWTR from $ trade' at $ AH_ alerts and weekly video lessons here! 75% off trading courses here $FB area today Nice breakout here $FB of a potential bond sale: WSJ hype AAPL peeps. $TWTR #stocks #trading mentorship program with today https $FB #stocks #trading #investig $FB $AAPL $TWTR #investing #daytrading via @IBDinvestor$SBUX $AAPL $GOOG $DNKN $FB $AAPL $TWTR Peeps here! $AAPL $TWTR Join $FB $AAPL Peeps value play. http: Nice pay day $TWTR $AAPL $BAC https $AAPL $TWTR Peeps $AAPL $TWTR Traders $AAPL $TWTR Traders. http @YahooFinance19. $AAPL $TWTR $GMCR us! u8V $FB https Pos
20
AAPL20 Term pTrees (indexed Docs=1,2,3,4; all Pos, Predicate=NotPureZero (np0), strides=10, 1st 20 Tweets, 100wds 1,5: 2 level 1,6: 1,8: P d 5 1 6 1 8 1 a 1 b 1 d 1 e 1 1 2 2 1 4 2 1 5 2 1 7 2 1 9 2 1 a 2 1 b 2 1 c 2 1 d 2 1 2 3 1 3 1 6 3 1 7 3 1 a 3 1 b 3 1 c 3 1 d 3 1 1 4 2 4 1 3 4 1 6 4 1 7 4 1 8 4 1 a 4 1 b 4 1 d 4 1 g 4 1 h 4 1 i 4 1 j 4 1 ahead AAPL 2015 alerts Apple all area bond big breakout check chart call custom course day helping happy her investor info hype iwatch lessons keep join looking list losing mentor market moments might money multi-day more nice new online pass out pay plans peeps potential play prosperous program reasons runners run safe sensitive send sale shares Starbucks sign street stock swing subs three take today trade up visit video value winners week $AAPL year $DNKN $BAC $GOOG $GMCR $FB $SBUX $JAKK $IDRA $TWTR 0.75 3.77 3.5 2.7 1 4 3 2 7 6 5 10 9 8 13 12 11 16 15 14 19 18 17 22 21 20 23 26 25 24 29 28 27 32 31 30 35 34 33 38 37 36 41 40 39 44 43 42 47 46 45 48 51 50 49 54 53 52 57 56 55 60 59 58 63 62 61 66 65 64 69 68 67 72 71 70 73 76 75 74 79 78 77 82 81 80 85 84 83 86 89 88 87 92 91 90 95 94 93 98 97 96 100 99 1,a: 1,b: 1,d: 1,e: 2,1: 2,2: 2,4: 2,5: 2,7: 2,9: 2,a: 2,b: 2,c: 2,c: 3,2: 3,3: 3,6: 3,7: 3,a: 3.b: 3,c: 3,d: 4,1: 4,2: 4,3: 4,6: 4,7: 4,8: 4,a: 4,b: 4,d: 4,g: 4,h: 4,i: 4,j: give her a iwatch helping the visit us here: https $FB $AAPL courses here: $AAPL $twtr a safe and prosperous new year! $FB $AAPL $TWTR Peeps #investing #trading Sign up for a custom trading here: $AAPL $TWTR #stocks $FB $AAPL $TWTR peeps $FB $AAPL $TWTR Peeps #stocks Stay Hot In http: #IBDNews $FB $AAPL $TWTR Traders. here Big winners this week https: Sign up for our daily alerts alerts and weekly video lessons here! from $ trade' at $ AH_ for a swing trade here $TWTR area today Nice breakout here $FB 75% off trading courses here $FB of a potential bond sale: WSJ peeps. hype AAPL mentorship program with us today. https $TWTR #stocks #trading #investing #stocks #trading #investig $FB $AAPL $TWTR #daytrading $FB $AAPL $TWTR Peeps via @IBDinvestor$SBUX $AAPL $GOOG $DNKN here! $AAPL $TWTR Join $FB $AAPL Peeps value play. http: $AAPL $TWTR Traders https $AAPL $TWTR Peeps Nice pay day $TWTR $AAPL $BAC http @YahooFinance19. $AAPL $TWTR Traders. $FB $AAPL $TWTR $GMCR u8V us! $FB https 3. For more info on our courses 2. Check out my custom trading mentorship 1. At one of those sensitive moments 6. Happy new year to all! Have 5. Custom trading classes with us 4. Keep losing money in the market? 8. Get stock alerts here multi-day runners! 7. Take a trading course with us! 11. Online Trading Course's The Street Mentor 10. Day pass for our trading alerts 9. Day pass for our trading alerts 13. Sign up for our trading alerts 12. $SBUX Three Reasons Starbucks Shares Might 16. Been a nice run for sub's 15. Nice looking daily chart on $JAKK 14. Hot list of stocks out today 19. $IDRA could run to the $3.50 18. Send me an for the 17. Sign up for our daily trading 20. $AAPL Apple plans investor call ahead Pos:
21
AAPL Term pTrees (indexed by Documents=5,6,7,8; all Positions, Predicate=NotPureZero (np0), all stride s=10. 1 level doc , pos 2 level 5,1: 5,2: DOC POS 1 ct 1 5 2 5 1 3 5 1 6 5 1 7 5 1 8 5 1 9 5 1 a 5 1 1 6 2 6 1 3 6 1 5 6 1 8 6 1 a 6 1 b 6 1 c 6 1 g 6 1 h 6 1 i 6 1 j 6 1 1 7 3 7 1 4 7 1 7 1 8 7 1 9 7 1 a 7 1 b 7 1 2 8 1 3 8 1 5 8 1 6 8 1 7 8 1 8 1 b 8 1 9 8 1 a 8 1 1 4 3 2 7 6 5 10 9 8 13 12 11 16 15 14 19 18 17 22 21 20 23 26 25 24 29 28 27 32 31 30 35 34 33 38 37 36 41 40 39 44 43 42 47 46 45 48 51 50 49 54 53 52 57 56 55 60 59 58 63 62 61 66 65 64 69 68 67 72 71 70 73 76 75 74 79 78 77 82 81 80 85 84 83 86 89 88 87 92 91 90 95 94 93 98 97 96 100 99 5,3: 5,6: 5,7: 5,8: 5,9: 5,a: 6,1: 6,2: 6,3: 6,5: 6,8: 6,a: 6,b: 6,c: 6,g: 6,h: 6,i: 6,j: 7,1: 7,3: 7,4: 7,7: 7,8: 7,9: 7,a: 7,b: 8,2: 8,3: 8,5: 8,6: 8,7: 8,8: 8,9: 8,a: 8,b:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.