Linear Search Efficiency Assessment P. Pete Chong Gonzaga University Spokane, WA 99258-0009

Linear Search Efficiency Assessment P. Pete Chong Gonzaga University Spokane, WA 99258-0009 chong@gonzaga.edu

Why Search Efficiency? Information systems help users to obtain “right” information for better decision making, thus require search Better organization reduces time needed to find this “right” info, thus require sort Is savings in search worth the sort?

Search Cost If we assume random access, then the cost is (n+1)/2. That is, for 1000 records, the average search cost is approximately 1000/2=500. For large n, we may use n/2 to simplify the calculation

Search Cost In reality, the usage pattern is not random For payroll, for example, every record is accessed once and only once. In this case sorting have no effect to search efficiency Most of the time the distribution follows the 80/20 Rule (Pareto Principle)

The Pareto Principle Applications80% of the...in 20% of the... Income distributionIncome/WealthPeople Marketing DecisionsBusinessCustomers Population DistributionPopulationCities Firm Size DistributionAssetsFirms Library Resource MgmtTransactionsHoldings Academic ProductivityPapers PublishedAuthors Software Menu DesignUsageFeatures used Database ManagementAccessesData Accessed Inventory ControlValueInventory

Group/NumbersNumbersPaperCumulativeCumulativeAuthorPaper IndexPapersAuthorsSubtotalAuthorsPapersProportionProportion in i f(n i )n i f(n i )  f(n i )  n i f(n i )x i  i 26242124212420.0030.137 25114111423560.0050.202 24102110234580.0080.260 239519545530.0110.314 225815856110.0140.347 214914966600.0160.374 203413476940.0190.394 192224497380.0240.419 1821242117800.0300.442 1720240138200.0350.465 1618118148380.0380.475 1516464189020.0490.512 1415230209320.0540.529 1314114219460.0570.537 1212224239700.0620.550 11115552810250.0760.581 10103303110550.0840.598 994363510910.0950.619 888644311550.1160.655 788565111210.1380.687 666365712470.1540.707 5510506712970.1810.736 4417688413650.2270.774 33298711314520.3050.824 225410816715600.4510.885 1120320337017631.0001.000 Total number of Groups (m): 26 Average number of publications (  ): 4.7649

A Typical Pareto Curve

Formulate the Pareto Curve Chen et al. (1994) define f(n i ) = the number of authors with n i papers, T = = total number of authors, R = = total number of papers,  = R/T = the number of published papers per author

Formulate the Pareto Curve for each index level, let x i be the fraction of total number of authors and  i be the fraction of total paper published, then x i = and  i =.

Formulate the Pareto Curve Plug in the values above into (  i -  i+1 )/(x i - x i+1 ), Chen et al. derive the slope formula: s i = When n i = 1, s i = 1/  = T/R, let’s call this particular slope .

Revisit the Pareto Curve  = 370/1763 = 0.21

The Significance We now have a quick way to quantify different usage concentrations Simulation shows that in most situations a moderate sample size would be sufficient to assess the usage concentration The inverse of average usage (  ) is easy to calculate

Search Cost Calculation The search cost for a randomly distributed list is n/2. Thus, for 1000 records, the search cost is 500. For a list that has 80/20 distribution, the search cost is (200/2)(80%)+[(200+1000)/2](20%) = 200 Or a saving of 60%

Search Cost Calculation Let the first number in the 80/20 be a and the second number be b. Since these two numbers are actually percents, we have a + b = 1. Thus, the expected value for searching cost for a list of n records is the weighted average: (bn/2)(a) + [(bn+n)/2](b) = (bn/2)(a+b+1) = (bn/2)(2) = bn

Search Cost Calculation Thus, b indicates the cost of search in terms of the percentage of records in the list. bn represent an upperbound of the number of searches. For a fully sorted list (by usage) with 80/20 distribution, Knuth (1973) has shown that the average search cost C(n) is only 0.122n.

Search Cost Simulation  bC(n)  b 0.0050.0250.00680.200.210.1312 0.010.030.00860.250.240.1613 0.020.060.02150.300.260.1827 0.040.080.03200.400.2950.2226 0.060.110.05010.500.330.2654 0.080.120.05690.600.370.3173 0.100.140.07120.700.4050.3648 0.150.180.10370.800.440.1437 0.180.200.12181.000.500.5000

Search Cost Simulation

Search Cost Estimate Regression Analyses yield: b = 0.15 + 0.359 , for 0.2<  <1.0 b = 0.034 + 0.984 , for 0<  <0.2, and C(n) = 0.02 + 0.49 

Conclusion The true search cost is between the estimation of b and C(n) We may use C(n)~0.5  as a way to quickly estimate the search cost of a fully sorted list. That is, take a moderate sample of usage, the search cost will be half of the inverse of the average usage times the total number of records.

“Far-fetched” (?) Applications Define and assess the degree of monopoly? What is the effect of monopoly? Note the gap between b and C(n) (ideal). Gini Index?

Linear Search Efficiency Assessment P. Pete Chong Gonzaga University Spokane, WA 99258-0009

Similar presentations

Presentation on theme: "Linear Search Efficiency Assessment P. Pete Chong Gonzaga University Spokane, WA 99258-0009"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Linear Search Efficiency Assessment P. Pete Chong Gonzaga University Spokane, WA 99258-0009

Similar presentations

Presentation on theme: "Linear Search Efficiency Assessment P. Pete Chong Gonzaga University Spokane, WA 99258-0009"— Presentation transcript:

Similar presentations

About project

Feedback