Presentation is loading. Please wait.

Presentation is loading. Please wait.

Organizing Open Online Computational Problem Solving Competitions By: Ahmed Abdelmeged.

Similar presentations


Presentation on theme: "Organizing Open Online Computational Problem Solving Competitions By: Ahmed Abdelmeged."— Presentation transcript:

1 Organizing Open Online Computational Problem Solving Competitions By: Ahmed Abdelmeged

2 In 2011, researchers from the Harvard Catalyst Project were investigating the potential of crowdsourcing genome- sequencing algorithms.

3 So, they collected few million sequencing problems and developed an electronic judge that evaluates sequencing algorithms by how well they solve these problems.

4 And, they set up a two-week open online competition on TopCoder with a total prize pocket of $6000.

5 The results were astounding!

6 “... A two-week online contest... produced over 600 submissions.... Thirty submissions exceeded the benchmark performance of the US National Institutes of Health’s MegaBLAST. The best achieved both greater accuracy and speed (1,000 times greater).” -- Nature Biotechnology, 31(2):pp. 108–111, 2013.

7 We want to lower the barrier to entry for establishing such competitions by having “meaningful” competitions where participants assist the admin in evaluating their peers.

8 Thesis Statement “Semantic games of interpreted logic sentences provide a useful foundation to organize computational problem solving communities.”

9

10

11

12

13

14 Open online competitions have been quite successful in organizing computational problem solving communities.

15 “... A two-week online contest... produced over 600 submissions.... Thirty submissions exceeded the benchmark performance of the US National Institutes of Health’s MegaBLAST. The best achieved both greater accuracy and speed (1,000 times greater).” -- Nature Biotechnology, 31(2):pp. 108–111, 2013.

16 Let’s take a closer look at state-of-the-art approaches to organize an open online competition for solving computational problems. MAX-SAT as a sample problem.

17 MAXimum SATisfiability (MAX- SAT) problem Input: a boolean formula in the Conjunctive Normal Form (CNF). Output: an assignment satisfying the maximum number of clauses.

18 The Omniscient Admin Approach A “trusted” admin prepares a thorough benchmark of MAX-SAT problem instances together with their correct solutions. This benchmark is used to evaluate individual MAX-SAT algorithms submitted by participants.

19 The Teaching Admin Approach Admin prepares a thorough benchmark of MAX-SAT problems and their “model” solutions. Benchmark used to evaluate individual MAX-SAT algorithms submitted by participants.

20 Cons Overhead to collect and solve problems. What if, admin incorrectly solves some problems?

21 The Open Benchmark Approach Admin maintains an open benchmark of problems and their solutions. Participants may object to any of the solutions before the competition starts.

22 Cons Over-fitting : Participants may tailor their algorithms for the benchmark.

23 The Learning Admin Approach An admin prepares a set of MAX-SAT problems and keeps track of the best solution produced by one of the algorithms submitted by participants. Pioneered by the FoldIt team.

24 Cons Works for optimization problems. Not clear how to apply to other computational problems. TQBF for example.

25 Wouldn’t it be great if we had a sports-like OOCs where admin referees the competition with minimal overhead?

26 However,

27 Research Question How to organize a “meaningful” open online computational problem solving competition where participants assist in the evaluation of their opponents?

28 Research Question How to organize a “meaningful”, sports-like, open online computational problem solving competition where the admin only referees the competition with minimal overhead?

29 Simpler Version “meaningful”, Two-Party Competitions. Admin provides neither benchmark problems nor their solutions.

30 Attempt I Each participant prepares a benchmark of problems and solve their opponent’s benchmark problems. Admin checks solutions. Checking the correctness of a MAX-SAT problem solution can be an overhead to the admin.

31 Attempt II Each participant prepares a benchmark of problems and their solutions. Each participant solves their opponent’s problems. Admin “compares” both solutions for each problem to determine the winner. Admin has to correctly compare solutions. Admin cannot assume any of the solutions to be correct.

32 Attempt II Each participant prepares a benchmark of problems and their “model” solutions. Each participant solves their opponent’s problems. Admin only compares solutions to “model” solutions.

33 But, Participants are incentivized to provide the wrong “model” solution. Admin should compare solutions without trusting any of them.

34 Thesis “Semantic games of interpreted logic sentences provide a useful foundation to organize computational problem solving communities.”

35 Semantic Games A Semantic Game (SG) is a constructive debate of the correctness of an interpreted logic sentence (a.k.a claim) between two distinguished parties: the verifier which asserts that the claim holds, and the falsifier which asserts that the claim does not hold.

36 A Two-Party, SG-Based MAX-SAT Competition (I) Participants develop functions to: Provide side preference. Provide values for quantified variables based on values of variables in scope. ∀ φ ∈ CNFs ∃ v ∈ assignments(φ) ∀ f ∈ assignments(φ). fsat(f,φ)≤fsat(v,φ)

37 A Two-Party, SG-Based MAX-SAT Competition (II) Admin chooses sides for players based on their side preference. Let Pv be the verifier and Pf be the falsifier. ∀ φ ∈ CNFs ∃ v ∈ assignments(φ) ∀ f ∈ assignments(φ). fsat(f,φ)≤fsat(v,φ)

38 A Two-Party, SG-Based MAX-SAT Competition (III) Admin gets value provided by Pf for φ. Admin checks φ ∈ CNFs. If false, Pf loses. Admin gets value provided by Pv for v. Admin checks v ∈ assignments(φ). If false, Pv loses. ∀ φ ∈ CNFs ∃ v ∈ assignments(φ) ∀ f ∈ assignments(φ). fsat(f,φ)≤fsat(v,φ)

39 A Two-Party, SG-Based MAX-SAT Competition (IV) Admin gets value provided by Pf for f. Admin checks f ∈ assignments(φ). If false, Pf loses. Admin evaluates fsat(f,φ)≤fsat(v,φ). If true Pv wins, otherwise Pf wins. ∀ φ ∈ CNFs ∃ v ∈ assignments(φ) ∀ f ∈ assignments(φ). fsat(f,φ)≤fsat(v,φ)

40 Rationale (I) Controllable admin overhead. ∀ φ ∈ CNFs ∃ v ∈ assignments(φ). satisfies-max(v,φ) ∀ φ ∈ CNFs ∃ v ∈ assignments(φ) ∀ f ∈ assignments(φ). fsat(f,φ)≤fsat(v,φ)

41 Rationale (II) Correct: there is a winning strategy for verifiers of true claims and falsifiers of false claims. Regardless of the opponent’s actions.

42 Rationale (III) Objective. Systematic. Learning chances.

43 SG-Based Two-Party Competitions We let participants debate the correctness of an interpreted predicate logic sentence specifying the computational problem of interest, assuming that participants choose to take opposite sides.

44 Out-of-The-Box, SG-Based, Two-Party MAX-SAT Competition ∀ φ ∈ CNFs ∃ v ∈ assignments(φ) ∀ f ∈ assignments(φ). fsat(f,φ) ≤ fsat(v,φ) 1. Falsifier provides a CNF formula φ. 2. Verifier provides an assignment v. 3. Falsifier provides an assignment f. 4. Admin evaluates fsat(f,φ) ≤ fsat(v,φ). If true, verifier wins. Otherwise, falsifier wins.

45 Pros and cons of out-of-the-box, SG-based, two-Party competitions solve our “meaningful”, Two-Party Competitions. Admin provides neither benchmark problems nor their solutions.

46 Pro (I): Systematic The rules of an SG are systematically derived from the syntax of its underlying claim. SGs are also defined for other logics.

47 Rules of SG(φ, A, v, f) φMoveNext Game ∀ x : ψ  f provides x 0 SG(ψ[x 0/ x], A, v, f)  ∧  f chooses θ ∈ {ψ,  } SG(θ, A, v, f) ∃ x :  ψ  v provides x 0 SG(ψ[x 0/ x], A, v, f) ψ ∨ ψ ∨  v chooses θ ∈ {ψ,  } SG(θ, A, v, f) ¬ ψN/ASG(ψ, A, f, v) P(t 0 )v wins if p(t 0 ) holds, o/w f wins “The Game of Language: Studies in Game-Theoretical Semantics and Its Applications” -- Kulas and Hintikka, 1983

48 Pro (II): Objective Competition result is based on skills that are precisely defined in the competition definition.

49 Pro (III): Correct Competition result is based on demonstrated possession (or lack of) skill. Incorrectly solved problems by admin or opponent cannot worsen participant’s rank. There is a winning strategy for verifiers of true claims and falsifiers of false claims. Regardless of the opponent’s actions.

50 Pro (III): Correct There is a winning strategy for verifiers of true claims and falsifiers of false claims, regardless of the opponent’s actions.

51 Pro (IV): Controllable Admin Overhead Admin overhead is to implement the structure interpreting the logic statement specifying a computational problem. It is always possible to scrap functionality out of the interpreting structure at the cost of adding complexity to the logic statement.

52 Pro (V): Learning Losers can learn from SG traces.

53 Pro (VI): Automatable Participants can codify their strategies for playing SGs. Efficient and thorough evaluation. Codified strategies are useful bi- products. Controlled information flow.

54 Challenges (I) Participants must take opposing sides! Neutrality is lost with forcing.

55 Con (II): Not Thorough Unlike sports-games, a single game is not thorough enough.

56 Con (III): Issues Scaling to N-Party Competitions In sports, tournaments are used to scale two-party games to n-party competitions.

57 Challenges (II) Scaling to N-Party Competition using a tournament, yet: Avoid Collusion Potential especially in the context of open online competitions where Sybil identities are common and games are too fast to spectate! Ensure that participants get the same chance.

58 Challenges (II) Scaling to N-Party Competition using a tournament, yet:

59 Issue (II): Neutrality Do participants get the same chance? We have to force sides on participants. We may have vastly different number of verifiers and falsifiers.

60 Issue (II): Correctness and Neutrality We have to force sides on participants. Yet, we cannot penalize forced losers for competition correctness. We’ve to ensure that all participants get the same chance even though we may have vastly different number of verifiers and falsifiers.

61 Contributions 1. Computational Problem Solving Labs (CPSLs). 2. Simplified Semantic Games (SSGs). 3. Provably Collusion-Resistant SSG- Tournament Design.

62 Computational Problem Solving Labs (CPSLs)

63 CPSLs A structured interaction space centered around a claim. Community members contribute by submitting their strategies for playing an SSG of the lab’s claim. Submitted strategies, compete in a provably collusion resistant tournament of simplified semantic games.

64 Control, Thorough and Efficient Evaluation.

65 Codified Strategies Efficient and thorough evaluation. Useful by-products. Controlled information flow.

66 CPSLs (II) A structured interaction space centered around a claim. Community members contribute by submitting their strategies for playing an SSG of the lab’s claim. Once a new strategy is submitted in a CPSL, it competes against the strategies submitted by other members in a provably collusion resistant tournament of simplified semantic games.

67 Highest Safe Rung Problem The Highest Safe Rung (HSR) problem is to find the largest number (n) of stress levels that a stress testing plan can examine using (q) tests and (k) copies of the product under test. k = 1, n = q, linear search k >= q, n = 2^q, binary search 1 < k < q, n = ?, ? 12k... 2 1 (safe) n...

68 Highest Safe Rung Admin Page Computational Problem Solving Lab - Highest Safe Rung Description: The Highest Safe Rung (HSR) problem is to find the largest number of stress levels that a stress testing plan can examine using (q) tests and (k) copies of the product under test. Claim: Save Welcome.... Standings Log out 1 2 4 3... "HSR() := forall Integer q : forall Integer k :... Game Traces: PublishHide

69 Highest Safe Rung Computational Problem Solving Lab - Highest Safe Rung Description: The Highest Safe Rung (HSR) problem is to find the largest number of stress levels that a stress testing plan can examine using (q) tests and (k) copies of the product under test. Upload new Strategy Ran k Membe r Latest contribution # of faults Chose n side #1... verifier #2... verifier #3... #20... #22... #21... #22Sc11/1/2014... #23... #24... #25... falsifier Standings: Welcome Sc1 See all Download traces of past games 1 2 3 4 6 5 Download strategy skeleton Download claim specification Log out

70 Claim Specification

71 Simplified Semantic Games

72 SG Rules

73 SSGs Simpler : use auxiliary games to replace moves for conjunctions and disjunctions. Thoroughness potential : participants can provide several values for quantified variables.

74 SSG Rules

75 class HSRClaim { public static final String [] FORMULAS = new String []{ ”HSR() := forall Integer q : forall Integer k : exists Integer n : HSRnqk(n, k, q) and ! exists Integer m : greater (m,n) and HSRnqk(m, q, k)” ; ”HSRnqk(Integer n, Integer q, Integer k) := exists SearchPlan sp : correct(sp, n, q, k)”}; public static boolean greater(Integer n, Integer m){ return n > m; } public static interface SearchPlan{} public static class ConclusionNode implements SearchPlan{ Integerhsr ; } public static class TestNode implements SearchPlan{ IntegertestRung ; SearchPlan yes; // What to do when the jar breaks. SearchPlanno ;// What to do when the jar does not break. } public static boolean correct(SearchPlan sp, Integer n, Integer q, Integer k){ // sp satisfies the binary search tree property, has n leaves, of depth at most q, all root−to−leaf paths have at most k ”yes” branches.... } HSR Claim Specification

76 Strategy Specification One function per quantified variable.

77 class HSRStrategy { public static Iterable HSR_q(){... } public static Iterable HSR_k(Integer q){... } public static Iterable HSR_n(Integer q, Integer k){... } public static Iterable HSR_m(Integer q, Integer k, Integer n){... } public static Iterable HSRnqk_sp(Integer n, Integer q, Integer k){... } HSR Strategy Skeleton

78

79 Semantic Game Tournaments

80 Tournament Design Scheduler: Neutral. Ranking function: Correct and anonymous. Can mask scheduler deficiencies.

81 Ranking Functions Input : beating function representing output of several games. Output: a total preorder of participants.

82 Beating Functions (of SG Tournaments) b P (p w, p l, s wc, s lc, s w ) : sum of all gains of p w against p l while p w choosing side s wc, p l choosing side s lc and p w taking side s w. More complex.

83 Ranking Functions (Correctness) Non-Negative Regard for Wins. Non-Positive Regard for Losses.

84 Non-Negative Regard For Wins (NNRW) Px Wins Faults Additional wins cannot worsen Px’s rank w.r.t. other participants.

85 Non-Positive Regard For Losses (NPRL) Px Wins Faults Additional faults cannot improve Px’s rank w.r.t. other participants. Implies:

86 Ranking Functions (Anonymity) Output ranking is independent of participant identities. Ranking function ignores participants’ identities. Participants also ignore their opponents’ identities.

87 Limited Collusion Effect Slightly weaker notion than anonymity. What you want in practice. A participant Py can choose to lose on purpose against another participant Px, but that won’t make Px get ahead of any other participant Pz.

88 Limited Collusion Effect (LCE) Px Wins Faults Games outside Px’s control cannot worsen Px’s rank w.r.t. other participants.

89 Discovery A useful design principle for ranking functions. Under NNRW, NPRL : LCE = LFB LFB is quite unusual to have. LFB lends itself to implementation.

90 Locally Fault Based (LFB) Px Wins Faults Relative rank of Px and Py depends only on faults made by either Px or Py. Py Faults Wins

91 Locally Fault Based (LFB) Px Wins Faults Relative rank of Px and Py can depends only on games faults made by either Px or Py. Py Faults Wins

92 Locally Fault Based (LFB)

93 Collusion Resistant Ranking Functions

94 Beating Functions Represent outcome of a set of SSGs b P (p w, p l, s wc, s lc, s w ) : sum of all gains of p w against p l while p w choosing side s wc, p l choosing side s lc and p w taking side s wc.

95 Beating Functions (Operations) b P | w p x : games p x wins. b P | l p x : games p x loses. b P | fl p x : games p x loses while not forced. b P | c p x = b P | w p x + b P | fl p x : games p x controls. Can add them, b P 0 is the identity element.

96 Ranking Functions Take a beating function to a ranking Ranking : a total pre-order.

97 Limited Collusion Effect There is no way p y ’s rank can be improved w.r.t. p x ’s rank behind p x back.

98 Non-Negative Regard for Wins An extra win cannot worsen p x ’s rank.

99 Non-Positive Regard for Losses An extra loss cannot improve p x ’s rank.

100 Local Fault Based Relative rank of p x w.r.t. p y only depends on faults made by either p x or p y.

101 Main Result

102 Visual Proof

103 Fault Counting Ranking Function Players are ranked according to the number of faults they make. The less the number of faults the higher the rank. Satisfies the NNRW, NPRL, LFB and LCE properties.

104 Semantic Game Tournament Design For every pair of players: If choosing different sides, play a single SG. If choosing same sides, play two SGs where they switch sides.

105 Tournament Properties Our tournament is neutral.

106 Neutrality Each player plays n v + n f - 1 SGs in their chosen side, those are the only games it may make faults.

107 Related Work Rating and Ranking Functions Tournament Scheduling Match-Level Neutrality

108 Rating and Ranking Functions (I) Dominated by heuristic approaches Elo ratings. Who’s #1? There are axiomatization of rating functions in the field of Paired Comparison Analysis. LCE not on radar. Independence of Irrelevant Matches (IIM) is frowned upon.

109 Rating and Ranking Functions (II) Rubinstein[1980]: points system (winner gets a point) characterized as: Anonymity : ranks are independent of the names of participants. Positive responsiveness to the winning relation which means that changing the results of a participant p from a loss to a win, guarantees that p’s rank would improve. IIM: relative ranking of two participants is independent of matches in which neither is involved. “beating functions” are restricted to complete, asymmetric relations.

110 Tournament Scheduling Neutrality is off radar. Maximizing winning chances for certain players. Delayed confrontation.

111 Match-Level Neutrality Dominated by heuristic approaches Compensation points. Pie rule.

112 Conclusion “Semantic games of interpreted logic sentences provide a useful foundation to organize computational problem solving communities.”

113

114 Future Work Problem decomposition labs. Social Computing. Evaluating Thoroughness.

115 Questions?

116 Thank You!

117

118 N-Party SG-Based Competitions A tournament of two-party SG-based competitions

119 N-Party SG-Based Competitions: Challenges (I) Collusion potential especially in the context of open online competitions.

120 N-Party SG-Based Competitions: Challenges (II) Neutrality. Two-party SG-Based competitions are not neutral when one party is forced.

121

122 Rationale (4): Anonymous

123 Rationale (Objective) While constructively debating the correctness of an interpreted predicate logic sentence specifying a computational problem, participants provide and solve instances of that computational problem.

124 ∀ φ ∈ CNFs ∃ v ∈ assignments(φ) ∀ f ∈ assignments(φ). fsat(f,φ) ≤ fsat(v,φ)

125

126 Semantic Games

127 A “meaningful” competition is Correct Anonymous Neutral Objective Thorough

128 Correctness Rank is based on demonstrated possession (or lack of) skill. Suppose that we let participants create benchmarks of MAX-SAT problems and their solutions to evaluate their opponents. Participants would be incentivised to provide the wrong solutions.

129 Anonymous Rank is independent of identities. There is a potential for collusion among participants. This potential arise from the direct communication between participants. This potential is aggravated by the open online nature of competitions.

130 Neutral The competition does not give an advantage to any of the participants. For example, a seeded tournament where the seed (or the initial ranking) can affect the final ranking is not considered neutral.

131 Objective Ranks are exclusively based on skills that are precisely defined in the competition definition. Such as solving MAX-SAT problems.

132 Thorough Ranks are based on solving several MAX-SAT problems.

133 Thesis “Semantic games of interpreted logic sentences provide a useful foundation to organize computational problem solving communities.”

134 Semantic Games Thoroughness means that the competition result is based on a wide enough range of skills that participants demonstrate during the competition.

135


Download ppt "Organizing Open Online Computational Problem Solving Competitions By: Ahmed Abdelmeged."

Similar presentations


Ads by Google