Google Hashcode 2018

Print Friendly, PDF & Email

For the first time, this year, me and three colleagues of mine participated to the Google Hash Code challenge; this year the problem was open to Europe, Middle East and Africa: in the end 4856 teams has participated to the game. We knew we didn’t have too many chances to get a good result due to the prevailing number of nerds, young students and people from universities but we weren’t playing to win, just to prove ourselves we could still resolve a theoretical problem – maybe not in the best way, but we could do it.

So, this time the qualification problem was about a set of rides to be allocated to a pool of self-driving vehicles

Task – Given a list of pre-booked rides in a city and a fleet of self-driving vehicles, assign the rides to vehicles, so that riders get to their destinations on time. For every ride that finishes on time (or early), you will earn points proportional to the distance of that ride; plus an additional bonus if the ride also started precisely on time.

The distance of a ride is not the classical euclidean distance but a kind of taxicab distance: given the Start and the End of a Ride, the distance between intersection [a, b] and intersection [x, y] is equal to |a − x| + |b − y|.

Each Ride has an Earliest Start and a Latest Finish time: the ride cannot start before ES and shouldn’t finish after LF. If it can start at ES, a Bonus is gained – should it ends after LF, no points will be gained; the #points is defined by the length of the ride.

Everything is quite clean.

A usual the problem is represented by a set of input files and the candidates should produce a corresponding set of files containing the proposed solution:

Google Hash Code 2018: input example
Google Hash Code 2018: submission example

First solution

The first approach was way too greedy and naïve:

  • we ordered the rides in various way but mainly trying to prefer the nearer in time or in space (at the beginning all the vehicles are in (0, 0))
  • then we allocated the first N-vehicles to the first N-rides
  • from now on, we cycled the various rides trying to place them one by one to the best vehicle, where “best vehicle” meant:
    • nearest in time
    • nearest place
    • privileging Bonus

We have obtained a 31.4 million score when the top of the scoreboard had raised up to 49.9 million! Well, it was the first #code, so we were nervous, we wanted to submit something decent… after all it was not so bad; as I said “we knew we could make it better, we feared we could make worst than that!“.

Some days after I analyzed the problem and I found it in the calculation that associated a Ride to a Vehicle: the “next free time” of the vehicle was recalculated in a wrong way, thus preventing us to allocate more rides to that vehicle.
But let’s proceed orderly.

Google #code – score

Second Solution

The surprising thing has been visualizing the results after the scoreboard has been closed – the result on the “Metropolis” test has always been quite disappointing:

Google #code – Metropolis input – what the hell?!?

How can the solution be so dreadfully horrible? The thing is that, as usual, Google has prepared different input files putting in the various test case different scenarios, not only different data-sizes.

Is a ride always feasible?

Google likes to include some non feasible cases in the problems that proposes – we could call it “noise”; in this year qualification problem we can identify some rides that cannot be satisfied. Consider the following two samples:

 "Should be easy" #1"Should be easy" #2"High bonus" #1
Start point(434, 885)(623, 425)(1495, 1737)
End point(25, 81)(588,906)(960, 488)
Early start4062881009
Latest finish246312393343
Distance from (0, 0)131910483232
Ride length12135161748
Min time to complete
(distance from (0,0) + ride length)
1319+1213 = 2532 > 24631048 + 516 = 1564 > 12393232 + 1748 = 5016 > 3343

That is, if the start location and the ride length are such that the minimum time to get to the starting point and complete the ride is higher than the Latest Finish, then the ride cannot ever be satisfied – we just drop it right from the start.

 Number of ridesFeasible rides
A) Example33
B) Should be easy300294
C) No hurry1000010000
D) Metropolis100009988
E) High bonus100009984

Focus on the rides

If it’s true that the number of the vehicles must be kept in any case into consideration, we can also say that the complexity added by the vehicles to the problem is just ralated to that number: all the cars start at (0,0) and they all have the same speed – things would have changed if the vehicles had had different properties but that’s not the case.

So, our focus is on the rides; each city has different sizes, each problem has a different number of rides and each ride has different parameters:

Each ride has:

  • a Start point
  • an End point
  • an Early Start time (ES)
    • the ride cannot start before that time but a vehicle can be there before ES and wait for it
  • a Latest End time (LE)
    • the ride can complete before that time but if it ends after it, no point is gained
  • a Length

we have to assign the Rides to the Vehicles, but since all the Vehicles are equals, we can just reduce the problem in chaining the Rides in the best possible way – that is, reducing the “wait time” of a vehicle between two rides

we could call the “S” time “slack time”: for example, moving from ride 12 to serve ride 22, the vehicles has to spend a given amount of time (the “street length” between the two rides: ride12.End ==> ride33.Start) and in case it arrived before ride33.EarlyStart, it has to wait some time slots.

Considerations on the topology

The challenge uses 5 different input files with these parameters:

  ExampleShould be easyNo hurryMetropolisHigh bonus
City size3x4800x10003000x200010000x100001500x2000
Feasible rides32941000099889984
Ride lengthavg2.66577.1316741418.351160.75

and this means that, in the various scenarios, we could have to tweak the solution in a proper way to get the best result – we’ll see that this actually gives some benefits.

Find below the result of the rides allocation when the list of the rides is sorted using different metrics; in the second line the percentage of the rides allocation and the number of bonuses obtained:

a) Example10
(100% - 1)
(100% - 1)
(100% - 1)
(100% - 1)
(100% - 1)
(100% - 1)
(100% - 1)
(100% - 1)
b) Should be easy176572
(100% - 283)
(100% - 265)
(100% - 128)
(100% - 265)
(100% - 166)
(100% - 265)
(100% - 274)
(100% - 265)
c) No Hurry15809027
(94.05% - 0)
(94.13% - 0)
(94.13 - 0)
(94.13% - 0)
(94.13% - 0)
(94.13% - 0)
(94.11% - 0)
(94.13% - 0)
d) Metropolis11027010
(80.29% - 5166)
(80.27% - 5908)
(80.64% - 479)
(80.33 - 5920)
(82.06 - 1197)
(80.27 - 5908)
(80.73% - 5213)
(80.27% - 5908)
e) High bonus21276945
(100% - 9688)
(100% - 8363)
(100% - 1083)
(100% - 8363)
(100% - 1545)
(100% - 8363)
(100% - 8321)
(100% - 8363)

Using different pre-sorting modes we can generate different output and choose the one with the highest score – using  the “score calculator” of our software we get a score of 48635214; for some kind of reason, submitting the result to the Google Judge System we always get some more points (48643728), probably I didn’t correctly implement the score calculation…


One of my team-mates extended his optimization even further, getting a 49517699 score, quite a good result, that put our team at position 42 in the Extended Round:

We are mildly happy of this result.

We enjoyed the challenge a lot and we’ll definitely play another round next years hoping to get a better result, maybe avoiding bugs in the code!

Leave a Reply

Your email address will not be published. Required fields are marked *