HOME All Posts My Etsy Follow me on Mastodon for updates

Analyzing Megabus Pricing Structure Through Email Receipts and Confirmation Numbers

Published October 16th, 2019 by Emily

I have taken 80 Megabus rides in the past decade, I have the receipts to prove it. I've gotten pretty good at anticipating the cost of the tickets: I know on holidays the prices are much higher and I know that if I book about a month before an unimportant weekend the ticket prices are between five and ten dollars.

However, I thought I would take a look at an aggregation of my email receipts and see if I could anticipate the price a bit better.

Confirmation Code

Each Megabus ticket has a confirmation number. I took a look at the confirmation numbers and was able to determine what just about all of the information means.

A: Ticket Number

As far as I can tell, the first number is what number you are in the queue for purchasing tickets. If you were the second person to purchase a ticket, your number will be 2. The highest number I've gotten in a confirmation code is 116, and that was when I booked a ticket the same day as the journey, only nine hours before.

B: Checksum?

I can't tell any pattern for this number. My only thought is that this might be some kind of checksum, so that theoretically the system can tell if a confirmation code has been falsified.

C: Journey Departure Date (MMDDYY)

This is a six digit number that corresponds with the date of departure. In the example, the bus departs on June 30th, 2013.

D: Route Number

This corresponds to Megabus' list of routes. There are sometimes multiple routes between the same two cities. For example, the most common routes between Philadelphia and Washington are M31 and M31R, but M32R and M44R are also options, and once I took one route that was M21R. I haven't yet figured out what the route numbers mean but I'm going to look into that more in the future.

E: Journey Departure Time (HHMM)

This is a four digit number that corresponds with the time of departure in military time. In the example, the bus departs at 1:15 pm.

F: Departure Location -> Arrival Location

This shows where the bus is coming from and where it is going.

Other Data

Now I had a lot of data to work with, but I also wanted to add two more important pieces to complete the set.

Purchase Date and Time

I was able to get this information from the date and time stamped on my confirmation email.

Ticket Price

I recorded the listed price of each ticket from the confirmation email. I was careful not to include the price of a seat reservation.

Analyze Data

For the purposes of this analysis, I only used three pieces of data:

In the end I had 125 lines of data. Megabus has a free rebooking policy so I frequently book a ticket only to change the trip later. As it turns out, 36% of the trips I book I later reschedule.

First I wanted to verify that I had understood the confirmation number correctly. If I bought the ticket more than a month before the trip I should have been among the first to buy a ticket and my ticket number should have been low. If I bought the ticket the day of the journey, presumably I would be one of the last so my ticket number would be high. This isn't foolproof - most of the bus trips I've taken are only around 75% full - but if I graph the two I should see a relationship.

In fact, I did see a negative relationship. Obviously there were outliers, but in general the further in advance I bought the ticket, the less people had bought tickets ahead of me.

I then got to the point of why I made all these graphs. I wanted to make a graph where I would be able to see how the price of the ticket increases as I book it closer to the departure date.

I was shocked to see that there doesn't appear to be much of a relationship at all between how early I bought the ticket and the price of the ticket.

Perhaps the ticket price increases with the number of people who had bought the ticket?

There isn't much of a relationship here, either.

I think the reason why I can't find a relationship between the price and when I am booking the ticket is because each trip is so different. A trip that isn't in demand will start at $1 and only go up to $30 the day of the journey. However, if you book a trip on a holiday the prices will start at $40 and only rise from there.

For this reason, I think the only way to determine how the price of a ticket changes over time is to actually track the price of the same ticket over many days before the departure date. Lucky for us, I've been running a program to track these prices over the past few years. Once I have time, I'll write a post with my findings.