The data only contains the start and end station for each trip, but does not contain the full path. Route geometries are computed for each (start station, end station) pair using the shortest path from OSRM.
This means that the computed routes are directionally correct but inexact. Trips that start and end at the same station are filtered out since the route geometry is ambiguous.
This limitation comes with more interesting implications: e.g., I noticed that some bike trips are noticeably slower than average. For those I’d assume that the rider either took a detour or made a stop in between. The animation, however, makes it appear as if it was a very slow ride. Maybe worth considering to filter out all rides that are essentially walking speed or slower.
It also would be interesting to learn how many rides had been excluded altogether, just to put things into perspective.
This is now top of my list as one of my favorite data visualizations I've ever seen. I remember spending some time with data for Capital Bikeshare data in DC, which was also public at one point, though looks like it only goes through 2016: https://capitalbikeshare.com/system-data. Would love to see the Lime/Bird version of this. Thanks for sharing.
The link above points to a 404 error page on GitHub. Looks like you forgot the hyphen in the name part of the url.
I’m working with subway data, particularly the A subway line, 32 mi long with about 2million trips over 6 months across 66 stations. Trying to train a convlstm to learn the spatiotemporal propagation of train headways.
I really wish Lyft invested in maintenance. I used Citibike this week for the first time in about a year, and the Hudson River Greenway dock by NY Waterway had 1/3 of its empty docks broken with flashing red lights, then about 5 ebikes that needed service.
Are you sure that wasn't the "staggered" bike dock? It forces you to dock in the rear row if the neighboring two front row spaces are free. This is to fit more bikes. The blinking red docks aren't broken. They're intentionally unavailable.
Also, the 5 e-bikes probably didn't need "service", they were just waiting for battery swaps. This is by design. The docks don't charge them.
CitiBike maintenance is generally fine. They're not leaving any significant number of broken bikes or docks. I think you may have just misunderstood how it works.
Interesting that citibike publishes trip level data. The bike share schemes in Dublin only publish station counts or free bike locations. So you can see the overall pattern of bike motion, but there’s no way to see how many north side trips go to the docks vs Heuston station vs the city center.
Do you find the OSRM shortest path routes probable for bikes? Not living in NYC, I expected pretty different paths. Say the "Hudson River Greenway" or whatever that's called.
+1 to this comment! I used to work in this space and have similarly seen many projects and professional attempts at visualizing this kind of trip data.
this is really nice. One request: when searching for a station name, let me type "and" instead of "&" e.g. typing "E 47th St and 2 Ave" would still return "E 47th & 2 Ave".
They show a bike at a location, if it's rented it will disappear off the map, if it's "returned" (available to hire again) it will show back up on the map, but at a different location.
So "represents one real bike ride" is... I guess a lawyer would say technically true.
I was recording similar location data of a Car2Go-like service for a year or two some years ago, I realize considering they charge rentals by the minute, I could estimate how much they earn by analyzing how long the cars disappear for.
Is MapLibre GL a cheaper (free?) open source alternative?
Cool stuff btw. I’m trying to visualize weather model data myself (millions of points) at https://futureradar.net and have been researching client-side techniques like yours.
It's often interesting to observe the different ways that privacy is approached in the US and Europe.
In Europe we often accept pretty grave restrictions of our liberty like the UK's Online Safety Act, which would never fly in the US, and we do so without much public comment.
On the other side of things, organisations in the US happily expose datasets like this one, which would give a most EU Data Protection Officers a heart attack, and nobody bats an eyelid.
In Lyft's defense, they are providing it anonymized under the NYCBS Data Use Policy. They also aren't providing the exact GPS routes, which is why OSRM is used to calculate the shortest path instead.
* Limitations *
The data only contains the start and end station for each trip, but does not contain the full path. Route geometries are computed for each (start station, end station) pair using the shortest path from OSRM.
This means that the computed routes are directionally correct but inexact. Trips that start and end at the same station are filtered out since the route geometry is ambiguous.
It also would be interesting to learn how many rides had been excluded altogether, just to put things into perspective.
The link above points to a 404 error page on GitHub. Looks like you forgot the hyphen in the name part of the url.
I’m working with subway data, particularly the A subway line, 32 mi long with about 2million trips over 6 months across 66 stations. Trying to train a convlstm to learn the spatiotemporal propagation of train headways.
https://www.reddit.com/r/MicromobilityNYC/comments/v457x0/9_...
Also, the 5 e-bikes probably didn't need "service", they were just waiting for battery swaps. This is by design. The docks don't charge them.
CitiBike maintenance is generally fine. They're not leaving any significant number of broken bikes or docks. I think you may have just misunderstood how it works.
Cool visualization.
Do you find the OSRM shortest path routes probable for bikes? Not living in NYC, I expected pretty different paths. Say the "Hudson River Greenway" or whatever that's called.
This is beautifully done!
So "represents one real bike ride" is... I guess a lawyer would say technically true.
I was recording similar location data of a Car2Go-like service for a year or two some years ago, I realize considering they charge rentals by the minute, I could estimate how much they earn by analyzing how long the cars disappear for.
Cool stuff btw. I’m trying to visualize weather model data myself (millions of points) at https://futureradar.net and have been researching client-side techniques like yours.
In Europe we often accept pretty grave restrictions of our liberty like the UK's Online Safety Act, which would never fly in the US, and we do so without much public comment.
On the other side of things, organisations in the US happily expose datasets like this one, which would give a most EU Data Protection Officers a heart attack, and nobody bats an eyelid.
I've heard that releasing these sorts of data sets help competitors do market research, and thus mitigates "winner takes all" forces. NYC also tends to be fairly pro-public-datasets: https://data.cityofnewyork.us/browse?%3BsortBy=most_accessed...