Introduction
We were interested in how user patterns might vary between Omaha and Lincoln. Omaha is a larger city (metro population of about one million people), whereas Lincoln is about a quarter of the size and a college town. To compare users, we first filtered out admin and other invalid trips. Most of the available data pertains to time, so we made maximum use of the timestamps. We considered if a person made a trip in the winter (defined as December, January, February, and March) as a measure of their being an all-season cyclist. We also considered if they biked at night. Nightfall is a bit of a tricky variable. Fortunately, the Python suntime
package can give us the sunrise and sunset time on any day, adjusting for latitude and daylight saving time. We consider whether a trip is one-way, meaning it returns to the same station at which it started its trip. This variable provides an indication of if the trip was utilitarian or recreational. Finally, we consider the trip count and average duration of trips by user.
K-Means Clustering for Four Clusters
Describing The Clusters
With four clusters and the above descriptive statistics, we can define the clusters as follows: Cluster 0 (Local infrequency): These are occassional users who do not use the system during the winter. They may use it during the evening. Cluster 1 (Tourist): These are occassional users who do not use the system during the winter. They may use it during the evening. They differ from Cluster 0 in that they are more likely to make one-way trips and make slightly fewer trips. Cluster 2 (Frequent): These are the most frequent users of the system. They make many trips and are likely to make a trip during the winter months. Clsuter 3 (Frequent social): These are frequent users of the system. They are slightly more likely to make one-way and night trips than Cluster 2 users.
Statistics for Cluster 0 (Local infrequency)
count |
42836 |
42836 |
42836 |
42836 |
42836 |
mean |
5.89523 |
0.0884465 |
0.25797 |
0.532147 |
24.2404 |
std |
15.7375 |
0.266521 |
0.403309 |
0.465253 |
9.93219 |
min |
1 |
0 |
0 |
0 |
2 |
25% |
1 |
0 |
0 |
0 |
16.5 |
50% |
2 |
0 |
0 |
0.555556 |
25.5 |
75% |
4 |
0 |
0.5 |
1 |
32.5 |
max |
185 |
1 |
1 |
1 |
49.2203 |
Statistics for Cluster 1 (Tourist)
count |
35900 |
35900 |
35900 |
35900 |
35900 |
mean |
2.47889 |
0.0779591 |
0.20361 |
0.76414 |
55.7638 |
std |
2.97054 |
0.255467 |
0.385383 |
0.400037 |
12.9453 |
min |
1 |
0 |
0 |
0 |
40 |
25% |
1 |
0 |
0 |
0.583333 |
46 |
50% |
2 |
0 |
0 |
1 |
52.3333 |
75% |
3 |
0 |
0 |
1 |
62 |
max |
122 |
1 |
1 |
1 |
96 |
Statistics for Cluster 2 (Frequent)
count |
22 |
22 |
22 |
22 |
22 |
mean |
1820.91 |
0.23552 |
0.159872 |
0.0988543 |
13.6594 |
std |
996.699 |
0.0663609 |
0.110981 |
0.132088 |
8.6342 |
min |
1111 |
0.0878261 |
0.0483019 |
0.00126984 |
3.53841 |
25% |
1307.75 |
0.205502 |
0.0686736 |
0.00993912 |
6.66342 |
50% |
1454 |
0.244916 |
0.111485 |
0.0573053 |
11.3514 |
75% |
1855.25 |
0.284529 |
0.257867 |
0.12855 |
18.3469 |
max |
5447 |
0.332565 |
0.404865 |
0.561656 |
33.7786 |
Statistics for Cluster 3 (Frequent social)
count |
347 |
347 |
347 |
347 |
347 |
mean |
365.585 |
0.156223 |
0.19016 |
0.103283 |
12.6812 |
std |
182.188 |
0.126897 |
0.130693 |
0.160661 |
8.8064 |
min |
186 |
0 |
0 |
0 |
3.68841 |
25% |
230 |
0.0556363 |
0.0815076 |
0.0195956 |
6.7967 |
50% |
304 |
0.13215 |
0.182979 |
0.0501882 |
9.26971 |
75% |
442.5 |
0.229686 |
0.270637 |
0.113309 |
15.2136 |
max |
1029 |
0.720096 |
0.669565 |
1 |
49.4857 |
Potential Additional Dimensions to Explore:
- Use date to get weekday vs. weekend
- Travel speed (duration seems a bit inaccurate for some trips, so may be hard to have confidence, and we do not have route distance)