Project description

Timeline

Report 1 due Mon, Oct 02 - Submit on Canvas

Report 2 due Mon, Oct 23 - Submit on Canvas

Report 3 due Mon, Nov 20 - Submit on Canvas

Final report due Wed, Dec 08 - Submit on Canvas

Introduction

TL;DR: Build a 4-step travel model for a US metropolitan region. That is your term project.

The project task is for you to use aggregate and disaggregate transportation planning methods to develop a 4-step travel demand model for a US metropolitan area of your choice. The goal of this project is for you to demonstrate proficiency in the techniques we have covered in this class (and beyond, if you like!) and apply them to a real-world dataset to analyze it in a meaningful way.

All analyses must be done in Python or R, and all components of the project must be reproducible.

Logistics

This assignment will be performed in groups of 3-4 students. Groups will be chosen by the instructor to include a mix of programming skill levels and to match other metrics.

The primary deliverables for the project are a series of written, reproducible reports detailing your analysis.

Group Project Reports

You will complete a sequence of group reports over the semester. You will complete the term project in the interactive programming environment Jupyter notebooks using a Visual Studio (VS) codespace. The core code is provided to you, but you MUST make modifications to the models and provide descriptive text, as outlined in the below subsections.

Requirements and recommendations for reports 1, 2, and 3

  • You should submit a single *.ipynb file with the name “G[01,02,03]-NAMES.ipynb”. Please replace “NAMES” with the names of your team members.
  • The submitted notebook file should have a professional format and include sections headers (using #, ##, ###, etc. in markdown).
  • Figures should be included as deemed necessary by the group. Tutorials are provided in the VS codespace on figure generation.
  • You will have to perform some additional data filtering and manipulation tasks. Tutorials are provided in the VS codespace on how to conduct these tasks. Additional documentation is available on the Pandas webpage.
  • The course instructor and TA are available to answer coding and modeling questions. Please make maximal use of office hours.
  • This is a group project, so you should make use of your team members. Some students may be stronger in coding, while others may have a firmer grasp of the statistics. You are encouraged to make use of these skills and learn from one another.
  • You may want to split into sub-teams to estimate models, then come together to compare your results. Such operational details are ultimately up to the individual groups. However, all group members must review the submission in its entirety and approve the results.

General file structure

Introduction and data

This section includes an introduction to the project tasks and data. Describe the data and definitions of key variables. It should also include some exploratory data analysis. All of the EDA won’t fit in the paper, so focus on the EDA for the response variable and a few other interesting variables and relationships.

Methodology

This section includes a brief description of your modeling process. Explain the reasoning for the type of model you’re fitting, predictor variables considered for the model, including any interactions. Additionally, show how you arrived at the final model by describing the model selection process, any variable transformations (if needed), and any other relevant considerations that were part of the model fitting process.

Results

In this section, you will output the final model and include a brief discussion of the model assumptions, diagnostics, and any relevant model fit statistics.

This section also includes initial interpretations and conclusions drawn from the model.

Additional resources in VS codespace

  • Codebook_v1.2.xlsx: This file details the variable definitions for the NHTS dataset.
  • NHTS2017_UsersGuide_04232019_1.pdf: This file details the process for conducting the NHTS and how to use the data.
  • Python tutorials: Additional tutorial files are provided detailing how to use various Pandas features, how to generate charts, and many other useful features. The data to run these tutorials is available in the VS codespace. You are encouraged to work through these tutorials and use them to enhance your assignment submissions.

Report 1: Trip Generation

  1. Develop a regression models of non-work trip generation. The model must be a household-based model, in which the dependent variable is the total number of daily non-work trips generated by a given household, and the independent variables are relevant household characteristics. In each case, some experimentation will likely be required to find the “best” model that can be obtained given the available data and the model type chosen. Present and discuss the results of your analysis in terms of the statistical significance of the parameter estimates, the goodness-of-fit of the model, and the rationale for the variables selected for inclusion in the model.
  2. Similarly develop a household-based model of daily work trip generation. Compare your results with those obtained for the household-based non-work trip model with respect to explanatory variables used, parameter statistical significance, and model goodness-of-fit. Discuss likely reasons for these differences.
  3. Develop a cross-classification model of work trip generation using household size (i.e, number of persons in the household) and number of vehicles as the explanatory variables. Justify your selection of category definitions for each explanatory variable. Discuss your results in terms of their implications for how non-work trip generation varies with household size and number of autos, as well as in terms of the reliability of your results given the obtained cell frequencies. Also compare this model with the regression model developed in question 2. Which would you prefer to use for forecasting and why?

Report 2: Trip Distribution

  1. Run the various trip distribution modules contained in the provided Jupyter notebook for your CBSA.
  2. Run the various goodness of fit statistics for your dataset. As written, the code will run the statistics for the column/row IPF output. You may find that the statistics are quite poor. Why do you think this is the case? If you receive strong statistical fits, comment on this result.
  3. Create a new section that duplicates the goodness of fit code but change the predicted trip table to use the output from the entropy maximization algorithm. Compare the results with those obtained in question 2. Do the results differ? If so, why would you obtain different results? Which approach would you recommend?
  4. You were given the impedance function \(0.45e^(-1*t_ij )\). Does this impedance function appear to accurately represent the impendence caused by travel time (i.e., as measured by GoF statistics). Test an alternative impedance function and compare the results. Your impedance function does not need to improve on the one given in the assignment.

Report 3: Mode Choice

  1. Develop a multinomial logit (MNL) mode choice model for home-based non-work trips. Some experimentation will likely be required to find the “best” model that can be obtained given the available data and the model type chosen. Present and discuss the results of your analysis in terms of the statistical significance of the parameter estimates, the goodness-of-fit of the model, and the rationale for the variables selected for inclusion in the model. Your discussion should also include consideration of elasticity estimates.
  2. Similarly develop an MNL mode choice model for home-based work trips. Compare your results with those obtained for the home-based non-work trip model with respect to explanatory variables used, parameter statistical significance, and model goodness-of-fit. Discuss likely reasons for these differences (if any). Your discussion should also include consideration of elasticity estimates.

Final report

The final report is a culmination of your work as a group this semester. The final report should incorporate elements from other topics (e.g., land use and transit planning, equity, etc.) covered this semester. The report this semester will take the form of an informal discussion in class. Your group should talk about a few interesting transportation-relevant topics you learned about your chosen region. I will ask 1-2 follow up questions. The final report will be assessed as meets expectations or does not meet expectations.

Possible discussion points:

  • What transportation projects are happening in the area?
  • What are the major transportation challenges faced in the area?
  • How are the problems in your study area similar/different from those in Lincoln/Omaha?

Report marking criteria {report-marking-criteria}

The written reports are each worth 40 points, broken down as follows

Total 40 pts
Introduction/data 6 pts
Methodology 10 pts
Results 14 pts
Discussion + conclusion 6 pts
Organization + formatting 4 pts

Click here for a PDF of the written report rubric.

Introduction and data

This section includes an introduction to the project motivation, data, and research questions. Describe the data and definitions of key variables. It should also include some exploratory data analysis. All of the EDA won’t fit in the paper, so focus on the EDA for the response variable and a few other interesting variables and relationships.

Grading criteria

The research question and motivation are clearly stated in the introduction, including citations for the data source and any external research. The data are clearly described, including a description about how the data were originally collected and a concise definition of the variables relevant to understanding the report. The data cleaning process is clearly described, including any decisions made in the process (e.g., creating new variables, removing observations, etc.) The explanatory data analysis helps the reader better understand the observations in the data along with interesting and relevant relationships between the variables. It incorporates appropriate visualizations and summary statistics.

Methodology

This section includes a brief description of your modeling process. Explain the reasoning for the type of model you’re fitting, predictor variables considered for the model including any interactions. Additionally, show how you arrived at the final model by describing the model selection process, interactions considered, variable transformations (if needed), assessment of conditions and diagnostics, and any other relevant considerations that were part of the model fitting process.

Grading criteria

The analysis steps are appropriate for the data and research question. The group used a thorough and careful approach to select the final models; the approach is clearly described in the report. The model conditions and diagnostics are thoroughly and accurately assessed for their model. If violations of model conditions are still present, there was a reasonable attempt to address the violations based on the course content.

Results

This is where you will output the final model with any relevant model fit statistics.

Describe the key results from the model. The goal is not to interpret every single variable in the model but rather to show that you are proficient in using the model output to address the research questions, using the interpretations to support your conclusions. Focus on the variables that help you answer the research question and that provide relevant context for the reader.

Grading criteria

The model fit is clearly assessed, and interesting findings from the model are clearly described. Interpretations of model coefficients are used to support the key findings and conclusions, rather than merely listing the interpretation of every model coefficient. If the primary modeling objective is prediction, the model’s predictive power is thoroughly assessed.

Discussion + Conclusion

In this section you’ll include a summary of what you have learned about your study area from statistical analysis and literature review. In addition, discuss the limitations of your analysis and provide suggestions on ways the analysis could be improved. Any potential issues pertaining to the reliability and validity of your data and appropriateness of the statistical analysis should also be discussed here.

Grading criteria

Overall conclusions from analysis are clearly described, and the model results are put into the larger context of the subject matter and original research question. There is thoughtful consideration of potential limitations of the data and/or analysis, and ideas for future work are clearly described.

Organization + formatting

This is an assessment of the overall presentation and formatting of the written reports/Jupyter notebooks.

Grading criteria

The report neatly written and organized with clear section headers and appropriately sized figures with informative labels. Numerical results are displayed with a reasonable number of digits, and all visualizations are neatly formatted. All citations and links are properly formatted. If there is an appendix, it is reasonably organized and easy for the reader to find relevant information. The main body of the written report (not including the appendix) is no longer than 15 pages.

Points for reproducibility + organization will be based on the reproducibility of the written reports/Jupyter notebooks. The Jupyter notebooks should be neatly organized and interpretable to a knowledgeable reader.

Overall grading

The grade breakdown is as follows:

Total 35 %
Report 1: Trip Generation 10 %
Report 2: Trip Distribution 10 %
Report 3: Mode Choice 10 %
Final report 5 %

Note: No late project reports are accepted.

Grading summary

Grading of the project will take into account the following:

  • Content - What is the quality of the analysis?
  • Correctness - Are statistical procedures carried out and explained correctly?
  • Writing and Presentation - What is the quality of the statistical presentation, writing, and explanations?
  • Creativity and Critical Thought - Is the project carefully thought out? Does it appear that time and effort went into the coordination and implementation of the project?

A general breakdown of scoring is as follows:

  • 90%-100%: Outstanding effort. Students understand how to apply all statistical concepts, can put the results into a cogent argument, can identify weaknesses in the argument, and can clearly communicate the results to others.
  • 80%-89%: Good effort. Students understand most of the concepts, put together an adequate argument and communicate most results clearly to others.
  • 70%-79%: Passing effort. Students have misunderstanding of concepts in several areas, have some trouble putting results together in a cogent argument, and communication of results is sometimes unclear.
  • 60%-69%: Struggling effort. Students are making some effort, but have misunderstanding of many concepts and are unable to put together a cogent argument. Communication of results is unclear.
  • Below 60%: Students are not making a sufficient effort.

Late work policy

There is no late work accepted on this project. Be sure to turn in your work early to avoid any technological mishaps.