An official website of the U.S. government
Data Lab Logo of an abstract American flag referencing a bar chart
Data Lab Logo of an abstract American flag referencing a bar chart

What is the Student Innovator’s Toolbox?

The Student Innovator’s Toolbox was designed to connect students and faculty with data from the U.S. Treasury. The purpose is to complement classroom learning about the federal budget, data science, policy analysis as well as facilitation of analysis and research.

The intended audience for this toolbox includes professors and instructors, directors of graduate and undergraduate programs, librarians dealing with data analysis and visualization, and students looking for a real-world project to flex their skills and develop data-driven solutions.


We offer three main ways for students and faculty to connect with us. We will keep this list dynamic and growing so let us know if you have suggestions for other types of engagement that better fits your needs.

Use the data to learn about something else

Quantitative courses at every level require datasets for classroom lessons, homework, problem sets, exams, and more. Courses we believe are a good fit for our data include those that focus on probability, statistics, econometrics, economics, federal budgeting, and data science.

There are two high-level options of datasets: (1) a prepared, clean dataset and (2) a real-world, raw dataset. Below we explain what we mean by clean and raw, as well as the circumstances that call for each. Both options involve students using Python, R, Stata, SAS, Excel, or some other software to analyze (or clean then analyze) the data.

Clean dataset

A prepared and clean dataset is easier to use out of the gate because it is smaller and requires no data cleaning or data wrangling on the student’s part. That is, the data is in the desired format/shape to begin the desired analysis immediately. There are no missing or duplicate values, and the dataset contains a limited number of fields/columns. Prepared and clean datasets are best for teaching other concepts (e.g. regression analysis) when and you don’t want to overwhelm the student with the pre-cleaning steps. They’re also helpful for analysis and data visualization projects.

Raw dataset

In more advanced classes, the goal is to help the student understand how data will actually look when they first get it. This may require teaching the student how to handle missing values or de-duplicate records. Often original owners of data have it structured in a way to serve their own business purposes, so the student may be required to change the shape of the data entirely in a processing stage before they can begin any analysis. These types of data sets are useful for students learning data wrangling, data cleaning techniques; those learning more “real-world” data handling skills in order to prep datasets before they apply statistical techniques, or prior to learning different languages typical to data science, e.g. R or Python.

Use the data to learn about federal budgeting and/or federal spending

While the engagement above focuses on learning skills related to data and analysis, the courses mentioned are often agnostic to the actual content of the dataset. A second option is to use USAspending data to specifically focus its content for courses touching on federal budgeting or federal spending processes. Programs interested in federal government data may include both undergraduate and graduate levels for finance, accounting, public administration, public policy, or business schools.

Similar to above, this engagement includes two options: a prepared, clean dataset or “real-world” raw dataset that requires cleaning and preparation prior to consumption.

Option 1) Clean dataset

See Engagement 1, Option 1

Option 2) Raw dataset

See Engagement 2, Option 2

Use the data for a consulting or capstone project (given a problem statement or question)

In this engagement, a student or student team uses data from to identify a real-world question to answer. Using the principles of user-centered design as outlined in the Resources section, the student team identifies a “client”, and the student or team acts as analysts or consultants. The student team conducts initial interviews with the client to determine how data can help answer a question or solve a problem, and the student team then researches, performs analysis, and provides recommendations in order to answer the question or advise a particular course of action. While this type of structure is common for public policy students, it can easily be applied to a wide variety of settings ranging from business to data science students, or even an interdisciplinary team where a “real-world” project is the goal (e.g. student associations, clubs).

The student team can decide, in consultation with their “client”, to produce a product or visualization instead of an analysis or recommendation. The focus would be less about gathering data to answer a specific question and providing a recommendation, and more about creating a product, tool, visualization for the Data Lab’s website and portfolio. Students with a wide range of backgrounds including, quantitative analysis, graphic design, UX, UI, design, data visualization, data science, computer science, and more can contribute to this type of engagement. Examples of software/languages used for executing tasks or analysis in this engagement are: Tableau, R and R Shiny, Python, Stata, JavaScript, HTML, CSS, D3, and React.


The roles, responsibilities, and academic and time expectations vary for each type of engagement. We sketch these out below, but they are not set in stone. They are completely open to discussion to ensure both the institution’s and the Data Lab team's needs are being fulfilled.

Using the data to learn (e.g. statistics or budgeting)

Roles & Responsibilities

The Data Lab team works with a professor to develop a useful dataset (clean or raw dataset) for a specific course. We will work with the professor to answer any questions and ensure that he/she is comfortable with the content and data structure in order to incorporate the dataset into the course. These interactions can be via email, phone, screen shares, or video conferences—whatever works best.

Academic & Time Expectations

We provide professors datasets and assistance through Engagements 1 and 2 as an upfront, pre-semester activity to allow enough time for professors to ask questions, understand the data, and incorporate the dataset into their curriculum. The Data Lab team is happy to field questions as they arise but, for the most part, the time or interactions required are mainly prior to a semester in order to get the right dataset for the course prepped, explained, and incorporated into the syllabus. We consider this a low level of commitment for both the professors and our team.

Use the data for a consulting or capstone project

Roles & Responsibilities

The roles/responsibilities and time expectations are similar to above, with the expectation of slightly less “hand-holding” in this engagement. This engagement may be useful for long-term, and student-driven projects, such as an undergraduate thesis, master’s capstone project, or even a PhD thesis.

We are happy to meet with the students to get them started, but we will not offer a problem statement or task to execute. The students will explore the data themselves in order to decide on a problem they want to explore and answer, or a visualization/tool they want to build. The onus is on the student for the thought leadership and the drive to push the project along. In this engagement, there may likely be an undergraduate or graduate advisor assisting the student. We will work with the professor, advisor, etc. to establish what role our team will play in any evaluation (e.g. providing grades, answering surveys, etc.). However, all this is up for negotiation depending on your institution’s needs!

Academic & Time Expectations

These types of real world projects typically span a full semester and the students will likely be in a course or club that provides expectations and support during this time period. Students may have academic requirements for the course that are not necessarily required by the Data Lab team (e.g. course readings, literature reviews, etc.). Students are expected to put in the regular number of hours required for one semester-long course, including classes, meetings and work time and to provide a final product. We will be available for questions and regular communication and expect progress reports on a regular basis (e.g., once every two weeks). Lastly, the team will attend a presentation or demonstration of the final product.


Use the data to learn about something else


Analyst Guide to Federal Spending Data

The possible projects using the data could range from written analysis and recommendations to creating visualizations and a web-based tool to allow others to explore the data in a new way. Perhaps the final product answers a specific question or allows others with subject matter expertise to discover insights into the data.

While the final deliverable can range quite a bit, we expect three main components. First, the final product itself. Second, all code used to produce the final product. Third, a presentation on the process or journey the student/team took to arrive at their final conclusion or creation. We expect all three components to be made publicly available and potentially published to our main Data Lab website. Ideally, the student, school, or library will also be able to host the final products so that they do not disappear when the student moves on. We could also link to these permanent locations on the Data Lab website. Finally, the school is welcome to publish and promote the final products as a way to tout the value of real-world partnerships.

In summary, the three components of final products are:

  • Product/application/analysis (hosted on a school server and/or published to our site)
  • Code
  • Presentation on process

User-Centered Design Principles

While each user-centered design (UCD) process can look different for every team, there are four main stages to every UCD approach.

  • Observe and learn about the people for whom you are solving a problem.
  • Ideate or generate ideas about potential options that could address the challenge your target audience faces.
  • Create or make a prototype.
  • Check or test the chosen prototype to see if it actually solves the problem people are facing.

In reality, however, UCD is an incredibly iterative process—the stages do not flow linearly from one to the next. You may begin brainstorming and realize you need more information, returning your team back to the observation and learning stage. You may build a prototype, test it, and find out it doesn’t quite work, sending you back to the ideation phase.

The ultimate purpose of UCD when tackling any challenge is to keep your target audience first and foremost to any solution – how would they naturally navigate a situation or interact with something? – and build for them. The prototype should cater to the people facing the challenge, not the other way around; if you build a grand solution that no one will use, you don’t have a solution at all.

The benefits of employing a user-centered design approach are numerous. The solution is much more likely to be effective and adopted by the target audience when it was built from the ground up with them in mind and tested throughout multiple phases. UCD can also reduce risk by allowing for quick, iterative stages that are nimble enough to make changes as new information is acquired and incorporated. Surprises are limited in the final release of the solution given the rounds of testing of the various prototypes throughout the lifecycle of the project. Once an idea moves into a higher-fidelity phase (e.g., a mock up that looks exactly how the phone app will be developed), resources (time, money, etc.) can be spent more confidently and wisely, knowing that quirks have been ironed out and unknown issues have been discovered and addressed earlier (e.g., with paper prototypes).

We encourage teams that we partner with to adopt this approach. Once a team identifies the target audience or who may be impacted by the project, we want students to go out and observe or interview people to gather and test the initial data. Later, once ideas have been generated and a prototype has been created, students should test the concept before switching fully into implementation mode. How would they use the project? What do they expect each thing to do? Now that they have something to react to, are there other issues they foresee? Incorporate the users’ feedback into any final solution or prototype.

Tips to finding your client

In a client-based project, students analyze a policy problem, a challenge or missed opportunity and – with faculty guidance – develop recommendations to address it. Hence, a client with an interesting problem is key to producing a successful product or analysis.

But finding a client and a policy problem can be challenging. Here are a few tips to begin this essential first step of your product or analysis:

Follow your interests. Connect with an organization working on problems you’re interested in. For example, a local homeless service organization might be an appropriate client for students interested in preventing and ending homelessness or alleviating poverty. Learn about the organization’s goals and current initiatives. Then, work with the client to define a suitable and manageable policy problem or analysis that can provide value or insights to them.

Leverage your network. Reach out to your alumni and faculty networks and get an introduction to the problems and challenges their respective organizations are facing. Then, conduct initial research and identify an opportunity for a client organization.

Think local. Consider organizations/agencies you’re familiar with. These can be your high school, your city council, nonprofits in your community, or other small, public organizations where you may have a connection. Most of these organizations would be happy to have students completing a master’s degree to help increase understanding of the distribution of federal funds and their impact.

Develop your subject matter niche. Take advantage of your student status and contact professionals on social media, via email or phone! Consider policy issues or topics in your field of study and share interesting insights or ideas. Next, meet with them and ask if they have a project that you can conduct for them to complete your master’s degree. You should bring some examples of project ideas that you have, too.

Client-Based Project Opportunities

You + Data = Impact. By utilizing data, you can unlock valuable insights that empower agencies, nonprofit, and public organizations to make evidence-based decisions that positively impact our communities. Stay tuned for unique opportunities to get involved and make a difference.

Look for additional ideas and incentives in challenge and prize competitions within government, universities or data science organizations.

Tech Meetups

Join professionals, students and others in your local area to share ideas or build a team.

Data Sets Downloadable Data Sets: Are you interested in creating a data visualization to answer questions like “How much does the federal government spend to combat the opioid crisis?”, or to learn how much is spent on border security, housing, or violence against women? Check out our database to get started!

Clean Data Sets (Option 1): For analysis and data visualization projects - Coming soon

Raw Data Sets (Option 2): For data cleaning projects - Coming soon


The DATA Act Capstone Project

Building an open source tool to make federal spending data more accessible and easy to navigate.

Syracuse University | Maxwell School of Citizenship and Public Affairs

Master's Project: Unspent Funds Across Federal Agencies

Analysis and policy recommendations to address unspent money in federal agencies.

Duke University | Sanford School of Public Policy

More Resources