This is the final of three posts on use of AI in my entrepreneurship/economic development research class in Spring 2024. Post one summarized how I added AI to my syllabus and assignments. Post two covered my lesson plans for “AI week”. (Related: last semester, I dug into using AI for demographic and consumer spending research.)
This final post covers lessons learned. It builds upon a lightning round I presented on Monday at a day-long AI institute organized by our Teaching and Learning Commons. If you prefer slides over a blog post, see below. (The sides also briefly summarized how I incorporated AI into my class, covered in detail within posts one and two).
Thank you to my students. I learned much from them. They will do most of the talking below. Well, the AIs will have a lot to say too. All bolded text is my emphasis.
Questions & assumptions:
I had three big questions: Can the popular AI tools…
- Access numeric datasets? (ex. data.census.gov)
- Utilize static tabular data? (ex. a CEX table)
- Access paywalled content? (like a business intelligence subscription database)
My assumptions on the answers:
- No
- Maybe
- No way
Which AIs?
Since I told the students I wouldn’t require them to pay for any AI, we focused on popular and free tools: Copilot, Gemini, Chat GPT 3.5, and Perplexity. When I refer to “AI” in this post, I’m writing about these tools in the aggregate.
Although I used DataCommons last semester, I rarely recommend it to my students this spring since I can’t find much evidence that Google has continued to develop this tool.
Many of you are aware of emerging, specialized AI for data. We’ll see how those develop. So far, most seem to focus on investment data. That’s not the focus of my class.
How did AI perform?
Based on student work, here is my quick summary:
- Yup, AI can’t access datasets and paywalled data. “I can’t do that for you” was a consise AI response we encountered.
- As expected, AI tends to use statistics found in narrative sources: press releases (including from Census Bureau and BLS), newspaper articles, and free reports (including from governmental agencies, market research report publishers, and trade associations).
- The AI tools can access Census Quick Facts tables.
- But AI can’t seem to handle long static tables, like the BLS Consumer Expenditure Survey tables in PDF and XLS format.
- However, CoPilot can understand big tables opened in Edge:
- You can ask the AI to only use information from PDF and XLS files, just as you can limit Google searches to those file types.
Best practices, according to the students
- Even though AI can’t access datasets and business intelligence subscriptions, “you often get the best data from the datasets and the paywalled research tools”.
- Yet AI can be useful as a discovery tool for alternative sources, usually pages with data extracted from datasets or paywalled content (examples below).
- Use power words (smart words) and be as detailed as possible in your prompting. Here is one example prompt from a student who illustrated this point:
“What are the start-up costs for an animal hospital in the United States? Act as a business analyst. Consider the different types of animal hospitals, such as small clinics or large hospitals with in-house labs and surgeries, and the geographical location, such as urban or rural areas. Include expenses for equipment, rent, labor and training costs, and costs for obtaining licenses and permits necessary for running an animal hospital. Provide a detailed cost analysis with considerations of all of these variables. If possible, include results from the Economic Census.”
More student suggestions
- “From the experience in this exercise, the key to using AI for data research is to be as specific as possible. Adding keywords that specify subtopics, geography, and even the specific source, yields better results.”
- “My suggestion is to be as specific as possible and to be okay with digging deeper if necessary. Be prepared with several ways to ask for the same thing.”
- “For quality research, you need to know and include the best sources/datasets in your prompts”
- “I find that it is very helpful to have an existing knowledge base on the topic about which one is inquiring but it seems possible to use AI to educate yourself about an overview of the topic if necessary.” [So AI as a discovery and learning tool and then maybe you can write good prompts to find data.]
- “AI can provide useful brainstorming support and can dramatically reduce the time it takes to create an initial framework for doing data research but it does not replace doing the actual research and understanding the nuances of how various data interconnects to create a full picture.”
- “While AI is useful as a “search” engine, it is not a “decision” engine. AI is good at finding resources, but it cannot critically evaluate the scholastic values of the sources it finds. As such, AI can be a useful tool that complements research, but it cannot replace the role of critical human evaluation.”
- “Ask for the resources to be provided in a concise table with links.”
Examples of prompt engineering
During our “AI week” in class, we discussed this research question:
“I was searching for industry information of meal kit/delivery (e.g., Blue Apron, HelloFresh) in the U.S. But I was not able to locate anything. Could you kindly help me?”
The students brainstormed a list of topics to include in a prompt, reflecting what they had learned in the class already:
- Market size, demand, demographics/ psychographics
- Top competitors and market share
- Financial information/benchmarking
- What geography to focus on?
- Sustainability considerations?
For another scenario, we researched the market for athletic/performance socks. The AI results weren’t very good, so this student kept pushing it to do better. (We got some info about the percentage of people who like to wear socks while sleeping – bizarrely off topic.) Notice the progression in these prompts:
- “What marketing strategies work best for the performance sock industry?”
- “What marketing strategies work best for the performance sock industry? Focus on the United States market. Only use information that focuses on running or performance socks. Include some industry or market statistics. Respond as if you are a market research analyst.”
- “I don’t want information about socks that aren’t performance socks. I only want information from running-specific or sock industry sources.”
For another topic, one student was very clear about how she wanted the sources formatted:
- First prompt: “Act as a venture capitalist and list what statistical industry data is relevant to deciding whether to invest in a national chain of animal hospitals”
- ChatGPT responded with a thick paragraph.
- Second prompt: “Create a list of what data sources to use for each item in the list above”
- ChatGPT added the requested information.
- Third prompt: “List recent industry statistical data for each item. Pull statistics from the sources listed where possible”
- And now she had a very useful list to use.
Honesty from the AIs
Refreshing responses?
- First prompt: “What is the profit margin for brewery startups in North Carolina?”
- A long answer was provided with some statistics.
- Second Prompt: “What are your sources for the above answer?”
- Response: “My apologies, but I don’t have direct access to external sources or the internet to provide specific citations. The information I shared is based on my existing knowledge up to a certain point in time (around 2021) and general industry trends. For more accurate and up-to-date data, I recommend consulting reliable sources such as industry reports, financial analyses, or expert advice. If you’re considering starting a brewery, conducting thorough research and seeking professional guidance would be prudent.”
While Copilot provides links to its sources, Gemini does not unless you ask it too. There’s also been plenty of analysis of Gemini’s tendencies to lie (or hallucinate, but that’s sugar coated language). Therefore I was pleased to see this response from Gemini:
“Remember, thorough market research is crucial. Visit these potential locations, talk to local residents and existing businesses, and assess the overall vibe to understand if it aligns with your vision and target audience. By combining information on general trends, local specifics, and your unique concept, you can make an informed decision about the best location for your used bookstore in North Carolina.”
So while AI’s data retrieval skills can be poor, they can be useful for explaining research strategies, both primary and secondary.
Another example: a student asked Gemini “What are the financial benchmarking for used bookstores in Greensboro, North Carolina”?
Response: “Unfortunately, obtaining specific financial benchmarks for independent used bookstores in Greensboro, NC, is challenging due to several factors:
- Privacy of Financial Information: Most independent businesses, including bookstores, are not obligated to publicly disclose their financial information.
- Limited Industry Data: Industry reports and data for used bookstores are often broad and may not provide specific details for a particular city or region.
However, you can explore alternative resources to gain insights into the financial aspects of operating a used bookstore in Greensboro”
Two more quotes, both from ChatGPT:
“As of my last update in January 2022, I don’t have access to real-time or region-specific data, including the profit margins for brewery startups in North Carolina. Profit margins for brewery startups can vary widely depending on various factors such as location, size of the brewery, operational efficiency, market demand, competition, and more. To get accurate and up-to-date information on profit margins for brewery startups in North Carolina, you may need to consult industry reports, market research, or reach out to local brewery associations or industry experts who may have access to such data. Additionally, speaking directly with brewery owners in North Carolina could provide valuable insights into the financial aspects of running a brewery in the region.”
“As of my last update, the U.S. Census Bureau’s American Community Survey (ACS) provides data on median gross rent for San Francisco. However, please note that this data might not reflect the most current rental prices, as the ACS is conducted over a period of time and the data is released periodically. According to the ACS data for 2019, the median gross rent in San Francisco was approximately $1,994 per month. However, it’s important to recognize that this figure represents the median for all rental units in San Francisco and may not capture the full range of rental prices across different neighborhoods and housing types. For the most up-to-date and detailed rental price information, I recommend consulting real estate websites, rental listing platforms, or contacting local property management companies in San Francisco. These sources often provide more current and specific data on rental prices in different neighborhoods and for various types of rental properties.”
Regarding that final response: as I reported in my first post, the students would be using both data.census.gov and AI to research housing data. (I had the students compare the AI results to ACS DP04, “Housing Characteristics”, which the AI can’t access being within the data.census.gov interface, but ChatGPT found some ACS data, perhaps from the Quick Facts table.) One benefit of using AI was getting Zillow and other real estate market data. Zillow of course is a different dataset with a different purpose and methodology. ChatGPT was certainly on target on using different data sources to analyze any housing market. Final student quote: ““Overall, I really liked that Bing [Copilot] was able to deliver on more current market trend data, which only further supports the findings from the ACS tables.”
Leave a comment