Madison startup leads the way for web scraping

Curate Solutions looks for data through technology

The difference between professional “web scraping” and conducting a Google search is the difference between a photo exhibit in an art museum and all of the photos stored on all of the world’s iPhones and hard drives: It’s a matter of curation.

Web scraping or web harvesting is a way of extracting data from websites and weeding out what’s not needed. It can be done manually, but the term implies the use of an automated process that copies data from the web and puts it into a spreadsheet or database.

Web scraping is done extensively by the e-commerce, real estate, travel and recruitment industries. According to online security firm Distil Networks, 38 percent of companies obtained data through web scraping in 2016 and the annual salary for web scrapers that year ranged from $58,000 to $128,000. Meanwhile, Amazon and Google make web-scraping tools available for free to anyone who knows how to use them.

Madison could become fertile ground for companies that can make web scraping accessible to non-techies. With the University of Wisconsin-Madison’s world-renowned computer science department, and tech companies like Epic Systems Corp. and Promega Corp. attracting top talent, it’s just a matter of finding useful applications.

“It takes a certain skill-set to be able to do this,” says Liam Johnston, a third-year graduate student in the UW-Madison Department of Statistics.

Johnston teaches an undergraduate class on web scraping and related issues and advises researchers in various UW-Madison departments on ways to use web-scraping techniques to find narrow slices of data. For most people, he says, it is difficult to do this without a background in writing code.

Curate Solutions is a 2-year-old Madison company born out of the city’s gener8tor startup incubator. Taralinda and Dale Willis, the husband-and-wife team behind Curate, use the artificial intelligence-influenced web-scraping technology they created to find information in the meeting agendas and summaries that city councils, county boards and other public bodies post on their websites. Curate then passes that information on to clients, which tend to be general contractors looking for leads on commercial construction projects.

In Wisconsin alone, about 7,000 new documents per week are generated and then incorporated into the database that Curate’s software mines for hidden gems. Sunshine laws require municipalities to make most of these documents available, but small towns often lack the personnel or resources to do it in a coherent, accessible way, let alone in a way that combines all the data from all the towns large and small.

“Anybody could go out and look up this stuff, but nobody can do it as fast as we do,” says Curate CEO Taralinda Willis. “We have seen clients win projects in their hometowns that they wouldn’t have known about.”

If a Kwik Trip store in northern Wisconsin petitions its local government to annex an adjacent parcel of land, one likely reason is that it is planning to rebuild or expand the store’s wastewater system. To a company that builds wastewater treatment systems, this knowledge could be quite valuable.

With growing revenue and investment from venture capitalists, Curate now has eight employees. While the future of web scraping as an industry is not assured, Taralinda Willis says the potential applications for the technology are limitless.

“Before we started, I thought, ‘Somebody must be doing this already,'” she says. “It is a tool that can have a lot of uses, but nobody else is specializing at the hyper-local level like we are.”

Dustin Beilke is a Madison freelance writer.