The Electric Reliability Council of Texas (ERCOT) manages the flow of electric power to more than 26 million Texas customers — representing about 90 percent of the state’s electric load. As the independent system operator for the region, ERCOT schedules power on an electric grid that connects more than 46,500 miles of transmission lines and 650+ generation units. It also performs financial settlement for the competitive wholesale bulk-power market and administers retail switching for 8 million premises in competitive choice areas.
One of the things ERCOT does really well is publish data out to the public. The data they publish ranges from details about the way their grid operates, but also a wide variety of settlement and pricing data. The pricing data is very helpful to those who participate in the market, such as retail energy providers or generation assets. Access to their reports start at their public website, http://www.ercot.com. From there, the data is organized by its type, such as retail data or by its markets. The data is published in *mostly* electronic formats which are easily consumed by applications, such as CSV and XML formats. This allows for access to the data to be easily scrapable and consumed by consumer’s IT systems.
Scraping
ERCOT used to have a publicly accessible list of all reports, in a nice XML format, located at http://mis.ercot.com/misapp. For some reason, they have removed public access to these reports, which detracts from the overall usability of the MIS system because new reports could be identified easily as they came online.
Scraping ERCOT is rooted in the reportTypeId. Every report has a specific reportTypeId, which is a random integer of no discernible importance other than being a unique key. Finding out the report’s reportTypeId is as simple as clicking on the link associated with the report’s download page and checking out the associated URL for the query parameter. For example, the Real Time Market’s LMPs by Electrical Bus report is located at: http://mis.ercot.com/misapp/GetReports.do?reportTypeId=11485&reportTitle=LMPs%20by%20Electrical%20Bus&showHTMLView=&mimicKey
The page displayed by the URL is HTML, and therefore, scrapable. The easiest way to scrape is to parse the HTML document and do a search for all the “a[href]” links on the page. Once these are obtained, the document can be traversed using it’s parent references to move up and get the title of the document and other information.
There is a separate disconnect for obtaining the files themselves. The only access is using a URL with only a reference to a docLookupId, which is again a random integer of no discernible importance other than being a unique key. Example:
http://mis.ercot.com/misdownload/servlets/mirDownload?mimic_duns=000000000&doclookupId=732865653
Using the link, the file can be downloaded and then processed.
Implementation
https://github.com/kubetrade/ercot-mis-scraper
The library is a set of static methods for downloading and parsing each report. For the basic flow, the ErcotScraper is used to scrape the AvailableReports as well as download and unzip a specific document. Each report type should have its own class, which utilizes the ErcotScraper for the basic flow and then does report-specific behaviors.
The reports *try* to follow a pattern, where the package name where the report class resides roughly follows where you would find the report on ERCOT’s public website, http://www.ercot.com. The name of the report class also tries to follow the actual name of the report, turning it into PascalCase.
How can you help? Pull requests for new reports are welcome! Please follow basic GitHub pull request flow. https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-requests