Preserving state’s history: In a first, Punjab Vidhan Sabha proceedings since 1947 to be a click away
From SYL to Bhattal’s speech, Punjabi University to digitise 4 lakh archival images

Punjabi University, Patiala — in a first-of-its-kind initiative in the state — has launched the process of digitising around 4 lakh archival images of Assembly debates dating back to 1947.
Once completed, the material on record will have pieces from the state’s history, including the perennial controversial debates on Sutlej Yamuna Link (SYL) between Punjab and Haryana, subsequent to the re-organization in 1966, when the two states were carved out separately.
The current project, officials said, is part of the Rs 14.70 crore research titled ‘OCR (Optical Character Recognition) and Applications in Indian Languages’, and is being undertaken jointly by teams from CDAC Noida, IIIT-Hyderabad and Punjabi University Patiala.
Dr Gurpreet Singh Lehal, one of the members of the Punjabi University team working on the project, told The Indian Express that Punjab government was planning to “float tenders” for converting the archival images into digital texts of Punjab Vidhan Sabha proceedings. “But we told them that we will do it under similar project commissioned by the Centre for the Lok Sabha,” he said.
Punjab Vidhan Sabha officiating secretary, Ram Lok Khatana, added, “There was a proposal to digitise the archives. We came to know that Punjabi University had undertaken a similar project for the Lok Sabha. So we approached them for the job.” Lehal, who will carry out the exercise with Dr Ankur Rana — the other member of the Punjabi University team — said that they will convert 4 lakh archival images of the state Vidhan Sabha, besides also converting 15 lakh pages of Lok Sabha proceedings, starting from 1947.
He added that the conversion of Punjab Vidhan Sabha proceedings was a relatively tough task as the debates had been recorded in various languages — English, Punjabi, Hindi and even Urdu. Compared to this, Lok Sabha proceedings have been largely recorded either in Hindi or in English.
Professor Arvind, the Vice Chancellor of Punjabi University, said that the current initiative aligns with the university’s commitment to scholarly pursuits and the dissemination of knowledge.
In an official press statement released last week, Dr Arvind had stated, “This project will significantly contribute to the public accessibility of the archives that encompasses debates and resumes of Punjab Vidhan Sabha since 1947. It will empower users to explore and analyze these debates using keywords in English, Punjabi, Hindi, or Urdu. For instance, when a user inputs the keyword ‘SYL’ in English, an extensive compilation of debates containing this particular term in any of the four languages will be readily accessible, thereby offering valuable insights for research endeavours.”
According to Lehal, “The archives of debates in Punjab Vidhan Sabha exist as images or non-unicode font formats, rendering them unsuitable for search engine functionality. In order to enable search, it is necessary to convert the images into textual form and transform the existing non-unicode text into unicode format. The project involves leveraging advanced Artificial Intelligence technologies, such as optical character recognition (OCR) and script recognition, to convert the existing non-searchable images and non-unicode text into searchable formats. The multilingual nature of the debates, which encompass English, Punjabi, Hindi, and Urdu, presents significant challenges that necessitate the development of robust and highly accurate systems.” Lehal said that the digitisation will enable users to see what all issues were raised in the Assembly by the legislators of their respective times. “For instance, it would make users access the speeches made by the first woman CM of Punjab, Rajinder Kaur Bhattal,” he said.